I just want to point out schools should use median grades, not mean grades. For example the grades A A A- B C have a median of A-, but a mean of B+. A B+ for getting mostly As is stupid. Or A A A A- A- C C F F can average to a C!! Fs count too much with mean grading.

On the other hand, mean grading is good at forcing children to learn what they hate most.

The problem is your sample sizes are too small. If enough samples of the student's work/knowledge (whatever that means) were included into the formula then the median/mean distinction would make little difference. A way to approximate the effect you desire is to toss out outliers (i.e. drop the lowest couple of scores), which is something teachers often do anyway. (Think homework-assignments, where typically even the really good students will have a bunch of A's but then one Zero because they were sick that week or whatever...)

My ideal dreamworld grading system would have these features (think a math/science class):

-gotta be based on summing the results of lots of little questions/problems, not a few Big ones

-all grades out of 2 "points". 0=not done/sucky. 1=partial credit. 2=correct.

-did I say LOTS?

The curve part is to normalize for how sucky the teacher is in teaching or selecting course material; if the teacher sucks/teaches stuff that's too difficult so that most scores are between 40-50%, that doesn't mean they should all fail, it means 50+ is an A. Remove the teacher from the equation if possible.

The lots-of-little-questions thing is just to use the law of averages so that students don't get killed because they misunderstood one problem statement. You get enough samples so that the grade is a true assessment and randomness/luck (i.e. the student just *happened* to study a problem just *like* that the night before...) plays as small a role as possible. Hopefully the central limit theorem helps you get a true "bell curve" as well, making final categorization easier.. ;-)

The out-of-two-points thing is for two reasons.

One is that it eliminates, as much as possible, the factor of teacher favoritism/subjectivity in grading. I mean really WTF does "you get 7/10 on this problem!" mean, anyway? The grader has to constantly make those little judgments, and then keep in mind who got 6 for what, who got 7 for what, who got 9 for what, to make sure the partial-credit is assigned rationally... and it's all largely a bogus process (starts to feel like being a figure-skating judge). You give a student 22/30 for such and such problem, he's GOING to come to your office hours and ask why not 23 or 24, and WTF do you say? He's got a point after all. Now, the really gutsy thing to do then would be to grade each problem out of 1: Yes or no, you got it right, or not. However, students would whine too much about that, so that's why I'd go for the 2-point system.

The other reason for the 2-point system is that it's much easier on the person who has to grade these things, for reasons described above :-)

Of course I come at this from the point of view of someone who's taught/graded, and I realize you'll probably be horrified at some of my assumptions here (for example: lots of homeworks & quizzes with lots of problems... I'd guess you're against stuff like homework & quizzes...)

I've seen a pretty cool way of avoiding the arbitrary weighting problem for individual questions/problems, but it probably requires software to implement...

What you do is make the value of an answer proportional to the number of people who missed it (or answered worse, if there are multiple levels of credit for the problem). That way, what's difficult and what's easy is determined by how people actually performed, rather than by how hard (or important) the teacher decided it was.

Also, this post reminds me of this geeky pun:

When she told me I was average, she was just being mean.

Neat idea Gil, but it does over-weight bad questions.

Blixa,

Drop the lowest is done sometimes, but median is a more unified/consistent way to get the same sort of effect. The effect being that A, A, A, A, C and A, A, A, A, B are equal. Mean grading on the other hand is designed to make you have to try your hardest on every last thing to avoid being screwed, which is cruel. If you start with one zero, you need 14 perfects in a row to get up to an A (assuming cutoff is 93.0). That's absurd.

I don't like curves because they fail people. Everyone who wants to be there and isn't fully incompetent should pass.

Mean vs median can make a *huge* different even with a large sample size. Using 0/1/2 grading:

00000000000000011111111111222222222222222222222222222222

mean: about 1.3, I'm guessing from looking. median 2.

Ah well we're sort of talking about different things right here, but it's my fault cuz I kinda contradicted myself.

I said median/mean makes little difference w/ enough samples, which would generally be *true* if we were talking about something like: 20 homework assignments, each one's worth 100, and the scores spread rather nicely from 60-100, which is what I was actually thinking of at that point, cuz that's typical.

But then I forgot about that whole deal and went on to fantasize about my 0-1-2 grading system. Yes, median/mean is obviously more likely to make a difference when all the grades are concentrated on 3 spikes. *shrug* duh

The thing is that my 0-1-2 system is not intended for a class that calculates the final score via median, in the first place. The whole point was to *sum* lots and lots of these little numbers after all, because the whole point is that in the end you want a nice spread to differentiate among students.

If you told me I *had* to use the median I'd grade everything from 0-100 or whatever, like normal, so there's at least some decent spread built in at the beginning, and *then* calculating median wouldn't scare me so much. And like I said, in that case it would be similar to a "drop lowest" rule which is often done anyway.

I'm not opposed to median or anything, I'm just saying the typical change that would result probably isn't significant enough to be worth the effort. (okok there's no more *computational* "effort", just effort in explaining to students that this is what you're doing and so on :)

Not sure I understand your point about curves failing people. Curves don't fail people, People fail people ;-) Ok what I mean is, in the very next sentence you say people shouldn't fail unless they are incompetent or don't show. Well right. And that's who will fail, curve or no. It sounds like you have in mind some robotically-applied curve (i.e. stoopid hard and fast rules about standard deviations etc.) which ends up binning people into the F category who shouldn't be. Well I agree, I wouldn't robotically apply a curve or use standard deviation rules at all, that's not what I meant by "curve". There are many ways to curve. Typically I'd make a histogram and look for clusters, and go from there...

i.e. If there are 20 students and 8 of them get between 30-50% of the points (with a couple in the 10% range) I'm not gonna Fail the 30%-50%ers just because mathematically they're 2 S.D.'s below the class mean or whatever the Orthodox Curve Rule supposedly is... that's not what I meant at all. That such a large number got that small percentage probably indicates that something I did was too difficult, so that cluster gets a break (like probably a C or whatever). And *that's the only purpose of the curve*, to give them a break. (The curve I have in mind is one that would only increase a student's grade, never decrease it.)

ps Why do I type so much?

my understanding of a curve is you give out the same number of As and Fs, Bs and Ds. I think in most classes most people should get As. anything else hurts people for no good reason.

one thing about using means, and a scores fall in 60-100 range system is it lets teachers hurt students a lot with super low grades whenever they feel like. and this is standard policy when you don't do an assignment. or even turn one in late (often that means half credit! half is out of the 60-100 range)

my understanding of a curve is you give out the same number of As and Fs, Bs and Ds.

Whoa. That's a crazy way to curve & I've never encountered that. For the record: I'm NOT advocating things that are obviously dumb-ass, ok? ;-)

Usually "curve" just means you are graded with a view to what the rest of the class's scores are, rather than with respect to some stupid predetermined numerical scale (i.e. 90=A, 80=B etc). In practice I've never encountered a curve which "hurts" a student's grade. Under a curve all students *could* get an A, but (usually) what is prevented is a situation where all or most of them fail just because their numerical score is below some pre-set F number.

I think in most classes most people should get As. anything else hurts people for no good reason.

I disagree. If most people get A's that seems to defeat the purpose of grades, why not just say "attended"? Now, I don't necessarily think grades are a great idea to begin with, but if you're gonna have them you may as well use them.

Also the demand (from e.g. colleges) for separating students according to how they did, is not going to go away, so when you inflate grades like this all that happens is that the student's room for error is smaller. Under a sane grading system, C is fine, B is good and A- is very good. Under your (inflated) grading system, some kids'll think A- a horrible failure and kill themselves over it. Now this leads to pressure on teachers to never give A- or lower (heck isn't this what happens at like Yale?), but that leads to all grades being A's - so why even have grades? Well, like it or not, they *are* needed for some reasons (i.e. colleges)

Inflating grades (and similar things) seems to come from human nature however. Again the figure-skating comparison comes to mind. They pretend to grade out of 6.0 but no matter how badly one screws up they ain't getting lower than 5.0. This drives me nuts and rationally I think why not just subtract 5.0 from everyone's score and score from 0.0-1.0? Well because human nature would consider .6/1.0 a horrible score even if 5.6/6.0 for the same performance is fine. So scores would be automatically re-inflated so that no one gets lower than .7. But then I'd wonder why not just subtract .7 from everyone's score and grade from 0-.3. And so on....

The point is if you got your way and inflated grades so most people get As, the demand for sorting students, which doesn't go away, would (probably) lead to further stratification - say a system of A--, A-, A, A+, A++. I'm sure you can see how this would end up being just a re-labeling of the existing grade system we already have.

If you're going to insist that grades *don't* stratify (i.e. no plus/minus), that most students just get "A", the problem is that grades will lose their usefulness altogether for how they are used. But again, the demand for sorting students by performance can't go away, so some other (perhaps stealth) measure would have to be used. What? I have no idea but I doubt it would be any better than the grading system we already have....

Elliot,

Bad questions are going to be a problem for any grading system.

I saw the inverse-scoring sysem I described above at car rallyes (brain-tesers on wheels, where people would follow complex instructions carefully and answer questions along the way, avoiding logical tricks). At the finish, each entrant would receive the official Answers and Explanations. He would also have a chance to protest any wrong answers he got where he thought his answer was more correct than the official one, or that he answered as he did because of a problem with the question, rather than because he failed to catch the intended trap. The rallyemaster would usually grant full credit for a valid protest, or throw out a question entirely if it seemed that it was sufficiently flawed that many people would answer for the wrong reasons.

Anyway, most teachers I have had were not open to this sort of protest system that would correct invalid test results. That's the problem, and I don't think that any grading system short of thowing out grades altogether will fix it.

I meant brain-teasers, not brain-tesers.

