Standardized Testing in Plain Words (continued)
Last post I wrote a little about local norms versus national norms and gave the example of how the best-performing student in the area can still be below grade level.
Today, I want to talk a little about tests. As I mentioned previously, when we conducted the pretest prior to student playing our game, Spirit Lake, the average student scored 37% on a test of mathematics standards for grades 2-5. These were questions that required them to say, subtract one three-digit number from another or multiply two one-digit numbers.
Originally, we had written our tests to model the state standardized tests which, at the time, were multiple choice. This ended up presenting quite a problem. Here is a bit of test theory for you. A test score is made up two parts – true score variance and error variance.
True score variance exists when Bob gets an answer right and Fred gets it wrong because Bob really knows more math (and the correct answer) compared to Fred.
Error variance occurs when, for some reason, Bob gets the answer right and Fred gets it wrong even though there really is no difference between the two. That is, the variance between Fred and Bob is an error. (If you want to be picky about it, you would say it was actually the variance from the mean was an error, but just hush.)
How could this happen? Well, the most likely explanation is that Bob guessed and happened to get lucky. (It could happen for other reasons – Fred really knew the answer but misread the question, etc.)
If very little guessing occurs on a test, or if guesses have very little chance of being correct, then you don’t have to worry too much.
However, the test we used initially had four multiple-choice items for each question. The odds of guessing correctly were 1 in 4, that is, 25%. Because students turned out to be substantially further below grade level than we had anticipated, they did a LOT of guessing. In fact, for several of the items, the percentage of correct responses was close to the 25% students would get from randomly guessing.
When we computed the internal consistency reliability coefficient (Cronbach alpha) which measures the degree to which items in a test correlate with one another, it was a measly .57. In case you are wondering, no, this is not good. It shows a relatively high degree of error variance. So, we were sad.
SAS CODE FOR COMPUTING ALPHA
PROC CORR DATA = mydataset NOCORR ALPHA ;
VAR item1 – item24 ;
The very simple code above will give you coefficient alpha as well as the descriptive statistics for each item. Since we very wisely scored our items 0 = wrong, 1= right a mean of say, .22 would indicate that only 22% of students answered an item correctly.
To find out how we fixed this, read the next post.
Ooooh- cliffhanger before Thanksgiving, no less!
I always thought my statistics professor was being a smartass when he would refer to multiple “choice” tests as multiple “guess”. Hmmmm