statistics

Standardized Testing in Plain Words (continued)

ByAnnMaria De Mars November 20, 2016

Last post I wrote a little about local norms versus national norms and gave the example of how the best-performing student in the area can still be below grade level.

Today, I want to talk a little about tests. As I mentioned previously, when we conducted the pretest prior to student playing our game, Spirit Lake, the average student scored 37% on a test of mathematics standards for grades 2-5. These were questions that required them to say, subtract one three-digit number from another or multiply two one-digit numbers.

Originally, we had written our tests to model the state standardized tests which, at the time, were multiple choice. This ended up presenting quite a problem. Here is a bit of test theory for you. A test score is made up two parts – true score variance and error variance.

True score variance exists when Bob gets an answer right and Fred gets it wrong because Bob really knows more math (and the correct answer) compared to Fred.

Error variance occurs when, for some reason, Bob gets the answer right and Fred gets it wrong even though there really is no difference between the two. That is, the variance between Fred and Bob is an error. (If you want to be picky about it, you would say it was actually the variance from the mean was an error, but just hush.)

How could this happen? Well, the most likely explanation is that Bob guessed and happened to get lucky. (It could happen for other reasons – Fred really knew the answer but misread the question, etc.)

If very little guessing occurs on a test, or if guesses have very little chance of being correct, then you don’t have to worry too much.

However, the test we used initially had four multiple-choice items for each question. The odds of guessing correctly were 1 in 4, that is, 25%. Because students turned out to be substantially further below grade level than we had anticipated, they did a LOT of guessing. In fact, for several of the items, the percentage of correct responses was close to the 25% students would get from randomly guessing.

When we computed the internal consistency reliability coefficient (Cronbach alpha) which measures the degree to which items in a test correlate with one another, it was a measly .57. In case you are wondering, no, this is not good. It shows a relatively high degree of error variance. So, we were sad.

SAS CODE FOR COMPUTING ALPHA

PROC CORR DATA = mydataset NOCORR ALPHA ;

VAR item1 – item24 ;

The very simple code above will give you coefficient alpha as well as the descriptive statistics for each item. Since we very wisely scored our items 0 = wrong, 1= right a mean of say, .22 would indicate that only 22% of students answered an item correctly.

To find out how we fixed this, read the next post.

To buy our games or donate one to a school, click here. Evaluated and developed based on actual data. How about that? Learn fractions, multiplication , statistics – take your pick!

My Year in Books: Technical Edition

ByAnnMaria De Mars December 31, 2014January 3, 2015

I read a lot. This year, I finished 308 books on my Kindle app, another dozen on iBooks, a half-dozen on ebrary and 15 or 20 around the house. I don’t read books on paper very often any more. It’s not too practical for me. I go through them at the rate of about a book…

Software | statistics | Technology

SAS Tricks for Massaging Data into Shape

ByAnnMaria De Mars October 3, 2014

Today, I was thinking about using data from the National Hospital Discharge Survey to try to predict type of hospital admission. Is it true that some people use the emergency room as their primary method of care? Mostly, I wanted to poke around wit the NHDS data and get to know it better for possible…

Dr. De Mars General Life Ramblings | statistics

Should transgender athletes compete in women’s MMA: The data

ByAnnMaria De Mars March 24, 2013March 27, 2013

There has been far more heat than light surrounding the current controversy over whether a transgender (male to female) fighter should be allowed to compete in mixed martial arts in the women’s division. This article on The Verge said that opponents of Ms. Fox competition “are not supported by the current science”, citing the fact…

statistics

Probability and z-scores

ByAnnMaria De Mars May 11, 2015May 11, 2015

For many students just learning statistics, the relationship of z-scores and probability is confusing. Let’s try this concrete example. Here is a chart of the distribution of height in a sample of over 2,800 women. Notice that the peak, the mode is around 62-63 inches. You can see the frequency table here, as well as a…

statistics

Statistics save the world

ByAnnMaria De Mars October 14, 2016

I will be the first to admit that I’m not the warm fuzzy type. Maybe you’re like me, you’d like to do good for your community but you just can’t see yourself as a physician. Maybe your bedside manner is to snap at someone to quit being a whiner. Or maybe you really are a…

Dr. De Mars General Life Ramblings | Software | statistics | Technology

You learn one programming language, you’ve learned them all (sort of): SPSS Quintiles Example

ByAnnMaria De Mars February 15, 2012February 15, 2012

Recently, I had the need to write the exact same programs twice, once using SAS and once using SPSS syntax. Even though these aren’t the same language, having done it once made it much easier to do it the second time. Let’s start with quintile matching. I’ve been rambling on about propensity scores lately and…

One Comment

E-bone says:

November 23, 2016 at 4:23 pm

Ooooh- cliffhanger before Thanksgiving, no less!

I always thought my statistics professor was being a smartass when he would refer to multiple “choice” tests as multiple “guess”. Hmmmm

Similar Posts

One Comment

Leave a Reply