Day 2: Start-up News – Boring, Important Measurement

Bar graph showing percentage correct by item grade level

It never ceases to amaze me that intelligent people will spend huge amounts of time doing a literature review, designing elaborate theories, generating elegant hypotheses, selecting a three-stage stratified random sample, performing multivariate analyses, and their measures on which this brilliant study rests are some questions they made up with their three best friends over Chardonnay during happy hour one Friday night. This is also known as the “panel of experts” method and it has the added benefit that it allows you to deduct the wine on your taxes. (Not actual tax advice. Consult your accountant. Of course, if you are doing your 1040 based on reading this blog, you are probably beyond help.)

We did not go with this approach. Our original idea was to use released items from the state standards test from North Dakota but, unfortunately, that is one of the states that never releases items. What we did was find standards that were the same, verbatim, as other states and then found items from those states that had been released. For example,

” Compute a given percent of a whole number”

and the problem would be

“What is 40% of 250?”

with the same four multiple choice options that had been used on the state test.

As someone pointed out, even if the same test had not been previously, since we pulled only the items that tested exactly what we included in the game, the individual items had been validated. So, we had content validity.

One bit of evidence for construct validity came from the item difficulty levels. Here is one of several charts. This shows what percentage of the fourth-grade students answered each item correctly. The items are broken down by grade level. It is also important to know that the state tests showed the majority of students at this school to be low-performing in mathematics. What we see is that as students go from second-grade level items, all of which the majority of the students answered correctly, to fifth-grade items, the percentage correct declines. We see that for the fifth-grade items, only one of them did the students exceed the 25% that would be answered correct by random guessing (remember, there were four multiple-choice options).

Since the state’s test have shown these students to be performing poorly, we should see that they generally are not at grade level, that is, they do not answer many of the fourth-grade items correctly at a rate exceeding chance. That, as you can see from the chart, is the exact situation.

Of course, we did more than this, beginning with replicating this identical chart with fifth-graders, who showed pretty much the same pattern but, as would be expected, answered a higher proportion correctly at each grade level than did the fourth-graders.

That’s the sort of thing that too many studies take for granted and never test. This isn’t the exciting part of creating a game, the part where you make an attack scene and the kid gets to shoot flaming arrows. So, what good does this do us? Well, the combination of the different analyses of the measure confirms that the measure we used for students to test whether or not their mathematics achievement increased is, in fact, a valid measure of mathematics achievement.

Also, this method has the advantage of not being required to share any of the wine with our best friend/ expert panel so we get to drink it all ourselves.

Census in Black & White: What I wondered about lately

ByAnnMaria De Mars August 22, 2011

The census now allows more than one race to be checked. For many years, friends of mine in inter-racial couples when they registered their children for school would check the “Other” box for race, rather than pick black or white. Although an individual’s census form responses are confidential, you certainly are free to tell anyone…

statistics

Ask me anything: Part 2

ByAnnMaria De Mars December 9, 2011January 7, 2012

Continuing on with questions students asked at the end of the semester … Note that the following questions were asking what I personally do, and I answered the same way. These are not rules that anyone has to follow, like taking the square root of the variance to find the standard deviation, but they are,…

Software | statistics

SAS and SPSS Give Different Results for Logistic Regression but not really

ByAnnMaria De Mars July 14, 2011July 14, 2011

When people ask me what type of statistical software to use, I run through the advantages and disadvantages, but always conclude, “Of course, whatever you choose is going to give you the same results. It’s not as if you’re going to get a F-value of 67.24 with SAS and one of 2.08 with Stata. Your…

Software | statistics | Technology

The QUANTLIFE procedure for survival analysis

ByAnnMaria De Mars April 29, 2013April 29, 2013

Trying this live blogging from SAS Global Forum again. The title kind of says it PROC QUANTLIFE new procedure in SAS 9.3 Why DO we need a new procedure for survival analysis? ====== Survival analysis used to analyze time-to-event data already had procs lifetes, lifereg & phreg ======== Lifereg is fine if you have IID…

statistics

Cluster Analysis: Finding Groups in Data

ByAnnMaria De Mars March 16, 2010March 16, 2010

Cluster analysis is one of those techniques I don’t get to use very often. About once every couple of years someone will be doing a study of types of companies, patients or clients and have a need for a cluster analysis. The best description I read of cluster analysis came from a book many years…

Software | statistics | Technology

SAS Global Forum Random Post 1: Statistics

ByAnnMaria De Mars April 19, 2016

If you did not go to SAS Global Forum this week, here are some things you missed: Me, rambling on about the 13 techniques all biostatisticians should know, including the answer to: If McNemar and Kappa are both statistics for handling correlated, categorical data, how can they give you completely different results? The answer is…

One Comment

Similar Posts

One Comment

Leave a Reply