Open Data & TIMSS: I am not a masochist (or a witch)
I’ve never understood masochists. Back in the days when I was competing I would regularly get calls from creepy men who were willing to pay me big bucks to beat them up. As one of my lovely daughters said of the sumo wrestler who sent her a picture of himself posing in his diaper-thingie and asking her for a date,
In a word – eeew!
I am not a masochist. Nor a witch, for that matter, in case anyone is interested. And yet, I find myself spending not hours, but days poring over codebooks and technical manuals to understand open data datasets. The latest is the TIMSS (Trends in International Mathematics and Science Study) dataset.
There does not seem to be any substantial resource for analysis of open data or collaboration by people doing it. I noticed that donorschoose.org has a competition going on, and that is certainly a worthwhile cause. It focused on their data although they do suggest merging with other possible datasets. There are also the community forums on the data.gov website, which get surprisingly little traffic given that over 300,000 raw datasets are available. Since I could not find any place else to post this information, I am putting it here, largely for myself for later use, but also for anyone else who might be working with TIMSS or similar data and find this information useful. If you do know of resources on analysis of open data, PLEASE post the information here!
Sampling – from the TIMSS technical report and user guide
I don’t believe anything anyone says unless I can prove it myself. My initial suspicion was that perhaps the country comparisons were not equivalent, that is, it may be that we had a very representative sample of students in the U.S. where other countries had more selective samples. For a HYPOTHETICAL example, if you had a country where nearly 100% of students get an education at least through the eighth grade (like in the U.S.) and you compared them to a country where the drop out rate before eighth grade was 50%, then you might find that the U.S. performed more poorly when that is not necessarily the case at all.
I still wonder if that might be true, but what I have concluded so far is that the TIMSS sample seems to be a pretty fair, representative sample of the U.S. Students from high poverty and central city schools are, in fact, slightly underrepresented, but the difference from the population is really so slight that it is not worth mentioning, even though I did just mention it.
What did I expect? Well, I hoped that this might be the case but one reads everything about education in the media from the average teacher in Wisconsin making $100,000 a year (AS IF!) to that high school drop out problems are due to illegal aliens (slight effect if you remove non-citizens from the data, but very small compared to other factors). The data on sampling available are extremely detailed and probably more complex than really necessary, in my opinion. However, my opinion is primarily based on the use I intend for these data which is not the same as the main goal of TIMSS.
Personally, I’m very interested in what exactly do American eighth-graders know, at a micro-level, that is, what questions did they get right and which did they get wrong? One of the benefits of making your data open to everyone is that it can be applied to answer questions that were not part of your original study. It opens up the possibility of getting a great deal more in the way of analyses than your research team can do on their own.
Of course, it also means that others can scrutinize every bit of your research design and hold it up for criticism. So, kudos for the TIMSS team for taking the plunge!
Administration and instrument
TIMSS documentation states that the test is designed to measure five areas of mathematics and three levels of difficulty. There are fourteen versions of the test and students receive one at random. They can impute plausible values for the items that were not administered from the ones that were. While I can speculate on some very good reasons for doing it that way – in particular, if you have one version of a very high stakes test, it won’t be that hard for people to get hold of that test and teach the answers to the exact questions that are on it. Having 14 different versions certainly makes cheating much harder. Regardless, I would have preferred for my purposes that everyone had been given the same test because I’d like a very large N . When I look at the item frequencies, each item says “NOT ADMINISTERED” for about 86% of the subjects. Well, 1/14 is a lot less than 14%, in fact it is more like 7%, so, obviously, each of the 14 different parallel forms wasn’t completely different.
The good news so far is that TIMSS documentation and data have had almost everything I have wanted.
- There is a clear identification for each student whether an item was not answered because it wasn’t administered to a specific student or because there was no response.
- The data download as a text file but come with SAS files to read in the data.
- They have kindly included code for merging and missing values.
- Documentation is exhaustive (and I mean that literally!)
The bad news is that there is an almost overwhelming amount of information. The technical report and user guide is over 300 pages long. The codebook for the eighth grade mathematics achievement is 178 pages long and the eighth grade achievement dataset alone is 572 variables. I’m not particularly interested in science at this point, so I can drop all of those. On the other hand, there are no data on ethnicity or income in this dataset so it is clear I am going to have to merge these data with the school and student files at some point.
The programs include user-defined formats so you can get format errors if you run their programs as is. You have three options, one is to just delete their format statement, a second is to create the formats, either temporarily by copying them at the top of your program or, more sensibly, using a %INCLUDE statement or making them permanent formats and the third is to use the OPTIONS NOFMTERR when you don’t feel like messing with the formats. Usually I would not add formats permanently but since I can already sense that I am going to be using these data a lot, I’m going to go ahead and do that this time.
SURPRISE – an age of 13 doesn’t mean 13 years old !
It’s going to be a lot of work, but worth it. I’ve already seen some very interesting statistics. I was astounded to see that only 1% of the students were under 14 at the time of testing. In fact, there were twice as many students who were 16 years old in the eighth grade as 13. Even if the testing is at the end of the year, that seems really low.
Just a little tip on age – the ages appear to be stored with decimal ages even though they print as integers. So, if you have a statement that says something like
If bsdage in (14,15) then ….
You will find very few students. In fact, you’ll get the students who are exactly 14 and 15.
My recommendation is when you read in the data NOT to use the age format TIMSS uses, which seems to round students to the nearest age. I don’t think most people think of age like after they’re five years old. They quit saying they are “almost six” and give you their real age.
You can then cut the data however you like. I used a cut-off of under 13.5 years as “young” for the eighth grade and over 15.5 years as old.
Since my mother let three of her children skip grades in school against the advice of administrators, and it turned out with varying results, I have always been fascinated by the experiences of kids who are young for their grade. Personally, I think it was the best thing my mom ever did for me and will be forever grateful that she told the principal to stick it (actually, my mom would never say anything like that and for her to buck authority, especially a nun, was extraordinarily out of character). I let my oldest daughter start kindergarten at four and go off to college at 17. My brother, on the other hand, thought it was a great hardship for him and said he would never let his kids skip a year of school.
As I suspected, those who were young for their grade were disproportionately female (64%) while those who were old for their grade were disproportionately male (67%). Whether this represents anything more than people accepting a stereotype that boys mature late, I don’t know.
The boys who were younger than average seemed to be doing quite well. Male or female, children who were young for their grade did best, children who were old for their grade did worst. My guess would be that children who were advanced were promoted and those who did poorly were held back, certainly not an earth-shattering revelation. What it does seem to suggest at a first glance, though, is for both males and females, being a grade ahead of children their same age isn’t related to academic problems in the eighth grade.
It’s 1 a.m. and I should probably be thinking about sleeping but I am just starting to get into the fun part of the data.
So, what have I learned from open data? In a nutshell, it’s a mountain of work to get started with a dataset of any complexity, but like most times in life, the work can pay off, sometimes in unexpected ways.