It Beats Working Backward from the Semi-Colon
My niece, Samantha, should be far more trusting at her tender age. She was skeptical – skeptical, I say! – that I once worked at a job where I got so bored that I started coding my SAS statements backwards, typing a semi-colon first and then working backwards, say to PROC .
Non-naive nieces aside, it is nonetheless true.
I left that job before I had progressed to starting at the end of the program and working backward up to LIBNAME.
Lately, I have had a lot of similar problems where I needed to recode data. All of these involved rectangular data sets – that is, hundreds of variables over a few hundred people – so speed and efficiency of processing were negligible concerns.
A few days ago, I gave a solution when you have, for some bizarro reason, questions on a one to five scale coded into five different variables. Before that, I had a similar problem when true / false questions were coded into two variables.
Later in the week, I ran into essentially the same problems but I was bored with doing it the same way. In this case, there were about a zillion questions where people were to check any that applied.
Check all of the things you have in your pocket right now
__ keys
__ USB drive
__ lint
__ guinea pig poop
__ a chicken
__ Julia De Mars’ cell phone
These are scored a 1 if checked and missing if not. It we do the mean, we would get 1 for every item, because everyone who did not check an item had a missing value. Having a 0 if they did not check it would be better. For one thing, when we do a PROC MEANS it will tell us what percentage of people selected this item. (If you have Julia’s cell phone, give it back.)
This time I used a PROC FORMAT, like this:
PROC FORMAT ;
VALUE yn
1 = 1
. = 0 ;
DATA scoredfile ;
SET oldfile ;
ARRAY yn{*} q0041_00 -- Q0051_04 ;
DO x = 1 to DIM(yn) ;
yn{x} = PUT(yn{x},yn.) ;
END ;
The PUT function puts the formatted value of the element in the array yn back into that variable. Generally I am a bit gunshy about recoding variables into themselves but notice that I re-named the file so it is pretty clear it has been scored already.
Shortly after this problem, I had a test with a gazillion questions and each one, in the manner of tests, was scored right or wrong. However, it turns out that the answer to every question was not C , contrary to what you put on your SATs (and now you know why you did not get into UCLA ). The example below only shows three, but actually there were six choices, which made this a little more worth doing.
So, I created a macro. I used the same index variable, i, for each array. One less thing to delete at the end of the job. There are only two parameters and they are both required, the array name and the number.
For each array, say, all of the items with the correct answer 1 (how A was stored in our file), it will score the item 1 if the answer 1 was given and 0 otherwise. This is a test, so whether you got it wrong or you skipped it you get a zero either way. If you knew the answer, you should have answered it.
I can call this macro six times, just giving the name of the array and the correct answer.
%macro sa(aname,num) ;
i = 1;
do i = 1 to dim(&aname) ;
if &aname{i} = &num then &aname{i} = 1 ;
else &aname{i} = 0 ;
end ;
%mend sa ;
DATA scoredfile ;
SET oldfile ;
array a1{*} q0004 q0007 q0008 q0009 q0014 q0016 ;
array a2{*} q0005 q0006 q0013 ;
array a3{*} q0011 q0012 q0015 q0018 q0020 q0022 q0024 ;
%sa(a1,1) ;
%sa(a2,2) ;
%sa(a3,3) ;
So, there you have it, two more ways to solve the same problem. Thank God all of the data are read in and we can go on to analyzing it, though, because much more of this and I would have started writing backwards from the semi-colon.
An interesting bit of trivia: the approach that uses PROC FORMAT works because the DATA step automatically converts character values to numeric values, and vice versa! The result of PUT (character) is converted on the fly to a numeric value when it is assigned to yn{x}. Not many languages do this, and even some language in SAS do not.(For an example, see http://blogs.sas.com/content/iml/2011/10/17/does-symput-work-in-iml/)
Thanks. I did not know that.
Yes, the numeric converted on the fly is correct.
You can get around this by using the input() function which is a good programming habit. In this specific example, it’s not really a big deal (note is printed to the log and the code looks a little “funny” at first glance). But if you had assigned it to a new variable, you would get results you might not have expected (the new variable would be character ‘1’ or ‘0’).