PROC COMPARE FOR VALIDATING SAS CODE
I know people who are so obsessive about testing and validating their code to the point they spend more time on testing it than actually writing it and analyzing the output. I said I know people like that, I didn’t say I was one of them. However, it is good practice to validate your SAS code and despite false rumors spread by my enemies, I do it sometimes.
Here is a simple example. I believed that using the COMPRESS function with “l” for lower case or “I” for case-insensitive gave the same results. I wanted to test that. So, I ran two data steps
DATA USE_L;
set mydata.aztech_pre ;
q3 = compress(Q3,’ABCDEFGHIJKLMNOPQRSTUVWXYZ’,’l’);
q5 = compress(Q5,’ABCDEFGHIJKLMNOPQRSTUVWXY’,’l’);
… and a whole bunch more statements like that.
Then, I ran the exact same data step but with an “I” instead of an “l” .
Finally, I ran a PROC COMPARE step
PROC COMPARE base =USE_L compare=USE_I ;
Title “Using l for lowercase vs I for insenstitive” ;
But, hey, maybe PROC COMPARE just doesn’t work. Is it really removing everything whether it is upper or lower case? To test this, I ran the procedure again comparing the dataset with the compressed results with the original data set.
PROC COMPARE base =mydata.aztech_pre compare=use_I ;
Title “Comparing with and without compress function” ;
The result was a whole lot of output, which I am not going to reproduce here, but some of the most relevant was:
Values Comparison Summary Number of Variables Compared with All Observations Equal: 24. Number of Variables Compared with Some Observations Unequal: 16. Number of Variables with Missing Value Differences: 10. Total Number of Values which Compare Unequal: 694.
Looking further in the results, I can see comparison of the results for each variable by observation number
|| q5 || Base Value Compare Value Obs || q5 q5 ________ || ____________ ____________ || 5 || 150m 150 6 || 42 miles 42 10 || one thousand 12 || 200 MILES 200
So, I can see that the data step is doing what I want, which is removing all of the text from the responses and only leaving numbers. This is important because the next step is comparing the responses to the questions with the answer key and I don’t want any mismatches to occur because the student wrote ‘200 miles’ instead of 200.
In case you are interested, this is the pretest for two games that are used to teach fractions and statistics. You can find Aztech: The Story Begins here and play it for free, on your iPad , Mac, Windows or Chromebook computer.
Forgotten Trail can be played in a browser on any Mac, Windows or Chromebook computer.