Watch me work: Data Project
On twitter, there were a few comments from people who said they didn’t like to take interns because “More than doing work, they want to watch me work.”
I see both sides of that. You’re busy. You’re not netflix. I get it. On the other hand, that’s a good way to learn.
So, here you go. I’m starting on a data analysis project today and I thought I’d give you the blow by blow.
It just so happens that the first several steps are all point-y and click-y. You could do it other ways but this is how I did it today. So, step one, I went to phpMyAdmin on the server where the data were saved and clicked Export.
For the format to export, I selected CSV and then clicked on the Go button. Now I have it downloaded to my desktop.
Step 3: I opened SAS Enterprise Guide and selected Import Data. I could have done this with SAS and written code to read the file, but, well, I didn’t. Nope, no particular reason, just this isn’t a very big data set so I thought, what the heck, why not.
Step 4: DO NOT ACCEPT THE DEFAULTS! Since I have a comma-delimited file with no field names, I need to uncheck the box that says File contains field names on record number. SAS conveniently shows you the data below so I can see that it is comma-delimited. I know I selected CSV but it’s always god practice to check. I can also see that the data starts at the first record, so I want to change that value in Data records start at record number to 1.
Step 5: Change the names – I am never going to remember what F1, F2 etc. are, so for the first 5 , I click on the row and edit the names to be the name and label I want.
That’s it. Now I click the next button on the bottom of the screen until SAS imports my data.
I could have continued changing all of the variable names, because I KNOW down the line I am not going to remember that F6 is actually the first question or that F25 is question 28a. However, I wanted to do some other things that I thought would be easier to code, so I opened up a program file in SAS Enterprise guide and wrote some code.
/* THIS CREATES TWO ARRAYS BECAUSE I AM TOO LAZY
TO RENAME 32 QUESTIONS INDIVIDUALLY
THE PRETEST DATA SET WAS CREATED BY THE STEPS ABOVE USING IMPORT DATA */
data pretest2 ;
set pretest ;
** NOTE THAT THERE IS A $ AFTER THE NUMBER OF ELEMENTS IN THE ARRAY
** BECAUSE THIS IS A CHARACTER ARRAY ;
array ren{32} $ f6-f37 ;
array qs {32} $ q1-q27 q28a q28b q28c q29 q30;
do i = 1 to 32 ;
qs{i} = ren{i} ;
end ;
** YOU CAN ALSO USE A RENAME STATEMENT TO RENAME THE SAME VARIABLES ;
rename f38 = date_test ;
*** SINCE I NO LONGER NEED THE VARIABLES F6- F37 OR THE INDEX VARIABLE FOR THE
ARRAY, I DROP THEM HERE ;
drop f6- f37 i ;
*** SOME STUDENTS SAVED THE TEST MORE THAN ONCE BECAUSE THEY SAVED BEFORE THEY WERE DONE AND AT THE END. SO, I SORT BY USERNAME AND TEST. WE WILL ONLY KEEP THE LAST ONE.
proc sort data=pretest2 ;
by username date_test ;
*** THIS KEEPS JUST THE LATEST TEST DATE. ALSO, WE TESTED THIS 45 TIMES IN
THE PROCESS OF GETTING READY FOR USE IN THE SCHOOLS. ALL OF OUR STAFF USED USERNAMES WITH ‘TEST” SO I USED THE INDEX FUNCTION TO FIND IF THERE WAS A “TEST” IN THE USERNAME AND, IF SO, DELETED THAT RECORD ;
data pretest2 ;
set pretest2;
by username date_test ;
if last.username ;
if index(username,‘TEST’) > 0 then delete;
run;
Okay, that’s it. Now I have my data all ready to analyze. Pretty painless, isn’t it?
Want to learn more about SAS?
Here is a good paper on Arrays made easy .
If you’re interested in character functions like index, here is a good paper by Ron Cody.