PROC FORMAT : One way to do a table lookup in SAS
About ROUND(100) years ago, I took a couple of COBOL courses. I never coded anything in COBOL after the classes, but the concept of a table look up has stuck with me.
Just like it sounds, this is used when you want to look something up in a table. For example, if I have ICD-10 codes, which are used to classify diagnoses, I could look up J09.X2 in a table and see that is “influenza”.
There are several ways to do this with SAS, including using PROC FORMAT, a one-to-many merge with either the data step or PROC SQL , a bunch of IF statements and a macro.
One way I learned very early on to do this in SAS was to use a PROC FORMAT.
Say I have a bunch of possible codes for outcomes for my program and they are coded 02 = applies all the way to 32 = “Post-employment services” .
PROC FORMAT;
VALUE stats
0 = “referral”
2 = “applicant”
6 = “evaluation”
8 = “closed”
10 = “eligible”
12 = “IPE complete”
14 = “counseling”
18 = “training”
20 = “service complete”
24 = “service interrupted”
22 = “employed”
26 = “successful closure”
28 = “closed after service”
30 = “closed before service”
32 = “post-employment”
;
Another option, if I wanted to combine categories, like those who had a successful and unsuccessful outcome, is to do it like this :
PROC FORMAT;
VALUE stats
0, 2, 6, 10 , 12, 14, 18, 20 = “open case”
24, 28, 30 = “unsuccessful”
22, 26 , 32 = “successful closure”
;
In either case, if I just wanted to have the type of service printed , I could use a FORMAT statement , like this.
FORMAT status_type stats. ;
If I wanted to create a new variable I could use the put function to put the original value into a new variable using the format.
DATA testy ;
SET mydata.vr_codes ;
recoded_status = PUT(status_type,stats.) ;
Is there any advantage of PROC FORMAT over doing 20 or 50 IF statements?
I can think of several. First of all, you can use the same PROC FORMAT repeatedly. If you need to do the same transformation with several different data sets, you can just do the format procedure once, include one PUT or FORMAT statement in each data step and you are done. Second, since you can store formats permanently, if you haven’t gotten around to learning macros yet, this can be one method of using the same code over and over in different programs. Third, it’s just less messy to type, which seems trivial until you have 300 values to recode.
Some day I might write a post on user-defined formats, especially how to store and re-use them. Today is not that day. In the meantime, I highly recommend reading this paper on building and using user-defined formats by Art Carpenter while you are waiting.
I live in opposite world. I blog on SAS and statistics for fun and make games for a living. Check out Making Camp Premium. Learn about Ojibwe culture, brush up your math skills, learn more English and have fun. All for under two bucks.