Learning Advanced SAS from a Macro
I was using a macro this week written by Feng, Yu & Xu and there were so many less-used and advanced techniques in it I thought it would be a great example to use for teaching SAS.
** ** Nifty thing one computing the caliper width ***** ;
Like with other propensity score macros, it started with a logistic regression. This was to do a matching with calipers and since a lot of people use .25* the logit as the caliper value, I need to calculate that. Notice the xbeta = logit on the OUTPUT statement in the PROC LOGISTIC step.
%macro match(data, class, yvar, xvar, id);
**Perform logistic regression, select independent variables, and create the propensity
score***;
PROC LOGISTIC descending data=&data noprint;
&class;
model &yvar=&xvar;
OUTPUT =propen(drop=_level_) prob=prob XBETA = LOGIT ;
run;
**** Here they use PROC UNIVARIATE to output the standard deviation of the logit. The result is a file with one record ;
/* Calculate standard deviation of the logit */
proc univariate data=propen noprint;
var logit;
output out=propen_sd std=sd;
run;
*** This is pretty standard just outputting to case and control files, giving the control file a random number;
***create datasets of case and control, which contain patients from case and control groups ***;
data ctrl case;
set propen;
if &yvar=1 and prob ne . then do;
rannum=ranuni(1234567);
label rannum=’Uniform Randomization Score’;
output ctrl;
end;
else if &yvar=0 and prob ne . then do;
output case;
end;
**** Nifty thing two: Adding the standard deviation of the logit to every record.
data ctrl;
if _n_=1 then set propen_sd(keep=sd);
set ctrl;
run;
I had to look this up. Did you know that using a conditional SET statement causes you to RETAIN every variable in that dataset. Notice that data set only has one variable, due to the KEEP option that follows it. The result is that the value of sd, that is, the SD of the logit, is added to every record in the ctrl file.
Thanks to Paul McDonald who wrote a paper for SAS Global Forum If _N_ = 1 then set . Yes, that was actually the name of it.
Then, they sort by the random number, not too interesting.
***sort ctrl by random number generated ***;
proc sort data=ctrl;
by rannum;
run;
*** Nifty thing three, if you are learning macros , an example of CALL SYMPUT, plus creating a macro variable that is the total N ;
/* The tot macro variable is created here */
data _null_;
set ctrl;
by sd;
if last.sd then call symput(‘tot’,put(_n_,8.));
run;
This is nifty in the first place because it uses CALL SYMPUT to put the value of _N_ into a newly created macro variable called tot . Notice they did not actually bother sorting by sd, since it is the same for every record. However, they used the BY sd and then LAST.SD . That means the IF statement will be true on the last record in the data set, since everyone has the same value for SD. On the last record, the value of _N_ is the last record in the data set. I found this very clever because there are many times when people would want to know the number of records and put that into a macro variable and there are multiple ways to do it. This is one I had not thought of, plus it demonstrates the CALL SYMPUT so double learning points.
*** Nifty macro learning things four, five and six
First, they just create a data set (boring!) then they use the %LET statement and the %EVAL function. The %EVAL function is pretty essential in SAS macro language. It evaluates the value of a variable as a number instead of text. By default, SAS is going to treat the value of any macro variable as text. So, for example, 1 +2 is not evaluated as 3. %EVAL only works with integers but since the number of records is always going to be an integer, you’re good here.
data case_temp;
set case;
*** select one patient at a timee time from dataset ctrl, and search for matched-pair in case group ***;
%let j=%eval(&tot);
Next, they use the value of j, which is now the total number of subjects in the data set, in a DO statement, so that they are going to do these statements for every person in the control group. A “caliper” is set up for each person so that is the logit for that subject plus or minus .25* the standard deviation of the logit. All the cases that fall within that range are collected in a data set named match_temp
%do i=1 %to &j;
data ctrl_temp;
set ctrl;
if _n_=&i;
low=logit – 0.25*sd;
up =logit + 0.25*sd;
run;
data match_temp;
if _n_=1 then set ctrl_temp(keep=low up);
set case_temp(drop=rannum);
if low<=logit<=up;
run;
*** Nifty thing seven – an illustration of a use of PROC SQL.
*************************************************************************************;
*** calculate number of patients who were included in the match dataset *** ;
*** 1) if there is no match found, then select next patient from case group ***
*** 2) if there is one matched, select this patient and go to next round ***
*** 3) if there are more than 1 patients, calculate Mahalanobis distance and ***
*** select the patient with smallest distance, then go to next round ***
*************************************************************************************;
Here they use PROC SQL for another way of finding the number of records in a file (I told you there were multiple ways). They create a new variable that is the count of IDs, which is going to be the number of people in the data set since each one has a unique ID.
proc sql;
create table casen
as select count(&id) as n
from match_temp;
quit;
data _null_;
set casen;
if _n_=1 then call symput(‘casen’,put(n,8.));
run;
This is followed by some more %IF , %THEN and %DO loops that, if no match is found, goes on to the next case, and if only one matches is found, keeps that in the matched data set we are building.
If more than one match is found — well, that’s where it gets really interesting.
—– Tune in next time for the rest of the story, as Paul Harvey used to say.
For “nifthy thing 2” there’s an even nicer solution using the point= option. See here: http://www.kenkleinman.net/home/index.php/sas-and-r-code/sas-tricks/62-sas-tricks.html
Nifty thing seven could be shortened using the :into operator in proc sql. This way you avoid creating the casen dataset in work and directly put the result of the count in a macro variable.
proc sql noprint;
select count(&id) into :casen from match_temp;
quit;
There are a number of different ways things could be done (and thank you both very very much for your comments, by the way). I had to put in very twice to show that I really appreciated them.
Sometimes I read other people’s code and wonder why they chose to do things the way they did.
I know in my case it is sometimes that is the way I thought of it first and I didn’t have time to go back and change it later.
What I really like about this macro though, and there is more coming, after I get done fighting with Python and SPSS today, is the number of different techniques they did from the qualified SET statement to PROC SQL to the PRINCOMP in the second part of it.