SPSS Propensity Scores – Part 2
I wrote Part 1 a couple of years ago, so I guess I’m due for a part 2. In this case, I started with a data set in SAS but because it was going to be used by a group who had some SAS users and some SPSS users, they wanted to have the code for both SPSS and SAS.
Levesque wrote a lovely macro years ago to do propensity score matching and a few years later John Painter added to it a bit. Both of them did great work for which many people should be extremely grateful (I am!) .
However, I think for many people who use SPSS primarily by pointing and clicking, they still may have a bit of trouble with the pre-processing that this macro assumes you will do. They also may not be so sure about how to code things for a Mac or how to do the post-processing after the macro. So, as a public service, here you go, Part 2.
Start with defining the path where your interim files and the final matched file will be stored. If my path looks funny to you it is because you are probably using Windows and I did this on a Mac.
/* Change file path here and only here */
DEFINE !pathd() ‘/Volumes/Mystuff/SaveHere/’ !ENDDEFINE.
/* This is the data set with all of my original data */
DEFINE !readin() ‘/Volumes/Otherplace/HereIs/inputfile.sav’ !ENDDEFINE.
IT IS ASSUMED that your dependent variable is named treatm and coded 0 for the control (larger) group and 1 for the treatment (smaller) group. If that is NOT the case you will need to execute a GET FILE statement to readin your data and then do whatever you need to make it coded 0 or 1. In my example I have a variable, Alive, coded in the OPPOSITE direction of what I need, that is most people are alive (1) and a few people are dead (0). It is easy enough here to execute a COMPUTE command and make treatm = Alive – 1 .
* Get file and make sure .
* Dependent is named treatm and coded 0 or 1.
GET
FILE= !readin .
COMPUTE treatm=1 – Alive .
EXECUTE.
******************** .
* Perform logistic regression to compute propensity score .
******************* .
I could have made the a variable list and dependent variable macro variables also but since they are only used this one time, that seemed kind of silly.
Note the RENAME VARIABLES – that is going to name the propensity score to propen. That is used throughout the macro, so don’t change that. Also, the output file from the logistic regression is going to be test.sav in whatever directory you specified above. That is also used throughout the macro, so don’t change that either. In fact, don’t change anything here other than the dependent variable name and the list of independent variables after ENTER .
LOGISTIC REGRESSION VARIABLES depend
/METHOD=ENTER V1 v2 othervar morevar1 morevar2
/SAVE=PRED
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
RENAME VARIABLES (PRE_1=propen) .
SAVE OUTFILE=!pathd + “test.sav” .
EXECUTE.
After this is the macro Levesque wrote. It works fine. Just copy and paste it into your syntax file. Yay, Levesque. He also has a really good book on data management and programming. I HIGHLY recommend it, as well as just perusing his site. You’ll learn a ton about SPSS.
********************* .
** End Preparation .
********************* .
GET FILE= !pathd + “test.sav”.
COMPUTE x = RV.UNIFORM(1,1000000) .
SORT CASES BY treatm(D) propen x.
COMPUTE idx=$CASENUM.
SAVE OUTFILE=!pathd + “mydata.sav”.
* Erase the previous temporary result file, if any.
ERASE FILE=!pathd + “results.sav”.
COMPUTE key=1.
SELECT IF (1=0).
* Create an empty data file to receive results.
SAVE OUTFILE=!pathd + “results.sav”.
exec.
********************************************.
* Define a macro which will do the job.
********************************************.
SET MPRINT=no.
*////////////////////////////////.
DEFINE !match (nbtreat=!TOKENS(1))
!DO !cnt=1 !TO !nbtreat
GET FILE=!pathd + “mydata.sav”.
SELECT IF idx=!cnt OR treatm=0.
* Select one treatment case and all control .
DO IF $CASENUM=1.
COMPUTE #target=propen.
ELSE.
COMPUTE delta=propen-#target.
END IF.
EXECUTE.
SELECT IF ~MISSING(delta).
IF (delta<0) delta=-delta.
SORT CASES BY delta.
SELECT IF $CASENUM=1.
COMPUTE key=!cnt .
SAVE OUTFILE=!pathd + "used.sav".
ADD FILES FILE=*
/FILE=!pathd + "results.sav".
SAVE OUTFILE=!pathd + "results.sav".
************************************************ Match back to original and drop case from original .
GET FILE= !pathd + "mydata.sav".
SORT CASES BY idx .
MATCH FILES
/FILE=*
/IN=mydata
/FILE=!pathd + "used.sav"
/IN=used
/BY idx .
SELECT IF (used = 0).
SAVE OUTFILE=!pathd + "mydata.sav"
/ DROP = used mydata key delta.
EXECUTE.
!DOEND
!ENDDEFINE.
*////////////////////////////////.
SET MPRINT=yes.
**************************.
* MACRO CALL (first insert the number of cases after nbtreat below) .
**************************.
So much for the macro definition. Now you need to call it.
Replace the ### here with the number in your treatment (smaller) group, the people that are coded treatm = 1 .
!match nbtreat= ### .
Here is more of Levesque's work. Just copy and paste it.
* Sort results file to allow matching.
GET FILE=!pathd + "results.sav".
SORT CASES BY key.
SAVE OUTFILE=!pathd + "results.sav".
******************.
* Match each treatment cases with the most similar non treatment case.
* To include additional variables from original file list them on the RENAME subcommand below .
******************.
GET FILE=!pathd + "mydata.sav".
MATCH FILES /FILE=*
/FILE=!pathd + "results.sav"
/RENAME (idx = d0) (id=id2) (propen=propen2)
(treatm=treatm2) (key=idx)
/BY idx
/DROP= d0.
FORMATS delta propen propen2 (F10.8).
SAVE OUTFILE=!pathd + "mydata and results.sav".
EXECUTE.
* That's it!.
He says that's it, but there is a little more to it than that. The original macro assumed you had four variables, a propensity score, ID, treatm and improve, that is, a variable that shows improvement.
What if you have a lot more than that and you would like to merge this back with your original data set and have the matched and treatment subjects selected out by ID number with everything else you may have recorded on them?
In the file the macro produces, it has N cases, where N is the number of people in the treatment group. Perfect if you want to do a dependent t-test sort of analysis, but that is not what I want. I want N*2 cases with each ID on a separate row.
There are probably more beautiful ways to do this, but here is one that works.
Get the file that the macro produced with all of the data. If the case has a value for propen2, it was one of the cases selected. Output that to the results2 file and keep id2 (this is the match ID). Rename that variable to ID.
GET
FILE= !pathd +'mydata and results.sav'.
SELECT IF NOT (SYSMIS(propen2)).
SAVE OUTFILE=!pathd + 'results2.sav' / KEEP = id2.
EXECUTE.
GET
FILE=!pathd + 'results2.sav'.
RENAME VARIABLES (ID2 =ID) .
SAVE OUTFILE=!pathd + 'results2.sav' / KEEP = id .
EXECUTE.
Get the file with the results and keep the id variable. That is the case ID. Output that to the results1 file.
GET
FILE= !pathd + 'mydata and results.sav'.
SELECT IF NOT (SYSMIS(propen2)).
SAVE OUTFILE=!pathd + 'results1.sav' / KEEP = id.
EXECUTE.
Concatenate the results1 and results2 files. This is your file of all of the ids, the treatment cases and their nearest match.
ADD FILES
/FILE= !pathd + 'results1.sav'
/FILE= !pathd + 'results2.sav' .
SAVE OUTFILE=!pathd + 'resultsall.sav' .
EXECUTE.
This is just a quality check. On the final results file, all of the records should have a variable inmatch with a value of 1.
GET
FILE= !pathd + 'resultsall.sav'.
COMPUTE inmatch = 1 .
SAVE OUTFILE=!pathd + 'resultsall.sav' .
EXECUTE.
You absolutely have to sort the cases by ID and save the sorted file before you can merge them. This sorts the resultsall.sav file just created.
SORT CASES BY ID(A).
SAVE OUTFILE= !pathd + 'resultsall.sav'
/COMPRESSED.
This sorts the original data file (remember your original data file?)
GET
FILE= !readin .
EXECUTE.
SORT CASES BY ID(A).
SAVE OUTFILE= !readin
/COMPRESSED.
Now, we will finally match the subjects from the treatment group and their matched controls back together and save them in a new file named matches.sav that is in that directory you specified at the very beginning.
MATCH FILES /TABLE= !pathd + 'resultsall.sav'
/FILE= !readin
/BY ID.
SAVE OUTFILE= !pathd + 'matches.sav'
/COMPRESSED.
EXECUTE.
Now that you have your matched file, I strongly suggest you do some tests to see that your propensity score matching worked as hoped.
Which I did and nothing was within shouting distance of being significant, so I was happy.
Once you have calculated propensity score to use for matching, you could just use the FUZZY extension command available from the SPSS Community website to match within a specified tolerance based on that score. It requires the Python Essentials for SPSS Statistics, also available from that site.
Jon,
Since that was what I needed to do tomorrow, you are now officially my new best friend!
The FUZZY/Help. command takes data from two different datasets. But, my cases and controls are embedded in one dataset. How can I create a syntax for that?
Thanks for your help!
I need to match 1:4 using propensity score matching. I have spss 24 on a pc.
Can anyone help me with this?
Thank you