Controlling for Damn Near Everything: Propensity Score Matching

ByAnnMaria De Mars June 3, 2009June 3, 2009

Lately I have been on a roll looking at relatively less common statistical techniques, proportional hazards, survival analysis, etc.

In keeping with that, I have been taking a look at propensity score matching, fondly known as PSM by, – well, by no one actually.

The problem to be solved ….

Think about some of these comparisons:

Hospitals with special burn,cardiac or neonatal units versus general hospitals
Public schools versus parochial, private or charter schools
People who watch TV > 40 hours weekly versus those surfing the Internet > 40 hours

In all of these cases, and probably a lot more you can think of, there are very likely differences in certain “outcome” variables, whether it be survival in the case of hospital patients, academic achievement of students or annual income of TV versus Internet users. However, all of these comparisons also begin with groups who are already different.

For example …

You have two groups, say people who are treated at a hospital with a specialized unit for terminally ill patients and patients from another hospital without any such specialized unit. Your outcome variable of interest is whether the patient lived or died.

The simplest way to test this is a chi-square. You compare the percentage of people who survived at St. George of Money Hospital versus Heart of Despair County Hospital. There is a problem with that, though. A simple comparison will almost always show WORSE outcomes for hospitals with special units for patients who are terminally ill, seriously burned, extremely premature births, etc. The reason is probably obvious – if you get sicker patients, they are less likely to live. If your interest is in knowing whether having a specialized unit increases your chances of survival, you would want to compare similar groups.

It isn’t as simple as just controlling for severity of condition, though. There are other variables, for example, people who are better educated, who have private insurance and who live in urban areas all may be more likely to be patients at more “elite” hospitals. Some of those factors may be related to survival as well. What we’d really like is to compare a group of people from St. Money’s that is similar to patients from Despair.

In short, certain types of people have a greater propensity to be admitted to one type of place than the other.

Enter propensity score matching — to the sounds of trumpets and wearing a cape.

In fact, the first step is to do a logistic regression analysis and I will admit that it is not strictly necessary to wear a cape while doing so but it would probably be more comfortable than this business suit from Filene’s that I am wearing.

Using SPSS, go to the ANALYZE menu, select REGRESSION, then select BINARY LOGISTIC. Your dependent variable will be the hospital to which the patient was admitted. Covariates are the variables such education, severity of illness and insurance that you want to control. For variables that are categorical, e.g., insurance, which could be private, public (a.l.a. MediCal if it hasn’t disappeared in the latest round of state budget cuts) and none, click on the CATEGORICAL button and move those over to the “Categorical covariate” window.

Here’s the really important part — click on SAVE and select PREDICTED PROBABILITIES – that is your propensity score.

This is what you are going to match on. Hence the name.

This is step one. I would say it gets easier after this point – but it doesn’t.

Dr. De Mars General Life Ramblings | The Julia Group

FINALLY we look like we know what we’re doing

ByAnnMaria De Mars September 20, 2014September 21, 2014

Yes, I do realize that I’m probably far more excited about our new website coming on line than is normal. Several points here on a Friday night: I completely disagree with those entrepreneurs who say, “You sell the sizzle not the steak” when what they mean is that they really don’t have a good product…

statistics

Phi coefficients, Christmas and the number 42

ByAnnMaria De Mars January 3, 2009January 3, 2009

People like familiarity. That’s probably one reason we enjoy the holidays so much – we know all the words to Silent Night, how to carve a turkey, which of the Christmas cookies taste the best. If I am going to convince you to give up statistics with which you feel comfortable, such as chi-square and…

Dr. De Mars General Life Ramblings | The Julia Group

The Truth about Start-up Life

ByAnnMaria De Mars February 4, 2013February 9, 2013

Before you decide you want to start a business, be sure you understand the difference between: Doing whatever you want. Owning a business. Not having anyone tell you what to do. Here is what I wanted to do in the past month: A lot of php A lot of javascript Write a new computer game…

Dr. De Mars General Life Ramblings | Software | statistics

CHAPTER 1: AFTER THE DATA STEP

ByAnnMaria De Mars June 1, 2011June 1, 2011

Any person who claims to know all of SAS is either clinically insane or a liar. However, that is not you. YOU are reading this book. Based on this one fact, I can conclude a couple of things about you. First, you know the basics of SAS. You can code a DATA step. You have…

Software | statistics

What would you do if one person changed your results?

ByAnnMaria De Mars December 30, 2017

This is a hypothetical question, but it could easily happen. Let me give you a real example. Using a mobile phone game, we administered a standard depression screening measure (CESD-C) to 18 children living on or near an American Indian reservation. All children had a family member who was an alcoholic or addicted to drugs. …

statistics

What’s the first thing you tell students about statistics?

ByAnnMaria De Mars November 22, 2013November 22, 2013

I’m looking forward to teaching my first masters level course in a lo-o-ng time next week. Since this may be the first course students take in their masters program, the question I’m faced with is, “What would you tell someone at the very beginning of learning about statistics?” I’m starting with this: Bias = bad…

23 Comments

Cristina Barattoni says:

July 22, 2010 at 6:35 am

Please….. What’s step no. two????

thank-you..
Jack says:

September 14, 2010 at 9:15 pm

Yes, what is step two?!
AC says:

October 17, 2010 at 8:20 am

and step 2???
dave says:

January 17, 2011 at 10:22 pm

this is quite helpful. what’s the next step
lisa kiesel says:

January 27, 2011 at 3:19 pm

Yes, Please, what is the next step????
Gbogbo Emmanuel says:

March 10, 2011 at 9:53 am

Thank you, but desperately need the follow up step to enable me finish up with my thesis.
susan says:

March 25, 2011 at 3:08 pm

Do you describe Step 2?
annie says:

September 17, 2011 at 8:09 pm

what’s the step 2?
AnnMaria says:

January 12, 2012 at 3:55 am

Once you have the scores, for every participant you match with a non-participant. That is the matching part. Say I am looking at 600 people who were admitted to St. Money’s and 7,200 admitted to Despair. For each of the 600, I find a person in Despair who has the identical propensity score. If there is more than one person, I randomly sample a person from those that match. If there is no one with the identical score, I sample the person as close as possible. So, I end up with 600 in each group and then do my analysis.
sandra says:

January 25, 2012 at 11:38 am

I think I love you. Just got a job where they expect me to do this!
Cate says:

January 27, 2012 at 9:04 pm

AnnMaria,
that is what I did as well, but it is very time consuming doing this manually. Do you have any tricks to doing the matching? I was only matching around 80, but 600 would have been huge to do manually.
Appreciate the thread!
Cheers,
Cate
AnnMaria says:

January 27, 2012 at 10:03 pm

In step 2 you run a macro to match the scores. You can do this in SPSS or SAS and there are a number of macros available you can customize to your own needs. Here is one example in SPSS

http://www.spsstools.net/Syntax/RandomSampling/MatchCasesOnBasisOfPropensityScores.txt
Jannick says:

February 13, 2012 at 5:22 am

Can I ask why you just don’t control for all covariates? Won’t controlling for severity of the condition, education, having private insurance and living in urban areas, and all the other covariates relating to hospital attended and survival chances, produce the same results as the work-intensive propensity score matching? In other words, what are the advantages of propensity score matching versus controlling for all covariates in your initial multivariate model? Thanx!
AnnMaria says:

February 13, 2012 at 5:32 am

That is a really interesting question and it is very timely because I wrote this post years ago and am writing part 2 at this very moment.

Some people say there is no advantage of propensity score matching versus controlling for all covariates:

Check out

The Importance of Covariate Selection in Controlling for Selection Bias in Observational Studies

by Steiner et al.

They argue that having the right covariates is far more important than whether you use propensity scores or covariates. I agree.
AnnMaria says:

February 13, 2012 at 5:43 am

Here is part 2

http://www.thejuliagroup.com/blog/?p=2104
Pingback: SPSS Propensity Scores – Part 2 : AnnMaria’s Blog
Jannick says:

February 13, 2012 at 10:39 am

Dear AnnMaria,

Thank you so much for the reference! It was enormously helpful. I agree too now 😉

Take care!
jannick
Pankaj says:

February 18, 2012 at 12:09 am

Thanks so much. I was almost lost while reading this topic in many of the books/documents etc. but was unable to get the crux.
It was really helpful.

Pankaj
Pankaj says:

February 18, 2012 at 12:49 am

One more thing to ask, is there any criteria that where to apply a logit or a probit model for propensity scores or it simply works using the basics of modelling (Regression)
Kari says:

March 9, 2012 at 1:36 am

Stata would be more easier to perform this propensity matching..without any macros
Vivian says:

October 20, 2014 at 8:56 pm

Dear AnnMaria,
I have a dilema – I have two experimental groups – roughly the same size- and these could be combined and a comparison group that is roughly equivalent in size to one of the experimental groups or half the combined experimental groups. I know I can weigh propensity scores as well as use them for matching but my sample sizes are small to start with. I also thought I could compare each experimental group separately to the control. Any suggestions?
AnnMaria says:

October 20, 2014 at 9:36 pm

If your two groups are very small, propensity scores are not going to be a good choice. Comparing the experimental groups separately would give you a higher Type I error because each test would have a .05 probability of error. If you can reasonably combine your experimental groups, e.g., they were just people sampled on two different days to see the targeted ad, that is a possibility. Or, you could an an ANOVA with the three groups if you don’t think it is justifiable to combine the two control groups.
Shahida says:

February 17, 2015 at 12:56 pm

Hi! Have you ever used SPSS Complex Samples with Propensity Score Matching (v22)? I’m encountering an issue with my large dataset… It seems to just keep running without any results. Is it possible that missing data is causing the analysis to crash? Any insight?

Thank you!!!

Similar Posts

23 Comments

Leave a Reply