Before you even THINK about propensity score matching
Propensity score matching has had a huge rise in popularity over the past few years. That isn’t a terrible thing, but in my not so humble opinion, many people are jumping on the bandwagon without thinking through if this is what they really need to do.
The idea is quite simple – you have two groups which are non-equivalent, say, people who attend a support group to quit being douchebags and people who don’t. At the end of the group term, you want to test for a decline in douchebaggery.
However, you believe that that people who don’t attend the groups are likely different from those who do in the first place, bigger douchebags, younger, and, it goes without saying, more likely to be male.
The very, very important key phrase in that sentence is YOU BELIEVE.
Before you ever do a propensity score matching program you should test that belief and see if your groups really ARE different. If not, you can stop right now. You’d think doing a few ANOVAs, t-tests or cross-tabs in advance would be common sense. Let me tell you something, common sense suffers from false advertising. It’s not common at all.
Even if there are differences between the groups, it may not matter unless it is related to your dependent variable, in this case, the Unreliable Measure of Douchebaggedness.
Say, for example, that you find that your subjects in the support group are more likely to eat grapefruits for breakfast, live on even-numbered streets and own a parrot. Even though I’d be a little suspicious of anyone who gets up early enough to eat breakfast, if it turns out that none of those variables are related to how big of douchebag you are, there is no point in doing a propensity score match.
Finally, and perhaps most obvious and most frequently overlooked, if your dependent variable is not measured reliably, no amount of statistical hocus-pocus is going to make anything predict it. (Short explanation – an unreliable measure is one that has a large proportion of error variance. Error variance is, by definition, random. Random error is not going to be related to anything. Imagine that every student just colored in the bubbles in the test at random. Now imagine trying to predict the test scores with any variable. Not happening. I think all students SHOULD color in the test sheets at random. I did once. The school psychologist told me I was mentally retarded. She was wrong.)
and AFTER you do propensity score matching (or anything else) …
Even after all of this, sometimes it still doesn’t work. A few years ago, I had a client who had a really logical theory, well-designed study and when we ran the analyses every which way, none of the data supported their hypotheses.
At the end of it all, the client asked me what else we could do, and I said
“There isn’t anything else we can do that I would recommend. You know, sometimes the theory is just wrong.”
It reminds me of the title of a good presentation I went to at the Joint Statistical Meetings earlier this month,
“Bayesian statistics are powerful but they’re not magical”
I think that could be applied to just about any kind of statistical technique. I wish I had said it first.
For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV
Nice post, but I thought the real advantage of propensity score matching was to combine the effects of a bunch of variables on which the groups likely vary into one score, thus saving a lot of degrees of freedom in the regression (of whatever type) you are doing.
It can also make the output from a regression simpler, if you aren’t interested in all those covariates.
But propensity score matching has problems if the groups are really different – that is, if there isn’t much overlap in their scores on the covariates. I saw this happen in one study of the effects of job training. The two groups had almost no overlap on education, and no overlap at all on joblessness. Propensity score analysis gave ridiculous results.
Aside from the situation Peter mentions (saving degrees of freedom when your dataset is extremely limited), you shouldn’t expect propensity scores to do anything that a normal regression model controlling for the same variables would do. If you have unmeasured confounding factors, both types of analysis are going to be biased.
If you’re dealing with a client who needs to be convinced that propensity scores have no magic powers, you might be interested in “Propensity scores: help or hype?” by Winkelmayer et al. (Nephrol Dial Transplant 2004) as a reference.
I do consider all the concepts you’ve presented on your post. They’re very convincing and will definitely work. Still, the posts are very brief for starters. May you please extend them a little from next time? Thanks for the post.