What’s the first thing you tell students about statistics?
I’m looking forward to teaching my first masters level course in a lo-o-ng time next week. Since this may be the first course students take in their masters program, the question I’m faced with is,
“What would you tell someone at the very beginning of learning about statistics?”
I’m starting with this:
Bias = bad
Bias is to statisticians as sin is to preachers. We’re against it.
Bias is SYSTEMATIC error. While it is generally impossible to avoid error, in an unbiased study, error will be random.
Random = good
If error is random, we would be equally likely to err in one direction as the other, and so, on the average, would get the correct result. For example, if I was evaluating fighters to decide if they really did have brain damage as a result of being hit in the head too many times, in some borderline cases I might incorrectly decide the fighter was fine when, in fact, there was some minimal brain damage. In other cases, I might decide the person had damage, when he or she was just somewhat on the low side of the bell curve in terms of functioning brain cells. On the average, though, those errors should balance out and I should get the correct conclusion.
Random assignment is good because it means that people are equally likely to be assigned to one group versus another, so it is likely to control for confounding variables. What are confounding variables? Those are factors that may have complex relationships that distort the relationships found between your predictors/ risk factors and outcome variables. For example, people residing in nursing homes (my predictor) may be more likely to die (my outcome) but that might be because they are older or in poorer health (confounding variables).
Random selection is good because it means that everyone in the population has an equal chance to be selected, which means that, if you have a large enough sample, your sample is likely to be representative.
What’s a sample? What’s a population? What’s representative?
Well, we’ll get into that shortly.
But, speaking of random, I thought the most important thing to begin with was not how to find a mean or standard deviation but that bias is bad, because if you have bias, you are worse off after you found the mean than before you knew how to compute it. Before you didn’t have any information, you didn’t know the mean and you knew you didn’t know it.
With bias, you still don’t know the mean, but you think you do. You’ve actually gone backwards.
Think about it.
I think to also question the quality of the data and how it was obtained. Missing values, negative ages, numerous categories when there shouldn’t be etc. this affects the statistics and should also be investigated/explored whilst considering bias and randomness.
“on the low side of the bell curve in terms of functioning brain cells.”
lol
Well, I’m not a prof, but I’ve taken intro stats courses several times. : ) I like the bias/precision discussion. I always found images like this useful for that: http://www.yorku.ca/psycho/en/postscript.asp
But for intro stats, might start with explaining *why* people use statistics. Simplisticly, in many settings we can’t count/measuere everything. For those populations that are too big to measure everything, we will never know the truth exactly. When that happens, we can either know nothing, or use statistics to develop an *estimate* based on a sample….
Which leads nicely into your discussion of whether you would rather have an estimate with bias or imprecision.
Love your blog. It’s inspiring.
Regarding bias, and whether it is universally bad, I have always liked Maurice Kendall’s “Hiawatha designs an experiment” http://www.columbia.edu/~to166/hiawatha.html
Ha ha, Alex – The Hiawatha designs an experiment is funny!