You can’t “exactly” tell anything with statistics
Some people believe you can say anything with statistics. I don’t believe that is true, unless you flat out lie, but if you are a big fat liar, I am sure you would lie just as much without statistics.
However, a point was made today when Marshall and I were discussing, via email, our presentation for the National Indian Education Association. One point we made was, while most vocational rehabilitation projects serve relatively few youth, the number at Spirit Lake has risen dramatically. He said,
You said the percentage of youth increased from 2% to 20% and then you said the percentage of youth served tripled. Which was it?
It depends on how you slice your data
There is more decision-making in even basic statistics than most people realize. We are looking at a pretty basic question here, “Did the percentage of the caseload that was youth age 25 and under, increase?”
The first question is, “Increase from when to when?” That is, what year is the cutoff? In this case, that answer is easy. We had observed that the percentage of youth served was decreasing and changes were undertaken in 2015 to reduce that trend. So, the decision was to compare 2015 and later with 2014 and earlier.
How much of an increase is found depends on the year used for comparison and whether we use one year or an average.
The discrepancy between the 10x improvement versus 3x comes because the percentage of youth served by the project varied from year to year, although the overall trend was going down. If we wanted to make ourselves look really good, we could compare the lowest year – 2013 at 2% with the highest year, 2015 at 20% and say the increase was 10x, but I think that isn’t the best representation, although it is true. One reason is that the changes we discussed in the paper weren’t implemented until 2015, so there is no justification for using 2013 as the basis.
The second question is how do you compute the baseline? If we use all of the data from 2008-2014 to get a baseline, youth comprised 7% of the new cases added. At first, I used the previous year six years as baseline 2008-2014, we get 7% and if we compare that to 2015 with 20.2% the percentage of youth served almost tripled.
However, we just started using the current database system in 2012 fiscal year and the only people from prior years in the data were those who had been enrolled prior to 2012 and still receiving services. The further back in time we went, the fewer people there were in the system, and they were definitely a non-representative sample. Typically, people don’t continue receiving vocational rehabilitation services for three or four years.
You can see the number by year below. The 2018 figure is only through June of this year, which is when I took a snapshot of the database.
If we use 2013-2014 as a baseline, the percentage of youth among those served was 4%. If we use 2012-2014, it’s 6%.
To me, it makes more sense to compare it to an aggregate over a few years. I averaged 2012 through 2014 because it gave larger sample size, had representative data and also because I didn’t feel comfortable using the absolute lowest year as a baseline. Maybe it was just a bad year. As any good psychometrician knows, the more data points you have, the more reliable your measure.
The third question is how to select the years for comparison. I combined 2015-2018 also because it gave a larger sample size and, again, I did not want to just pick the best year as a comparison. Over that period, 18% of those served by the project were youth.
So … what have we learned? Depending on how you select the baseline and comparison years we have either improved 10 times, from 2% to 20% , 2.6 times, from 7% to 18%, tripled, from 6% to 18% , quadrupled, from 4% to 20% – and there are some other permutations possible as well.
Notice something here, though. No matter how we slice it, after 2014, the percentage of youth increased, and substantially so. This increase was maintained year after year.
I thought this was an interesting example of being able to come up with varying answers in terms of the specific statistic but no matter what, you came to the same conclusion that the changes in outreach and recruitment had a substantial impact.