Fun studying deaths of old people – or not
I am probably going to hell for this … because today I was studying the death rate of older people using the data from Kaiser Permanente available on the Inter-university Consortium for Political and Social Research (ICPSR) website and really having a great time.
Reading .stc
First funny thing, after I extracted it and noticed I had a .stc file, I remembered needing to do a PROC CIMPORT or something but not exactly how to do it. I typed it into Google and the first page that came up was a post over two years ago by me!
This is all it takes to read in the file
Libname in "C:\Users\AnnMaria\Documents\oldpeople\sasdata" ;
Filename readit
'C:\Users\AnnMaria\Documents\oldpeople\ICPSR_04219\DS0002\04219-0002-Data.stc' ;
proc cimport infile = readit library = in ;
run ;
Read your SAS log
Check this out
NOTE: Proc CIMPORT begins to create/update data set IN.DA4219P2
NOTE: Data set contains 39 variables and 14730 observations.
Logical record length is 248
NOTE: Proc CIMPORT begins to create/update catalog IN.FORMATS
NOTE: Entry DTHFLAG.FORMAT has been imported.
NOTE: Entry SEX.FORMAT has been imported.
NOTE: Entry DISP.FORMATC has been imported.
NOTE: Total number of entries processed in catalog IN.FORMATS: 3
So, SAS has now very nicely created my formats and stored them in a catalog.
Go, .stc files!
Using the formats created
options fmtsearch = (in) ;
The “in” refers to the folder where the formats were automatically stored, which is the same folder as I specified in my LIBNAME statement.
Proc freq awesome options
I cannot believe I have never used the binomial options in PROC FREQ. How did that happen? I decided to overcome this lack in my life today. I decided to test if your odds of living to the end of the study were 1 out of 3. Check this out ….
proc freq data = in.da4219p2 ;
tables dthflag / binomial (exact equiv p = .333) alpha = .05 ;
The binomial (equiv p = .333) will produce a test that the population proportion is .333 for the first category. That is “No” for death. A Z-value will be produced and probabilities for one-tail and two-tailed tests.
The exact keyword will produce confidence intervals and, since I have specified alpha = .05, these will be the 95% confidence intervals.
You can see the output here. It is very cool.
Why am I going to hell?
Well, hopefully I am not, but I was having such a good time today and then I remembered something from decades ago. I was a graduate student and I was working for two different professors on two projects at the same time. One was a project interviewing parents of children with disabilities to understand family functioning. The other used a very large government records data set for estimating mortality among people with different disabilities. Because death rates are low, particularly for children, every time a record changed to show another death, we were happy because our sample size increased, thus giving greater (statistical) power for our tests. We had another death show up in our data and we were quite pleased about that because we were getting close to a large enough sample size for some of the statistical analyses we had in mind.
A few days later, we went to interview a family. The mother came to the door, looking like she’d been crying for a solid week, which she probably had. She said dully,
“I’m sorry I forgot to call and cancel the appointment. Sammy died 12 days ago.”
Then it hit me. That six-year-old in our data who died was not a number. He was her son, Sammy. (No, his name wasn’t really Sammy, but over twenty years later, I do remember his name.)
So, yeah, all the SAS stuff and the statistics was fun today, but at the end of it I tried to remember that every one of those 9,170 people who died were someone’s husband, wife, mother, father, grandmother or grandfather and be respectful of that.
One Comment