SAS Studio: Finding prevalence with pointing and clicking

ByAnnMaria De Mars February 24, 2016

Policy makers have very good reason for wanting to know how common a condition or disease is. It allows them to plan and budget for treatment facilities, supplies of medication, rehabilitation personnel. There are two broad answers to the question, “How common is condition X?” and, interestingly, both of these use the exact same SAS procedures. Prevalence is the number of persons with a condition divided by the number in the population. It’s often given as per thousand, or per 100,000, depending on how common the condition is. Prevalence is often referred to as a snapshot. It’s how many people have a condition at any given time.

Just for fun, let’s take a look at how to compute prevalence with SAS Studio.

Step 1: Access your data set

First, assign a libname so that you can access your data. To do that, you create a new SAS program by clicking on the first tab in the top menu and selecting SAS Program.

libname mydata "/courses/number/number/" access=readonly;

(Students only have readonly access to data sets in the course directory. This prevents them from accidentally deleting files shared by the whole class. As a professor with many years of experience, let me just tell you that this is a GREAT idea.)

Click on the little running guy at the top of your screen and, voila, your LIBNAME is assigned and the directory is now available for access.

(Didn’t believe me there is a little running guy that means “run”? Ha!)

Next, in the left window pane, click on Tasks and in the window to the right, click on the icon next to the data field.

From the drop down menu of directories, select the one with your data and then click on the file you need to analyze.

Step 2: Select the statistic that you want and then select the variable. In this case, I selected one-way frequencies, and one cool thing is that SAS will automatically show you ONLY the roles you need for a specific test. If you were doing a two-sample t-test, for example, it would ask for you groups variable and your analysis variable. Since I am doing a one-way frequency, there is only an analysis variable.

When you click on the plus next to Analysis Variables, all of the variables in your data set pop up and you can select which you want to use. Then, click on your little running guy again, and voila again, results.

So … the prevalence of diabetes is about 11% of the ADULT population in California, or about 110 per 1,000.

You can also code it very simply if you would like:
libname mydata “/courses/number/number/” access=readonly;

PROC FREQ DATA = mydata.datasetname ;

TABLE variable ;

Of course, all of this assumes that your data is cleaned and you have a binary variable with has disease/ doesn’t have disease, which is a pretty large assumption.

Now, curiously, the code above is the exact SAME code we used to compute incidence of Down syndrome a few weeks ago. What’s up with that and how can you use the exact same code to compute two different statistics?

Patience, my dear. That is a post for another day.

Software | Technology

Make a note of this overlooked feature in SAS Enterprise Guide

ByAnnMaria De Mars September 7, 2013

Sometimes the simplest things can make life easier. When I start exploring a new data set, the first thing I do is the Characterize Data task. With even modest-sized datasets this produces a lot of output. For example, the data from Spirit Lake: The Game, with about 80 variables and 88 subjects produced 93 pages….

statistics

Native Americans: Why Heidi Heitkamp won & Nate Silver was wrong?

ByAnnMaria De Mars November 19, 2012

The past couple of weeks, I’ve been hearing my friends from Turtle Mountain and Spirit Lake talk about the election in North Dakota. I was particularly interested because this was the one election that Nate Silver predicted incorrectly. He had Heitkamp down by 3.9 percent, and yet she won. I have no idea how Silver’s…

Software | statistics | Technology

SUPER BASIC INTRODUCTION TO DATA ANALYSIS

ByAnnMaria De Mars January 20, 2019January 20, 2019

I was going to write more about reading JSON data but that will have to wait because I’m teaching a biostatistics class and I think this will be helpful to them. What’s a codebook? If you are using even a moderately complex data set, you will want a code book. At a minimum, it will…

statistics

Beware mean substitution ! (And the importance of mothers)

ByAnnMaria De Mars December 1, 2011December 1, 2011

Today was a lesson in why one should always be a little leery of mean substitution. I had downloaded a data set to use as a logistic regression example for my class tomorrow. It happened to be the 2010 Monitoring the Future study and I was particularly interested in school drop out. This is a…

statistics

ASA’s New Look : It’s not your father’s statistical association

ByAnnMaria De Mars January 25, 2012January 26, 2012

Photo from Nic Cubrilovic. Creative Commons license. Thanks, dude! It’s been 15-20 years since I was last a member of the American Statistical Association. I read an article in their journals occasionally but not much of it is relevant to me. I work with clients who are designing surveys, analyzing messy data and evaluating programs….

computer games | statistics

Dakota Math Results Coming In: Following my own advice

ByAnnMaria De Mars June 16, 2015

I tell clients on our statistical consulting side all of the time that if your conclusion is only valid if you look at this specific subset of your sample, with this particular statistical technique. You need to look for a convergence or results. Does the mean score increase? Does the proportion of people passing a…

3 Comments

E says:

February 24, 2016 at 1:34 pm

Curious- how current are the data sets that you are able to access?
Annmaria says:

February 25, 2016 at 5:02 am

The data used here was the 2011 California Health Interview Survey.
Pingback: SAS Studio – Import Excel with Tasks & Utilities : AnnMaria's Blog

Similar Posts

3 Comments

Leave a Reply