statistics

Chi-square, by request, and not in a few words

ByAnnMaria De Mars December 15, 2008December 17, 2008

Recently, someone asked me if I could explain chi-square in a few words. The short answer is, “No, I am incapable of using only a few words for any purpose whatsoever. If you doubt this, ask any of my children.”

What is chi-square?
Chi-square is a measure of relationship between two categorical variables. For example, let’s pick Proposition 8, the recent initiative on the ballot regard gay marriage. The two categories of voters were “For” and “Against”. This initiative was passed 52% to 48%. Remember these proportions. They are important later on.

The gender of voters also fell into two categories, “Male” and “Female”, who are around 49% and 51% of the population. If I wanted to test for whether there was relationship between votes on Prop 8 and gender, a chi-square would be a great test to use.

The null hypothesis being tested is : “There is no relationship between gender and how one voted on Proposition 8”.

I tried to find actual data on this relationship but after searching through a lot of websites and articles trying to find facts on this issue, I was depressed by the number of people who hate other groups of people and were not at all reluctant to write about it, data or no, so I just gave up. Here, proceeding without any interference from real data, is a hypothetical example.

We find 1,000 people who are willing to tell us how they voted and their gender. Just to make life easier, we deliberately select 500 males and 500 females. This gives us a two by two table

Gender     Vote
Yes         No
Female      237        263
Male          280        220

More males voted yes and more females voted no. Was this just random or are males really more likely to vote against gay marriage?

The formula for a chi-square is sum of the observed number in each cell minus the expected number, squared, and divided by the expected.

In this case, if there were no relationship between gender and which way you voted, the expected number in each cell would be 260 yes (52%) and 240 no (48%) for both male and female.

In the first cell, we have (237- 260) ** 2 / 260 = 2.03
In the second cell, we have (263 – 240)**2/ 240 = 2.20
In the third cell, (280 – 260)**2 / 260 gives us 1.54
And, in the fourth cell (220 – 240)**2 / 240 = 1.67

I end up with a chi-square value of 7.4 which is statistically significant.. The probability of obtain a chi-square value of 7.4 is less than .01, or one out of 100. Therefore, if these data were real and not some random numbers that I made up, I could conclude that women are less likely to be opposed to gay marriage than men.

Why did I detour into chi-square when I said I was going to spend the next week talking about categorical models? It’s not a detour, really.

Understanding chi-square is one of the building blocks of getting into log-linear models and more. Next, I want to talk about another basic statistic, the phi coefficient, and how, like marzipan, it really isn’t all it’s cracked up to be.

============================

How to get a chi-square in SAS:

Proc freq data = datasetname ;

tables variable1 * variable2 / chisq ;

============================

How to get a chi-square in SPSS

CROSSTABS

/TABLES = variable1 BY variable2

/ STATISTICS = CHISQ.

=====================================

Chi-square in Stata

tabulate variable1 variable2 , chi2

Now you know more than you wanted to know about chi-square.

SAS Tricks for Massaging Data into Shape

ByAnnMaria De Mars October 3, 2014

Today, I was thinking about using data from the National Hospital Discharge Survey to try to predict type of hospital admission. Is it true that some people use the emergency room as their primary method of care? Mostly, I wanted to poke around wit the NHDS data and get to know it better for possible…

Software | statistics | Technology

A few statistical details on JMP’s pointy-clicky SEM

ByAnnMaria De Mars May 4, 2011

The new structural equation modeling for JMP is pretty cool. It’s unfortunate that it requires both JMP and SAS/STAT to run it, the cost of the two combined being so expensive that you pretty much have to work for a huge organization that can afford a site license for both or sell a kidney to…

statistics

A Beginner’s Guide to Propensity Score Matching

ByAnnMaria De Mars April 27, 2017

One advantage of writing this blog for almost a decade is that there are a lots of topics I have already covered. However, software moving at the speed that it does, there are always updates. So, today I’m going to recycle a couple of older posts that introduce you to propensity score matching. Then, tomorrow,…

Software | statistics | Technology

SAS Global Forum Stuff worth Noting

ByAnnMaria De Mars April 5, 2011April 5, 2011

In three minutes before the next statistics session, here’s some more on the opening session last night. SAS Chief Marketing Officer Jim Davis made the comment that for every SAS product they are asking the question “Is there a mobile application for this and if so what does it look like?” He also showed some…

Software | statistics

Beyond SAS Basics: Tips, Statistics and a Naked Mole Rat

ByAnnMaria De Mars May 31, 2011May 31, 2011

I wanted to learn how to use smashwords, for reasons completely unrelated to this blog, SAS or statistics, but I thought it would go much faster if I had an actual project to work on. I considered writing a serious book for researchers on SAS Enterprise Guide, but then I decided that I did not…

Software | statistics

Software Books I Want That I Have Not Got

ByAnnMaria De Mars January 4, 2011January 4, 2011

The new year is a popular time for blogs to give lists of favorite books one read over the last year. Reading several of these posts did not inspire in me any desire to update my Amazon wish list. Novels really aren’t my cup of tea. I don’t care about any girls who knocked over…

3 Comments

missukamaka says:

October 5, 2013 at 11:14 am

Thank you for this information!
I am trying to familiarize myself with stats and it has not been easy. My question, how did you determine that the value of 7.4 was statistically significant?
AnnMaria says:

October 5, 2013 at 12:13 pm

You can read the p value on your printout. If you calculated the chi-square by hand or on a calculator you can look up the probability in a chi-square table.
missukamaka says:

October 6, 2013 at 10:44 am

Thank you!

Similar Posts

3 Comments

Leave a Reply