Dr. De Mars General Life Ramblings | statistics

Finding Groups in Data

ByAnnMaria De Mars September 26, 2008

Today, Dr. De Mars is — happy.

One of the fun things about my job is that I get to do lots of different things. That can be a bit troubling some days, because statistical software consultant encompasses a wide range from different types of models, to coding, to various operating systems to all of non-parametric, parametric, Bayesian and other statistics that I cannot remember at the moment.

Because the range of people I work with continually increases, I am now more often running into questions I cannot answer off the top of my head. I do know how Mahalanobis’ distance is used, even though I had not thought about it in years until someone asked me a question yesterday, I do know the calculation for pooled variance , which should be used when Levene’s test is rejected. Still, once a day or so, someone asks me a question I have to look up. Sometimes, these are on techniques I have not used before and just as many times, the question relates to something that I KNOW can be done, and I know this because I personally have used that statistic or written that code before. I just can’t remember how.

You know that saying,

“I have forgotten more about statistics than you’ll ever know.”

Well, that is my problem. I keep forgetting it. Fortunately for me, and this is why I am happy, I get to consult on a lot of different projects each week that remind me of things I used to know. For example, cluster analysis, as the Stata multivariate statistics guide so poetically says, is used for finding groups in data. You can use it to identify or validate specific diagnostic groups, you can try to group just about anything. Most often, cluster analysis is used as an exploratory technique, which is my favorite type of statistics, where you are turning a bunch of numbers into knowledge.

The most common way to use cluster analysis is the k-means technique. You assume there are k-groups (with k being a number you specify) and the program iterates to a solution. The program starts with k “seeds” which are the means for each group. Every observation is assigned to the group whose mean is closest to it. New group means are calculated based on the observations in the group. If an observation’s mean is closer to a different group, it is moved into that group. Then, group means are calculated again. This continues until a step is reached where none of the observations change groups. And that is one way to do cluster analysis.

computer games | Dr. De Mars General Life Ramblings

The Benefits of Being Uncool

ByAnnMaria De Mars April 16, 2014September 15, 2016

Tom Peters has written quite a bit about the huge market opportunities in providing goods and services designed for two populations – women and old geezers. I thought of this today as, for the thousandth time, I went through the pre-check line only to have my titanium knee set off the security alarm and get…

Dr. De Mars General Life Ramblings

I was wrong not to teach my statistics students programming

ByAnnMaria De Mars March 7, 2013March 7, 2013

It’s been a good day. I had to drag myself away from PHPStorms to write this blog. I used phpMyAdmin to create the tables I needed, then wrote a few scripts in PHP to connect , insert records and execute queries. I used Dreamweaver and Textwrangler for the HTML. In a day or so, I’ll…

Software | statistics

Excel statistics functions – simple answers to simple questions

ByAnnMaria De Mars December 30, 2012December 30, 2012

I have colleagues who hate Excel with a passion. Why, they demand to know, would ANYONE use Excel for statistics when there are so many options that are so much better? Actually, I don’t find the Excel add-on for statistics that terrible, but that isn’t even the topic of this post. I use Excel because…

Dr. De Mars General Life Ramblings

What Rocket Scientists Do on the Weekends

ByAnnMaria De Mars October 8, 2011October 8, 2011

The rocket scientist decided the most important use of his spare time was to pose Homer as an R6σ specialist. I was looking at this in my email and wondered why I ever decided to actually have a child with this person. Then, I looked at the books in the background. That’s why. If you…

statistics

Systematic random sampling: As useful as Roman numerals?

ByAnnMaria De Mars October 20, 2013October 20, 2013

Why do we still teach systematic random sampling as an option? As you may recall from your Statistics 101, simple random sampling is when you select from the sample at random. So, if you want 100 people out of a sample of 10,000 in a dataset, you would pull a random sample by, most likely,…

Software | statistics

Statistics is Everywhere: An unexpected use of PROC SURVEYSELECT

ByAnnMaria De Mars February 4, 2012

Although I tell my students all of the time that statistics is everywhere, even I did not really see where mixed martial arts, free rice and PROC SURVEYSELECT could possibly have anything in common. Here is what happened …. Mixed martial arts Darling daughter #3 after the Olympics decides not to go to college as…

Similar Posts

Leave a Reply