R vs SAS/SPSS in Corporations: A view from the other side
I read Allen Englehardt’s post this morning, on R vs SAS/SPSS in corporations and it motivated me to set aside my infinite to-do list and write about something I’ve been thinking for a long time.
Since Allen writes on R-bloggers, it will surprise no one that his conclusion was that R is preferable to SAS and the main obstacle to its use is the inertia and ignorance of executives and HR departments. What may surprise some people is that I agree with him that there may be cases where R is preferable, although not for the same reasons he gives, and that SAS Institute has some serious issues it needs to address, although looking at it from the side of someone who likes and uses SAS, I see different problems.
As someone who has used SAS daily for 29 years, I disagree on some of Mr. Engelhardt’s reasons both for and against SAS. I do agree, though, that there are some serious issues that, unless SAS Institute starts taking them seriously, may eventually end up in SAS going the way of WordPerfect or COBOL.
Engelhardt said that one reason R is not the choice for corporations is
“R takes talent to use. (That is kind of why we like it.) It takes talent to maintain. My problem as the manager of a commercial analytical insights team is that it is very hard for me to retain that talent.”
I quoted this so you would not think I made it up. I thought of incredibly brilliant people like Rick Wicklin, the author of Statistical Programming with SAS/IML software. The first paper I pulled up at random in my notes from SAS Global Forum was An Overview of Survival Analysis using Complex Sample Survey Data, by Dr. Patricia Berglund. I could add a vast number of examples of SAS users who are not talent-less hacks, but you get my point.
He’s incorrect in assuming most of the people who use SAS use the menu-drive SAS Enterprise Guide, Enterprise Miner, etc. I’ve been to many user group meetings/ conferences where when asked how many do it’s less than 10% in the room who raise their hands. (Non-random sample, I know) but in 29 years in diverse organizations I see the same thing – the great majority of people who use SAS write code. Those who use it for very long write macros, create their own formats, extend it with CSS, Perl, Python, IML and sometimes even R. Assuming R = talented, SAS = pointing, clicking drone is a bit over-simplistic.
SPSS, I’ve seen the opposite and I agree on that point. People who are SPSS users are hardly likely to abandon it for R – yet (see below for why they may). I was once speaking with a developer at SPSS about a problem and he asked me, as one of the standard questions, “Do you write syntax?” Then, because we had been talking for a while already, he caught himself and said, “Of course you do.” My point is that the assumption was that you did not use syntax, and, again, in my admittedly non-random sample over 25 years of using SPSS, that assumption has been increasingly born out ever since menus became an option.
So, I disagree with his assumption that R people are just more talented (although that was popular with readers of R-bloggers) and I am not completely sold on his disadvantage that SAS costs corporations a lot of money. I think Mr. Engelhardt over-estimated the ignorance of executives and under-estimated the cost of the vast body of legacy code out there.
As I have said before,
Re-writing everything to run on free software is only a good deal if your time has no value.
I think he under-emphasized this for corporations, an enormous COST of replacing legacy code. You’d need to re-write the code, re-write the documentation and re-train the employees. Anyone who has written much code, especially for a complex system, realizes that it will not work right out of the gate. For a while, you will be running two parallel systems. That’s expensive. You will need to keep all of your SAS people until you have your new system up and running. Will you have those people learn R? As Engelhardt notes, there is a difference between reading an introductory book and being an expert. Will you hire new people with years of experience with R? Then what will you do with your SAS people? Fire them all? I presume they have other knowledge of statistics, your industry, etc. that you might want. Will you just take the SAS code and re-write it in R? As anyone who has worked in corporations on large systems will guarantee, a lot of that code “Grew like Topsy”. It can be improved because you probably have patches on top of patches. What do you say to your manager when your R code has a bug and quits running? (This happens to everyone, but remember, you are replacing a system that was running with a new one that, made with free software and better in many ways, is not running.) Also, does that mean your people who are writing the R code are going to be well-versed in SAS, too? Or are you going to have one of those talent-less SAS people you are going to fire sit next to you and tell you what each piece does?
I said this before but, who is going to write the documentation of everything the program does and how to maintain it for when your talented R person leaves?
So why should SAS (and SPSS) be worried about R?
First of all, for those people and organizations that do NOT have legacy code, the major barrier I just talked about is removed. If you are a new company, you don’t have any legacy. There is no cost of re-writing, re-documenting anything. If you are a student, your time doesn’t have any value to anyone but you. This is why R is so popular among students, and this should make SAS very, very worried. Yes, lots of students hate R, but lots of them hate SAS, too (more about that in a minute).
A few days ago, I was at a SAS USERS GROUP MEETING and three people sitting around me were discussing using R to teach students. One person said that the students would hate it because it was too difficult, where a second professor countered that he had used R studio and it was not that difficult. The third chimed in that he had used it in graduate school. Again, this is not a random sample but rather one that should be biased toward SAS. These are people who are interested in SAS enough to attend users group meetings and yet discussing the benefits of switching to R. One had already done it, a second was at least considering it, though unconvinced, and the third saw no problem with it.
A major reason that people, especially in academics, consider switching to R – or a piece of slate and a sharp rock for that matter is that their installation process BLOWS. If you have never had to install SAS, let me just tell you that it is bad beyond imagination and has been for thirty years. I remember in graduate school using SAS 5 how every time we had to renew our license and I had to get things working again the SPSS people in sociology would laugh at me. It has only gotten worse. A month ago, I was having lunch with the SAS administrator at a large university and she told me she hated SAS. She tells people to switch to JMP or SPSS every chance she gets. I asked about SAS On-demand and she said that almost every single person had a problem installing it. At one point, I was the SAS administrator for a large university and about 10% of the people had trouble installing SAS. These are not stupid, lazy people. They’re faculty and researchers at a prestigious institution.
I used SAS On-demand for my statistics course I am teaching. Here is what I did:
- Tested everything myself and registered a month before class.
- Made a powerpoint of step by step how to get the software
- Made a MOVIE of how to get and install the software that students can watch to review the steps
- Demonstrated in class how to get a SAS profile, register for the course and download and install the software.
Obviously I did this because I believed learning SAS would benefit my students, but it took quite a bit of time I would not have had when I was an assistant professor trying to get tenure.
As it is, about half of my students have been able to use SAS On-demand. Why? Mostly because it doesn’t run on a Mac (more on that later). Those who had Windows were able to get it to run by the third week of class. One student, however, could not get it to run. I tried uninstalling and re-installing it. Still didn’t run. In the end, he received this message from SAS Technical Support, who were no doubt correct
It sounds like you may have a registry key that is acting up. Lets try the following:
1. Reboot your system.
2. Log in as the Administrator.
3. Close all applications including anti-virus software (even if it is just running in the background).
4. Go to the system registry by clicking on Start>Run and type:
regedit.
5. Examine the following Windows registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session ManagerIf it contains FileRenameOperations or PendingFileRenameOperations, delete this key, and retry your SAS installation.
Warning: Always back up your registry before you make any registry changes. For assistance, see Windows Help, Microsoft documentation,or the Microsoft Windows Web site. SAS is not responsible when you edit the Windows registry: changes in the Windows registry can render your system unusable and will require that you reinstall
the operating system.
After removing these keys, continue on with the installation.
I am not faulting SAS Technical Support. They are probably right, this was probably the problem and it probably would have worked. I have done similar things getting Enterprise Miner to work on a computer once and it did work. The problem is that when you send this to a student who is just trying to pass a statistics class, and Advanced Quantitative Data Analysis is not a fluff course to begin with, their response is going to be, and I believe this is a direct quote, “Fuck it!”
The student asked if he could use a different software package he had used as an undergraduate and I said sure, go ahead.
This type of problem does not occur often – this was one out of 10 or 11 students who tried to install SAS – but when it does, this student becomes like the SAS Administrator I mentioned above. They both hate SAS. This cannot be good.
After the problem of installation, the biggest problem SAS has is it does not run native on a Mac and the SAS On-Demand doesn’t run on virtual machines, either.
Of the 17 students in my class, 7 or 8 have Macs. When I required SAS on-demand, I found that it did not run on a virtual machine, so I had to partition my hard drive, install boot camp, buy a copy of Windows 7 and install that. Since I am using this for a class, I was able to get Windows 7 for under $50 so it was not a big deal for me, but since my “free” version of SAS has now cost me $50 that is as much or more than many student licenses for statistical software. Also, there is the time part. I like playing with computers, installing boot camp and partitioning the hard drive was pretty effortless (Your mileage may vary) and downloading and installing SAS On-Demand took very little time with the very, very good connection we have in our office.
I have taught statistics at three private universities in California in the last several years (again, a non-random sample) at one, 25% of the students had a Mac. At the other two, it was closer to 50%. According to tech support, this was what they saw campuswide. Perhaps if you can afford $30K and up for tuition you buy more expensive computers. This was also something the folks at the SAS user group mentioned about R – you know it runs on Mac and Unix, too.
A few of the students did what I did, installed boot camp, installed SAS On-demand, and it worked fine. The only problem now is that much of your other software like PowerPoint, Word is probably on the Mac side. You can do what I do and install OpenOffice, which I really like, but now you are taking more time to install boot camp, install OpenOffice – so the time aspect of using SAS over R is starting to disappear.
The final problem – the free cloud-based service, SAS On-Demand is pathetically slow. I’m holding out hope for that one, though, because it has increased so much from a year ago when it was just useless. Useless to usable and decent but slow is a pretty big leap.
Why I Recommend SAS Anyway – for now
There are advantages, too.
First of all, amazing technical support. Engelhardt just brushes this aside, but SAS tech support is AMAZING. See the answer above. If I really wanted to get SAS working, and I was that student, I’ll bet it would work. I called them the other day because a client needed the equations used to calculate power in PROC POWER because her dissertation committee required it (no, I very seldom have clients who are students because they can’t afford our fees, but this was a special case). I got transferred to the right person and got an answer in 10 minutes. Or you can read here about the amazing Tom from SAS Technical Support . See this post I see smart people, for more details on both problems with SAS installation and the amazingness of technical support. (not to be confused with the creepy Tom from MySpace).
Compare this to SPSS where I have sat on hold for 45 minutes, as the norm. (This was before they were bought by IBM, it may be better now.)
Second, SAS has a huge user group base. Their user groups are amazing. I know R has meet-ups and meetings that are becoming more common around the country. From what I have seen, though, the SAS user groups are growing in size and activity as well. Orange County is starting a new user group, the one in San Diego meets quarterly, LA has annual meetings and we were discussing at WUSS possibly making this semi-annual. There is SAS-L and its archives, which are a fountain of information, the growing SAScommunity.org Did I mention their user groups are amazing? They have regional user group meetings, PharmaSUG and SAS Global Forum which is amazing cubed. All of the regional user groups offer student and junior professional scholarships, including travel, to allow people starting their career to attend for free, learn and network.
Third, SAS does EVERYTHING. This might be why it takes the sacrifice of a flamingo to get it to install sometimes, but once installed it can be used for anything. More than once, when someone has had a problem computing a statistic, I’ve heard someone sniff, “Well you could do that in R”, believe me, whether it is reporting with columns in alternating chartreuse and magenta, running a nightly analysis of your data that is uploaded to the web at 2 a.m. or analyzing a complex national survey, SAS does it.
Because SAS does everything, including being great for analyzing huge and complex data sets, really great statistical graphics, maps, every flavor of report and every type of statistic, there are jobs out there in those corporations now. That is the main reason I chose it for my students. Many of them are mid-career professionals getting a Ph.D. and there will be SAS jobs available when they graduate and for the 10 or 15 years remaining until they retire.
For younger students, and down the road, I think unless SAS Institute can get SAS On-demand working and fix its installation fiasco, there are going to be some serious problems. That makes me sad because I think SAS On-Demand could be insanely great and SAS Institute is completely missing it. This must be how Steve Jobs and Steve Wozniak felt when they saw the first GUI interface and mouse at Xerox.
Dudes! This could be insanely great! Don’t you see that?
Apparently, they don’t. If nothing else, they should license it to some start-up that will realize that potential. If you are interested in that, holler and I’ll holler back.
More on that later, this post is already thousands of words longer than I meant to write today, I have a paper to write, need to price a contract and the rocket scientist is asking why we live by the beach in Santa Monica if I won’t walk down and have a drink with him while overlooking the ocean. Having no answer for that, I’m heading out for Chardonnay.
Hi – I wrote that post that inspired you to write this much longer and much more balanced view. Thank you for this and also for your comment on our site.
Just a few points to clarify.
First, you say that my “conclusion was that R is preferable to SAS” but I start out with “it [choosing R] is not necessarily the right choice to make”.
The context of the post is ‘commercial organizations’ like telecommunications, not consulting or research companies like, say, your own Julia Group.
Second, I maintain that talent is hard to retain *in that context* of commercial organizations where analytics is not the main activity.
The very talented people you mention are both *researchers* working at SAS and University of Michigan. There are plenty of talented researchers out there.
But my point is that even if I could somehow entice Rick or Patricia to come and work for, say, one of the cellphone companies to do marketing campaigns, I could not retain them there. They would go crazy working on the same data and same problem types over and over. As I wrote, we have “new challenges: yes, some, but we are not a research university and it tends to be the same few problem types that we are always working on”.
Your other points are all fair. (And yes, my menu-clicking comment was meant for SPSS.)
Hope the Chardonnay was good. Thanks for the post.
There are two reasons why I doubt my current company will ever be able to switch.
There are hundreds of legacy mainframe programs that are completely automated and it take years just to rewrite them. I’m not even sure R would run on the mainframe?
For auditing purposes we have to keep clean logs of the final version of any report we run. The R console doesn’t replicate the accuracy or information SAS puts out into the log files. The ods excelxp options are also essential to what we put out. SAS Internet reports as well run on a non stop basis.
In short, it’s just not worth the $ and human capital for us to switch. We do use R a lot for it’s graphic capabilities after using SAS for the data cleaning etc.
Your definitely correct on the barrier existing companies face. All of the above is based on my experience at a 10,000+ employee health insurance company.
The Chardonnay was great, and I *kind of* see your point about retaining talent. I have worked on very lucrative consulting contracts for some huge organizations that were as nice as could possibly be but after a while doing Repeated Measures ANOVA day after day (or whatever) bored me out of my mind. So, yes, for me, that is totally true.
There are people, though, who work for huge companies and seem to be perfectly happy doing that. Many of them express their creative urges by writing papers for user group meetings, writing books and similar activities with the half of their brain not used for their job. Not my cup of tea and I don’t really get it but it seems to work for some people.
Keeping talent is more about the management, work atmosphere, and feeling a new challenge with the same ole’ data is there.
However, if a lot of folks and they have the right talent in high demand, then let the bidding wars begin. 😉
Hi,
I disagree that a rewrite with free software is only if your time is free.
For the organisations I work/consult for the internal cost of building a new reporting system was less than the the *annual* SAS licence fees.
In that context it is worth a re-write even without any new features.
And R is a proper programming language without the need for text macro expansions to pass data between steps. And the data frame is extensible with your own metadata. SAS datasets are not, and that is a real barrier to building proper systems in a clean way.
(Not to mention error handling…)
Actually given the language differences a re-write in R would pay back pretty fast just to get a maintainable system.
Then who cares about retaining talent 😉
In practice I would maybe rewrite in JMP with JSL because the GUIs for R are so partial and fragmented. And a canned system leaving people with a very capable client is better for open ended problems. And actually many practical problems are really open ended.
Dave
Hi, Dave –
Certainly there may be times when it is cheaper to use open source solutions. If you are doing a fairly simple reporting system then I can see that would be less than the cost of an annual SAS license. However, most of the places I worked did not have one unit with one reporting system. They had 1,000 units with 1,000 different systems.
You are right, of course, that free software isn’t only WORTH IT if your time is free. My point is really that the cost of software is
Purchase/License Cost (A) +
Cost to program solution (B) +
Cost to maintain solution (C) +
Cost to document solution (D)
and that reducing A to zero does not often make the cost zero, nor does it always make open source the preferred choice. As you note, sometimes it does. Other times, though A is dwarfed by B+C+D
I sometimes wonder if the decline of oragnaized religion in the West is the reason , that people get so passionate on things like Software. First World Problems!.
Thanks for writing this- Hope the shitheads who run SAS On Demand get off their welfare checks and use some of their talent to get that piece flying.
and yes my own bozo like view on open source as an “and ‘ option and not an “or” option in analytics software
you can use both !!
http://www.allanalytics.com/author.asp?section_id=1408&doc_id=233454
Funny you would mention that. After writing this I walked down to the beach with my husband and he was talking about how he used some open source software but to some people it was like a religion and if you are one of the unbelievers you are just a stupid corporate shill.
I read your blog yesterday. It could not be further from bozo like. I think you made a lot of valid points, chief among them that if something is making you money by, e.g. , increasing lift, you’re not going to be too inclined to trade it in.
Hello there,
I can not add much to your discussion as I am an R user with very little experience in SAS.
However, I do see your point in terms of the value in preserving the legacy code, and understand the deep role SAS will keep playing in our future.
The reason I am writing is to inform you about:
http://www.sas-x.com
In case you might join it yourself, or help promoting it by posting/linking to it from your blog.
With regards,
Tal
p.s: consider the “subscribe to comments” WordPress plugin.
When it comes to general nonlinear mixed models (not the usual suspects) there
are a lot of things you can do with
AD Model Builders random effects module,
that you can not do with SAS NLMIXED.
It is free software and runs on windows,
linux, macs, (and on my Kindle just for fun).
Dave
Hi, Tal –
You have certainly spotted one of my weak points which is updating the website. It gets done annually whether it needs it or not. Adding new plugins is on the list.
It is funny you mention sas-x.com I ran across the site a few days ago and thought “This is awesome, I can’t believe I haven’t been checking this regularly.” So, you are right, it is also on my list to update my blogroll with sas-x and other really cool sites like John D Cook’s blog and more.
This is my December project.
On your Kindle! Oh, that is something I have to check out (just for fun). Another on the December project list!
I don’t understand the SAS-On Demand thing and the whole SAS academic license thing. Just give it free to Universities, no shitty On Demand thing. This shows the greedy part of SAS, seeing the tree while missing the whole forest, which is the blowing BI & Analytics market.
Analytical softwares thrive because of their users, as well as the built in functionality and power. With the rollout of R2.14, multi-core capability is standard with base package, while IML is still single threaded. If SAS added parallel capability into IML a couple of years ago, SAS may hold their market longer.
What a well written article. I rarely comment on blogs but I had to respond to this one. Your writing style and the wealth of information presented in your articles is great.
iF the can get the speed problem worked out SAS on demand could become the preferred solution for universities so I don’t see the fact that they are not giving away the regular SAS as the main problem
Your comment is awaiting moderation.
December 2, 2011 at 5:15 pm
In my experience the R cognoscenti do not like to involve themselves
with mundane matters like “quality control”. Recently, Zhang et al.
2011 published some simulation results indicating serious problems with
the lme4 package. I verified some of the results and posted to the
R list. There was absolutely no response whatsoever.
For comparison I used AD Model Builder which is free software. It got results close to those reported by Zhang et al. for SAS NLMIXED.
I certainly would not use R for any serious mixed model analysis.
The link is.
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q4/006953.html
I think SAS should consider SPSS’s take on R: make a plug-in that lets you run R code in an SPSS syntax file (though IBM does make you jump through 30 links to get to the download).
I can see how corporations might use SAS if they do not want to rewrite their analyses in R. But in an academic or research setting SAS is very stupid. As you said, its generally a nightmare to install. Check the table on the SAS website showing which version of windows you need for each version of SAS, and tell me if it means anything to anyone except the SAS developers. It has a time bomb built in for most academic licenses so just stops working after 6 or so months anyways. Professors might teach SAS because thats what the previous generations used, but try teaching SAS to people who are still learning basic statistics. The students end up hating SAS, equating SAS with statistics, and not understanding any of the math behind the ancient SAS procedures.
Now you can use the SAS web editor. It’s free, runs on Windows and Mac (haven’t tried it on Unix yet) and doesn’t require any installation.