Why SAS Enterprise Miner on demand is like Jennifer & other thoughts on learning data mining
A miracle has occurred and I have had time to spend evaluating two things that have been on my to-do list forever, JMP and SAS Enterprise Miner. Both of these products are produced by SAS and the first interesting point is that knowing SAS won’t really help at all. That isn’t to say that having some knowledge of programming logic won’t help. In fact, I am taking a data mining course just for fun. It is very interesting because while I have taken plenty of workshops and short courses I haven’t been a student in a regular class for over a decade.
Until she went to college, every parent-teacher conference ever held about my daughter, Jennifer, went like this:
“Jennifer has great potential. She is obviously brilliant and if she just exerted some effort, she could do anything she wanted. Jennifer makes A’s on all of the work that she turns in. ”
In fact, Jenn dropped out of high school, took her GED, went to community college, finished her B.A. at 21, taught school for a while and had her masters degree from USC by 24. So, that is my general view on SAS Enterprise Miner on-demand. I think it has great potential and is worth keeping around. When it grows up, it will do impressive stuff and be a really good teacher.
In learning data mining, whether using JMP or Enterprise Miner, background knowledge makes difference. Because I have had decades of experience with both programming and statistics when I see something in JMP like FORMULA > Conditional it makes perfect sense to me as an IF statement. Some people reading this are probably thinking, “Of course”. If you are one of those people you may be proficient with SAS – or SPSS syntax or any number of programming languages. In Enterprise Miner, when I right-click on the Partition Node and see options like Cluster and Stratification, again, I think “of course”. This is why my fellow students hate me.
It’s not just me. There were a few posts in this cool blog, Bzst on SAS Enterprise Miner’s On-Demand version.
SAS Enterprise Miner
http://blog.bzst.com/2009/10/sas-on-demand-enterprise-miner-update.html
http://blog.bzst.com/2010/05/sas-on-demand-take-3-success.html
and I agreed with pretty much all of her points. Enterprise Miner is cool and the current on-demand version is a great improvement. It is much easier to install than the desktop version and as far as the client-server version, it involves over 340,000 steps to install ,one of which (and I may be imagining this) requires a band of marching flamingos.
So… points in favor of Enterprise Miner on Demand
1. Way easier to install than previous versions
2. Free for students and faculty for teaching purposes
3. Students like it better than sitting in the lab. They can download and use on their computers.
4. Just the general cool options- you can use the Partition to create a test, training and validation data set. When you first read in a data source you can set Bayes prior probabilities, you can include the costs of decisions. It is really cool. I was going to include screen shots of some of the really cool output from the cluster analysis I did earlier today but SAS EM kept giving me an error about
“The load balancing object spawner timed out. Please check your Enterprise Miner license.”
Disadvantages
1. It is not easy. Very little of it is self-evident and even less so if you have never used SAS or JMP. As Dr. Shmuéli said in her posts, most MBA students probably aren’t going to thrilled by the need to download, install and learn another piece of software. On the other hand, those really interested in statistics,software or data mining will probably be pumped about that part.
2. As noted in the BZST blog also, if you don’t have some knowledge of statistics and a general idea of program logic you are going to have a hard time using Enterprise Miner. Some people, and I can’t say I wholly disagree, will say this not a disadvantage. You should know what the heck you are using.
3. It can be excruciatingly slow. Sometimes it pops up in a minute. It may take 15 minutes between the time it opens and one analysis runs and gives results when you add the delay in opening EM, adding a new data source, creating a new diagram, dragging the data source to the diagram, creating a sample and running an analysis. When using it at my desk I usually read a book while waiting for each step to execute. For teaching in a lab is just about useless from what I have observed. [And kudos to those brave souls who tried.]
4. It is unreliable. Even while writing this blog on the cool stuff it does, I could not get it to come up to do the cool stuff.
So…. EM is like Jennifer, because:
1. It will no doubt be awesome when it is all grown-up
2. It is worth waiting around for, and
3. The growing pains in the mean time can be REALLY irritating (oh, you have no idea).
One Comment