Getting only numbers, in SAS and Javascript
The advantage of learning a new language is it sometimes makes you re-think the old languages you know. For example, here is a problem that happens often:
Some people are morons.
For example, say I were to ask you the following question:
“How old are you?”
YOU would probably answer something like, 42 or 21. You didn’t mistake that for an essay question, now, did you? That, my dear reader, is because YOU are not a moron. However, trust me when I tell you that other people are not as smart as you.
A rather annoying percentage of people enter responses along the lines of:
I am 47 years old.
I just turned 21. Happy Birthday to me.
36 years
87 (yes, eighty-seven)
54 yrs.
and so on ….
Just using the sub-string function to read in the first two characters won’t work, obviously.
Well, I was doing something in javascript where I asked the person their age and then stripped off everything but numbers before I tried to use the age they had given me, like so:
var age = prompt(“How old are you?”) ;
var ageyears = age.replace(/[\D]/g, ”);
Usually, in my SAS programs, I would either just define age as a numeric variable and all of those who included text had their values set to missing. Or, if I wanted to minimize missing data, I would write a statement to just read in the first two characters, or maybe to strip out “years” and “yrs”. However, in the latest data set I have, it seems to be a sample of people who are creatively annoying, so I had to settle for a lot of missing data or do something else. I got to thinking that there MUST be some function in SAS that does something similar.
Well, wouldn’t you know ….
Age_numonly = compress(age,'0123456789','K');
Having the ‘K’ at the end reverses what the COMPRESS function normally does and instead of deleting your numbers it keeps them. I don’t know how I did not know this. Maybe I knew it at one point and forgot it? Be sure you have the ‘K’ in quotes, by the way.
Well, now I have it stored in my blog, which is better than having it in memory, because unlike my memory, this blog gets backed up regularly.
I find compress one of the most useful functions in SAS, probably the one I’ve ever used most. I’m sure you would make good friends with perl regular expressions in SAS when you run into something more complex than age. http://www2.sas.com/proceedings/sugi29/265-29.pdf. I use them a lot to find things like [Width]x[Height] with Width and Height in digits.
FYI – There are a many modifiers you can specify (not just only ‘K’) as the 3rd argument. Check it out in the help… http://support.sas.com/documentation/cdl/en/syntaxidx/64656/HTML/default/index.htm#/documentation/cdl//en/lefunctionsref/63354/HTML/default/n0fcshr0ir3h73n1b845c4aq58hz.htm
and as a side-note this new Syntax Index page at support.sas.com is absolutely awesome to look up SAS syntax.
COMPRESS is really useful, but I think PRXMATCH is better for your case. PRXMATCH (with the correct expression) will find the occurrences of consecutive numbers that might be an age (and you can then decide which to select as a match). COMPRESS (I think) will create funky results from a response such as “40, but I don’t look a day over 39”. Unless you want your respondent to be clustered with Methuselah.
Hi, Chris –
This is actually part of a longer solution that handles response like that and also takes the average from those who say 40 – 50
you could have just written
Age_numonly = compress(age,,’DK’);