Super-computing doesn’t take super powers: Part I
“How often times it happens that we live our lives in chains, and we never even know we hold the key.”
I remember listening to that song by The Eagles at the Mississippi River Fest back when I was a teenager.
Yes, the Mississippi River existed back then.
It’s a good point. So often we are held back in life thinking,
“I could never do that.”
Just think how much faster that job that takes 40 hours on a desktop could run if it used 1,000 CPUs. Many people who are proficient programming on Windows, Mac and even Linux desktops are daunted by the idea of tackling a supercomputer.
I have seen an unfortunate trend among some people in tech, to act as if using any new technology requires god-like powers, or, at a minimum, wizardry.Well, just like the man behind the curtain in the Wizard of Oz, they’re selling you a bill of goods.
The Truth, in three parts
Here’s the truth – If you know a fair bit about computers, say you can write a SAS or Stata program to merge a dataset, compute a variable and conduct an ANOVA, have a good understanding of one or more operating systems – you probably could learn everything you need to know in a couple of weeks, max, if you put your mind to it.
The second part of the truth – during that week or two, you will swear, be tempted to break things and wonder if you really do need to buy a magic wand and sacrifice a chicken to the computer gods after all.
HOWEVER, after you get through that week or two you will never believe that you used to spend hours waiting for a job to run, that you would put a book on the space bar to keep the computer from going to sleep and stopping your job, that you would submit a job before you left for the day, or for the weekend.
You will find yourself agreeing with the young man who reportedly said, after having sex for the first time,
“Why didn’t anyone tell me about this before? To hell with baseball!”
[DISCLAIMER: This is not to imply that high-performance computing is better than sex. For one thing, sex takes longer. If it doesn’t for you, re-check your code. If it still doesn’t, well, it’s probably a personal issue and I think they have doctors for that. ]
First of all, what exactly is a super-computer? There is no one set definition, in part, because capacity keeps changing and the speed that was a supercomputer thirty years ago is a lousy desktop today. The preferred term by many centers now is high performance computing. Even that is a misnomer. The “super-computer” at the University of Southern California is actually a cluster of computers.
Basically, you need to know:
1. How to get an account on a high performance computer. All that takes is finding the right person at your organization to give you that access. It shouldn’t take much more effort than signing up for an email account.
2. Getting data on the server. Any ftp program should do. I use Fetch or Cyberduck when I am using a Mac and Filezilla on Windows. When I am using a Linux computer, I just open a Terminal window and use the sftp command. If you are already whining, “I don’t know how to do any of that” – cut it out!!! If you can sign up for Yahoo and send email with it you can use any of the programs I just mentioned. It takes about that level of technical knowhow. You’ll need to know the hostname. That is the name of the computer you log into. It should be something like hpcc.myschool.edu . You’ll need to know your username, which is something like rrousey. Of the connection options, you want to select sftp . It is more secure than regular FTP, which most organizations won’t even allow and if they do, they shouldn’t. If you need to fill in a port number, use 22. That’s not guaranteed to work, but it’s a very good bet.
As for using the Terminal window – you can find it under the APPLICATIONS menu under ACCESSORIES in Ubuntu. If you have a Mac it is in your APPLICATIONS folder in the UTILITIES folder. Open the Terminal window type
sftp username@hostcomputername
<hit enter>
You’ll be asked for your password. Type that and hit enter.
Type
put /directory/filename [like, put /Users/bob/Documents/somefile.txt ]
<hit enter>
Now type
exit
So you can quit.
[Don’t you hate it when no one tells you how to quit and you are typing “bye”, “logout”, “quit”, “hastalavista” and every other damned thing?]
According to Forbes Magazine, which seems to be run by morons who hate the president of the United States, but that’s another issue, 301 of the top 500 supercomputers in the world run Linux and another 189 run Unix. Having used both systems and very rarely noticed any differences, for the rest of this post, and maybe the next post, until I get bored and go on to do something more productive, I’m going to assume that you are using Linux.
3. Some stuff about Linux.
Don’t freak out, Linux isn’t that hard and from the pictures, Linus Torvald, who invented it, looks kind of cute. (The nice thing about being my age is you can say men are cute because even if they think you are hitting on them, no one cares. )
As I said, Linux isn’t that hard, it just takes some getting used to. A lot of times, instead of Windows, icons and folders to click on you just see a blank screen waiting for you to type something on it.
Let’s talk about directories for a while. These are pretty much the same as folders on a Mac or Windows machine, subdivisions of your whole computer which can then be subdivided further.
Your home directory is your personal space. Think of it like the hard drive on your desktop computer. It has your stuff. Normally you don’t need to do anything special to write to it, open files or delete files and no one else has access to it. The two problems with your home directory are that it usually isn’t very big and it isn’t a good choice for sharing files with others.
A project directory is not “yours” even though it may even have a subdirectory with your name on it. I’m going to belabor this point because I have seen it drive people crazy. How can you not have access to your files that are in your subdirectory of the project directory? Simple, really. If I created a folder on my laptop with your name on it, you wouldn’t automatically be able to see it would you? Think of this like a network drive on a windows server. It has a lot more space, which is nice, but someone has to give you access to it. Often project directories have multiple users who can read, write and execute files. Be aware of this before uploading sensitive or potentially embarrassing information.
Now that you know two things, about sftp and directories, you can put those together:
The example below uses Filezilla. For address, enter the hostname of your computer. You’ll need to get this from your organization. It will be something like hpcc.myu.edu . User is your userid. For port, enter 22, this is the standard port for SFTP. Click Quickconnect.
In the left pane, you’ll see your local computer. Under local site, select or type in where your data are located that you want to upload. In my case, these are on my travel drive, the e drive. Your datasets in that folder will show up below
IMPORTANT: on the remote site select where your data should go, the default will be your home directory. This may or may not be where you want your data uploaded. Second important point, each time you use Filezilla, this default re-sets to your home directory. So if you had another directory when you used this yesterday and you go back today it will once again have your home directory as the remote site.
Once you have the Local (from) and Remote (to) sites set up, then just double click on the file that you want to move.
So, now you have mastered the first part – getting your data on the computer. You can also upload programs like this, of course. For example, if you have SAS or Stata programs you want to run on a high performance computing cluster, you can just make the few edits to get those to run under Linux and then upload your program. Or, you can do it on the host computer, whatever makes you happy.
You should also learn how to make a directory, move between directories, edit a file and some other Linux commands. More on that next time.
Right now, I need to go down and administer death threats to the world’s most spoiled twelve-year-old about what will happen if she does not turn off Buffy the Vampire Slayer and go to bed this instant. Hopefully, she hasn’t watched so many of those shows that she leaps off the couch and puts a stake through my heart.
If I don’t post more on this tomorrow, you’ll know why.