How to start using Compute Canada cluster with R and SpaDES.

supercomputer
HPC
Grex
Simulation
R
Author

Eliot McIntire

Published

July 4, 2016

Is your R script taking too long? Do you find yourself rerunning things over and over? Are you using loops or lapply and think that these are slowing you down? In many cases, working with parallel computing in R can help solves these problems. Working on a super computer with 100s or 1000s of nodes can potentially help a lot.

These instructions are for Compute Canada super computing network, and specifically, the Grex machine on the WestGrid network, but they should work with some minor modifications for any super computing cluster.

Create an account on your super computing network

For Compute Canada, that is here.

Set up your computer

Connect to supercomputer by command line.

Windows – need putty

PuTTY is the software to make the command-line connection to Westgrid.

  1. Download PuTTY from https://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
  2. Put putty.exe somewhere easy (like Desktop or taskbar). It is how you connect to Westgrid.
  3. Open PuTTY (double click)
  4. Create a new Session:
  1. type grex.westgrid.ca in the Host Name box and in the Saved Sessions box
  2. Select Connection - Data on left side, fill in your user name in Auto-login username near top.
  3. Click Save

Connect using PuTTY / SSH

Windows

  1. Go back to Session on left, click Open at bottom.
  2. You will see something about ssh key, type yes
  3. Type your password, Enter
  4. You should be connected
  5. When you want to disconnect, type exit

Linux

It is much easier on Linux because SSH is built in to most distributions.

##########
ssh -l LOGINNAME grex.westgrid.ca # change LOGINNAME to your login name


exit # to disconnect
##########

Connect to supercomputer for file transfer

Windows

We will use WinSCP working (Windows Secure CoPy) to transfer files between your machine and Westgrid.

  1. Download WinSCP
  2. Unzip somewhere easy to find.
  3. Double click on WinSCP.exe
  4. Do same (approximately) steps as for putty above, though it is easier because the username is on the same page

Once connected

Once on to Grex on Westgrid, you need to load R and gdal. The specific ways to do this will vary by machine and cluster. Please contact your cluster administrators, or find the list of software on each machine.


##########

module load r/3.2.2
module load gdal/2.1.0

##########

Now, you need to work with your own files. Either you can manually copy and paste (drag) in WinSCP, or you can use another tool, like GitHub.

On Linux machines, ~ is your home directory and is the shorthand for /home/USERNAME/ … so you can do cd ~ to bring you back to your home directory, in case you ever get lost in sub-sub-sub directories

Use a github repository

Here, the use must change the lines below for their own github repository of interest. The one below is private and so will not work unless you are part of that repository user group. [edited addition] If you need to connect to a private repo, follow these instructions:

https://help.github.com/articles/generating-an-ssh-key/

Once you have completed those steps, then you can clone a private repo:

##########

# Perhaps clone the McIntire-lab repository
mkdir -p Documents/GitHub/
cd ~/Documents/GitHub/
git clone git@github.com:eliotmcintire/McIntire-lab.git

# Keep it up to date:
cd ~/Documents/GitHub/McIntire-Lab
git pull

##########

Start R

From the prompt, start R

##########
R
##########

Prepare your R for what you will need, i.e., install some packages. In the case here, we are loading a simulation package, SpaDES, which has a lot of dependencies and can take a while.

##########
# From within R, install necessary packages
install.packages("devtools") # choose HTTP #18, then #1. This is because it is an old version of R, specifically 3.2.2  If it is a newer version of R, then you can choose HTTPS
install.packages("Rmpi")
library(devtools)
install_github("PredictiveEcology/SpaDES@development")
##########

You can work with this interactive session for small testing things. But the connection we have so far is NOT intended for high performance. Please see next step for that.

Submitting jobs

The R session that we have entered is the “interactive” part of Grex. You can do small stuff here, but not big analyses. To do that, you need to submit “jobs” from the command prompt, NOT inside R.

You need a submit file and an R file with your R code. See two files ending with .pbs in McIntire-lab github repo. See example text that could be put in a submission file, here test.pbs

##########

cd Documents/GitHub/McIntire-lab/ComputeCanada/
qsub test.pbs

##########

At this point, nothing will appear to happen, but you will have submitted jobs to the queue. You should go to the next steps right away to monitor the jobs. If your job is really quick, and the queue accepts it quickly, then there will be no more jobs to monitor, and you will only have any output or error files that your job produced (see below for those.)

Monitoring jobs

There are several commands that you can use at the command line to monitor your jobs. qdel will remove tham.

##########

# to monitor jobs
qstat -u USERNAME # change this for your user name
qstat -f 9963747 # change this for your job number, which can be found from previous line
checkjob 9963747 # change this for your job number, which can be found from previous lines

# delete
qdel 9963747 # delete that job

##########

Finding your output files [edited addition]

If your R script saved any files, then those should be in the directory that your Rscript put them it. In addition to those, there should be 2 files in the same folder from which you submitted your job (the qsub statement). They will have the job filename (the qsub file), but with new filename endings. One will be .e something, the other .o something. The .e something will be your “error” file, which may be useful if there are errors. The .o something file will be your any output statements that would have been written to your R console. The something will be your job number.

Other commands in Linux that may be useful for new-to-Linux users

There are many others that you can find on the Westgrid web page, or widely throughout the internet, but these are certain to come in handy.

##########
# linux commands that may be useful
ls # list the contents of a directory
ls -l # 
mv filename newFilename # move a file
cd ~ # change to home
cd Documents/GitHub #  change to another directory
rm dist7* # remove all files in the current directory starting with dist7
  
nano filename # a simple text editor
# CTRL-X will exit from that editor, keyboard (not mouse) can be used to do minor edits
##########

Example qsub file

There are a few lines that you will generally change. Going through from top to bottom.

  1. Give your submission a name, with #PBS -N SomeNamehere
  2. Estimate the time it will take your job to complete. Enter it in walltime=HH:MM:SS
  3. Decide how many processors, here, 100, and how much memory per processor, here 500mb. Can use gb or kb as suffixes, and can’t use decimals, like 1.5gb, instead use 1500mb
  4. Perhaps have an epilogue script (see version below), so every job will print this
  5. Load all modules you need
  6. Here, we will use MPI, which is a protocol for using multiple processors, which is supported by Westgrid out of the box, so we don’t have to do anything special.
  • Note, using MPI with R, we will use this approach: ask for 1 MPI process, with many processors. Which translates to , mpiexec -n 1 and #PBS -l procs=100
##########

#!/bin/bash
#PBS -S /bin/bash
#PBS -N Rmpi-Test2
#PBS -l walltime=00:25:00
#PBS -l procs=100,pmem=500mb
#PBS -r n
#PBS -l epilogue=/home/USERNAME/epilogue.script

module load r/3.2.2
module load gdal/2.1.0
module load geos/3.5.0

# Script for running serial program, diffuse.

cd $PBS_O_WORKDIR # this will change working directory of the mpiexec process to the directory where the qsub statement was made
echo "Current working directory is `pwd`"
echo "Running on hostname `hostname`"

echo "Starting run at: `date`"
mpiexec -n 1 Rscript --vanilla ./test.R
echo "Program test finished with exit code $? at: `date`"

##########

Running R with MPI

You will need to make a script that can be called from the submit file. In the above submit file, I called it test.R and it is called on the mpiexec line. An example of an Rscript file is next.

Here is an example using SpaDES. The key line is to indicate type = MPI as an argument passed to raster::beginCluster(type = "MPI") or to parallel::makeCluster(100, type = "MPI").

Key points below, these are specific to WestGrid:

  1. Use scratch directory for processes that require lots of reading and writing to disk; can use the home directory for read write of infrequent things
  2. There must be a makeCluster or beginCluster function call, and the number of processes should match the number requested in the submit file submitted via qsub (i.e., in this case 100)
  3. Always run stopCluster(ClusterObjName) or endCluster() to clean up.
##########

library(SpaDES)
library(parallel)
library(raster)

moduleDir <- "~/Documents/GitHub/McIntire-lab/Wolf"
scratchDir <- "/global/scratch/USERNAME"

### Next section are all things for SpaDES -- can be skipped for more general R use
times <- list(start = 0, end = 14, timeunit = "year")

modules <- list("wolfAlps")

paths <- list(
  modulePath = moduleDir,
  cachePath = file.path(scratchDir, "outputR", "cache"),
  inputPath = file.path(moduleDir, "wolfAlps", "data"),
  outputPath =  file.path(scratchDir, "outputR")
)

inputs <- data.frame(file = c("wolves2008.asc", "packs2008.asc",
                              "CMR.asc", "HabitatSuitability.asc"))

mySim <- simInit(times = times, #params = list(wolfAlps=parameters),
                 modules = modules,
                 inputs = inputs, paths = paths)
### End of SpaDES specific stuff

N <- 100

# Make the cluster
cl <- makeCluster(N, type = "MPI")

# You may need things loaded in the R slave processes, like libraries. Each of the 
#   slave R processes is a clean R with few or libraries.
clusterEvalQ(cl = cl, {
    library(SpaDES)
}) 

# run some function that knows how to 
outSimList <- experiment(copy(mySim), replicates = N, .plotInitialTime = NA, cl = cl) # don't use plotting

# Stop the cluster. You can use the same cluster many times within this script. Only close it after no
#  longer needed.
stopCluster(cl)

# save it for accessing later
save(outSimList, file = "outputs/outSimList.rdata")

##########

Possible mechanism to get files off Westgrid via FTP

Because file transfer across the internet is slow, it may be worthwhile to set up an automated copy mechanism at the end of a file. This means that if the program is running overnight and finishes at 2am, the copying would start right away.

##########

library(RCurl)
# You will have to set User, password and FTPserver manually (it is not what you see here)
filename = "output.rdata"
system.time(ftpUpload(filename, paste0("ftp://ftpUsername:ftpPassword@ftpServer",filename)))

##########

Epilogue file

If you would like to see some extra information from your job, you can write this following to a file, call it epilogue.script and add it to your home directory. This will then be called from the #PBS line that refers to the epilogue.script file (above)


##########

#!/bin/sh
echo "Epilogue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo "Job Name: $4"
echo "Session ID: $5"
echo "Resource List: $6"
echo "Resources Used: $7"
echo "Queue Name: $8"
echo "Account String: $9"
echo ""
exit 0

##########