Download the VPN installer from MIT's download page, Cisco AnyConnect VPN Client for Windows. Cisco AnyConnect Secure Mobility Client v4.x AnyConnect Secure Mobility Client Features, Licenses, and OSs, Release 4.10 08-Apr-2021 AnyConnect Secure Mobility.
How to submit R code to a Portable Batch System (PBS) managed High Performance Computing (HPC) Facility (e.g. QUT's HPC Facility 'Lyra').
You will need:
- An account with the relevant HPC facility (QUT staff and HDR students can request access to the QUT facility here),
- the R Script (.R file) you want to run,
- a Job Script (.sub file) that tells Lyra how to run your R script and,
- a file containing any data your R Script needs to run.
Note:These instructions work on a system running Debian (GNU + Linux) with the Gnome Desktop Environment.The main difference for MS Windows and MacOS users will be in how you connect to the HPC File Store and there are instructions on how to do that here (also at that page are several useful guides on topics relevant to using the HPC facilities at QUT).
MS Windows users wishing to use a secure shell to interact with a PBS system have the option of downloading and installing the terminal emulation software PuTTY.
Step 1 - creating your .sub file
Example .sub file to run an R script:
The important lines of this .sub file to understand for submitting simple R jobs that run in serial are as explained below. The remaining lines you can just copy and paste unchanged into your .sub file without worrying about too much at this stage.
#PBS -l walltime=1:00:00 requests 1 hour of processing time ('walltime').
Your job will be killed once this time limit is reached regardless of whether or not it has finished running your script. Thus it is good practise to try and predict how long your job will take to run and request that amount of walltime.For instance, if you job involves iterative calculations run a few of these on your machine (if possible) to get an average time per iteration. The CPU on your computer will likely run at a different speed to those of the QUT HPC nodes so once you have an estimate of how long an iteration takes on your computer submit a test job to Lyra that also only runs a few iterations and request your best estimate of the required walltime. You can then use
qjobs -x command (more on this below) to see how long your job took to run on Lyra, calculate an average time per iteration and forecast the walltime required for your full job accordingly.
#PBS -l select=1:ncpus=1:mem=8G requests that your job be run on a single CPU of a single node with 8Gb of RAM. If possible it is worth testing your job on your local machine to determine how much RAM it requires as you will encounter errors if your compute job attempts to use more RAM than you have requested.
module load r/3.3.1-foss-2016a loads R version 3.2.4 (see the section at the end of this guide for how to check which versions of R are currently available on Lyra).
R --file=/home/username/your_R_script.R uses the version of R load on with the line above to run the R script located at the file path
/home/username/your_R_script.R on the HPC filestore (Step 2 details how to copy files across to the HPC filestore).
MS Windows users, your .sub file must use Unix/Linux/OSX style line endings.
You can make such a file with Notepad++ (or any good text editor) by activating the relevant setting e.g. in Notepad++:
Settings -> Preferences -> New Document -> Format (Line Ending) Unix/OSX.
Alternatively you can convert a .sub file with MS Windows style line endings to a .sub file with Unix/OSX style line endings with the
dos2unix command line tool on Lyra itself.Type
man dos2unix when logged into Lyra via ssh to read about how to do this.If you are curious about the difference between these two styles of line endings a succinct explanation can be found here.
Step 2 - copy your files to the HPC Filestore
How to send your Job Script (.sub file), the R Script (.R file) referenced in the Jobscript and any required data file(s) to HPC File Store. MS Windows and MacOS users, instructions on how to connect to the HPC File Store from your machine and copy files back and forth may be found here.
Under GNU+Linux with a GNOME based GUI:
Open the Nautilus File Browser
Click Connect to Server
(use your QUT Username, the same one you use to log into your QUT Webmail)
- Copy files to your directory on the HPC Filestore (i.e. copy your .sub file, your .R file and any .Rdata or .csv files of data you need).
Note: if your .R file needs to load some data you will need to copy this across to the HPC file store and have a
load( ) or
read.table( ) line in your .R file that specifies the location of the data on the HPC filestore with a filepath something like
Step 3 - Use a secure shell to log into the HPC facility and submit your .sub file to the PBS system
Open a termial (on Windows open PuTTY).
Log into Lyra via a secure shell. If you're on campus you just need to be connected to the network, if you're off campus you need to be using the QUT Virtual Private Network (VPN) e.g. with the QUT endorsed Cisco Anyconnect VPN (have a look at the IT Helpdesk pages on this). The VPN also enables you to connect to the HPC filestore from off campus.
enter QUT your password (the same one you use to log into your QUT webmail)
Set the workding directory to where ever you copied your .sub, .R and data files.
Submit your job to the queue:
You can check the progress of your active jobs with:
Qut Cisco Anyconnect Student
If you have requested sufficient time for your R script to run and it runs without errors any results it writes out should appear in the current directory (unless you have changed directories in your .R file)
A copy of the terminal output of running your script will also be written to this directory (this is very useful for debugging your jobs).Once your job has compled you can copy your results back your machine with Nautilus (or whatever you are using to access the HPC File Store).
You can also view information on completed jobs with the following command
qjobs -x. This will output information such as the amount of RAM used over the duration of a jobs and the overall CPU utilisation given as a percentage. If you see that the CPU utilisation for a completed job is less than 50% you could well benefit from the advice of the HPC Support Team on optimizing your code.
Checking the versions of R currently available for use on Lyra
Log into Lyra with
ssh as above.
module avail r comand.
The output shoud look something like this:
For standard use of R use the versions that end in
-foss-2016a (these have been compiled with the GNU Compiler Collection). Versions of R that end in
-intel-2016b have been compiled with Intel compilers.
If you need to find the model number of the CPU your job is running on execute
qjobs -x to find the
Host/Array/GPU/mics entry for you job. It will be something like
pbsnodes cl3n004 grep cputype resources_available.cputype = E5-2680v3,avx,avx2
Informs you that
cl3n004 (Cluster 3 Node 4) has a
E5-2680 CPU.You can then Google this model number to discover it's clock speed.
Shared Memory Parallel Computing on a Single Node of the HPC Cluster
Cisco Anyconnect 4.8 Download Windows
R includes a variety of packages for parallel computing summarised on the CRAN HPC Task View here.
In this example I will use the doMC package for parallel computing.
To use doMC you need to write your .sub file slightly differently:
export MC_CORES=16 sets the global option
MC_CORES which we will import into R in the
You must set
MC_CORES to the number you supplied to
ncpus in the .sub file line
#PBS -l select=1:ncpus=16:mem=120G
We use this option to inform R how many 'cores' the CPU possesses which in turn is the number of parallel processes R can run.The Lyra nodes each have 16 or more 'cores' (actually they have 8 or more physical core cores each and each core has hyperthreading which allows it to efficiently execute 2 'cores' worth of work).
As we are using parallel computing at the R level we need to ensure that external libraries called by R do not attempt to use parallelism. This is the function of the line:
For parallel computing with the doMC package we need to run a 'batch' of R processes. This is achieved with the line:
R CMD BATCH --slave /home/username/your_R_script.R your_R_termianl_output.out
Your R script will need to load the doMC package
require(doMC) and set the number of 'cores'
doMC uses to be the number we set the
MC_CORES option to contain:
registerDoMC(cores = getOption('mc.cores', 2L)))
#PBS -j oe results in the terminal output and errors being written to a single file, in this case:
For a more comprehensive example of parallel computing with R on the QUT HPC Lyra please see Marcela's example here.
Installing an R Package for personal use on the HPC System
- Download the package source from CRAN e.g.
- Copy the package source across to the HPC filestore
- Add the command to install the package to your
.subfile (note the
R CMD INSTALLcommand must come after R has been loaded and before the command to execute the R script that loads the package).
R CMD INSTALL -l /home/username/pkgs /home/username/ranger_0.6.0.tar.gz
installs the package
ranger from the source file located at
/home/username/ranger_0.6.0.tar.gz to the location
Qut Cisco Anyconnect Login
To load the newly installed package in your
.R script use the
lib.loc argument in your
library( ) command to load the package from the location to which you have installed it on the HPC filestore store e.g.
library('ranger', lib.loc = '/home/username/pkgs/')