Raijin User Guide
- Getting Started
- Job Submission and Scheduling
- File Systems
- Changes from Old System to Raijin
- Useful Links
- To gain access to our system, you first need to fill in the forms of Registration and Connection. Make sure you fill in the connection form in order to get a username and password.
- To apply for new projects, please fill in the form here: Application for Resource
- If you want to update your details, please fill in the form here: Update User Details
To login from your local desktop or other NCI computer run ssh:
ssh -l userid raijin.nci.org.au
Your ssh connection will be to one of six possible login nodes, raijin[1-6] (If ssh to raijin fails, you should try specifying one of the nodes, i.e. raijin3.nci.org.au). As usual, for security reasons we ask that you avoid setting up passwordless ssh to raijin. Entering your password every time you login is more secure, or using specialised ssh secure agents.
Connecting under Unix/Mac:
- For ssh – ssh
- For scp/sftp – scp, sftp
- For X11 – ssh -Y, make sure you installed XQuartz for OS X 10.8 or higher.
Connecting under Windows:
- For ssh – putty, mobaxterm
- For scp/sftp – putty, Filezilla, winscp, mobaxterm
- For X11 – Cygwin, XMing, VNC, mobaxterm
If you are connecting for the first time, please change your initial password to one of your own choice via the passwd command, which will prompt you as below: (Note the % is the command prompt supplied by the interactive “shell” as in all examples in this document – it is not something you type in.)
% passwd Old password: New password: Re-enter new password:
Interactive Use and Basic Unix
The operating system on all systems is Linux. You can read our Unix quick reference guide for basic usage.
When you login you will come in under the Resource Accounting SHell, (referred to as RASH), which is a local shell used to impose interactive limits and account for the time used in each interactive session.
Your account will be set up with an initial environment via a default
.login file, and an equivalent
.profile file, as well as a
.rashrc file. The
.rashrc file can be edited to change the default project (see Project Accounting) and the command interface shell to be started by RASH as you login. Your initial command interface shell will be the
bash. You can change this to
tcsh by changing the line in
setenv SHELL /bin/bash
setenv SHELL /bin/tcsh
instead. Other shells including
ksh are available but may not provide the same support for modules as
bash. There has been a local modification made for ksh and details of that are here. If you try to use a shell not registered with rash for the particular machine you will default to the
Each interactive process you run on the login nodes has imposed on it a time (30mins) limit and a memory use (2GB) limit. If you want to run longer or more memory intensive interactive job, please submit an interactive job (qsub -I), see Interactive PBS Jobs in the section below for more details.
At login you will not be asked which project to use. A default project will be chosen by the login shell if one is not already set in
~/.rashrc. You can change your default project by editing
.rashrc in your home directory. To switch to a different project for interactive use once you have already logged in you can use the following helpful command:
Note that this is just for interactive sessions. For PBS jobs, use the
-P option to specify a project.
Monitoring Resource Usage
nci_accountdisplays the usage of the project in the current quarter, as well as some recent history of the project if available. It also shows the /short and massdata storage system for the projects which you are connected to. You can also use
-vto display detailed accounting information per user.
lquotadisplays your disk usage and quota in your home directory and the
short_files_reportreports /short files usage. Use
-G projectto see location and usage in /short owned by the group and use
-P projectto see group and user information of files in /short/ folder.
nf_limits -P project -n ncpus -q queuedisplays walltime, memory limits for user. More default resources limits can be found in the section Queue Limits below.
The systems have a simple queue structure with two main levels of priority; the queue names reflect their priority. There is no longer a separate queue for the lowest priority “bonus jobs” as these are to be submitted to the other queues, and PBS lowers their priority within the queues.
- high priority queue for testing, debugging or quick turnaround
- charging rate of 3 SUs per processor-hour (walltime)
- small limits particularly on time and number of cpus
- the default queue designed for all production use
- charging rate of 1 SU per processor-hour (walltime)
- allows the largest resource requests
- specifically for IO work, in particular, mdss commands for copying data to the mass-data system.
- Note: always use
-l other=mdsswhen using mdss commands in
copyq. this is so that jobs only run when the
the mdss system is available.
- runs on nodes with external network interface(s) and so can be used for remote data transfers (you may need to configure passwordless ssh).
- tars, compresses and other manipulation of /short files can be done in copyq.
- purely compute jobs will be deleted whenever detected.
Most projects can continue to submit jobs when their time allocation is exhausted – such jobs are called “bonus jobs”.
but are in fact submitted to either of the
- bonus jobs:
- queue at a lower priority than other jobs and will generally only run if there are no non-bonus jobs
- are more suspendable than non-bonus jobs
- make use of otherwise idle cycles while minimally hindering other jobs
- may be terminated if they are impeding normal jobs or for system management reasons (usually jobs are just suspended)
- Please note jobs requesting more than 160 cpus will never run when the project is in bonus. You will have to reduce the number of cpus in your job request or wait until next quarter.
There are many reasons jobs may be prevented from starting. The first thing to do is to run “qstat -s jobid”; this will print the comments from the job scheduler about your job.
- If you see a “–” after the job, it means the scheduler has not yet considered your job. Be patient.
- If you see “Storage resources unavailable”, it means that you have exceeded one of your storage quotas. Run “nci_account” to get more information.
- If you see “Waiting for software licenses”, it indicates that all the licenses for a software package you have requested are currently in use.
- If you see “Not Running: Insufficient amount of resource ncpus”, it indicates that all the cpus are busy. Please be patient, PBSPro scheduling is based on resources available and request, see our scheduling policy for more details. Furthermore, at the beginning and close to the end of each quarter, number of jobs increases significantly compare to the other time period, hence a longer waiting time. You can also find out about the current raijin usage at our website:
We are using PBSPro for job submission and scheduling. For example, a sample job script looks like this:
#!/bin/bash #PBS -P a99 #PBS -q normal #PBS -l walltime=20:00:00 #PBS -l mem=300MB #PBS -l wd ./a.out
You submit this script for execution by PBS using the command:
% qsub jobscript
More detailed PBSPro usage can be found in How to use PBS.
Note: Please make sure you specify #PBS -lother=gdata1 when submitting jobs accessing files in /g/data1. If /g/data1 filesystem is not available, your job will not start.
-I option for
qsub will result in an interactive shell being started out on the compute nodes once your job starts. A submission script cannot be used in this mode – you must provide all qsub options on the command line. To use X windows in an interactive batch job, include the
-X option when submitting your job – this will automatically export the
DISPLAY environment variable.
Your job is subject to all the same constraints and management as any other job in the same queue. In particular, it will be charged on the basis of walltime, the same as any other batch job, since you will have dedicated access to the cpus reserved for your request. Don’t forget to exit your interactive batch session when finished to avoid both leaving cpus idle on the machine and wasting your grant!
Interactive batch jobs are likely to be used for debugging large or parallel programs etc. Since you want interactive response, it may be necessary to use the express queue to run immediately and avoid your session being suspended. However the express queue attracts a higher charging rate, so again avoid leaving the session idle.
nf_limits -P project -n ncpus -q queue will show your current limits.
If you require exemptions to these limits please contact email@example.com.
The current default walltime and cpu limits for the queues are as follows:
|maximum jobs allowed queuing (running) per project||available memory per node||default cpu limit||default walltime limit|
|express||express (route)||5 queuing only||—||—||24 hours for 1-16 cores
5 hours for 17-128 cores
|normal||normal (route)||200 queuing only||—||—||48 hours for 1-255 cores
24 hours for 256-511 cores
10 hours for 512-1024 cores
5 hours for 1025-56064 cores
|copyq||copyq||200 (25)||32GB||1||10 hours|
The number of jobs that you can have running at any given time depends on the availability of resources. For express-def, max jobs allowed running also depends on the number of cpus request.
The version of PBS used on NF systems has been modified to include customisable per-user/per-project limits:
- All limits can be (and are intended to be) varied on a per-user or per-project basis – reasonable variation requests will be granted where possible.
- Resources on the system are strictly allocated with the intent that if a job does not exceed its resource (time, memory, disk) requests, it should not be unduly affected by other jobs on the system. The converse of this is that if a job does try to exceed its resource requests, it will be terminated.
- Please note jobs requesting more than 256 cpus will never run when the project is in bonus. You will have to reduce the number of cpus in your job request or wait until next quarter.
As well as 6 login nodes there are 3592 compute nodes with following configurations:
All nodes are Centos 6.5. Note that the Linux OS requires some physical memory to be reserved for the Systems functions, leaving the following memory available to user applications:
Memory Available to User jobs:
|32GB:||r1..r2395||(~67% of all nodes)|
|64GB:||r2396..r3520||(~31% of all nodes)|
|128GB:||r3521..r3592||(2% of all nodes)|
All nodes have 16 cpu cores, meaning that OpenMP shared memory jobs that were on vayu previously restricted to 8 cpu cores can now run on up to 16 cpu cores. The architecture of each node is 2 sockets with 8 CPU cores each. As in the past, please check that your code can scale to these greater number of cores – many codes don’t.
In a PBS job script, the memory you specify using the -lmem= option is the total memory across all nodes. However, this value is internally converted into the per-node equivalent, and this is how it is monitored. For example, since raijin has 16 CPUs per node, if you request -lncpus=32,mem=10GB, the actual limit will be 5GB on each of the two nodes. If you exceed this on either of the nodes, your job will be killed.
Please check out our FILIESYSTEMS page for more details.
MPI related issue
Our PBSPro does not currently support cpusets so it is possible for two small (i.e. fewer than 16 cpu) OpenMPI jobs to be scheduled to run on the same cpus. Experience suggests that using
mpirun -np $PBS_NCPUS -bind-to-none run.exe
will avoid this problem. We will be investigating this more and may modify the mpirun wrapper to automate this process but expect that future releases of PBSPro will handle cpusets.
MPI jobs that request more than 16 CPU cores will need to request full nodes, that is, a multiple of 16 in #PBS -l ncpus .
For MPI profiling and performance analysis tools, please see here for more details:
||Irreproducible data eg. source code||raijin only||2GB (user)||none||Yes|
||Large data IO, data maintained beyond one job||raijin only||72GB (project)||365 days||No|
||Processing of large data files||global||none||No|
|massdata||Archiving large data files||external – access using the
||20GB||none||2 copies in two different locations|
||IO intensive data, job lifetime||local to each individual raijin nodes||unlimited(3)||duration of job||No|
- Each user belongs to at least two Unix groups:
unigrp– determined by their host institution, and
projectid(s) – one for each project they are attached to.
- Increases to these quotas will be considered on a case-by-case basis.
- Users request allocation of
/jobfsas part of their job submission – the actual disk quota for a particular job is given by the
jobfsrequest. Requests larger than 396GB will be automatically redirected to /short (but will still be deleted at the end of the job).
- Please make sure you specify #PBS -lother=gdata1 when submitting jobs accessing files in /g/data1. If /g/data1 filesystem is not available, your job will not start. The following command can be used to monitor the status of /g/data1 on raijin and can be incorporated inside your jobscript for checking the status of /g/data1:
/opt/rash/bin/modstatus -n gdata1_status
- Please make sure you specify #PBS -lother=mdss when submitting jobs accessing files in mdss. If mdss filesystem is not available, your job will not start. The following command can be used to monitor the status of mdss on raijin and can be incorporated inside your jobscript for checking the status of mdss:
/opt/rash/bin/modstatus -n mdss_status
At login users will have modules loaded for
openmpi and the Intel Fortran and C compilers.
module command syntax is the same no matter which command shell you are using.
module avail will show you a list of the software environments which can be loaded via a
module load package command.
module help package should give you a little information about what the
module load package will achieve for you. Alternatively
module show package will detail the commands in the module file. Please see module manual for more details.
Access to the licensed third-party software package is granted by adding user to the appropriate software Unix group. Before that, user must fulfil all license requirements as stated in the ‘Access prerequsites’ on the third-party software package page in the ‘Software Available‘ Section.
Major differences are shown here:
|-l vmem||-l mem|
|quotasu and quota||lquota an nci_account|
More details can be found here.
- Raijin Quick Reference Guide
- Application Software
- How to use PBS
- Emergency & Downtime Notices
- Raijin Live Status
- Training Courses
- /g/data FAQs
- Software Development
Detailed instructions on compiling, parallel programming with MPI or OpenMP, debugging, profiling and benchmarking.
- Parallel Program Debugging
User guide on parallel debugging programs PADB and TotalView.
- Profiling — General Performance Analysis Tools
User guide on profiling programs HPCToolKit, OpenSpeedShop (OSS) and gprof.
- Profiling — MPI Performance Analysis Tools
User guide on MPI profiling and tracing programs IPM, mpiP, PAPI, darshan and Vampir.
- Debuggers & Profilers & Simulators
Usage on each debugger, profiler and simulator program can be found under Debuggers & Profilers & Simulators in our Application software.
- Debugging Memory Problems
Finding memory problems in code can be a difficult task but there are tools available on the National Facility machines to make it possible.
- Canonical User Environment Variables
There are many environment variables that users can set or adjust to customise their environment. Some of these, like $PATH, are well-known and well-understood. Many, however, are poorly known and understood. This is particularly true of variables that are used by compilers, linkers, etc when building software. This page attempts to document such variables, including how, when, where and why they should be used, as well as any clashes and/or gotchas.
- Profiling Performance Tool Presentation
Presentation on performance tool @ NF systems.