National Computational Infrastructure

NCI

UNSW Beginners Guide

System Overview

If you are familiar with Leonardi or Orange machine, the table below summaries the similarities and differences between these machines to help users to transition onto raijin.

  • Katana is the computational cluster supported by UNSW faculty of Science, it is not managed by NCI.
  • Leonardi is the computational cluster supported by UNSW faculty of Engineering, it is not managed by NCI.
  • Orange is the computational cluster supported by Intersect, it is not managed by NCI.
Katana Leonardi Orange Raijin
CPU Intel Xeon, all sorts AMD Opteron Intel Sandy Bridge Intel Sandy Bridge
Architecture x86_64 x86_64 x86_64 x86_64
Interconnect 10Gb/s Ethernet 10Gb/s Ethernet 40Gb/s QDR 56Gb/s FDR
core counts 2160 2944 1600 61088 + gpu + 2048 knl
cores/node 12/16/24 48/64 16 16/24/64
Memory 16.8TB:
16 x 24GB
34 x 96GB
87 x 128GB
12 x 144GB
1 x 256GB
5.8TB:
8 x 96GB
40 x 128GB
6 x 256GB
2 x 512GB
9.1TB:
90 x  64GB
13 x 256GB
160TB:
2395 x 32GB
1125 x 64GB
72 x 128GB
12 x 256GB (GPU)
3 x 1024GB
Disk Global scratch 340TB 100TB 101TB shared
+ 200TB local scratch
30PB Lustre + 1.6PB Local scratch
150GB/s on short
60 – 120GB/s on gdata
Filesystem /home (10GB per user)
/srv/scratch
/home
/share
/home (60GB per user)
/projects
/home (2GB per user)
/short
/gdata
/massdata
OS CentOS CentOS CentOS CentOS
MPI OpenMPI
OpenMPI
MPICH2
OpenMPI
SGI-mpt (based on mpich)
OpenMPI
Intel-MPI
mvapich2
Compilers PGI
GNU
Intel
PGI
GNU
GNU
Intel
GNU
Intel
Scheduler PBS Slurm PBS PBS
Single node #PBS -l nodes=1:ppn=16 #SBATCH -N=1 #PBS -l select=1:ncpus=16 #PBS -l ncpus=16
Multi node #PBS -l nodes=2:ppn=16 #SBATCH -N=2
#SBATCH –mem-per-cpu=60
#PBS -l select=2:ncpus=16:
mem=60G:mpiprocs=16
#PBS -l ncpus=32
#PBS -l mem=120GB

Register

Please go to my.nci.org.au to get user account and propose a project (under UNSW scheme) to obtain Compute and Storage grant.

unsw_mancini

You can also select software groups to join, e.g.:

unsw_software_1

unsw_software_2


Getting Started

Once account created, use the username to access our peak system, raijin.

ssh abc123@raijin.nci.org.au

A simple example job script looks like this:

Single Node Job

#!/bin/bash
#PBS -P a99
#PBS -q normal
#PBS -l walltime=20:00:00
#PBS -l mem=300MB
#PBS -l jobfs=1GB
#PBS -l ncpus=16
## For licensed software, you have to specify it to get the job running. For unlicensed software, you should also specify it to help us analyse the software usage on our system.
#PBS -l software=my_program 
## The job will be executed from current working directory instead of home.
#PBS -l wd 
 
./my_program.exe > my_output.out

Multi Node MPI Job

#!/bin/bash
#PBS -P a99
#PBS -q normal
#PBS -l walltime=06:00:00
#PBS -l mem=128GB
#PBS -l jobfs=1GB
#PBS -l ncpus=64
## For licensed software, you have to specify it to get the job running. For unlicensed software, you should also specify it to help us analyse the software usage on our system.
#PBS -l software=my_program 
## The job will be executed from current working directory instead of home.
#PBS -l wd 
 
 
module load openmpi/1.10.2
mpirun ./my_program.exe > my_output.out
 
## Please make sure your program is MPI-enabled.

To submit the job,

qsub jobscript

More detailed PBSPro usage can be found in How to use PBS.


Application Software

To see all available software:

module avail 

You can see over 300 application installed on raijin here, click on the software to see customised job script and specific license condition and job limitation.

Special ones:

  • ANSYS/Fluent, please join unsw_ansys group on my.nci.org.au. After you are approved, you will need to add the flag -l software=unsw_ansys in your PBS jobscript to access the license.
  • Matlab, please join matlab_unsw group on my.nci.org.au. After you are approved, you will need to add the flag -l software=matlab_unsw in your PBS jobscript to access the license.

Raijin Quick Guide

Filesystems
 • /home Backed up, important files. 2GB default per user.
 • /short Not backed up, temporary files.
 • /g/data Not backed up, long-term large data files.
 • /projects Backed up, important files shared amongst groups.
 • $PBS_JOBFS Not backed up, local to the node, I/O intensive data.
 • MDSS Backed up, archiving large data files.
   ○ mdss ls List files on tape
   ○ mdss dmls –l List files stautus: online (disk cache) or on tape
   ○ mdss put/get Put or retrieve files from mdss
   ○ netcp Submit a copyq job to copy files onto mdss
   ○ netmv Submit a copyq job to move files onto mdss
Accounting
 • nci_account Display compute and disk quota usage
   ○ nci_account –v Display detailed accounting information per user
 • lquota Display /home and /short and /g/data usage
 • nf_limits Display walltime/memory limits for project
 • short_files_report –G group Reports location and usage in /short owned by the group
Module command
 • module avail List available packages
 • module load/unload package Load specific package
 • module show package Show environments set by the module
 • module list List which modules are loaded
 • module use directory Use all the modules in current directory
PBSPro command
 • qsub [options] jobname Submit job in the queue
 • qdel jobid Delete job in the queue
 • qalter [options] jobid Modify resources of the jobs which are already in the queue
 • qmove destination jobid Move jobs between different queue (eg. normal to express)
 • qselect [options] Select PBS batch jobs
 • qstat/nqstat_anu Display status of PBS batch jobs
   ○ qstat –s jobid See comment of the job (why is my job not running)
PBSPro job script
#PBS –P project Specifies a project for the job
#PBS –q normal/express/copyq Specifies the destination upon submission
#PBS –l ncpus=xx Specifies the number of cpus
#PBS –l walltime=xx:xx:xx Specifies the walltime requirement
#PBS –l mem=xxxMB Specifies the memory requirement
#PBS –l jobfs=xxxMB Specifies the disk requirement
#PBS –l software=xxx Specifies all the licensed software
#PBS –l wd Starts the job from the directory it was submitted
#PBS –W depend=after:xxx Sets dependencies between this and other jobs.

Please see more details in our Raijin User Guide


Filesystem

Name(1) Purpose Availability Quota(2) Timelimit Backup
/home/unigrp/user Irreproducible data eg. source code raijin only 2GB (user) none Yes
/short/projectid Large data IO, data maintained beyond one job raijin only 72GB (project) none No
/g/data/projectid Processing of large data files global none No
massdata Archiving large data files external – access using the mdsscommand none 2 copies in two different locations
$PBS_JOBFS IO intensive data, job lifetime local to each individual raijin nodes unlimited(3) duration of job No
  1. Each user belongs to at least two Unix groups:
    unigrp – determined by their host institution, and
    projectid(s) – one for each project they are attached to.
  2. Increases to these quotas will be considered on a case-by-case basis.
  3. Users request allocation of /jobfs as part of their job submission – the actual disk quota for a particular job is given by the jobfs request. Requests larger than 420GB will be automatically redirected to /short (but will still be deleted at the end of the job).
  4. Please make sure you specify #PBS -lother=gdata1 when submitting jobs accessing files in /g/data1. If /g/data1 filesystem is not available, your job will not start. The following command can be used to monitor the status of /g/data1 on raijin and can be incorporated inside your jobscript for checking the status of /g/data1:
    /opt/rash/bin/modstatus -n gdata1_status
  5. Please make sure you specify #PBS -lother=mdss when submitting jobs accessing files in mdss. If mdss filesystem is not available, your job will not start. The following command can be used to monitor the status of mdss on raijin and can be incorporated inside your jobscript for checking the status of mdss:
    /opt/rash/bin/modstatus -n mdss_status

More detail can be found in Filesystem User Guide.


Getting Help

Please visit opus.nci.org.au to see our FAQs or send us enquiries to help@nci.org.au

In Collaboration With