Raijin User Guide
1. Logging In
2. Choosing and switching projects
4. Login environment
5. Transferring Files to raijin
6. Binary Compatibility and Recompiling
7. Batch jobs
8. JOBFS Requests
9. Copyq and mdss command
10. Quotas and Reporting
11. Using mpirun in batch jobs
12. Outstanding Issues
To login from your local desktop or other NCI computer run ssh:
ssh -l userid raijin.nci.org.au
If you use X windows, you may need to add -Y to your command line options to pass through the DISPLAY variable.
Your ssh connection will be to one of six possible login nodes (using round-robin through the DNS). As usual, for security reasons we ask that you avoid setting up passwordless ssh to raijin. Entering your password every time you login is more secure, or using specialised ssh secure agents – which we will describe more in the future. Windows users should follow the instructions given in our FAQ at http://nci.org.au/access/getting-help/faqs.
For users who had accounts on vayu prior to mid-June 2013, if you want to carry across changes to your login environment that you made on vayu you will need to copy these across from the directory from_vayu, for example:
cp from_vayu/.profile .profile
cp from_vayu/.login .login
cp from_vayu/.cshrc .cshrc
cp from_vayu/.bashrc .bashrc
Most established users’ versions of .cshrc and .bashrc from vayu may need the following lines added at the end as they are in .login and .profile.
module load dot
module load openmpi/1.6.3
module load intel-fc/188.8.131.523
module load intel-cc/184.108.40.2063
You then need to logout and login again for these settings to take effect. Changes need to be made to .bashrc and .cshrc because of the behaviour of PBSPro when starting batch jobs (see below).
All active (as of mid-June) user’s /home directories from vayu have been copied into a subdirectory of their /home directory on raijin called from_vayu. /short directories are set up but not populated with files from vayu and you should also see /short/$PROJECT/$USER.
User with new accounts set up since raijin became generally available will have the correct dot files installed in their /home directories.
At login you will not be asked which project to use. A default project will be chosen by the login shell if one is not already set in ~/.rashrc. You can change your default project by editing .rashrc in your home directory. To switch to a different project for interactive use once you have already logged in you can use the following helpful command:
Note that this is just for interactive sessions. For PBS jobs, use the -P option to specify a project.
As well as 6 login nodes there are 3592 compute nodes with following configurations:
All nodes are Centos 6.4. Note that the Linux OS requires some physical memory to be reserved for the Systems functions, leaving the following memory available to user applications:
Memory Available to User jobs:
# 31GB: r0001..r2395
# 62GB: r2396..r3520
# 126GB: r3521..r3592
All nodes have 16 cpu cores, meaning that OpenMP shared memory jobs that were on vayu previously restricted to 8 cpu cores can now run on up to 16 cpu cores. The architecture of each node is 2 sockets with 8 CPU cores each. As in the past, please check that your code can scale to these greater number of cores – many codes don’t.
MPI jobs that request more than 16 CPU cores will need to request full nodes, that is, a multiple of 16 in #PBS -l ncpus.
Raijin uses Intel Xeon E5-2670 CPUs on the compute nodes with the following parameters:
Standard frequency: 2.60GHz (26 x 100MHz BClk)
Turbo boost additional multipliers: 7/7/6/6/5/5/4/4.
This effectively gives raijin’s compute nodes a base frequency of 3.00GHz, rising to 3.3GHz if 1 or 2 cores are fully utilised.
At login users will have modules loaded for pbs, openmpi and the Intel Fortran and C compilers.
See module avail for the full list of software installed. Not all packages have been ported from vayu but please email firstname.lastname@example.org if there is something that you require urgently.
If you are transferring data to raijin from off-site please scp/rsync/sftp to r-dm.nci.org.au rather than to raijin.nci.org.au. The login nodes should be used for normal interactive load rather than data transfers.
We recommend that codes be recompiled on raijin for maximum performance. However many binaries from vayu will work without recompilation.
We would prefer to only make available recent versions of the Intel compilers. So far we have not ported version 11.1.046 which was the default for vayu. This may result in the need to recompile codes to use the more recent compiler versions. See module avail intel-fc to see the versions of the Intel compilers available.
Current recommendations are to use the option -xHost for maximum performance on the Intel processors. Version 220.127.116.113 has been set as the default as there have been some problems reported with the version 13 compilers. Type module avail intel-fc to see what versions are currently installed.
We also recommend that users build with openmpi/1.6.3 for MPI executables. Any MPI binaries from vayu should be rebuilt in order to use the most recent version of OpenMPI but openmpi/1.4.3 will be available to support some pre-built packages. The mpirun command produces warning output which can be ignored at this stage.
We are using PBSPro for job submission and scheduling. Many batch scripts from vayu and xe should work without major changes. The known differences are as follows:
- Users should use the -lmem request in their batch jobs rather than -lvmem as for vayu. You may need to experiment to find the best value for a mem request for your jobs.
- #PBS -wd becomes #PBS -l wd to start the batch job in the working directory from which it was submitted.
- Make sure that the $PROJECT variable is set before submitting a job or ensure that your script includes a line such as #PBS -P z00 qsub -v PROJECT will also mean that the batch job runs under the correct project.
- The standard PBS format of #PBS -l nodes or #PBS -l select are not being allowed at the moment. We are looking at relaxing this, to allow the syntax. However, our intention is to have jobs allocated to full nodes rather than partial nodes. In the meantime, please use our previous allowed syntax of #PBS -lncpus.
- Batch jobs do not start as though they were a fresh login as has been the practice on vayu. This means that modules that you load in .login or .profile will not be loaded in batch jobs. You need to edit the .cshrc (for tcsh) or the .bashrc (for bash) file in your /home directory in order to load modules automatically in batch jobs. Otherwise you need to explicitly load all modules and environment variables that are needed for the batch job in the script.
- You may find the command qstat useful e.g. qstat -a to list all running jobs and qstat -f jobid to show the resources being used by a running job. The command nqstat is also available and has options such as -a for all jobs, -u for a particular user and -P for a project.
- The vayu NCI commands such as qps, qcat, qls etc are being ported and will appear in due course. (qps and qcat are available now). Please contact email@example.com if you need to run longer jobs and this can be arranged.
- The current default walltime limits for the queues are as follows:
Copyq has a 10 hour walltime limit.
Express queue walltime limit is 24 hours for up to 511 cores, 10 hours for 512-1023 cores, 5 hours for larger and is restricted to 31GB mem per node.
Normal queue walltime limit is 48 hours for 1-255 cores, 24 hours for 256-511 cores, 10 hours for 512-1024 and 5 hours for larger.
- The command nf_limits -P -n -q will show your current limits. If you require exemptions to these limits please contact firstname.lastname@example.org.
- The vayu command nf_limits to show batch job limits for walltime, memory, JOBFS scratch space etc will soon be available on raijin.
- The man pages for qsub and pbs_resources are not yet correct as they need to be rewritten to include the changes that have been made to standard PBSPro.
- If you need use pbsdsh with the -N option please use pbsdsh_anu for the moment. There is a significant difference between pbsdsh on vayu and pbsdsh (pbsdsh_anu) on raijin. On vayu, pbsdsh working directory is set to $PBS_O_WORKDIR; on raijin, it is set to the user home directory. Therefore, if your script is using commands like pbsdsh -N cp file $PBS_JOBFS it needs to be modified to have pbsdsh_anu -N cp $PBS_O_WORKDIR/file $PBS_JOBFS. Furthermore, a double dash needs to be placed between pbsdsh options and the command for pbsdsh to run, e.g.
pbsdsh -n 16 — cp -a $PBS_O_WORKDIR/directory $PBS_JOBFS
Job scratch space, jobfs, can be requested as has been the experience in the past. Currently small JOBFS requests will be supplied by local disk and larger ones placed into larger scratch spaces on Lustre. Currently this means that large JOBFS requests are using /short. PBSPro are working on a fix so that the /short quota for the project will be increased by the JOBFS request for the duration of the job so that you do not run out of quota on /short. This is not yet in place so, if you are experiencing I/O error or messages such as Disk quota exceeded please contact us to have your /short quota increased so you can run jobs with large JOBFS requests.
The copyq queue has a maximum walltime request of 40hrs. The default is 10 hrs if you do not specifically request the walltime.
We have moved to a new accounting and reporting system for raijin, which is not integrated with the old accounting database used on vayu. This means that commands such as quotasu, quota -v etc will not work. A new version of quotasu is called nci_account. It provides more information than the current command and allows for reporting of usage that are funded by multiple partners as well as giving information on grants for storage. For usage information:
Note that we are still modifying the format and updating information presented. However, if you notice issues please contact us.
Note that the units used for SU/CPU for the grant and queue tables may be different e.g. We use dynamic units in the budget table (KSUs, MSU), and SUs in the per-queue table.
File system quotas are currently being implemented using the raw Lustre file system quotas. Initially these will be fairly restrictive but please contact email@example.com if you believe you have a case for your project to be changed. As with our systems in the past, the /home quota is set quite small but there is much more space available in /short/$PROJECT. The command lquota will give you details on your project’s quota limits and usage.
Note that as these are hard Lustre filesystem limits if you exceed them you will not be able to write files and will receive the error Disk quota exceeded. We will be introducing soft quotas (that will stop the running of PBS jobs) in the near future.
Our PBSPro does not currently support cpusets so it is possible for two small (i.e. fewer than 16 cpu) OpenMPI jobs to be scheduled to run on the same cpus. Experience suggests that using
mpirun -np $PBS_NCPUS -bind-to-none run.exe
will avoid this problem. We will be investigating this more and may modify the mpirun wrapper to automate this process but expect that future releases of PBSPro will handle cpusets.
IPM profiling is now available for all versions of openmpi, module load ipm in your batch script to generate a profile. ipm_view is also available, to view the profile, follow the procedure outlined here:
- If a node fails while running a job it may disappear from the queue. There should be an explanation in the .o. If you notice a job disappear without any explanation, please let us know the jobids.
- We are still making some changes to the Lustre filesystem setup as the performance of reads was slightly lower than expected. This temporary arrangement will be addressed in the next couple of weeks when some additional hardware is installed.