NCI supports Australian researchers across all of their computational and data needs, from High-Performance Computing (HPC) through to High-Performance Data (HPD) and High-Throughput Computing (HTC).
Optimising performance is especially important for NCI’s users relying on our HPC and HTC capacity. While HPC refers to distributed and highly coordinated computation across hundreds or thousands of compute nodes at once, in HTC, many smaller, independent tasks are executed sequentially, with bottlenecks for data storage and communication steps. Research in bioinformatics and genomics often makes use of HTC, with analysis pipelines involving multiple stages repeated across thousands of samples.
NCI delivers both HPC and HTC at scale, extending the benefits of the Gadi supercomputer to more disciplines and scientific workflows. As computational science methods develop, the HPC community is turning more and more to HTC to cater for its data analysis needs.
Dr Matthew Downton is stepping into a new Associate Director role at NCI, focused on HPC Performance Optimisation and Productivity. He is working closely with users to help them make best use of the varied and powerful computing and storage infrastructures available at NCI. Dr Downton says he is most looking forward to working with a variety of researchers. “I’ll be helping our users perform world-class research by ensuring critical needs are addressed across a range of disciplines.”
NCI supports HTC users by developing and maintaining specific tools alongside our compute and data resources. Tools such as Nextflow and Cromwell for bioinformatics allow researchers to run complex pipelines on Gadi. The nci-parallel tool has also been developed to help farm out tasks across multiple compute nodes, making use of the Gadi supercomputer’s powerful nodes to speed up HTC workloads. The latter tool has been used by the Sydney Informatics Hub, a long-term NCI user, to create scalable bioinformatics workflows.
In addition to our computing capacity, NCI hosts petabytes of nationally-significant reference datasets across multiple disciplines. This on-hand storage is crucial for HTC projects, as researchers can easily work with prepared, analysis-ready datasets, or upload their own for easy, networked access. “At smaller facilities, storage is often as big a problem as computing capacity,” Dr Downton notes.
One example of a dataset that was produced using HTC at NCI is the Garvan Institute of Medical Research’s Medical Genome Reference Bank (MGRB). This large, transformative dataset was analysed on Gadi with a complex pipeline that took each of the samples through more than twenty steps of processing. The computation required tens of thousands of compute cores and hundreds of terabytes of temporary storage, but took only eight days to complete on Gadi.
“If Gadi didn’t exist, research groups would be forced to scale back their research to look at less useful problems, or at problems with lower resolution, to fit on smaller local machines with much reduced capabilities, or turn to the commercial cloud, which becomes cost prohibitive when computing and data at large scales are involved,” explains Dr Downton. “Instead, Gadi and the software tools we provide the community are efficient and powerful resources available for Australian researchers.”
NCI will continue to work with researchers from a diverse range of fields to find the best fit for their computational needs. Through providing national infrastructure, and with our expert staff to optimise their experience, we are growing the range of scientific and computational approaches we support.