Australian computational innovation leads development of world-class Medical Genome Reference Bank
FRIDAY 16 DECEMBER 2016
Medical research in Australia and throughout the world will benefit greatly from the development of the Medical Genome Reference Bank, the world’s first publicly available genome bank specifically of healthy older people, developed by the Garvan Institute of Medical Research, funded by NSW Health and powered by the National Computational Infrastructure (NCI).
With the technical expertise and computational performance available at NCI, over 1200 whole human genome sequences are now available to researchers worldwide through an online portal (https://sgc.garvan.org.au).
NCI’s Raijin supercomputer was instrumental in the computation of the whole human genome sequences. This process, including simultaneous alignment of all 1200 genomes, was completed within 24 hours and used almost 20,000 cores during peak computation periods, or over a third of Raijin’s entire computational capacity.
Associate Professor Marcel Dinger (Head, Kinghorn Centre for Clinical Genomics), who co-leads the Medical Genome Reference Bank project, said that without the compute power of NCI this project could not have been delivered.
“Sequencing and analysis of genomic data requires dedicated and highly-specialised resources and teams. Our partnership with the National Computational Infrastructure has provided an ideal complement to our genomics capabilities and has really accelerated our ability to get this important resource into the research community,” said Associate Professor Dinger.
NCI Director Professor Lindsay Botten, says the success of the Medical Genome Reference Bank is an indication that Australian advanced computing remains an important cornerstone of innovation for our research priorities.
“The scale of this computation illustrates both how large these clinically-relevant datasets are becoming, and how important it is to have access to massive computational power to achieve such outcomes in near real time,” he said.
“NCI is pleased to be able to provide the infrastructure and technical assistance that is needed to drive such an important initiative.”
Processing a single genome sequence is a data-intensive computational activity, ingesting raw sequence data of 60-100Gb and producing annotated output of approximately 200Gb in size. Storing thousands of these sequences becomes an enormous task that requires the significant data storage resources available at NCI. Up to 300Tb of NCI’s high-speed parallel file system storage was used at any one time during the genome computation.
With the backing of NCI’s capabilities, the number of genomes within the Reference Bank will increase to over 4000 in 2017, making it the largest single Australian genomic cohort.
The Medical Genome Reference Bank is the world’s largest publicly available genome bank specifically of healthy older people. The 1200 genome sequences were all provided by healthy Australians over the age of 70. These healthy older Australians and their genome sequences represent an important reference dataset to build our understanding of medically important variation in the human genome. Unaffected by ailments such as heart disease, neurodegenerative disease and cancer, the genomes held with the Reference Bank are expected to contain few genetic variants that are associated with disease, and therefore will act as an important control for medical researchers.
The genomes held within the Reference Bank will enable researchers to better diagnose genetic disease in patients, and may even help to unlock the secrets of healthy aging.