Garvan partnership to grapple with big data in genomics
The Garvan Institute of Medical Research has become a collaboration partner of the National Computational Infrastructure (NCI), bringing together the southern hemisphere’s largest genome sequencing centre and its most powerful supercomputing environment for data-intensive research. Together, the two institutions will develop systems for the secure, centralised storage and analysis of genomic information in Australia.
The collaboration will mean that the large-scale genomic data generated at Garvan can be archived in a cost-effective and secure manner. In addition, collaborating research partners will be able to analyse Garvan’s genomic information in a secure environment by using the NCI’s supercomputer or high-performance cloud computing infrastructure.
The collaboration with Garvan marks a new direction for NCI, whose hosted datasets have until now focused on geological and meteorological data, climate science, and information from satellite imagery.
Professor Lindsay Botten, Director of NCI, says, “I am very excited about the collaboration with Garvan—one which sets an important new direction for NCI, and which provides an opportunity to bring the same transformational outcomes to genomics research that its ‘big data’ technologies have delivered to the environmental portfolio.
“NCI is strongly outcome-driven, and I am therefore delighted that we are partnering with Garvan to deliver an infrastructure platform that will be crucial for genomics research at the population scale.”
Dr Warren Kaplan, Chief of Informatics at Garvan’s Kinghorn Centre for Clinical Genomics, says that NCI provides an ideal environment to accommodate Garvan’s rapidly increasing computational and data storage needs.
“There are over 70 bioinformaticians working on genomic data at Garvan, and we are generating mind-bogglingly large amounts of genomic information. Until now, Garvan researchers have stored and analysed that information within our own High Performance Computing Infrastructure.
“However, as we scale to tens of thousands of genomes per year, it’s timely that we, in collaboration with NCI, switch to a new model of storing and analysing large-scale genomic data.
“We seek to make Garvan an attractive destination for the best genome scientists in the world. With our partnership with NCI and the dedicated high-speed link connecting the two sites, Garvan is well placed to retain its position amongst the finest genome-empowered medical research institutes.”
Dr Kaplan also explains how the collaboration will facilitate responsible data sharing between Garvan and other genomic researchers across Australia.
“Genomic datasets are now so large that it’s no longer feasible to be sharing data with others by copying it to different locations. Instead, a more workable approach is for the analysis to come to the data – and we see the NCI as the natural home for Australia’s genomic data.
“By storing genomic data at NCI, it will become easier for Garvan’s collaborators across the country to access data for research purposes, while maintaining strict rules of access that ensure data remains secure.
Professor Chris Goodnow, Deputy Director of Garvan, sees the collaboration as a big step forward in how Australia manages genomic information.
He says, “Some things are just best handled at the national scale, and the secure storage and analysis of genomic information is one of those things.
“NCI provides an academically accessible but secure computational environment, so it’s an ideal repository for the large-scale genomic datasets that Garvan is producing.”
“This is not just about Garvan and NCI – this is doing something good for all Australia.”
As Australia’s national, high-performance research computing facility, NCI manages the Southern Hemisphere’s most integrated supercomputer and filesystems, delivering high-quality computational and data services to researchers in three national science agencies, and nearly 30 of Australia’s universities.
NCI is home to one of Australia’s largest data catalogues, hosting over 10 petabytes (10 billion megabytes) of nationally and internationally significant research data. Its Raijin supercomputer has a peak performance of 1.2 petaflops, enabling Australian researchers to work with their data in ways that would not otherwise be possible.
Garvan is one of Australia’s leading medical research institutions, and is at the forefront of next-generation genomic sequencing in Australia. In 2014, Garvan acquired the HiSeq X Ten sequencing platform, making it possible to sequence 18,000 whole human genomes per year. At full capacity, Garvan generates approximately 1800 terabytes (1.8 billion megabytes) of archivable data annually.