Reprocessing a highly valuable genomic dataset

Genetic medicine in coming years will benefit from the technical innovations that NCI and the Gadi supercomputer are now enabling. The Garvan Institute of Medical Research is refreshing its 4,000-strong database of genomes from healthy seniors – the Medical Genome Reference Bank (MGRB). Free from the markers of heart disease, cancer and diabetes, these healthy genomes form a robust comparison dataset for sequenced patient genomes that doctors might have. Initially processed and stored at NCI in smaller batches, researchers and NCI staff reached a milestone in 2017 by processing 1,000 of the genomes simultaneously over the course of a single night. Now, advances in the complex software controlling the hundreds of steps required have enabled reprocessing of all 4,012 MGRB genomes in one go.

Reprocessing using the latest scientific understanding about human genomes allows researchers to have the most accurate data at hand for diagnosis and treatment. As genetic medicine becomes more central to our medical process, being able to compare patient test results with the rigorous baseline set out by the MGRB will be a key factor. Doctors and clinicians will be able to reach diagnoses much quicker, especially, we hope, for rare genetic diseases with complex and debilitating symptoms.

Reaching this number of processed genomes takes a lot of scientific effort. The sequencing data from thousands of genomes gets transferred from Garvan’s Sydney laboratory down to NCI in Canberra, at which point the reprocessing begins. Thousands of tiny snippets of sequences get compared, lined up and combined into a long string making up the entire human genome. From the snippets to the final sequence takes more than 40 different computational steps, all guided by the expertise of the bioinformaticians and programmers who built the software.

The final product is a treasure trove of valuable genomic data. The entire MGRB is securely shared with approved Australian and international genome researchers. The medical benefits to come from this modern dataset, built using the Gadi supercomputer’s performance and filesystem speed, are only beginning to be realised.

Read other research highlights about the MGRB:

Reprocessing a highly valuable genomic dataset

Need support?

Need help from one of our NCI support staff?

Help desk

Site search

Need support?

Need help from one of our NCI support staff?

Help desk