As a trusted national data repository, NCI optimises some of the largest and most significant datasets requiring our high-performance capabilities
NCI curates and optimises nationally and internationally significant reference datasets, making them suitable for data-intensive science as well as publication for broader access. These data collections are used by scientists, government agencies and industry to undertake research which underpins many important scientific advancements and decision-making.
The NCI team carries out a series of quality assurance tests prior to publication to ensure datasets are suitable for HPC analysis and advanced data analysis techniques for our large number of computational users.
NCI Data Repository
NCI manages one of Australia’s largest collections of curated research data, with tens of petabytes of nationally significant datasets now registered within our National Research Data Repository. The data includes many environmentally significant datasets such as international climate modelling datasets, a huge corpus of time-series satellite imagery for the Australasian region and the globe, Australian geophysics reference datasets, and emerging big data reference datasets in optical astronomy and genomics.
The NCI data management team ensures that we have expertly curated and optimised these datasets for HPC analysis alongside advanced data analysis techniques for our large number of computational users. This standardised approach to data management and use of internationally recognised protocols increases the harmonisation of data and interoperability across different science domains.
As well as being used on our platforms, the data is also tuned for new high-performance data and informatics technologies. This allows a range of new applications, from access by mobile or desktop devices that contact NCI to analyse data on-the-fly, through to information queries over massively distributed data protocols. These technologies provide an invisible fabric of services which are now used routinely and seamlessly by the research communities.
The datasets at NCI are published so as to be findable through our publicly accessible data catalogue. Our team makes sure that datasets are correctly catalogued, published and cited according to international standards, the FAIR principles of data (Findable, Accessible, Interoperable, Reusable) and open to the research community. Our data catalogue is referenced by journal publications, and is harvested by many aggregators in Australia and abroad to help users find data we manage and publish. Each of these steps makes the data more easily discoverable and shareable for their communities.