Climate Science Data-enhanced Virtual Laboratory
Enhancing climate research capabilities in Australia
The Australian Climate Science Data-enhanced Virtual Laboratory (Climate DeVL) project was established to further develop the Australian research environments and data management capabilities to address the data access, data analysis and user support aspects of this community. This includes coordinating the ongoing development and availability of training materials as well as updating the research platforms and providing access to tools for collaboration and which enable the community to develop standardised workflows suitable for the intensive data analysis required for the Coupled Model Intercomparison Project phase 6 (CMIP6).
Australia’s contribution to CMIP6
A major focus of the Australian climate research community currently is the contribution to the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project phase 6 (CMIP6). CMIP6 is an internationally coordinated research activity that provides climate model output from a series of carefully designed and targeted experiments. The analysis of CMIP6 data will form the basis for assessments by the Intergovernmental Panel on Climate Change (IPCC) and inform policy- and decision-makers around the world.
For Australia, CMIP6 will underpin research into historical climate variability as well as future projections research into the timing, extent and consequences of climate change and extreme events. This work may be used to assist Australian government, business, agriculture and industry to manage climate risks and opportunities related to climate variability, change and extremes.
Climate research is computationally-demanding and requires data-intensive High Performance Computing (HPC). About 20 PBytes of CMIP6 data are expected globally, the largest collection of climate data ever produced, of which a substantial portion will be made available and analysed at NCI. The complexity and volume of this massive data archive means that NCI’s integrated data storage, supercomputing and data services combined with a deeply collaborative community effort is essential for Australia’s contributions for CMIP6.
Overcoming the challenges of ‘big data’
The unique challenges of the CMIP in both size and complexity has required the development of new and upgraded services. A significant example is the need for users to search the global CMIP data index and then to find what data is available at NCI for use in analysis. This need has been addressed through the integration of services and further development of mechanisms for improved accessibility and usability of the data. Key to this is the NCI’s Metadata Attribute Search (MAS) which provides consistent access to the information contained in the climate data collections by harvesting the metadata within the millions of self-describing files that constitute the CMIP data collection. The MAS then underpins a python-based API called CleF, developed by the ARC CoE for Climate Extremes (CLEX), which provides command line search tools for accessing this data. Combined, CleF provides researchers with an easy interface to discover what CMIP data has been published that match their specified requirements (experiment, variable, etc.).
The climate data at NCI is provided using the principles of FAIR: Findable, Accessible, Interoperable and Reusable. Providing a FAIR data service for such a large and complex data collection exposes significant data management challenges. NCI’s Data Quality Strategy (DQS) delivers data curation practices that permit FAIR standards and interdisciplinary data availability. This includes a community approach to define the highest priority CMIP6 data needing to be replicated in Australia for local analysis, to permit timely development and publication of scientific research papers analysing the CMIP6 data as it becomes available. The outcome of this work is streamlined data access and analysis of CMIP6 data, enabling efficient state-of-the-art climate science research to be undertaken.
The Climate DeVL is a collaborative project involving NCI, CSIRO, the Bureau of Meteorology, the ARC Centre for Climate Extremes and co-funded by the Australian Research Data Commons (ARDC). It builds on previous Australian e-infrastructure programs, the Climate & Weather Science Lab, and the National Earth Systems Data Collection and Data Services programs. It also supports NCI’s leading role in international collaborations, most notably the Earth System Grid Federation (ESGF) that provides the international federated capability for CMIP data. The value of this work required funding from the participants and various parties including other NCRIS funding programs ANDS, RDS, and NeCTAR (now ARDC) NCRIS programs. This infrastructure directly contributed to outcomes achieved through other major investments from government-funded research including CAWCR (Collaboration for Australian Weather and Climate Research), NESP (National Environmental Science Program) and the ARC CoE for Climate System Science (ARCCSS) and ARC CoE for Climate Extremes (CLEX).