Status update – January 2014
Development of Data Management at NCI
Data services and management form an important and growing component in the NCI portfolio of activities, both complementing and integrating with the comprehensive, high-performance environment that supports computational and data-intensive research nationally. It is thus with pleasure that we announce the recent appointment of Dr Jingbo Wang as NCI’s foundation Manager of Collections Services. Jingbo joins NCI from Geoscience Australia, where she served as Seismic Data Manager, will lead the development and implementation of a growing capability in areas of data policy, data curation, and life-cycle management, all of which are essential to the sustainable delivery of research data services to researchers and stakeholders of NCI.
It is planned to grow the data collections team as opportunities permit, in order to keep pace with the growing emphasis on research that is data driven, and to enhance the delivery of services aligned with the research objectives of national communities and partner organisations.
NCI’s Storage Installation
Data services from NCI are established on a storage platform that has evolved over the past three years, and which is closely integrated with the high-performance computing environment. Over the past year, the filesystem has been completely redeveloped to provide both performance and robustness. By March 2014, NCI will have 10 (real) PBytes on disk in a Lustre (parallel) filesystem that is built from a combination of DDN and SGI disk arrays, and driven by a large bank of Dell servers to provide a sustained throughput of 25 GBytes per second—identical to the performance of the scratch space on NCI’s previous HPC system (vayu) that was decommissioned in October 2013. This complements the performance of the scratch filesystem on Raijin, the current peak system (a Fujitsu 1.2 Pflop supercomputer), which has a sustainable throughput of 150 GBytes/sec—the fastest filesystem in the southern hemisphere.
Of the current installation of 10 Pbytes, about 3 PBytes will have been provided with support under the RDSI ReDS program, while the remainder has been put in place through investments from NCRIS, Super Science and NCI partner contributions over the past three years. The Lustre filesystem, which is supplemented by a substantial tape library of four Spectra Logic T950 systems spanning two data centres, forms the backbone of the NCI environment, with each major computational system (supercomputer, cloud facilities) able to mount the Lustre filesystem for high-speed access. The importance of this filesystem to NCI’s operations is reflected in the support structure which comprises an internal team of six, led by Daniel Rodwell, Manager of NCI’s Storage Systems, and which is supplemented by on-site contracted support from DDN and SGI, together with back-end support from Intel (Whamcloud). Further details about the storage platform are available on the website, nci.org.au.
Status of the NCI Node of RDSI
The NCI node of the RDSI network was established with the specific goals of:
- Making accessible significant collections that are currently held by national agencies;
- Complementing these with other nationally and internationally significant collections, and combining datasets held by research communities into coherent collections; and ;
- Establishing these collections in a rich environment of high-end computational and data-intensive services.
The collection policy is informed by the priorities of the sustaining partners (ANU, CSIRO, BoM and GA), with a particular emphasis on
- Climate and earth system science,
- Geosciences, including earth observation,
- Astronomy, particularly optical astronomy
together with support for the life science, the physical sciences and the social sciences and humanities.
As of January 2014, the Allocation Committee had authorised an aggregate of 14.714 PBytes (7.256 PBytes on disk, and 7.321 PBytes on tape, first copy), with a further round of allocations anticipated during the next two months. At this time the total ingest comprises 2.619 PBytes (1.197 PBytes of disk and 1.397 PBytes on tape), with the largest collections being CMIP5 (climate), and MODIS and LANDSAT (earth observation). Future contributions to newsletters will highlight such national collections, the services that are built around them, the means of access, and the contribution that they make to particular research communities.