IRODS for the APERTIF Long-Term Archive

The APERTIF upgrade of the Westerbork Radio Synthesis Telescope will be used to perform a full survey of the radio sky in several years. The data products generated by APERTIF will be stored in ALTA, the Apertif Long Term Archive.  Only several fixed type data products will be supported, which is acceptable because of the survey nature of the instrument.

Some numbers characterizing the Apertif Long Term Archive:

  • 4 PB per year of data products
  • Data will be ingested for an estimated 5 years, leading to a total size of 20 PB
  • Number of data products order 10 to 100 million
  • Typical data rates: 10 – 20 Gbps
  • Number of users: hundreds (plus thousands of ‘anonymous’ users)

For using storage, iRODS (integrated Rule-based Data management System) has been adopted. This middleware system maintains one global namespace to retrieve files (data-objects) stored in several locations. It maintains metadata separately, and provides some form of Authentication and Authorization (A&A). The ALTA implementation relies on external components providing A&A.

Two sites, Dwingeloo and an external site, will become integrated iRODS resources. The Dwingeloo site will serve as a ‘hot’ archive to support the analysis workflows. The expected external location will provide a tape based backend for long-term storage.

A central PostgreSQL relational database stores a subset of the metadata from the ALTA dataproducts. It is being considered to store a larger fraction of this metadata in a no-SQL database to provide search capabilities for unforeseen queries.

Apart from the standard iRODS interfaces such as webdav and iRODS native clients, a custom web front-end is being developed by ASTRON, using the django framework. The query front-end of ALTA will support a VO-compliant interface.


Contacts persons and website

Tammo Jan Dijkema, ASTRON