Qserv: petascale distributed database for LSST

The Large Synoptic Survey Telescope (LSST) [1], currently under construction in Chile, is designed to conduct a ten-year survey of the dynamic universe. This large-aperture, wide-field, ground-based telescope will map the entire southern sky in just a few nights in six optical bands from 320 to 1050 nm with its 3.2-gigapixel camera. LSST will take about 2000 exposures per observing night, for a total raw data volume of about 20 TB per 24 hours period. Detected and measured objects will be stored in a database catalog that, in its final year, is estimated to include 37 billion stars and galaxies in dozens of trillion detection forming a multiple petabytes data-set with a final database size of 15 PB for the last data release.

Service specifications

To satisfy the need to efficiently store, query, and analyze these catalogs, that will ultimately contain trillions of rows and petabytes of data, the LSST database team (~10 software engineers), based at SLAC National Accelerator Laboratory [2] and with contributions from CNRS/IN2P3 [3], are building a prototype system for user query access, called Query Service, or Qserv [4, 5], an open source distributed shared-nothing SQL database system. The Qserv system relies on several production-quality components, including MariaDB and XRootD [6]. The key requirements driving the LSST database architecture include incremental scaling, near real-time response time for ad-hoc simple user queries, fast turnaround for full-sky scans/correlations, reliability, and low cost, all at the multi-petabyte scale.

Development and test status

While currently under active development, Qserv is also tested on various data sets. Thanks to a partnership with Dell, a first Qserv test bench (50 nodes - 400 cores - 800 GB memory - 500 TB disk storage) has been deployed with CC-IN2P3 [7] for large scale tests on a 100 TB synthetic catalog. A second test bench has been deployed as a component of a prototype LSST Data Access Center at NCSA [8] for feature tests on real data from the Wide-field Infrared Survey Explorer (WISE). The CC-IN2P3 test bench will also be used by the members of the Dark Energy Science Collaboration (DESC) [9] in order to use and test the Qserv database on real science cases. The database will be populated with catalogs made from real images (CFHT, Subaru, etc.) processed by the LSST DM software stack [10], as well as catalogs made from simulated images produced by the DESC collaboration in the context of their second data challenge [11].  The combination of the tests performed by both the Qserv developers and the DESC members should end up with an estimate of the technical (query efficiency, scalability, etc) and scientific (user requirements vs. Qserv features) performances that Qserv is currently achieving.

Contact person & website

 

References

[1] LSST Science Book, Version 2.0, Paul A. Abell et al., arXiv:0912.0201 (https://www.lsst.org)

[2] https://www6.slac.stanford.edu/

[3] http://www.in2p3.fr/

[4] “Qserv: A distributed shared-nothing database for the LSST catalog” - Daniel L. Wang et al., 2011 International Conference for High Performance Computing, Networking Storage and Analysis (SC), http://ieeexplore.ieee.org/document/6114487

[5] https://github.com/lsst/qserv

[6] https://mariadb.org/ and http://xrootd.org/

[7] https://cc.in2p3.fr/

[8] http://www.ncsa.illinois.edu/

[9] LSST: Dark Energy Science Collaboration, arXiv:1211.0310 (http://lsst-desc.org/)

[10] “The LSST Data Management System” - Mario Jurić et al. arXiv:1512.07914

[11] "LSST DESC 2018a", in preparation

Author: 
CNRS-LAPP