CORELib (Cosmic Ray Event Library) is a collection of simulated events of cosmic-ray showers. The production is currently based on the CORSIKA software featuring a common set of physical parameters in order to achieve a general purpose high-statistics production. Cosmic rays are a source of background to many astroparticle and astronomy experiments, but at the same time they provide a useful benchmarking tool to assess detector performances.
Especially in the high-energy region of the spectrum, simulation is very demanding in terms of CPU time spent per each event. A reference set of simulations such as CORELib can be used “as-is” in several cases, hence saving computing resources (wall time, CPU time and energy) and providing a common playground for reconstruction algorithms and detector performance comparison; when custom running parameters are needed, it can serve as a benchmark to save debugging and fine tuning.
CORELib consists of a “pilot production” (approximately 0.6 TB) in which only proton-induced showers are simulated; energy spectrum has a power law with spectral index equal to -2. The “full scale production”, instead, consists of several kinds of primary cosmic rays: protons and Heavy Nuclei (He, C, N, O, Fe) induced showers are simulated (see Table 1). For proton-induced showers, two productions are available: with and without Cherenkov radiation (see Table 2). In addition, to increase the statistics a flat energy spectrum (only for proton-induced showers) is evaluated (see Table 2).
Table 1 (on the left): Summary of the heavy nuclei induced showers produced. Actually, only the production without contribution of Cherenkov radiation is completed.
Table 2 (on the right): Summary of the proton induced showers produced. The production with α = -2 is completed while the flat spectrum production is ongoing.
In order to simplify the access to the library, the information about secondary cosmic rays are extracted from CORSIKA output files and put in separated ASCII files (EM, Hadrons+Tau, Muons, Neutrinos).
Both productions are stored at CNAF (see below).
The CORELib production is now using more than 1000 CPU cores, allocated to the KM3NeT Collaboration, but its technical specifications exceed the needs of KM3NeT, in the spirit of serving the community of astroparticle experiments and potentially also other fields of applications. The angular range extends from 0° (vertical) to 89°. Energy bins are populated as shown in Table 3:
|Energy range (GeV)||Number of events|
Table 3: Number of simulated events for each energy bin.
Several high-energy interaction models are available (QGSJET01, QGSJETII-04, EPOS LHC) in combination with GHEISHA for the low-energy interaction model, and with TAULEP/CHARM options.
The first productions (about 0.6 TB) are stored and available via SFTP in a local server hosted at the University of Salerno:
email@example.com (password: Asterics2020)
The whole production completed so far (April 2019, about 30 TB) is stored at CNAF, the Information Technology National Centre of INFN (Italian Institute for Nuclear Physics). It can be downloaded through gridFTP with .X509 certificate using the endpoint:
prior to the administrator authorization (firstname.lastname@example.org).
Beside the CNAF repository, the KM3NeT Virtual Organization members can access to the whole production via GRID using the storage endpoints:
CORELib is a library of simulated cosmic ray events. The physics of cosmic ray-induced particle showers is a source of background for many experiments seeking rare phenomena and high-energy particles from astrophysical and cosmological origin. Conversely, secondary cosmic rays such as high-energy muons provide the tool for investigation in application fields such as muography (i.e. using muons to image the interior of volcanic edifices and faults, buildings that are part of the cultural and historical heritage, nuclear waste depots and reactors, etc.). Simulations of primary cosmic ray interactions in the atmosphere and secondary fluxes are expensive in terms of setup, computation time, electrical power and data storage. CORELib can be used as a ready-made source of data as well as a reference benchmark to other simulations.
A first production was made with spectral index -2. Simulation of Cherenkov radiation for proton-induced showers was completed in this period. In parallel, data production was completed including heavy nuclei.
The spectral index of -2 is relatively close to the real dependency of the flux on energy, but leaves small statistics at very high energy. In view of the usage of CORELib products to develop algorithms and train machine learning models to recognise rare high energy events, another data production with a flat logarithmic spectrum for proton primaries was set up.
CORELib extends to from 0° to the inclination of 89°, which makes it suitable for different applications from underwater neutrino telescopes to air showers and muography of large objects above the sea level.
The efforts required included designing the simulation jobs for the CORSIKA 7.5000 framework, managing a large production over thousands of computation nodes on the GRID, checking data quality and writing code to cast the output in a shape that enhances and simplifies data access and usage, in particular splitting the particles produced in different categories.
In the next future, simulation of Cherenkov radiation is planned also for heavy nuclei-induced events. Contacts are ongoing to involve a larger community from several ESFRIs in joint efforts. Not only this version of CORELib will stay as a reference production, but it is also promoting cross-fertilization and cooperation among researchers with different interests and cultural background.
We wanted to provide a reliable set of simulated cosmic ray events for users that cannot or don’t want to afford the time and cost for a large scale simulation. We did it by identifying a suitable version of CORSIKA as generator and paying a large but sustainable effort in computing resources.
Unlike usual simulations, The library of events produced features a particularly hard spectrum, extending to very high energies. This allows high statistics for very rare events, which are the most interesting for some classes of problems (e.g. diffuse ultra-high energy neutrino flux).
We found it was useful to simulate a flat log-E spectrum, whereas the first production had spectral index -2, resulting in fewer high energy events.
Also the lower end of the energy spectrum needs more statistics. It would be useful to simulate several kinds of atmosphere and different heights of observation levels.