The Cosmology Data Repository curates publicly available cosmology datasets and co-locates them at NERSC alongside the computing resources necessary to analyze them. This enables work such as multimodal AI model training, which requires simultanous access to petabyte-scale datasets and supercomputer-level resources.
The DESI Imaging Legacy Surveys jointly fit imaging data from multiple ground- and space-based telescopes to generate a catalog of 2.8 billion objects measured in 4 optical bands (g,r,i,z) and 4 infrared bands spanning over 20k square degrees.
The Dark Energy Spectroscopic Instrument (DESI) Data Release 1 includes spectra and redshifts of of 18 million galaxies, quasars, and stars.
The Multimodal Universe (MMU) gathered 100 TB of data from 20 major astronomical surveys into a common structure to simplify training AI foundation models.
The AstronomIcal Omnimodal Network (AION-1) is an AI foundation model built upon the Multimodal Universe dataset.
Other datasets in the Cosmology Data Repository include 2mass, CFHT-Luau, Gaia, galex, WISE, SCUSS, SDSS, tycho2, UCAC, WISE, and ZTF lightcurves.
The primary purpose of the Cosmology Data Repository is to provide
large scale direct access to the files at NERSC located at
/dvs_ro/cfs/cdirs/cosmo/data/ .
For browsing the directory structure and downloading individual
files, these are also available at
https://portal.nersc.gov/project/cosmo/data/ .
For bulk downloads, users are encouraged to get the data from
the original sources when possible, though they are also
available for download via the
NERSC DTN Globus endpoint.
In the future, we will coordinate with the
American Science Cloud
(AmSC) to provide these datasets within the AmSC infrastructure
as part of the Genesis Mission.