Cosmology Data Repository

The Cosmology Data Repository curates publicly available cosmology datasets and co-locates them at NERSC alongside the computing resources necessary to analyze them. This enables work such as multimodal AI model training, which requires simultanous access to petabyte-scale datasets and supercomputer-level resources.

Flagship Datasets

DESI Imaging Legacy Surveys

The DESI Imaging Legacy Surveys jointly fit imaging data from multiple ground- and space-based telescopes to generate a catalog of 2.8 billion objects measured in 4 optical bands (g,r,i,z) and 4 infrared bands spanning over 20k square degrees.

DESI Spectroscopy

The Dark Energy Spectroscopic Instrument (DESI) Data Release 1 includes spectra and redshifts of of 18 million galaxies, quasars, and stars.

Multimodal Universe

The Multimodal Universe (MMU) gathered 100 TB of data from 20 major astronomical surveys into a common structure to simplify training AI foundation models.

AION-1

The AstronomIcal Omnimodal Network (AION-1) is an AI foundation model built upon the Multimodal Universe dataset.

Additional Datasets

Other datasets in the Cosmology Data Repository include 2mass, CFHT-Luau, Gaia, galex, WISE, SCUSS, SDSS, tycho2, UCAC, WISE, and ZTF lightcurves.

Data Access

The primary purpose of the Cosmology Data Repository is to provide large scale direct access to the files at NERSC located at /dvs_ro/cfs/cdirs/cosmo/data/ . For browsing the directory structure and downloading individual files, these are also available at https://portal.nersc.gov/project/cosmo/data/ . For bulk downloads, users are encouraged to get the data from the original sources when possible, though they are also available for download via the NERSC DTN Globus endpoint. In the future, we will coordinate with the American Science Cloud (AmSC) to provide these datasets within the AmSC infrastructure as part of the Genesis Mission.

Partners