This work makes available a first survey-scale Lyman-α hydrodynamical snapshot produced with the Nyx code and our ML methodology. The snapshot spans 960 Mpc/h on a uniform 6144³ grid at redshift z=3. Main hydrodynamic fields include baryon density, temperature, and three components of velocity, along with Lyman-α optical-depth products and simulation metadata. The release is complemented by a halo catalog from a compainon high-resolution N-body simulation done with the HACC code.
Main paper describing the gigaparsec volume hosted at this portal is Jacobus et al. (2025).
Publication describing the machine learning methodology, as well as the training data is Jacobus et al. (2023).
The Nyx simulation output is currently a single large HDF5 file representing all the relevant hydrodynamical fields as 3D arrays over the entire (960 Mpc/h)³ volume. These fields are discretized into a uniform grid of 6144³ cells. The entire file requires 11.2TB of computer memory. We are working on providing a functionality to extract subvolumes from this simulation.
Hydrodynamic outputs are stored in HDF5 with the following group layout:
/native_fields/
:
dm_density
, baryon_density
, temperature
,
velocity_x
, velocity_y
, velocity_z
.
/derived_fields/
:
tau_real
, tau_red
(Lyman-α optical depth in real and redshift space).
/universe
, /domain
:
cosmology, units, grid specifications, and snapshot redshift.
The Baryon Density, Temperature, and Lyman-alpha Optical Depth fields have been reconstructed by our machine learning algorithm. The Dark Matter Density and velocity fields remain as originally returned by the low-resolution simulation.
A companion N-body simulation with HACC code uses the same initial conditions to evolve 6144³ dark-matter particles from z = 200 to z = 3. Halos are identified using a friends-of-friends (FOF) algorithm with a linking length of b=0.168, and spherical overdensity with Δ = 200 ρc. The halo catalog is provided in the GenericIO format.
For reproducibility, we also provide the training and test data from our methods paper (Jacobus et al. 2023). These consists of pairs of HDF5 files representing the same 80 Mpc/h volumes at two resolutions 4096³ and 512³, where one pair was used for training, and the other for test/validation purposes.
Important note on file sizes: 512³ volumes are 6.1GB, 4096³ volumes are 3.1TB, and 6144³ volume is 11.2TB! We are figuring a way to provide a functionality to extract subvolumes.
The model is a conditional generative adversarial network with a fully convolutional, multi-scale encoder–generator built from residual blocks.
It conditions on coarse hydrodynamic fields (baryon density, temperature, and three components of velocity) and outputs the corresponding enhanced fields and the Lyman-α optical depths.
To represent unresolved stochastic structure, the generator injects learned Gaussian noise at several internal scales. Inputs are log-normalized before inference,
and outputs use field-specific activations (e.g., tanh
for hydrodynamic variables, sigmoid
for flux proxies) prior to restoring physical units.
Adversarial training employs multi-scale, patch-based discriminators to encourage realism across both local textures and larger-scale coherence.
The model is able to reliably reconstruct realizations of the hydrodynamical fields that have the same large-scale morphology as the input low-resolution hydrodynamical simulations but that have greatly improved small-scale features. On small scales, the reconstructed hydrodynamic features much more nearly match the high-resolution simulations, despite being of much lower resolution. This is possible due to the model's generative/stochastic properties, which distill the injected small-scale noise into realistic features that can complement the large-scale morphology of the input maps.
If you find this data useful for your research, please cite Jacobus et al. (2025), and/or Jacobus et al. (2023). Also, please include this data portal URL in any data availability statements.
This project was supported by Berkeley Lab's LDRD program (PI: Zarija Lukić). Computational resources were provided by the Oakridge Leadership Computing Facility (OLCF) and National Energy Research Scientific Computing Center (NERSC). Long-term file storage is provided by NERSC.