Nyx Gpc Hydrodynamical Volume

Overview

This work makes available a first survey-scale Lyman-α hydrodynamical snapshot produced with the Nyx code and our ML methodology. The snapshot spans 960 Mpc/h on a uniform 6144³ grid at redshift z=3. Main hydrodynamic fields include baryon density, temperature, and three components of velocity, along with Lyman-α optical-depth products and simulation metadata. The release is complemented by a halo catalog from a compainon high-resolution N-body simulation done with the HACC code.

Visualization of the entire volume. — Visualization of the baryon density for the entire 960 Mpc/h wide volume that we have improved using our machine learning model. The dashed box highlights a region 80 Mpc/h wide, which is the size of our training data and comparable to previous Lyman-alpha forest simulations used for DESI and other surveys (see, for example: Walther et al. (2025)).

Publications

Main paper describing the gigaparsec volume hosted at this portal is Jacobus et al. (2025).

Publication describing the machine learning methodology, as well as the training data is Jacobus et al. (2023).

Data Products

The Nyx simulation output is currently a single large HDF5 file representing all the relevant hydrodynamical fields as 3D arrays over the entire (960 Mpc/h)³ volume. These fields are discretized into a uniform grid of 6144³ cells. The entire file requires 11.2TB of computer memory. We are working on providing a functionality to extract subvolumes from this simulation.

Hydrodynamic outputs are stored in HDF5 with the following group layout:

/native_fields/: dm_density, baryon_density, temperature, velocity_x, velocity_y, velocity_z.
/derived_fields/: tau_real, tau_red (Lyman-α optical depth in real and redshift space).
/universe, /domain: cosmology, units, grid specifications, and snapshot redshift.

The Baryon Density, Temperature, and Lyman-alpha Optical Depth fields have been reconstructed by our machine learning algorithm. The Dark Matter Density and velocity fields remain as originally returned by the low-resolution simulation.

A companion N-body simulation with HACC code uses the same initial conditions to evolve 6144³ dark-matter particles from z = 200 to z = 3. Halos are identified using a friends-of-friends (FOF) algorithm with a linking length of b=0.168, and spherical overdensity with Δ = 200 ρ_c. The halo catalog is provided in the GenericIO format.

For reproducibility, we also provide the training and test data from our methods paper (Jacobus et al. 2023). These consists of pairs of HDF5 files representing the same 80 Mpc/h volumes at two resolutions 4096³ and 512³, where one pair was used for training, and the other for test/validation purposes.

Data Download

Gpc Hydro Volume Halo Catalog Training 4096³ Volume Training 512³ Volume Test 4096³ Volume Test 512³ Volume

Important note on file sizes: 512³ volumes are 6.1GB, 4096³ volumes are 3.1TB, and 6144³ volume is 11.2TB! We are figuring a way to provide a functionality to extract subvolumes.

Machine Learning Approach

The model is a conditional generative adversarial network with a fully convolutional, multi-scale encoder–generator built from residual blocks. It conditions on coarse hydrodynamic fields (baryon density, temperature, and three components of velocity) and outputs the corresponding enhanced fields and the Lyman-α optical depths. To represent unresolved stochastic structure, the generator injects learned Gaussian noise at several internal scales. Inputs are log-normalized before inference, and outputs use field-specific activations (e.g., tanh for hydrodynamic variables, sigmoid for flux proxies) prior to restoring physical units. Adversarial training employs multi-scale, patch-based discriminators to encourage realism across both local textures and larger-scale coherence.

Cartoon Architecture of the Neural Network Structure — Flowchart showing a rough sketch of the architecture of our custom machine learning model, which improves the simulation and spectral accuracy of features in low-resolution hydrodynamic simulations, and a sketch of the training and implementation procedure. See Jacobus et al. (2023) for details.

The model is able to reliably reconstruct realizations of the hydrodynamical fields that have the same large-scale morphology as the input low-resolution hydrodynamical simulations but that have greatly improved small-scale features. On small scales, the reconstructed hydrodynamic features much more nearly match the high-resolution simulations, despite being of much lower resolution. This is possible due to the model's generative/stochastic properties, which distill the injected small-scale noise into realistic features that can complement the large-scale morphology of the input maps.

Result Comparison — Comparison of the Neural Network output with the low-resolution input field and the corresponding high-resolution field. High-res simulation is of 8x higher resolution than both the input and ML-enhanced output fields.

Credits

If you find this data useful for your research, please cite Jacobus et al. (2025), and/or Jacobus et al. (2023). Also, please include this data portal URL in any data availability statements.

This project was supported by Berkeley Lab's LDRD program (PI: Zarija Lukić). Computational resources were provided by the Oakridge Leadership Computing Facility (OLCF) and National Energy Research Scientific Computing Center (NERSC). Long-term file storage is provided by NERSC.