Convertion

This section shows the mapping between Fits and HDF5 formats, and following with the design of two H5Boss Formats,

  • The Mapping Diagram helps to visualize the mapping between Fits and HDF5.
  • The Mapping Table contains more detailed information of the relationship between the Boss-Fits structure and the Boss-HDF5 hierarchy.
  • The Example Codes helps to understand ‘what is the difference in reading a’Flux’ dataset in the two formats’, etc.

Mapping Diagram

_images/fits2fmt1.png

Mapping Table

The conversion code, took all files in a PLATE folder and convert into a single HDF5 file. Note that the BOSS data on NERSC project file system is organized as:

Fits Folder Top Level:

user@edison12: pwd
/global/projecta/projectdirs/sdss/data/sdss/dr12/boss/spectro/redux/v5_7_0
user@edison12: ls
3520  4032  4458  4848  5338  5897  6375  6758
3523  4033  4459  4849  5339  5898  6376  6759
3536  4034  4460  4850  5340  5899  6377  6760
3537  4035  4461  4851  5341  5900  6378  6780
3538  4036  4462  4852  5342  5901  6379  6781
3540  4037  4463  4853  5343  5902  6380  6782
3548  4038  4464  4854  5344  5903  6381  6783
...
...

Then inside one folder, which is named afte the PLATE number, e.g., plate 4857:

Fits Folder Second Level:

4857/
       photoMatchPlate-4857-55711.fits  spFluxcalib-r2-00131416.fits.gz
       photoPlate-4857-55711.fits       spFluxcalib-r2-00131417.fits.gz
       photoPosPlate-4857-55711.fits    spFluxcalib-r2-00131418.fits.gz
       redux-4857-55711                 spFluxcalib-r2-00131419.fits.gz
       redux-4857-55711.o22925          spFluxcorr-b1-00131415.fits.gz
       ...                              ...
       spCFrame-b1-00131415.fits        spFluxcorr-b2-00131415.fits.gz
       spCFrame-b1-00131416.fits        spFluxcorr-b2-00131416.fits.gz
       spCFrame-r2-00131415.fits        spFluxdistort-4857-55711.fits
       spCFrame-r2-00131416.fits        spFluxdistort-4857-55711.ps
       spCFrame-r2-00131417.fits        spFrame-b1-00131415.fits.gz
       spCFrame-r2-00131418.fits        spFrame-b1-00131416.fits.gz
       spFluxcalib-b2-00131417.fits.gz  spFrame-r2-00131418.fits.gz
       spFluxcalib-b2-00131418.fits.gz  spFrame-r2-00131419.fits.gz
       spFluxcalib-r1-00131416.fits.gz  spPlate-4857-55711.fits
       spFluxcalib-r1-00131417.fits.gz  spSN2d-4857-55711.ps
       spFluxcalib-r1-00131418.fits.gz  spSN2d-4857-55711.ps.orig
       spFluxcalib-r1-00131419.fits.gz  spectro_redux_v5_7_0_4857.sha1sum
       spFluxcalib-r2-00131415.fits.gz  v5_7_0

After the conversion, the whole folder 4857 becomes a single HDF5 file, in which, there are 1000 fiber groups and a few catalog datasets.

HDF5 Folder Top Level:

user@edison12: pwd
$SCRATCH/h5boss/v5_7_0
user@edison12: ls
3520.h5  4032.h5  4458.h5  4848.h5  5338.h5  5897.h5  6375.h5  6758.h5
3523.h5  4033.h5  4459.h5  4849.h5  5339.h5  5898.h5  6376.h5  6759.h5
3536.h5  4034.h5  4460.h5  4850.h5  5340.h5  5899.h5  6377.h5  6760.h5
3537.h5  4035.h5  4461.h5  4851.h5  5341.h5  5900.h5  6378.h5  6780.h5
3538.h5  4036.h5  4462.h5  4852.h5  5342.h5  5901.h5  6379.h5  6781.h5
3540.h5  4037.h5  4463.h5  4853.h5  5343.h5  5902.h5  6380.h5  6782.h5
3548.h5  4038.h5  4464.h5  4854.h5  5344.h5  5903.h5  6381.h5  6783.h5
...
...

For the catalog, the conversion simply reads one HDU in a fits file and writes into a table dataset in HDF5. For example, as shown in the following catalog mapping table, the HDU1 in ‘photoMatchPlate-pppp-mmmmm.fits’ becomes a HDF5 dataset within the ‘photo’ group, where the higher level groups are ‘plate’ and ‘mjd’.

Catalog:

Fits File Fits HDU HDF5 Group HDF5 Dataset
photoMatchPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo match
photoPosPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo matchpos
photoPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo matchflux
v5_7_0/spZbest-pppp-mmmmm.fits HDU 1 plate/mjd zbest
v5_7_0/spZline-pppp-mmmmm.fits HDU 1 plate/mjd zline
spPlate-pppp-mmmmm.fits HDU 5 plate/mjd plugmap

In each of the 1000 fiber groups, the fiber number is used as the group name in HDF5, e.g., 1, which is under ‘plate/mjd’. In a fiber group, there is a ‘coadd’ dataset, which is a 4000*8 2D array,(the number 4000 varies in different plates). The number 8 refers to the total number of wavelengths that’re converted, i.e., Flux, Ivar, and_mask, or_mask, wavedisp, wave, sky and model. These wavelengths are from different HDUs of the ‘spPlate-pppp-mmmmm.fits’ file. For example, the ‘Flux’ is from the Primary HDU. In fits, this primary HDU is a 1000 by 4000 2D table, in HDF5 file, this 2D table is split into 1000 fiber groups, where each fiber group only has the wavelength of one fiber. Similar convertion was conducted on the ‘Exposures’, which is from ‘spCFrame-[br][12]-[0-9]{8}.fits’ file. Special attention needs to be paid on the column ‘wave’ in coadd, and columns ‘wave’ and ‘clib’ in the b/r dataset, as noted below the table.

Spectra:

Id Fits File Fits HDU HDF5 Group HDF5 Dataset(ColumnID) ColumnName
1 spPlate-pppp-mmmmm.fits HDU 0 plate/mjd/[1-1000] coadd(col2) FLUX
2 spPlate-pppp-mmmmm.fits HDU 1 IVAR plate/mjd/[1-1000] coadd(col3) IVAR
3 spPlate-pppp-mmmmm.fits HDU 2 ANDMASK plate/mjd/[1-1000] coadd(col4) AND_MASK
4 spPlate-pppp-mmmmm.fits HDU 3 ORMASK plate/mjd/[1-1000] coadd(col5) OR_MASK
5 spPlate-pppp-mmmmm.fits HDU 4 WAVEDISP plate/mjd/[1-1000] coadd(col6) WAVEDISP
6 spPlate-pppp-mmmmm.fits HDU 5 PLUGMAP plate/mjd/[1-1000] coadd(col1)* WAVE
7 spPlate-pppp-mmmmm.fits HDU 6 SKY plate/mjd/[1-1000] coadd(col7) SKY
8     plate/mjd/[1-1000] coadd(col8) MODEL
9 spCFrame-[br][12]-[0-9]{8}.fits HDU 0 plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col1) FLUX
10 spCFrame-[br][12]-[0-9]{8}.fits HDU 1 IVAR plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col2) IVAR
11 spCFrame-[br][12]-[0-9]{8}.fits HDU 2 MASK plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col3) MASK
12 spCFrame-[br][12]-[0-9]{8}.fits HDU 3 WAVELENGTH plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col4)* WAVE
13 spCFrame-[br][12]-[0-9]{8}.fits HDU 4 WAVEDISP plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col5) WAVEDISP
14 spCFrame-[br][12]-[0-9]{8}.fits HDU 6 SKY plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col6) SKY
15 spCFrame-[br][12]-[0-9]{8}.fits HDU 7 X plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col7) X
16 spCFrame-[br][12]-[0-9]{8}.fits HDU 8 SUPERFLAT plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col8)* CLIB
17 spFlat-[br][12]-[0-9]{8}.fits.gz HDU 0 FIBERFLAT plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col8)* CLIB

Notes: The convertion simply copy the data in fits file and reorginze in HDF5 file, a few exceptions where the data need additional calculation can be better understood by reading the code. Here are just brief descriptions:

line 8, This wave column is obtained with the following python code:

header = fits.open(platefile)[0].header
c0 = header['COEFF0']
c1 = header['COEFF1']
nwave = header['NAXIS1']
wave     = (10**(c0 + c1*np.arange(nwave)))

line 12, wavelength is log based, so the conversion code calculates the reverse, i.e., ‘WAVE’=10^wavelength

line 16 and 17, the ‘CLIB’ is calculated with the following python code:

electrons = eflux * fiberflat * superflat
calib = flux / electrons

Example Codes

The sample codes for reading same data from Fits versus from the converted HDF5 file:

Read plate: 4857, mjd: 55711, fiber: 4, FLUX

Read Flux from Fits:

dfits = fitsio.FITS('spPlate-4857-55711.fits')
dflux = dfits[0][3:4,:]

Read Flux from HDF5:

dh5   = h5py.File('4857-55711.h5')
dflux = dh5['4857/55711/4/coadd']['FLUX']

Read Multiple HDUs from Fits:

dfits = fitsio.FITS('spPlate-4857-55711.fits')
dflux = dfits[0][3:4,:]
dwave = dfits[1][3:4,:]

Read Multiple HDUs from HDF5:

dh5   = h5py.File('4857-55711.h5')
dflux_wave = dh5['4857/55711/4/coadd'][('FLUX','IVAR')]

Read All HDUs from Fits:

for i in range(0,6):
 dall[i] = dfits[i][3:4,:]

Read All HDUs from HDF5:

dall = dh5['4857/55711/4/coadd'][()]