Convertion
==========

This section shows the mapping between Fits and HDF5 formats, and following with the design of two H5Boss Formats.

Mapping Diagram
===============

Mapping Table
=============

The conversion code took all files in a PLATE folder and convert into a single HDF5 file. Note that the BOSS data on NERSC project file system is organized as:

**Fits Folder Top Level**::

  user@edison12: pwd
  /global/projecta/projectdirs/sdss/data/sdss/dr12/boss/spectro/redux/v5_7_0
  
  user@edison12: ls
  3520  4032  4458  4848  5338  5897  6375  6758
  3523  4033  4459  4849  5339  5898  6376  6759
  3536  4034  4460  4850  5340  5899  6377  6760
  ... Then inside one folder, which is named after the **PLATE** number, e.g., plate 4857:

**Fits Folder Second Level**::

  4857/
  photoMatchPlate-4857-55711.fits    spFluxcalib-r2-00131416.fits.gz
  photoPlate-4857-55711.fits         spFluxcalib-r2-00131417.fits.gz
  photoPosPlate-4857-55711.fits      spFluxcalib-r2-00131418.fits.gz
  redux-4857-55711                   spFluxcalib-r2-00131419.fits.gz
  redux-4857-55711.o22925            spFluxcorr-b1-00131415.fits.gz
  ...

After the conversion, the whole folder 4857 becomes a **single** HDF5 file, in which, there are 1000 fiber groups and a few catalog datasets.

**HDF5 Folder Top Level**::

  user@edison12: pwd
  $SCRATCH/h5boss/v5_7_0
  
  user@edison12: ls
  3520.h5  4032.h5  4458.h5  4848.h5  5338.h5  5897.h5  6375.h5  6758.h5
  3523.h5  4033.h5  4459.h5  4849.h5  5339.h5  5898.h5  6376.h5  6759.h5
  3536.h5  4034.h5  4460.h5  4850.h5  5340.h5  5899.h5  6377.h5  6760.h5
  ... For the catalog, the conversion simply reads one HDU in a fits file and writes into a table dataset in HDF5. For example, as shown in the following catalog mapping table, the HDU1 in 'photoMatchPlate-pppp-mmmmm.fits' becomes a HDF5 dataset within the 'photo' group, where the higher level groups are 'plate' and 'mjd'. Catalog: =============================== ======== =============== ============ Fits File Fits HDU HDF5 Group HDF5 Dataset =============================== ======== =============== ============ photoMatchPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo match photoPosPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo matchpos photoPlate-pppp-mmmmm.fits HDU 1 plate/mjd/photo matchflux v5_7_0/spZbest-pppp-mmmmm.fits HDU 1 plate/mjd zbest v5_7_0/spZline-pppp-mmmmm.fits HDU 1 plate/mjd zline spPlate-pppp-mmmmm.fits HDU 5 plate/mjd plugmap =============================== ======== =============== ============ In each of the 1000 fiber groups, the fiber number is used as the group name in HDF5, e.g., 1, which is under 'plate/mjd'. In a fiber group, there is a 'coadd' dataset, which is a 4000*8 2D array,(the number 4000 varies in different plates). The number 8 refers to the total number of wavelengths that're converted, i.e., **Flux, Ivar, and_mask, or_mask, wavedisp, wave, sky and model**. These wavelengths are from different HDUs of the 'spPlate-pppp-mmmmm.fits' file. For example, the 'Flux' is from the Primary HDU. In fits, this primary HDU is a 1000 by 4000 2D table, in HDF5 file, this 2D table is split into 1000 fiber groups, where each fiber group only has the wavelength of one fiber. Similar convertion was conducted on the 'Exposures', which is from 'spCFrame-[br][12]-[0-9]{8}.fits' file. Special attention needs to be paid on the column 'wave' in coadd, and columns 'wave' and 'clib' in the b/r dataset, as noted below the table. Spectra: == ================================ ================ ===================================== ================================== Id Fits File Fits HDU HDF5 Group HDF5 Dataset(ColumnID) ColumnName == ================================ ================ ===================================== ================================== 1 spPlate-pppp-mmmmm.fits HDU 0 plate/mjd/[1-1000] coadd(col2) FLUX 2 spPlate-pppp-mmmmm.fits HDU 1 IVAR plate/mjd/[1-1000] coadd(col3) IVAR 3 spPlate-pppp-mmmmm.fits HDU 2 ANDMASK plate/mjd/[1-1000] coadd(col4) AND_MASK 4 spPlate-pppp-mmmmm.fits HDU 3 ORMASK plate/mjd/[1-1000] coadd(col5) OR_MASK 5 spPlate-pppp-mmmmm.fits HDU 4 WAVEDISP plate/mjd/[1-1000] coadd(col6) WAVEDISP 6 spPlate-pppp-mmmmm.fits HDU 5 PLUGMAP plate/mjd/[1-1000] coadd(col1)* WAVE 7 spPlate-pppp-mmmmm.fits HDU 6 SKY plate/mjd/[1-1000] coadd(col7) SKY 8 plate/mjd/[1-1000] coadd(col8) MODEL 9 spCFrame-[br][12]-[0-9]{8}.fits HDU 0 plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col1) FLUX 10 spCFrame-[br][12]-[0-9]{8}.fits HDU 1 IVAR plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col2) IVAR 11 spCFrame-[br][12]-[0-9]{8}.fits HDU 2 MASK plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col3) MASK 12 spCFrame-[br][12]-[0-9]{8}.fits HDU 3 WAVELENGTH plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col4)* WAVE 13 spCFrame-[br][12]-[0-9]{8}.fits HDU 4 WAVEDISP plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col5) WAVEDISP 14 spCFrame-[br][12]-[0-9]{8}.fits HDU 6 SKY plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col6) SKY 15 spCFrame-[br][12]-[0-9]{8}.fits HDU 7 X plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col7) X 16 spCFrame-[br][12]-[0-9]{8}.fits HDU 8 SUPERFLAT plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col8)* CLIB 17 spFlat-[br][12]-[0-9]{8}.fits.gz HDU 0 FIBERFLAT plate/mjd/[1-1000]/exposures/[0-9]{8} b/r(col8)* CLIB == ================================ ================ ===================================== ================================== Notes: The convertion simply copy the data in fits file and reorginze in HDF5 file, a few exceptions where the data need additional calculation can be better understood by reading the code. Here are just brief descriptions: .. highlight:: c line 8, This `wave` column is obtained with the following python code:: header = fits.open(platefile)[0].header c0 = header['COEFF0'] c1 = header['COEFF1'] nwave = header['NAXIS1'] wave = (10**(c0 + c1*np.arange(nwave))) line 12, wavelength is log based, so the conversion code calculates the reverse, i.e., 'WAVE'=10^wavelength line 16 and 17, the 'CLIB' is calculated with the following python code:: electrons = eflux * fiberflat * superflat calib = flux / electrons .. highlight:: c .. _examplecode: Example Codes ============= The sample codes for reading same data from Fits versus from the converted HDF5 file: **Read plate: 4857, mjd: 55711, fiber: 4, FLUX** Read Flux from Fits:: dfits = fitsio.FITS('spPlate-4857-55711.fits') dflux = dfits[0][3:4,:] Read Flux from HDF5:: dh5 = h5py.File('4857-55711.h5') dflux = dh5['4857/55711/4/coadd']['FLUX'] Read Multiple HDUs from Fits:: dfits = fitsio.FITS('spPlate-4857-55711.fits') dflux = dfits[0][3:4,:] dwave = dfits[1][3:4,:] Read Multiple HDUs from HDF5:: dh5 = h5py.File('4857-55711.h5') dflux_wave = dh5['4857/55711/4/coadd'][('FLUX','IVAR')] Read All HDUs from Fits:: for i in range(0,6): dall[i] = dfits[i][3:4,:] Read All HDUs from HDF5:: dall = dh5['4857/55711/4/coadd'][()]