In the latest dataset, you can find the data splits and a trained model. Please use model_new.pth and config_new.pth for any new projects. These provide fixed normalisation factors.
For more information and code to easily train your own model on this data, please refer to the github repo.
The data is split into a train and test split, where the training set consists of data points before 2011 and the test set consists of data points after 2011. Each file is named after the convention "data-YEAR-MONTH-DAY-RUN-TIMESTEP_LABELING.nc".
There are no subgroups in the NC files, and each file contains 17 (768,1152) NC variables: 16 data channels, specified by the name of the variable, and 1 label channel (named LABELS).
The files have the format: "data-YEAR-MONTH-DAY-RUN-TIMESTEP.nc".
Each .nc file is organized as follows:
To give an example:
The file data-2002-11-17-01-1.nc contains the All-Hist data with the filename data-2002-11-17-01-1.h5, and all corresponding expert labels. In this case, there are 9 such labels as evident from the value of the num_labels attribute of the labels subgroup. There are 18 NC variables that represent the 9 labelings, and their names are:
label_0_ar
label_0_tc
...
...
label_8_ar
label_8_tc