ClimateNet Datasets

Data and Model (up-to-date)

In the latest dataset, you can find the data splits and a trained model. Please use model_new.pth and config_new.pth for any new projects. These provide fixed normalisation factors.

For more information and code to easily train your own model on this data, please refer to the github repo.

The data is split into a train and test split, where the training set consists of data points before 2011 and the test set consists of data points after 2011. Each file is named after the convention "data-YEAR-MONTH-DAY-RUN-TIMESTEP_LABELING.nc".
There are no subgroups in the NC files, and each file contains 17 (768,1152) NC variables: 16 data channels, specified by the name of the variable, and 1 label channel (named LABELS).

Climate datapoints and corresponding expert labels for all samples used for training the model (deprecated)

Warning: The following data is deprecated. There is an alignment problem between the labels and data, and the data is not compatible with the code provided in the github repo. Only use this if you absoultely have to.

NetCDF4 files

The files have the format: "data-YEAR-MONTH-DAY-RUN-TIMESTEP.nc".

Each .nc file is organized as follows:

The root group contains two subgroups: 'data' and 'labels'.
'data' contains 16 (768,1152) NC variables, each corresponding to one of the 16 input channels, specified by the name of the variable.
Each such variable also contains 'description' and 'units' attributes.
'labels' contains all labelings for the corresponding data point. The number of total expert labels can be retrieved by the num_labels attribute.
Every labeling is stored as 2 different NC variables, namely the ar mask and the tc mask, following the convention used to store the expert labels in .h5 format.

To give an example:
The file data-2002-11-17-01-1.nc contains the All-Hist data with the filename data-2002-11-17-01-1.h5, and all corresponding expert labels. In this case, there are 9 such labels as evident from the value of the num_labels attribute of the labels subgroup. There are 18 NC variables that represent the 9 labelings, and their names are:
label_0_ar
label_0_tc
...
...
label_8_ar
label_8_tc

Trained model and pytorch loading script (deprecated)

Model files