ClimateNet Datasets

Data and Model (up-to-date)

In the latest dataset, you can find the data splits and a trained model. Please use model_new.pth and config_new.pth for any new projects. These provide fixed normalisation factors.

For more information and code to easily train your own model on this data, please refer to the github repo.

The data is split into a train and test split, where the training set consists of data points before 2011 and the test set consists of data points after 2011. Each file is named after the convention "data-YEAR-MONTH-DAY-RUN-TIMESTEP_LABELING.nc".
There are no subgroups in the NC files, and each file contains 17 (768,1152) NC variables: 16 data channels, specified by the name of the variable, and 1 label channel (named LABELS).


Climate datapoints and corresponding expert labels for all samples used for training the model (deprecated)

Warning: The following data is deprecated. There is an alignment problem between the labels and data, and the data is not compatible with the code provided in the github repo. Only use this if you absoultely have to.

NetCDF4 files

The files have the format: "data-YEAR-MONTH-DAY-RUN-TIMESTEP.nc".

Each .nc file is organized as follows:

To give an example:
The file data-2002-11-17-01-1.nc contains the All-Hist data with the filename data-2002-11-17-01-1.h5, and all corresponding expert labels. In this case, there are 9 such labels as evident from the value of the num_labels attribute of the labels subgroup. There are 18 NC variables that represent the 9 labelings, and their names are:
label_0_ar
label_0_tc
...
...
label_8_ar
label_8_tc

Trained model and pytorch loading script (deprecated)

Model files