The DESC ELAsTiCC Challenge

About ELAsTiCC

The purpose of ELAsTiCC ("Extended LSST Astronomical Time-series Classification Challenge") is to spur the creation and testing of an end-to-end real-time pipeline for time-domain science. The challenge starts with a simulation of ~5 million detected events that includes ~50 million alerts. These alerts will be streamed from LSST to brokers, who will classify the events and send new alerts with classifications back to DESC. A talk about ELAsTiCC given at the LSSTC Enabling Science Broker Workshop in 2021 can be found on YouTube. Two posters on ELAsTiCC given at conferences can be found below on this page.

For discussion or questions about the challenge, use the #elasticc-comms channel on the DESC Slack.

The first ELAsTiCC campaign ran from September 2022 until early January 2023. Metrics and diagnostics from that campaign can be found on the ELAsTiCC page of the DESC TOM (login required).

The Second ELAsTiCC campaign (dubbed ELAsTiCC2) ran from mid-November to mid-December 2023, streaing alerts at ~3× the rate of the first campaign. Diagnostics and some metrics from that campaign can be found on the ELAsTiCC2 page of the DESC TOM (login required).

There is a new github repository for ELAsTiCC-related code and information: LSSTDESC/elasticc.

ELAsTiCC Poster at the Dec 2023 Cosmic Streams Converence

(Click image for PDF.)

ELAsTiCC Poster at the Jan 2023 AAS

(Click image for PDF.)

Participants

For questions, message #elasticc-comms on the DESC Slack.

ELAsTiCC Lead: Gautham Naryan (UIUC)

ELAsTiCC team members: Alex Gagliano (UIUC), Alex Malz (Ruhr-Universitat Bochum), Catarina Alves (University College, London), Deep Chatterjee (UIUC), Emille Ishida (Université Cleremont-Ferrand), Heather Kelly (SLAC), John Franklin Crenshaw (U. Washington), Konstantin Malanchev (UIUC), Laura Salo (UMN), Maria Vincenzi (ICG Portsmouth), Martine Lokken (U. Toronto), Qifeng Cheng (UIUC), Rahul Biswas (Oskar Klein Center), Renée Holžek (U. Toronto), Rick Kessler (U. Chicago), Robert Knop (LBNL), Ved Shah Gautam (UIUC)

Brokers:

Timeline

ELAsTiCC2

2023 May 20: pre-streaming for plubming tests with updated alert schema starts.
2023 June 6: ELAsTiCC2 training set released.
2023 July 6: Second pre-streaming for plumbing starts.
2023 July 10-18: Updated ELAsTiCC2 training set released.
2023 Oct 17: Final ELAsTiCC2 training set released.
2023 Nov 11-Dec 16: ELAsTiCC2 streaming

Original ELAsTiCC Campaign

2022 May 18: Training set released.
2022 June 15 2022 June 17: A small practice/validated alerts sent out for testing infrastructure and validation.
2022 June 25: Training set updated.
Wed June 29 - Fri July 8: Second 10% alert stream for connection testing.
Wed July 13 - Fri July 22: Third 10% alert stream for connection testing.
Wed July 27 - Fri Aug 5: Fourth 10% alert stream for connection testing.
Wed Aug 10 - Fri Aug 19: Fifth 10% alert stream for connection testing.
Aug/Sep: Additional 10% test alert streams as necessary.
2022 September 21 28 : ELAsTiCC campaign alert streaming begins.
2022 October 6-16:: Streaming interrupted as a result of system problems at NERSC.
Sat 2023 Jan 14: ELAsTiCC streaming complete.
Mon 2023 Jan 23: End of listening to broker kafka streams, broker posting; truth tables released.

The DESC TOM

The DESC TOM is a web server based on Django and the TOM Toolkit that sent out the simulated ELAsTiCC alerts and collected all of the classifications from the brokers. Some of the data access below (direct web access and API access in documentation and example jupyter scripts) require an account on the TOM. If you do not already have one, contact Rob Knop on the LSST slack at #elasticc-comms.

Some relevant pages on the DESC TOM:

The ELAsTiCC Challenge — includes links to metrics and diagnostics from the first ELAsTiCC campaign that ran from Sep 2022-Jan 2023.
ELAsTiCC2 — includes links to metrics, and some diagnstoics, from the ELAsTiCC2 campaign that ran from Nov-Dec 2023.

To get access to the database behind the DESC TOM (via web APIs, and directly via SQL), see Accessing classification results and metrics below.

ELAsTiCC & ELAsTiCC2 Data Sets

The ELAsTiCC and ELAsTiCC2 data sets each include SNANA simulated photometry of ~4 million transient and variable objects. Some objects (AGN, especially variable stars) are underrepresented, as the focus of ELAsTiCC was photometric identification of different types of transients. The ELAsTiCC2 data set includes ~50 million detections ("sources") and ~400 million photometry points ("forced sources"— some of which are redundant with sources). The simulation was phtometry-level simulation, not a pixel-level simulation, so there is no pixel data, and there is no simulated uncertainty on RA and Dec of detected objects. Host galaxies were simulated, and each object includes zero to two possible hosts.

In most cases, you will want to use the ELAsTiCC2 data set. It uses a more current simulated LSST cadence (baseline 3.2, including a rolling cadence in years 2-3, and including DDF fields), and some models were updated between ELAsTiCC and ELAsTiCC2.

ELAsTiCC and ELAsTiCC2 data are both stored in the database behind the DESC tom; see Accessing classificaiton results and metrics below for information about this.

ELAsTiCC

The original ELAsTiCC data set is available in alert format, which is not the most convenient format for most uses. If you require the SNANA FITS files, we may be able to dig them up, but for most cases just use the ELAsTiCC2 data set which is already in that format.

ELAsTiCC alerts can be found at NERSC:
/global/cfs/cdirs/desc-td/ALERTS/ELASTICC_ALERTS_FINAL

ELAsTiCC2

The ELAsTiCC2 data set is available as SNANA FITS files, including HEAD and PHOT files. These may be read as standard FITS tables, but of course to really use them you need to know something about the format that SNANA writes.

ELAsTiCC2 data can be found at NERSC:
/global/cfs/cdirs/desc-td/ELASTICC2

Training Sets

Before each ELAsTiCC campaign, brokers were sent a "training set" of lightcurves to use for training their models. These training sets were not identical in composition to the actual data set; this was intentional, because when the real LSST survey starts, brokers will not have been able to have been trained on data that's identical to the data they will be receivng. (If anything, the training sets in ELAsTiCC were too close to the actual data sets in comparison to anything we'll have before the start of LSST and the first year of LSST data.) The cadence was at least slightly different, and some models were updated between the production of the training set and the actual data sets. The ELAsTiCC2 training set is closer to the final ELAsTiCC2 dataset than was the case for ELAsTiCC.

For your purposes, you may wish to ignore these training sets, and just yourself manually divide the actual ELAsTiCC2 data sets (above) into training and validation sets.

ELAsTiCC

The training samples can be found at https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES/
For a single 7.3GiB file with all of the training files, download FULL_ELASTICC_TRAIN.tar
If you have an account at NERSC, you can find these files in /global/cfs/cdirs/lsst/www/DESC_TD_PUBLIC/ELASTICC/TRAINING_SAMPLES
(The previous training sample, prior to the June 25 update, can be found in the obsolete_2022-05-18 subdirectory.)

Format of training set files

The format of the training set files is outlined in the file A_FORMAT.TXT (found in the same directory as the training set). A log of the models produced by the SNANA simulation is in the file A_MODEL_SUMMARY.TXT.

This Jupyter notebook has a demo of using the ELAsTiCC photo-z quantiles.

ELAsTiCC2

The ELAsTiCC2 training sample may be found at https://portal.nersc.gov/cfs/lsst/DESC_TD_PUBLIC/ELASTICC/ELASTICC2_TRAINING_SAMPLE_2 (which is also accessible directly on nersc in directory /global/cfs/cdirs/lsst/www/DESC_TD_PUBLIC/ELASTICC/ELASTICC2_TRAINING_SAMPLE_2. Meta information can be found in the A_FORMAT.TXT and A_MODEL_SUMMARY.TXT files. The training set is available in a few different formats:

In the ELASTICC2_TRAIN_* subdirectories are SNANA FITS files, divided by model. This format will be the most convenient format to use for most purposes. Truth tables for each model can be found in the *.DUMP file in the model's subdirectory.
The 7.4GiB file ELASTICC2_TRAIN_02.tar.bz2 has the contents of all those directories packed into a single tar file, so you can download it all at once if you need to. (If you're on NERSC, there's no need to do this; just point to the directory mentioned above.)
In the AVRO subdirectory you can find the training set stored as AVRO alerts. For most purposes, this is not a convenient format; it was provided so that brokers might send the training set through the same machinery that they would use for actual alerts during hte ELAsTiCC2 campaign.

Truth Tables

Truth tables are available in the database behind the DESC TOM; for more information, see Accessing classification results and metrics. Additionally, they may be found in the files described below.

ELAsTiCC

The following CSV files hold the ELAsTiCC truth tables:

The "OBJECT" truth tables have truth for each object; the column SNID corresponds to the field diaObjectId from the alerts. The "ALERT" truth tables have information for each source (there was one alert for each source); the column SourceID corresponds to the field diaSourceId from the alerts. The object type is in the GENTYPE (for object alerts) or TRUE_GENTYPE (for source alerts). These do not correspond directly to the taxonomy brokers used to classify objects, but are internal types corresponding to SNANA models. The definitions of these types may be found in the file elasticc_origmap.txt in the alert_schema subdirectory of the elasticc GitHub archive.

Broker classifications used the ELAsTiCC Taxonomy (which was different from the ELAsTiCC2 taxonomy!). The following CSV files hold the mapping between SNANA gentype and taxonomy id (they are dumps of tables from the DESC TOM database):

gentypeofclassid.csv : Use this to find the gentypes that corresponds to a given classId (i.e. taxonomy classification). gentype is blank for one- and two-digit classIds (general categories). Each gentype appears only once, but classIds appear multiple times (as there are multiple different gentypes, i.e. SNANA models, for some classes, such as SNIb/c and SII). This is also the table you want to use to get the exact three-digit taxonomy match for a given gentype.
classidofgentype.csv : Used this to find all the classIds that correspond to a given gentype. Gentypes appear multiple times in this table because of the hierarchical nature of the table. For instance, gentype 10 (SNIa) appears with all of classId 1 (Non-Recurring), 11 (SN-like), and 111 (SNIa).

There were some types of objects that were in the ELAsTiCC set that were deliberately not in the training set. These have SNANA GENTYPE 71-74 and 98. 71-74 represent strongly lensed SN Ia/II/Ib/Ic, and 98 is...special. Here is a note by Rick Kessler and Justin Pierel describing the strongly lensed SNe.

ELAsTiCC2

In addition to being stored in the DESC TOM database, ELAsTiCC2 truth tables may be found in the *.DUMP files alongside the SNANA FITS files with the ELAsTiCC2 Data Set.

ELAsTiCC2 Classification Taxonomy

ELAsTiCC2 used a hierarchical classification taxonomy with broad classes and specific classes. In practice, most brokers classified only to specific classes, but some brokers used some of the broad classes, and they were there if somebody needed them. The design of the taxonomy also allows us to group subclasses directly into a broad class by only looking at the first digits of a classification.

The classification taxonomy may be found found on the Jupyter notebook that generated the ids, but is also listed here for convenience:

 Alert
├── 0 Meta
│   ├── 100 Meta/Other
│   ├── 200 Residual
│   └── 300 NotClassified
├── 1000 Static
│   └── 1100 Static/Other
└── 2000 Variable
    ├── 2100 Variable/Other
    ├── 2200 Non-Recurring
    │   ├── 2210 Non-Recurring/Other
    │   ├── 2220 SN-like
    │   │   ├── 2221 SN-like/Other
    │   │   ├── 2222 Ia
    │   │   ├── 2223 Ib/c
    │   │   ├── 2224 II
    │   │   ├── 2225 Iax
    │   │   └── 2226 91bg
    │   ├── 2230 Fast
    │   │   ├── 2231 Fast/Other
    │   │   ├── 2232 KN
    │   │   ├── 2233 M-dwarf Flare
    │   │   ├── 2234 Dwarf Novae
    │   │   └── 2235 uLens
    │   └── 2240 Long
    │       ├── 2241 Long/Other
    │       ├── 2242 SLSN
    │       ├── 2243 TDE
    │       ├── 2244 ILOT
    │       ├── 2245 CART
    │       └── 2246 PISN
    └── 2300 Recurring
        ├── 2310 Recurring/Other
        ├── 2320 Periodic
        │   ├── 2321 Periodic/Other
        │   ├── 2322 Cepheid
        │   ├── 2323 RR Lyrae
        │   ├── 2324 Delta Scuti
        │   ├── 2325 EB
        │   └── 2326 LPV/Mira
        └── 2330 Non-Periodic
            ├── 2331 Non-Periodic/Other
            └── 2332 AGN

To connect broker classifications to truth tables, you need the mapping from this taxonomy class ID to the SNANA "gentype". These mappings may be found in the DESC TOM database, but are also provided below:

elasticc2_gentypeofclassid.csv is the one to use if you want to get all specific classids that correspond to a given gentype. (Sometimes there are multiple classids for the same gentype, because what's considered a single class in the taxonomy may have multiple SNANA models behind it.) In this table, where not blank, the gentype is unique. (Where gentype is blank, classid is a category rather than a specific type; every SNANA object has a specific type, because it came from some specific model.) You can also use this table the other way: for a given specific classid, which gentype does it correspond to? (If you want to handle the broader classifications, you need the next table.)
elasticc2_classidofgentype.csv is the one you would use to find all values of gentype that are consistent with a given classid. So, for example, every supernova gentype is on a row with classid 2220. The various boolean "match" colums are left as an exercise for the alert reader. (Hint: compare them to when the classid has a 0 digit in various places.)

ELAsTiCC2 Alert Schema

These are the alert schema used in ELAsTiCC2.

The alert schema can be found in the alert_schema subdirectory of the LSSTDESC/elasticc github repository: https://github.com/LSSTDESC/elasticc/tree/main/alert_schema.

Brokers ingested alerts in the elasticc.v0_9_1.alert.avsc format. (A perusal of the schema will reveal that some of the other schema in that directory are embedded in this.) They issued their classification alerts, which DESC then ingested, in the elasticc.v0_9_1.brokerClassification.avsc schema. The mapping of event type to classId can be found in a Jupyter notebook in the taxonomy subdirectory of the github archive.

All alerts were published without embedded schema on Kafka servers (both to and from brokers). As such, for things to work, everybody hadX to be using the same version of the alerts. (The alert format and schema remained consistent throughout the running of the ELAsTiCC2 campaign.)

Forced Photometry in Alerts

The first detection of a transient will not have any forced photometry. The model is that the project will need time to produce that forced photometry.

All detections at least one night later than the first detection will have forced photometry going back to 30 days before the first detection.

For example, suppose object 42 is detected on MJD 60305, 60306, 60310, and 60340:

The alert for the detection on MJD 60305 will only have the source information for that detection.
The alert for the detection on MJD 60306 will have the source information for the detections on 60305 and 60306. It will also have forced photometry for any images taken between MJD 60275 and 60306.
...
The alert for the detection on MJD 60340 will have the source information for the detections on 60304, 60306, 60310, and 60340. It will also have forced photometry for any images taken between MJD 60275 and 60340.

Accessing classification results and metrics

You will need an account on the DESC TOM to do this. (See above.)

You can find some metrics by going directly to a couple of TOM pages in your browser:

Using the DESC TOM elasticc2 REST API

Because some of these tables are quite large, queries joining them together can be slow. For ELAsTiCC2, broker classifications have been aggregated in ways that are useful for some metrics. For documentation on the aggregation and how to get access to it, see this Jupyter Notebook in the DESC ELAsTiCC metrics github archive.

Directly via SQL

Technically, this isn't possible, because the PostgreSQL database behind the TOM can't be addressed directly. However, there are a couple of APIs on the TOM that allow you to send SQL that will be run on the PostgreSQL server, and pull back the results. For documentation and examples, see the Jupyter notebook sql_query_tom_db.ipynb in the github DESC TOM repository.

The DESC ELAsTiCC Challenge

About ELAsTiCC

ELAsTiCC Poster at the Dec 2023 Cosmic Streams Converence

ELAsTiCC Poster at the Jan 2023 AAS

Participants

Timeline

ELAsTiCC2

Original ELAsTiCC Campaign

Technical info and data for DESC members

The DESC TOM

ELAsTiCC & ELAsTiCC2 Data Sets

ELAsTiCC

ELAsTiCC2

Training Sets

ELAsTiCC

Format of training set files

ELAsTiCC2

Truth Tables

ELAsTiCC

ELAsTiCC2

ELAsTiCC2 Classification Taxonomy

ELAsTiCC2 Alert Schema

Forced Photometry in Alerts

Accessing classification results and metrics

Using the DESC TOM elasticc2 REST API

Directly via SQL