Fireworks¶

FireWorks is a free, open-source code for defining, managing, and executing complex scientific workflows.

Strengths of FireWorks¶

Well-suited to NERSC-- we have many FireWorks users
Easy to install via conda
Server/worker model is inherently load-balancing
Extremely flexible and able to handle complex job structures
Friendly dashboard to display workflow status
Can handle jobs that contain MPI
Interfaces well with SLURM
Can run jobs on CPU or GPU

Disadvantages of FireWorks¶

Requires a database which must be set up by NERSC staff (contact help.nersc.gov)
FireWorks requires some time to learn

Overview of FireWorks¶

FireWorks uses a centralized server model where the server manages the workflows and workers run the jobs. It can handle complex job dependencies, restart jobs that have died, and track the workflow status on a friendly dashboard. It can be used via bash scripting or via a Python API. We will cover using FireWorks via bash shell here. You can find more information in the official FireWorks documentation.

overview_of_fireworks_dashboard

To use FireWorks you must first request a mongoDB database. Please use the database creation form here to request your mongoDB database.

If you need more help with FireWorks itself please open a separate ticket with NERSC consulting at help.nersc.gov.

Terminology¶

FireWorks uses a number of terms to describe the different parts of the workflow manager:

LaunchPad/FireServer: MongoDB that controls the workflow. It contains all the tasks to be run, and whether they have run successfully etc.
FireTask: computing tasks to be performed.
FireWork: list of FireTasks.
Rocket: fetches a FireWork from the LaunchPad and runs it. Could be run on a separate machine (FireWorker) or through a batch system (in this case, SLURM).

Creating your Fireworks Environment¶

We recommend that anyone wanting to use FireWorks at NERSC install it in a conda environment.

Create a new conda environment for FireWorks

module load python
conda create -n fireworks python=3.8 -y
source activate fireworks
conda install -c conda-forge fireworks

For more information about using conda environments at NERSC, check out this page.

Setting up your LaunchPad¶

Setting up the Launchpad can be done interactively using the command lpad init, similar to the example shown below. You will need to specify the name of the database host and your own username and password - note that you need to have admin access to the database, so make sure you use the administrators username/password with which you were provided.

On a Cori login node, navigate to the directory where you will issue your fireworks commands. In this example, we are working from $HOME/fw_test. Make sure you have initialized your FireWorks environment, for example via source activate fireworks.

How to configure your LaunchPad

lpad init

Please supply the following configuration values
(press Enter if you want to accept the defaults)

Enter host (default: localhost) : mongodb03
Enter port (default: 27017) :
Enter name (default: fireworks) : my_db
Enter username (default: None) : my_db_admin
Enter password (default: None) : my_pswd

This information will go into a file named "my_launchpad.yaml". This file should be located in the directory where you are issuing FireWorks commands.

Basic example¶

Below is an example of how to use FireWorks at NERSC. This is based heavily on the Fireworks tutorial which you will find here.

spec:
  _tasks:
  - _fw_name: ScriptTask
    script: echo "howdy, your job launched successfully!" >> howdy.txt

Here all we do is write a file called howdy.txt, but in your real workload you can substitute in the name of your script.

Singleshot example

Now let's add our fw_test.yaml to the launchpad and run it:

lpad add fw_test.yaml
rlaunch singleshot

The output should look like:

(fireworks) stephey@cori10:~/fw_test> rlaunch singleshot
2021-02-24 19:57:04,135 INFO Hostname/IP lookup (this will take a few seconds)
2021-02-24 19:57:04,136 INFO Launching Rocket
2021-02-24 19:57:04,223 INFO RUNNING fw_id: 1 in directory: /global/u1/s/stephey/fw_test
2021-02-24 19:57:04,232 INFO Task started: ScriptTask.
2021-02-24 19:57:04,240 INFO Task completed: ScriptTask 
2021-02-24 19:57:04,257 INFO Rocket finished

Rapidfire example

One task isn't very interesting. Let's try again with two tasks:

lpad add fw_test.yaml
lpad add fw_test.yaml

Note: adding the FireTask again will result in a second, identical job. Now let's run in rapidfire mode instead of singleshot mode:

rlaunch rapidfire

In this mode, FireWorks will launch tasks until it runs out. The output data (output files and job status) is given in launcher_* directories.

The output should look like:

(fireworks) stephey@cori07:~> rlaunch rapidfire
2021-02-22 13:18:26,274 INFO Hostname/IP lookup (this will take a few seconds)
2021-02-22 13:18:26,351 INFO Created new dir /global/u1/s/stephey/launcher_2021-02-22-21-18-26-349075
2021-02-22 13:18:26,351 INFO Launching Rocket
2021-02-22 13:18:26,370 INFO RUNNING fw_id: 2 in directory: /global/u1/s/stephey/launcher_2021-02-22-21-18-26-349075
2021-02-22 13:18:26,378 INFO Task started: ScriptTask.
2021-02-22 13:18:26,385 INFO Task completed: ScriptTask 
2021-02-22 13:18:26,402 INFO Rocket finished
2021-02-22 13:18:26,406 INFO Created new dir /global/u1/s/stephey/launcher_2021-02-22-21-18-26-406464
2021-02-22 13:18:26,406 INFO Launching Rocket
2021-02-22 13:18:26,423 INFO RUNNING fw_id: 3 in directory: /global/u1/s/stephey/launcher_2021-02-22-21-18-26-406464
2021-02-22 13:18:26,428 INFO Task started: ScriptTask.
2021-02-22 13:18:26,436 INFO Task completed: ScriptTask 
2021-02-22 13:18:26,453 INFO Rocket finished

Submitting FireWorks jobs via SLURM¶

You are probably most interested in how to use FireWorks to launch jobs at NERSC via SLURM. We will use an example to demonstrate how this works. You can read more about how to launch FireWorks jobs in a queue in the official FireWorks documentation.

You will need the following:

A queue-adapter for SLURM, my_qadapter.yaml
The script you want to run, wrapped in fw_test.yaml

You will also need to be in the directory where you have configured your LaunchPad (i.e. did lpad init). You can always move your my_launchpad.yaml file or re-configure your launchpad.

Here is an example of the SLURM queue adapter. You can copy this and save it as my_qadapter.yaml.

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -l my_launchpad.yaml rapidfire
ntasks: 1
cpus_per_task: 1
ntasks_per_node: 1
walltime: '00:02:00'
queue: debug
constraint: haswell
account: null
job_name: null
logdir: null
pre_rocket: null
post_rocket: null

You can read more about FireWorks queue adapters here.

Launching a FireWorks job via SLURM

lpad reset
lpad add fw_test.yaml
qlaunch singleshot

The result should look something like:

(fireworks) stephey@cori10:~/fw_test> qlaunch singleshot
2021-02-22 21:16:43,317 INFO moving to launch_dir /global/u1/s/stephey/fw_test
2021-02-22 21:16:43,318 INFO submitting queue script
2021-02-22 21:16:45,847 INFO Job submission was successful and job_id is 39771362

This means your my_fwork.yaml was submitted successfully to SLURM.

To submit more tasks with a 3 second pause between them, you can do the following:

lpad reset
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
qlaunch rapidfire -m 3

To submit tasks in "infinite mode" which maintains 2 jobs in the queue until the LaunchPad is empty:

lpad reset
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
lpad add fw_test.yaml
qlaunch rapidfire -m 2 --nlaunches infinite

You can find more information about launching FireWorks jobs via a queue here.

For information about how to check the status of your FireWorks tasks or workflows, please see here.

Display the FireWorks dashboard¶

The FireWorks dashboard can be viewed on your laptop/desktop if you connect to Cori with ssh port forwarding. For example, connecting using an sshproxy key:

ssh -L 5000:localhost:5000 -i ~/.ssh/nersc cori.nersc.gov

Once you're on Cori, navigate to the directory where you have initialized your launchpad and start up the FireWorks webgui:

module load python
source activate fireworks
cd $HOME/fw_test
lpad webgui

This will open a summary-- you can hit q to exit. Your console should display

(fireworks) stephey@cori06:~/fw_test> lpad webgui
 * Serving Flask app "fireworks.flask_site.app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Leave this window running. You can CTRL+C to kill your FireWorks dashboard when you're done.

Since you've forwarded port 5000 from Cori to port 5000 on your local machine, you can open a browser and navigate to

http://127.0.0.1:5000/

You should be able to see and interact with the FireWorks dashboard tracking your jobs on Cori.

screenshot_of_fireworks_dashboard

Running a FireWorks GPU job via SLURM¶

Not all NERSC users can do this example

This FireWorks GPU example is for users with corigpu access only. It may also be useful to anyone preparing to run their FireWorks code on Perlmutter.

Prepare our FireWorks conda environment for GPU work

module load cgpu
module load cuda/11.1.1
module load python
source activate fireworks
pip install cupy-cuda111

GPU CuPy program we'll run cupy_eigh.py

import numpy as np
import cupy as cp

def cupy_eigh(input_data,precision):

    x = cp.asarray(input_data, dtype=precision)
    w,v = cp.linalg.eigh(x)

    #move back to cpu
    w_cpu = cp.asnumpy(w)

    return w_cpu

asize = 1000
rng = np.random.default_rng(seed=42)
randarray = rng.random((asize, asize))

input_data = randarray
precision = 'float64'

w_cpu = cupy_eigh(input_data, precision)

print(w_cpu)

We'll adjust our FireWork fw_test_gpu.yaml to run our GPU test workload:

spec:
  _tasks:
  - _fw_name: ScriptTask
    script: srun -n 1 python cupy_eigh.py

You'll need to change your my_qadapter.yaml to allocate gpu resources.

_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -l my_launchpad.yaml rapidfire
constraint: gpu
ntasks: 1
cpus_per_task: 10
gpus_per_task: 1
account: m1759
walltime: '00:02:00'
queue: gpu_regular
job_name: null
logdir: null
pre_rocket: null
post_rocket: null

Edit the FireWorks SLURM template to include the SLURM GPU settings

The template is located at:

$CONDA_PREFIX/lib/python3.8/site-packages/fireworks/user_objects/queue_adapters/SLURM_template.txt

You'll need to add the line #SBATCH --gpus-per-task=$${gpus_per_task}:

#!/bin/bash -l

#SBATCH --nodes=$${nodes
#SBATCH --ntasks=$${ntasks}
#SBATCH --ntasks-per-node=$${ntasks_per_node}
#SBATCH --ntasks-per-core=$${ntasks_per_core}
#SBATCH --core-spec=$${core_spec}
#SBATCH --exclude=$${exclude_nodes}
#SBATCH --cpus-per-task=$${cpus_per_task}
#SBATCH --gpus-per-task=$${gpus_per_task}
#SBATCH --gres=$${gres}
#SBATCH --qos=$${qos}
#SBATCH --time=$${walltime}
#SBATCH --time-min=$${time_min}
#SBATCH --partition=$${queue}
#SBATCH --account=$${account}
#SBATCH --job-name=$${job_name}
#SBATCH --license=$${license}
#SBATCH --output=$${job_name}-%j.out
#SBATCH --error=$${job_name}-%j.error
#SBATCH --constraint=$${constraint}
#SBATCH --signal=$${signal}
#SBATCH --mem=$${mem}
#SBATCH --mem-per-cpu=$${mem_per_cpu}
#SBATCH --mail-type=$${mail_type}
#SBATCH --mail-user=$${mail_user}


$${pre_rocket}
cd $${launch_dir}
$${rocket_launch}
$${post_rocket}

# CommonAdapter (SLURM) completed writing Template

Run our gpu test program in singleshot mode

lpad rest
lpad add fw_test_gpu.yaml
qlaunch singleshot

The output should look like:

(fireworks) stephey@cori10:~/fw_test> qlaunch singleshot
2021-02-24 21:32:10,943 INFO moving to launch_dir /global/u1/s/stephey/fw_test
2021-02-24 21:32:10,953 INFO submitting queue script
2021-02-24 21:32:11,084 INFO Job submission was successful and job_id is 1527948

If you have any questions or problems using FireWorks at NERSC, please contact us at help.nersc.gov.