Testing zppy

Follow the steps below to test zppy. As you do so, please produce a Markdown report summarizing your results.

Step 1: Determine what the current expected results are

Machine-specific setup

Chrysalis:

expected_results_dir=/lcrc/group/e3sm/public_html/zppy_test_resources
expected_results_records_dir=/lcrc/group/e3sm/public_html/zppy_test_resources_previous

Compy:

expected_results_dir=/compyfs/www/zppy_test_resources
expected_results_records_dir=/compyfs/fors729/zppy_test_resources_previous

Note that Compy doesn’t give write access to /compyfs/www/, so we can’t add a new directory there. That’s why zppy_test_resources_previous is in a separate path.

Perlmutter:

expected_results_dir=/global/cfs/cdirs/e3sm/www/zppy_test_resources
expected_results_records_dir=/global/cfs/cdirs/e3sm/www/zppy_test_resources_previous

Process

ls -lt ${expected_results_dir}

In your Markdown report, note the date the expected results were last updated.

Step 2: Review changes since expected results were updated

Now that we know the date the expected results are from, we can review what changes we’ll be testing.

Review each of the following commit logs and note commits made since the date the expected results were updated:

For the e3sm_to_cmip task: e3sm_to_cmip
For the e3sm_diags task: e3sm_diags
For the mpas_analysis task: MPAS-Analysis
For the global_time_series and pcmdi_diags tasks: zppy-interfaces
For zppy itself: zppy

For the remaining tasks (climo, ts, tc_analysis, ilamb, livvkit), we typically just use the associated package’s latest release rather than making dev environments. As such, their latest development will have no impact on our tests unless we have started using one of their newer releases.

In your Markdown report, make a table like:

| Package | Changes since expected results were updated |
| --- | --- |
| [package name](link to package's commit log) | Links to all PRs merged since the expected results were updated |
...

Step 3: Set up environments for called packages

Machine-specific setup

Chrysalis:

repo_parent_dir=~/ez/ # Or wherever you keep your repos

start_bash_subshell()
{
    bash
    source ~/.bashrc # Or wherever you have your aliases, etc. defined
}

activate_dev_env()
{
    env_name=$1

    lcrc_conda # Or however you activate conda
    rm -rf build
    conda clean --all --y
    conda env create -f conda/dev.yml -n ${env_name}
    conda activate ${env_name}
    pre-commit run --all-files # Confirm this passes
    python -m pip install .
}

activate_unified_env()
{
    source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
}

Compy:

repo_parent_dir=~/ez/ # Or wherever you keep your repos

start_bash_subshell()
{
    bash
    source ~/.bash_profile # Or wherever you have your aliases, etc. defined
}

activate_dev_env()
{
    env_name=$1

    compy_conda # Or however you activate conda
    rm -rf build
    conda clean --all --y
    conda env create -f conda/dev.yml -n ${env_name}
    conda activate ${env_name}
    pre-commit run --all-files # Confirm this passes
    python -m pip install .
}

activate_unified_env()
{
    source /share/apps/E3SM/conda_envs/load_latest_e3sm_unified_compy.sh
}

Perlmutter:

repo_parent_dir=~/ez/ # Or wherever you keep your repos

start_bash_subshell()
{
    bash
    source ~/.bash_profile.ext # Or wherever you have your aliases, etc. defined
}

activate_dev_env()
{
    env_name=$1

    nersc_conda # Or however you activate conda
    rm -rf build
    conda clean --all --y
    conda env create -f conda/dev.yml -n ${env_name}
    conda activate ${env_name}
    pre-commit run --all-files # Confirm this passes
    python -m pip install .
}

activate_unified_env()
{
    source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh
}

Process

set_up_repo()
{
    repo_name=$1
    main_branch_name=$2

    cd ${repo_parent_dir}/${repo_name}
    git status # Check for uncommitted changes

    # If there are uncommitted changes,
    # commit them so we can move cleanly to a new branch:
    git add -A
    git commit -m "Checkpoint"

    git fetch upstream ${main_branch_name} # This assumes you've named your remote for the main repo as "upstream"
    git checkout ${main_branch_name}
    git reset --hard upstream/${main_branch_name}
    git log --oneline | head -n 1
    # Check that this matches the corresponding commit log:
    # https://github.com/E3SM-Project/e3sm_to_cmip/commits/master
    # https://github.com/E3SM-Project/e3sm_diags/commits/main
    # https://github.com/MPAS-Dev/MPAS-Analysis/commits/develop/
    # https://github.com/E3SM-Project/zppy-interfaces/commits/main

    # Activate EITHER a dev environment or the Unified env:
    # Dev environment -- test latest development
    # Unified environment -- test latest Unified environment
    activate_dev_env ${repo_name}-yyyymmdd # Use today's date
    # OR: activate_unified_env ${repo_name}-yyyymmdd # Use today's date
}

start_bash_subshell
set_up_repo e3sm_to_cmip master
exit # Exit bash subshell

start_bash_subshell
set_up_repo e3sm_diags main
exit # Exit bash subshell

start_bash_subshell
set_up_repo MPAS-Analysis develop
exit # Exit bash subshell

start_bash_subshell
set_up_repo zppy-interfaces main
# Since zppy-interfaces is so integrated into `zppy`,
# run its unit tests as well:
pytest tests/unit/global_time_series/test_*.py
pytest tests/unit/pcmdi_diags/test_*.py
exit # Exit bash subshell

Step 4: Set up zppy environment

cd ${repo_parent_dir}/zppy
git status # Check for uncommitted changes

# If there are uncommitted changes,
# commit them so we can move cleanly to a new branch:
git add -A
git commit -m "Checkpoint"

git fetch upstream main # This assumes you've named your remote for the main repo as "upstream"
git checkout -b test-zppy-yyyymmdd upstream/main # Use today's date
git log --oneline | head -n 1
# Check that this matches the corresponding commit log:
# https://github.com/E3SM-Project/zppy/commits/main

start_bash_subshell
# Activate EITHER a dev environment or the Unified env:
# Dev environment -- test latest development
# Unified environment -- test latest Unified environment
activate_dev_env zppy-yyyymmdd # Use today's date
# OR: activate_unified_env zppy-yyyymmdd # Use today's date

# Note the Python version being used
# If you activated dev env:
conda list python
# If you activated unified env:
pixi list python

# Run zppy unit tests
pytest tests/test_*.py # 44 passed in 0.87s

Step 5: Launch zppy jobs

Machine-specific setup

Chrysalis:

conda_setup_cmd=source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh
output_dir=/lcrc/group/e3sm/ac.forsyth2

Compy:

conda_setup_cmd=source /qfs/people/fors729/miniforge3/etc/profile.d/conda.sh
output_dir=/compyfs/fors729

Perlmutter:

conda_setup_cmd=source /global/homes/f/forsyth/miniforge3/etc/profile.d/conda.sh
output_dir=/global/cfs/cdirs/e3sm/forsyth

Process

First, let’s edit tests/integration/utils.py.

In place of ech {env_cmd} put either ${conda_setup_cmd}; conda activate ${repo_name}-yyyymmdd or the command from activate_unified_env. You may want to use dev environments for some tasks and the Unified environment for others.

You can comment out cfg``s from ``cfgs_to_run to run fewer configuration files. Likewise, you can comment out tasks from tasks_torun to run fewer tasks.

Be sure to set the unique_id; this allows us to avoid path name collisions.

TEST_SPECIFICS: Dict[str, Any] = {
    # This is the NCO path.
    # Keep as "" to use the production-version NCO commands.
    # Set to a specific path to use development-version NCO commands.
    "nco_path": "",
    # These are custom environment_commands for specific tasks.
    # Never set these to "", because they will print the line
    # `environment_commands = ""` for the corresponding task,
    # thus overriding the value set higher up in the cfg.
    # That is, there will be no environment set.
    # (`environment_commands = ""` only redirects to Unified
    # if specified under the [default] task)
    "e3sm_to_cmip_environment_commands": "{env_cmd}",
    "diags_environment_commands": "{env_cmd}",
    "mpas_analysis_environment_commands": "{env_cmd}",
    "global_time_series_environment_commands": "{env_cmd}",
    "livvkit_environment_commands": "{env_cmd}",
    "pcmdi_diags_environment_commands": "{env_cmd}",
    # This is the environment setup for other tasks.
    # Leave as "" to use the latest Unified environment.
    "environment_commands": "{env_cmd}",
    # For a complete test, run the set of latest cfgs and at least one set of legacy cfgs
    "cfgs_to_run": [
        "weekly_bundles", # Typically, we run on Chrysalis, Compy
        "weekly_comprehensive_v2", # Typically, we run on Chrysalis, Compy
        "weekly_comprehensive_v3", # Typically, we run on all 3 machines
        "weekly_legacy_3.1.0_bundles", # Typically, we run on Chrysalis
        "weekly_legacy_3.1.0_comprehensive_v2", # Typically, we run on Chrysalis
        "weekly_legacy_3.1.0_comprehensive_v3", # Typically, we run on Chrysalis
        "weekly_legacy_3.0.0_bundles", # Typically, we run on Chrysalis
        "weekly_legacy_3.0.0_comprehensive_v2", # Typically, we run on Chrysalis
        "weekly_legacy_3.0.0_comprehensive_v3", # Typically, we run on Chrysalis
    ],
    "tasks_to_run": [
        "e3sm_diags",
        "mpas_analysis",
        "global_time_series",
        "ilamb",
        "livvkit",
        "pcmdi_diags",
    ],
    "unique_id": "test_zppy_yyyymmdd", # Use today's date
}

git diff # Check that the diff looks as you expect
python tests/integration/utils.py
# This will generate the actual test cfgs based off the templates.

# Set up an alias for checking jobs:
alias sqa='squeue -o "%8u %.7a %.4D %.9P %7i %.2t %.10r %.10M %.10l %.8Q %j" --sort=P,-t,-p'
alias sq='sqa -u $USER'

sq
# Check that you have no jobs currently queued.
# It's ok if you do, but it makes counting remaining zppy jobs easier if you don't have any existing jobs.

# Typically run on Chrysalis, Compy, Perlmutter:
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg

# Typically run on Chrysalis, Compy:
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg

# Typically run on Chrysalis:
zppy -c tests/integration/generated/test_weekly_legacy_3.1.0_bundles_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.1.0_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.1.0_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_bundles_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_comprehensive_v3_chrysalis.cfg


sq | wc -l # This includes the header row, so subtract 1 to get the number of jobs you have running
# WAIT until that returns 1 (i.e., 0 jobs running)

Step 6: Launch zppy jobs – bundles part 2

This section is only relevant only if you’re running the _bundles_ jobs. Only run the lines relevant to the jobs you launched in step 5.

# Check on bundles status
cd ${output_dir}/zppy_weekly_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.1.0_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.0.0_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

# Now, run bundles part 2
cd ${repo_parent_dir}/zppy
git status
# You might have changed branches while you were waiting for jobs to finish.
# Make sure you're now back on the correct branch: test-zppy-yyyymmdd
# Also confirm you're back in the correct env: zppy-yyyymmdd or the Unified env


zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.1.0_bundles_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_legacy_3.0.0_bundles_chrysalis.cfg
sq | wc -l
# WAIT until that returns 1 (i.e., 0 jobs running)

Step 7: Review finished returns

Only run the lines relevant to the jobs you launched in steps 5/6.

### v2  ###
cd ${output_dir}/zppy_weekly_comprehensive_v2_output/${unique_id}/v2.LR.historical_0201/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.0.0_comprehensive_v2_output/${unique_id}/v2.LR.historical_0201/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.1.0_comprehensive_v2_output/${unique_id}/v2.LR.historical_0201/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

### v3 ###
cd ${output_dir}/zppy_weekly_comprehensive_v3_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.0.0_comprehensive_v3_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.1.0_comprehensive_v3_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

### bundles ###
cd ${output_dir}/zppy_weekly_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.0.0_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

cd ${output_dir}/zppy_weekly_legacy_3.1.0_bundles_output/${unique_id}/v3.LR.historical_0051/post/scripts
grep -v "OK" *status # Confirm no non-OK statuses appear

In your Markdown report, any of the output subdirectories that had non-OK statuses.

Step 8: Run Python tests

Machine-specific setup

Chrysalis:

launch_compute_node()
{
    salloc --nodes=1 --partition=debug --time=02:00:00 --account=e3sm
}

Compy:

launch_compute_node()
{
    salloc --nodes=1 --partition=short --time=01:00:00 --account=e3sm
}

Perlmutter:

launch_compute_node()
{
    salloc --nodes=1 --qos=interactive --time=01:00:00 --constraint=cpu --account=e3sm
}

Process

cd ${repo_parent_dir}/zppy
git status
# You might have changed branches while you were waiting for jobs to finish.
# Make sure you're now back on the correct branch: test-zppy-yyyymmdd
# Also confirm you're back in the correct env: zppy-yyyymmdd or the Unified env

# This test doesn't make use of an expected results directory.
pytest tests/integration/test_last_year.py

# These tests do make use of an expected results directory.
# That is, the expected results may need to be updated if expected behavior has changed.
pytest tests/integration/test_bash_generation.py
pytest tests/integration/test_campaign.py
pytest tests/integration/test_defaults.py

# These tests make use of an expected results directory
# AND rely on the jobs we just ran:
# 1. The bundles test:
pytest tests/integration/test_bundles.py
# 2. The image checker test, which we'll run from a compute node:
launch_compute_node

start_bash_subshell
# EITHER:
# Activate EITHER a dev environment or the Unified env:
conda activate zppy-yyyymmdd
# OR: the command from `activate_unified_env`

pytest tests/integration/test_images.py
# Typically takes between 10 and 20 minutes on Chrysalis and Perlmutter.
# Typically takes closer to 50 minutes on Compy.
cat test_images_summary.md
exit # Exit bash shell
exit # Exit compute note

In your Markdown report:

From the pytest tests/integration/test_images.py `` command-line output, copy everything after ``Captured stdout call to a code block labeled “Output”
Copy the results of cat test_images_summary.md to a section labeled “Complete summary table”
Make a new section named “Summary table – only failing image-check tests, sorted by task”. For each task that has missing and/or mismatched images, copy the relevant rows from the summary table. Skip this section if there were no failing image-check tests.
Note any test failures from the other Python tests.
If there were no failures at all, print “All tests pass”