Best Practices for Experimental Science NERSC Users¶

NERSC has an increasing number of users from the DOE's experimental and observational facilities. This user community has specific needs that we have heard through requirements gathering and our long experience working with this community, including:

HPC is required to analyze data from experimental facilities - but changes are needed to both application workflows and HPC environments
Scientists require support for analysis software and tools, many of which differ significantly from traditional simulation software.
New approaches are needed for analyzing large datasets including advanced statistics and machine learning.
As science increasingly becomes a community effort, the need to share, transfer, search and access data becomes even more important.
New strategies for resilient workflows are required
Experimental facilities will require new modes of interacting with the systems including notebooks and faster queue turn-around.
Changes in policies are as important to address as technical challenges

In addition to the standard best practices, for this community we have additional recommendations. This page presents best practices for experimental scientists using NERSC, ranging from data management, running jobs and edge services. It is not a replacement for the full NERSC documentation, rather it is a curated subset of our documentation that focuses on the needs of this user community.

How do I get my data into NERSC?¶

Globus¶

Globus is the recommended tool for moving data in & out of NERSC. It has a reliable & easy-to-use web-based service via http://www.globus.org/ or http://globus.nersc.gov/. It is accessible to all NERSC users, with options for web based interaction with service, REST/API for scripted interactions (in Bash, Python and other languages) or the Globus Connect Server & Personal for setting up additional remote endpoints such as your personal laptop. The Globus website contains extensive documentation. NERSC maintains managed endpoints for optimized data transfers.

Data Transfer Nodes (DTNs)¶

The Data Transfer Nodes (DTNs) are dedicated servers for moving data at NERSC. These servers include high-bandwidth network interfaces and are tuned for efficient data transfers. The DTNs offer direct access to global NERSC file systems & Cori cscratch1. They can be used (and should be used) to move data internally between NERSC systems &/or NERSC HPSS These specialised servers do not contain the full environment: DO NOT USE for non-transfer purposes.

See the DTN documentation for more information.

Where should my data go?¶

Overview of NERSC file systems here.

Community File System (CFS)¶

This is recommended as the first landing pad for your data. CFS is allocated per project, and the quota is set by your DOE program manager. Your quota depends on project allocation and is shared with other members of your project. To access CFS via Globus use the collection NERSC DTN and path /global/cfs/cdirs/<project_name>. Data on CFS never gets deleted and has 7 days of backup via snapshots. PI can partition storage allocation into custom folders via Iris.

See the CFS documentation for more information.

Cori Scratch (cscratch)¶

This is recommended for data to be used in active computing. Cori Scratch space is allocated per user, with a 20TB quota by default. To access Cori Scratch in globus use the collection NERSC DTN and path /global/cscratch1/sd/<user_id> Data on Cori Scratch is purged after 12 weeks. See the dot files .purged for the list of purged files.

See the Cori Scratch documentation for more information.

Tape Archive (HPSS)¶

This is recommended for data that doesn't need to be touched for months or years. Archive space is allocated per project and your individual share of that allocation can be set by your PI. To access HPSS via Globus use the collection NERSC HPSS and the path /home/<u>/<user_id> for your personal archive or /home/projects/<project_name> for your project’s archive. Package your data in units of 100-500GB to avoid files being spread over many tapes. If you're retrieving many files from HPSS, please use the Globus NERSC command line tools for a more efficient data transfer. Archive access comes with a serious latency and limited transfer speeds.

See the Tape Archive documentation for more information.

With other project members¶

We recommend using /global/cfs/cdirs/<project>, since other project members have read permissions by default.

With other NERSC users¶

You can use give and take (usage) or simply modify directory access permissions, e.g. chmod o+r /path/to/sharing_dir.

With external users¶

Best practice to share data with non-NERSC users is via Globus Sharing. To enable Globus Sharing:

Tell us that you would like to have Globus Sharing enabled for your project.
Place files in subdirectory of agreed upon sharing directory ("gsharing").
In globus, use endpoint NERSC SHARE and path /global/cfs/cdirs/<project_name>/gsharing/<share_subdir>
Use NERSC SHARE to create a globus share for the subdirectory.
Shares are read-only, but any Globus user can be added to a share.
Delete the share if access is no longer needed, this will not delete the data.
See the Globus Sharing documentation for more information.

Tips and Tricks for data management at NERSC¶

Use Globus Online for large, automated or monitored transfers. Remember that every aspect of globus can be scripted using their CLI or (Python-)API.
scp is fine for smaller, one-time transfers (<100MB), but note that Globus is also fine for small transfers.
Plain cp can be used for transfers within file systems, but you can use Globus for convenience.
Staging data from HPSS for a compute job? Try not to use a login shell (you get kicked out) Split your transfers in multiple jobs if you run out of time in your batch job.
Use the transfer queue if you do lots of data movement
Don't use your $HOME directory for anything other than very small data (few MB). Instead use /global/cfs/cdirs/<project> or $SCRATCH.

What if my data transfers fail or are too slow?¶

Performance is often limited by the remote (non-NERSC) endpoint. If it is not tuned for WAN transfers or has a limited network link the remote endpoint can lower performance < 100 MB/sec. Use ESnet DTNs to test the link to NERSC and to your facility. This can be done in Globus as well. The DTN’s contains are read-only and contain datasets of varying sizes. Initiate transfers from these sites to NERSC and to your endpoint. Globus logs the average transfers speed. All transfers are listed in the “Activity” tab of globus online. You can also consult the ESnet perfsonar dashboard to diagnose network issues.

File system contention may also be an issue. If the network connection appears healthy, try the transfer at a different time or on a different file system.

Which QOS should I use for quick access?¶

Detailed information about available QOSs.

Interactive ¶

This is the recommended QOS for interactive processing. You will be granted an instant allocation (in 5 min or reject). The maximum number of nodes available is limited per project. Please see the QOS policy page for the limits on the number of running and submitted jobs, and max job time. max walltime for jobs is 4 hours. This QOS tends to be less busy during off-hours (PT).

To submit to the "interactive" QOS, use:

salloc --nodes=2 --qos=interactive --constraint=knl,quad,cache --time=2:00:00

As the name suggests, batch submission to the "interactive" QOS is disabled.

See the interactive QOS documentation for more information.

Debug ¶

You can also request an interactive job on the "regular" and "debug" QOSs, but "debug" will typically return an interactive session more quickly. Depending on job size, it can take a while for your interactive session to be granted. The limits in "debug" QOS are per user. Please see the QOS policy page for the limits on the number of running and submitted jobs, and max job time.

To submit to the "debug" QOS, use:

salloc --nodes=20 --qos=debug --constraint=haswell --time=00:30:00

Realtime¶

Access to the "realtime" QOS is only available via special request. It is intended for groups that rely on immediate computing turnaround to operate an experiment, not simply for impatient users. It gives immediate access to resources on Cori Haswell (currently not available for Cori KNL). If you have access to the "realtime" QOS, submit jobs using:

salloc --qos=realtime --account=<nersc project>

or

sbatch --qos=realtime --account=<nersc project>

Tip

If you have other realtime needs then please contact NERSC support, we want to work with you!

Most resources at NERSC are shared. This includes bandwidth on the network, bandwidth accessing (I/O) for global file systems, like scratch, project, hpss, etc. and human support! It also includes the login nodes. When on a Cori login node (by ssh cori.nersc.gov for example) please be mindful that this resource is shared with other users. Use the login node primarily to edit files, compile codes, submit batch jobs, access nodes etc, and to run short, serial utilities and applications.

Note that SSH connections are not all as reliable and might get interrupted. Consider NoMachine (NX) for a longer session with graphics and Jupyter for Python scripts.

See the login node documentation for more information.

What if my job doesn't need a full node?¶

Does your job require only a few cores/threads? If so you can use the shared QOS. You can submit jobs as fine as a single core (+HT).

What if I have a large number of small, independent tasks?¶

Do you need to run a large number of small jobs? Avoid using job arrays as only a small number of jobs at a time in the array will be considered by Slurm for scheduling. Avoid using srun repeated in large for-loops as Slurm will struggle to execute them all.

Instead, you can pack the jobs with a workflow tools e.g. Taskfarmer for non-MPI jobs. NERSC is currently evaluating workflow tools - a full discussion can be found on the Workflows page.

How can I talk to my running job?¶

This can be achieved via Software Defined Networking (SDN). Compute nodes in Cori do not have IP addresses by default and cannot be reached from the outside world. NERSC has deployed a software translation layer on Cori bridge nodes to direct IP traffic to the head node of a job.

Usage example for an interactive session:

$ salloc --constraint=haswell --qos=interactive --sdn
salloc: Granted job allocation 29234281
user@nid00025:~> echo $SDN_IP_ADDR
128.55.224.202

You can reach the head node now from the outside under the ip address 128.55.224.202 or job29234281.cori.services.nersc.gov

Danger

This address is directly exposed to the internet, make sure to run secure services.