Skip to content

How to use Python on NERSC systems

There are 5 options for using and configuring your Python environment at NERSC. We provide a brief overview here and will explain each option in greater detail below.

  1. Module only
  2. Module + source activate
  3. Conda init + conda activate
  4. Install your own Python
  5. Use a Shifter container (best practice for 10+ nodes)

Small scale: Our data show that about 80 percent of our NERSC Python users are using custom conda environments (Options 2 and 3)- you might find these are a good solution for you, too.

Large scale: While Options 2 and 3 are well-suited for small scale jobs, they do not scale well. If you intend to run at large scale (10+ nodes), Shifter is the best option.

We provide more discussion about how to achieve good performance by choosing the right filesystems.

Option 1: Module only

In this mode, you just module load python and use it however you like. This is the simplest option but also the least flexible. If you require a package that is not in our default modules this option will not work for you.

Who should use Option 1?

Option 1 is best for users who want to get started quickly and who do not require special libraries or custom packages.

Option 2: Module + source activate

In this mode, you first module load python and then build and use a conda environment on top of our module. To use this method:

module load python
source activate myenv

To leave your environment

conda deactivate

and you will return to the base Python environment.

Who should use Option 2?

Option 2 is a good choice for any user who doesn't want a specific version of Python loaded automatically when they log on to Cori. It is also good for users who prefer to use the most recent Python module.

Option 3: Conda init + conda activate

In this mode, you are not actually using the Python module. Rather, you will configure your environment one time based on a Python module. This means that your configuration will not have variables like PYTHONUSERBASE set that help group pip packages in an organized fashion. Option 3 also doesn't include any safety checks that might prevent you from mixing Python environments. If these things are not an issue for you, you can configure your setup one time via:

module load python
conda init

This will add the following to your .bashrc file:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" ]; then
        . "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh"
    else
        export PATH="/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

After you have configured your environment, when you log on to Cori you should only:

conda activate myenv

To leave your environment:

conda deactivate

and you will return to the base Python environment.

What should you do if you decide you don't like Option 3? You can simply delete the lines that conda init has added to your .bashrc. file and choose another Python option.

Who should use Option 3?

Option 3 is suitable for any user who would like a particular Python environment loaded by default whenever they access Cori. However, the user must be willing to manually monitor and update their configuration. Users should also be aware that they will need to manage their pip package installation via setting PYTHONUSERBASE for example. Users who choose Option 3 should not combine their conda-init configured Python environment with our NERSC Python modules.

Option 4: Install your own Python

You don't have to use any of the Python options we described above- you are free to install your own Python via Miniconda, Anaconda, Intel Python, or a custom collaboration install to have complete control over your stack.

Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to /global/common/software independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b \
    -p /global/common/software/myproject/env
[installation messages]
source /global/common/software/myproject/env/bin/activate
conda install <only-what-my-project-needs>

You can customize the path with the -p argument. The installation above would go to $HOME/miniconda3 without it. You should also consider the PYTHONSTARTUP environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.

Who should use Option 4?

Option 4 is suitable for individuals or collaborations who would like to install, maintain, and control their own Python stack. Users who choose Option 4a should not combine their custom Python installations with our NERSC Python modules.

Option 5: Install/Use Python inside a Shifter container

We strongly suggest this option for any user who needs to run Python on 10+ nodes. This will result in better performance for your own application, make you less vulnerable to filesystem slowdowns caused by other users, and of course prevent causing filesystem slowdowns for other users. Please see our Python in Shifter documentation and examples.

Who should use Option 5?

Option 5 is suitable for users willing to build their own software stack inside of a container. mpi4py works best at scale in Shifter.

Creating conda environments

Creating custom conda environments is usually quick and easy. If you require a package that is not available in our default module, this is the option you must use.

If you are using Option 2 (source activate):

module load python
conda create --name myenv python=3.8
source activate myenv
conda install numpy scipy astropy

If you are using Option 3 (conda activate):

conda create --name myenv python=3.8
conda activate myenv
conda install numpy scipy astropy

For better performance or if you plan to run your application at scale, consider installing your custom environment in your project's directory on /global/common/software:

conda create --prefix /global/common/software/myproject/myenv python=3.8
source activate /global/common/software/myproject/myenv
conda install numpy scipy astropy

We are aware the project directory quotas on /global/common/software are small. Please open a ticket at help.nersc.gov if you need more space.

Installing libraries via conda channels

Conda has several default channels that will be used first for package installation. If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully.

Here is an example that demonstrates why your channels matter. If we

conda install numpy

it will search the default channels first. This is good because it means that MKL-enabled NumPy will be installed which generally performs well on Cori's Intel hardware.

If however you have added other channels to your search path, for example conda-forge, the packages that conda-forge will decide to install may not be optimal for NERSC. In this example, you will likely get a version of NumPy that uses OpenBLAS instead of MKL and this can be substantially slower on Cori.

Don't permanently add other channels to your conda config, i.e.

conda config --add channels conda-forge

Do this instead:

conda install numpy --channel conda-forge

It's better to append the channel you need with a -channel conda-forge. This uses conda-forge only when you ask for it and not all the time.

Installing libraries via pip

Pip is available under Anaconda Python. If you create a conda environment but you are unable to find a conda build of whatever package (or version of that package) you want to install, then pip is one viable alternative. However, pip users at NERSC should be aware of the following:

  1. Users of the pip command may want to use the --user flag for per-user site-package installation following the PEP370 standard. On Linux systems this defaults to $HOME/.local, and packages can be installed to this path with pip install --user <package name>. This can be overridden by defining the PYTHONUSERBASE environment variable.
  2. To prevent per-user site-package installations from conflicting across machines and module versions, at NERSC we have configured our Python modules so that PYTHONUSERBASE is set to $HOME/.local/$NERSC_HOST/<version> where "" corresponds to the version of the Python module loaded. Note that anyone using Option 3 will have to configure this themselves.

Mixing pip and conda: an example

We have observed that users often don't realize that the per-user site-package directories are included in the search path from all their conda environments created with the same module. What does this mean? We'll demonstrate with an example. If you have done the following:

module load python
pip install numpy --user

Any conda environment you have created based on this Python module will have this pip-installed NumPy in its search path.

It can be easy to forget you've done pip install --user and then create a new conda environment and be confused by how it works (or doesn't).

If you're using a conda environment anyway, think about whether you really want a pip-installed package to be accessible to multiple conda environments. If you don't, just drop the --user part and install it into your conda environment:

module load python
source activate myenv
pip install numpy