How to use Python on NERSC systems¶
There are 5 options for using and configuring your Python environment at NERSC. We provide a brief overview here and will explain each option in greater detail below.
- Module only
- Module + source activate
- Conda init + conda activate
- Install your own Python
- Use a Shifter container (best practice for 10+ nodes)
Small scale: Our data show that about 80 percent of our NERSC Python users are using custom conda environments (Options 2 and 3)- you might find these are a good solution for you, too.
Large scale: While Options 2 and 3 are well-suited for small scale jobs, they do not scale well. If you intend to run at large scale (10+ nodes), Shifter is the best option.
We provide more discussion about how to achieve good performance by choosing the right filesystems.
Option 1: Module only¶
In this mode, you just module load python
and use it however you like. This is the simplest option but also the least flexible. If you require a package that is not in our default modules this option will not work for you.
Who should use Option 1?
Option 1 is best for users who want to get started quickly and who do not require special libraries or custom packages.
Option 2: Module + source activate¶
In this mode, you first module load python
and then build and use a conda environment on top of our module. To use this method:
module load python
source activate myenv
To leave your environment
conda deactivate
and you will return to the base Python environment.
Who should use Option 2?
Option 2 is a good choice for any user who doesn't want a specific version of Python loaded automatically when they log on to Cori. It is also good for users who prefer to use the most recent Python module.
Option 3: Conda init + conda activate¶
In this mode, you are not actually using the Python module. Rather, you will configure your environment one time based on a Python module. This means that your configuration will not have variables like PYTHONUSERBASE
set that help group pip
packages in an organized fashion. Option 3 also doesn't include any safety checks that might prevent you from mixing Python environments. If these things are not an issue for you, you can configure your setup one time via:
module load python
conda init
This will add the following to your .bashrc
file:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh" ]; then
. "/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/etc/profile.d/conda.sh"
else
export PATH="/global/common/cori_cle7/software/python/3.7-anaconda-2019.10/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
After you have configured your environment, when you log on to Cori you should only:
conda activate myenv
To leave your environment:
conda deactivate
and you will return to the base Python environment.
What should you do if you decide you don't like Option 3? You can simply delete the lines that conda init
has added to your .bashrc.
file and choose another Python option.
Who should use Option 3?
Option 3 is suitable for any user who would like a particular Python environment loaded by default whenever they access Cori. However, the user must be willing to manually monitor and update their configuration. Users should also be aware that they will need to manage their pip package installation via setting PYTHONUSERBASE
for example. Users who choose Option 3 should not combine their conda-init configured Python environment with our NERSC Python modules.
Option 4: Install your own Python¶
You don't have to use any of the Python options we described above- you are free to install your own Python via Miniconda, Anaconda, Intel Python, or a custom collaboration install to have complete control over your stack.
Collaborations, projects, or experiments may wish to install a shareable, managed Python stack to /global/common/software
independent of the NERSC modules. You are welcome to use the Anaconda installer script for this purpose. In fact you may want to consider the more "stripped-down" Miniconda installer as a starting point. That option allows you to start with only the bare essentials and build up. Be sure to select Linux version in either case. For instance:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b \
-p /global/common/software/myproject/env
[installation messages]
source /global/common/software/myproject/env/bin/activate
conda install <only-what-my-project-needs>
You can customize the path with the -p
argument. The installation above would go to $HOME/miniconda3
without it. You should also consider the PYTHONSTARTUP
environment variable which you may wish to unset altogether. It is mainly relevant to the system Python we advise against using.
Who should use Option 4?
Option 4 is suitable for individuals or collaborations who would like to install, maintain, and control their own Python stack. Users who choose Option 4a should not combine their custom Python installations with our NERSC Python modules.
Option 5: Install/Use Python inside a Shifter container¶
We strongly suggest this option for any user who needs to run Python on 10+ nodes. This will result in better performance for your own application, make you less vulnerable to filesystem slowdowns caused by other users, and of course prevent causing filesystem slowdowns for other users. Please see our Python in Shifter documentation and examples.
Who should use Option 5?
Option 5 is suitable for users willing to build their own software stack inside of a container. mpi4py works best at scale in Shifter.
Creating conda environments¶
Creating custom conda environments is usually quick and easy. If you require a package that is not available in our default module, this is the option you must use.
If you are using Option 2 (source activate):
module load python
conda create --name myenv python=3.8
source activate myenv
conda install numpy scipy astropy
If you are using Option 3 (conda activate):
conda create --name myenv python=3.8
conda activate myenv
conda install numpy scipy astropy
For better performance or if you plan to run your application at scale, consider installing your custom environment in your project's directory on /global/common/software
:
conda create --prefix /global/common/software/myproject/myenv python=3.8
source activate /global/common/software/myproject/myenv
conda install numpy scipy astropy
We are aware the project directory quotas on /global/common/software
are small. Please open a ticket at help.nersc.gov
if you need more space.
Installing libraries via conda channels¶
Conda has several default channels that will be used first for package installation. If you want to use another channel beyond the defaults channel, you can, but we suggest that you select your channel carefully.
Here is an example that demonstrates why your channels matter. If we
conda install numpy
it will search the default channels first. This is good because it means that MKL-enabled NumPy will be installed which generally performs well on Cori's Intel hardware.
If however you have added other channels to your search path, for example conda-forge
, the packages that conda-forge
will decide to install may not be optimal for NERSC. In this example, you will likely get a version of NumPy that uses OpenBLAS instead of MKL and this can be substantially slower on Cori.
Don't permanently add other channels to your conda config, i.e.
conda config --add channels conda-forge
Do this instead:
conda install numpy --channel conda-forge
It's better to append the channel you need with a -channel conda-forge
. This uses conda-forge
only when you ask for it and not all the time.
Installing libraries via pip¶
Pip is available under Anaconda Python. If you create a conda environment but you are unable to find a conda build of whatever package (or version of that package) you want to install, then pip is one viable alternative. However, pip users at NERSC should be aware of the following:
- Users of the pip command may want to use the
--user
flag for per-user site-package installation following the PEP370 standard. On Linux systems this defaults to$HOME/.local
, and packages can be installed to this path withpip install --user <package name>
. This can be overridden by defining thePYTHONUSERBASE
environment variable. - To prevent per-user site-package installations from conflicting across machines and module versions, at NERSC we have configured our Python modules so that
PYTHONUSERBASE
is set to$HOME/.local/$NERSC_HOST/<version>
where "" corresponds to the version of the Python module loaded. Note that anyone using Option 3 will have to configure this themselves.
Mixing pip and conda: an example¶
We have observed that users often don't realize that the per-user site-package directories are included in the search path from all their conda environments created with the same module. What does this mean? We'll demonstrate with an example. If you have done the following:
module load python
pip install numpy --user
Any conda environment you have created based on this Python module will have this pip-installed NumPy in its search path.
It can be easy to forget you've done pip install --user
and then create a new conda environment and be confused by how it works (or doesn't).
If you're using a conda environment anyway, think about whether you really want a pip-installed package to be accessible to multiple conda environments. If you don't, just drop the --user
part and install it into your conda environment:
module load python
source activate myenv
pip install numpy