Current known issues¶
Perlmutter¶
Please keep in mind that Perlmutter is under active development. Access to the file systems (scratch, CFS, and HPSS) as well as to the outside world may be slow or hang unexpectedly.
-
PrgEnv-gnuusers must loadcpe-cudain order to get compatible versions ofgccandcudaorcudatoolkit. The correct order is:module load PrgEnv-gnu module load cpe-cuda module load cuda -
Static compiling doesn’t work, only dynamic linking works.
- CUDA-aware MPICH can only use up to half the resources on a CPU when it can see a GPU due to allocating GPU memory. Setting GPU binding will help distribute the CUDA-objects between GPUs to avoid an out of memory error.
- Full
myquotais not yet available - The lmod configuration for csh doesn't carry over to non-interactive logins (like batch scripts). This can be worked around by adding
source /usr/share/lmod/8.3.1/init/cshto your.tcshrcfile. - Some users may see messages like
-bash: /usr/common/usg/bin/nersc_host: No such file or directorywhen you login. This means you have outdated dotfiles that need to be updated. To stop this message, you can either delete this line from your dot files or check ifNERSC_HOSTis set before overwriting it. Please see our environment page for more details. - Using openmpi in shifter requires
--mpi=pmi2 - Shifter MPICH communication doesn't work across multiple nodes
- Users sometimes encounter a
CUDA Unknown Errorduring initialization. - Machine learning applications (See ML issues page for workaround):
- Users sometimes encounter a
CUDA Unknown Errorduring initialization. - Some Nvidia ngc containers don't properly enter compatibility mode when running with shifter.
- Users sometimes encounter a
- Users may notice MKL-based CPU code runs more slowly. Try
module load fast-mkl-amd. - Nodes on Perlmutter currently do not get a constant
hostid(IP address) response. - Nodes unexpectedly failing is a known issue. Please open a github issue if your jobs are killed but the node remains up.