Current known issues¶
Perlmutter¶
Please keep in mind that Perlmutter is under active development. Access to the file systems (scratch, CFS, and HPSS) as well as to the outside world may be slow or hang unexpectedly.
-
PrgEnv-gnu
users must loadcpe-cuda
in order to get compatible versions ofgcc
andcuda
orcudatoolkit
. The correct order is:module load PrgEnv-gnu module load cpe-cuda module load cuda
-
Static compiling doesn’t work, only dynamic linking works.
- CUDA-aware MPICH can only use up to half the resources on a CPU when it can see a GPU due to allocating GPU memory. Setting GPU binding will help distribute the CUDA-objects between GPUs to avoid an out of memory error.
- Full
myquota
is not yet available - The lmod configuration for csh doesn't carry over to non-interactive logins (like batch scripts). This can be worked around by adding
source /usr/share/lmod/8.3.1/init/csh
to your.tcshrc
file. - Some users may see messages like
-bash: /usr/common/usg/bin/nersc_host: No such file or directory
when you login. This means you have outdated dotfiles that need to be updated. To stop this message, you can either delete this line from your dot files or check ifNERSC_HOST
is set before overwriting it. Please see our environment page for more details. - Using openmpi in shifter requires
--mpi=pmi2
- Shifter MPICH communication doesn't work across multiple nodes
- Users sometimes encounter a
CUDA Unknown Error
during initialization. - Machine learning applications (See ML issues page for workaround):
- Users sometimes encounter a
CUDA Unknown Error
during initialization. - Some Nvidia ngc containers don't properly enter compatibility mode when running with shifter.
- Users sometimes encounter a
- Users may notice MKL-based CPU code runs more slowly. Try
module load fast-mkl-amd
. - Nodes on Perlmutter currently do not get a constant
hostid
(IP address) response. - Nodes unexpectedly failing is a known issue. Please open a github issue if your jobs are killed but the node remains up.