Skip to content

Current known issues

Perlmutter

Please keep in mind that Perlmutter is under active development. Access to the file systems (scratch, CFS, and HPSS) as well as to the outside world may be slow or hang unexpectedly.

  • PrgEnv-gnu users must load cpe-cuda in order to get compatible versions of gcc and cuda or cudatoolkit. The correct order is:

    module load PrgEnv-gnu
    module load cpe-cuda
    module load cuda
    
  • Static compiling doesn’t work, only dynamic linking works.

  • CUDA-aware MPICH can only use up to half the resources on a CPU when it can see a GPU due to allocating GPU memory. Setting GPU binding will help distribute the CUDA-objects between GPUs to avoid an out of memory error.
  • Full myquota is not yet available
  • The lmod configuration for csh doesn't carry over to non-interactive logins (like batch scripts). This can be worked around by adding source /usr/share/lmod/8.3.1/init/csh to your .tcshrc file.
  • Some users may see messages like -bash: /usr/common/usg/bin/nersc_host: No such file or directory when you login. This means you have outdated dotfiles that need to be updated. To stop this message, you can either delete this line from your dot files or check if NERSC_HOST is set before overwriting it. Please see our environment page for more details.
  • Using openmpi in shifter requires --mpi=pmi2
  • Shifter MPICH communication doesn't work across multiple nodes
  • Users sometimes encounter a CUDA Unknown Error during initialization.
  • Machine learning applications (See ML issues page for workaround):
    • Users sometimes encounter a CUDA Unknown Error during initialization.
    • Some Nvidia ngc containers don't properly enter compatibility mode when running with shifter.
  • Users may notice MKL-based CPU code runs more slowly. Try module load fast-mkl-amd.
  • Nodes on Perlmutter currently do not get a constant hostid (IP address) response.
  • Nodes unexpectedly failing is a known issue. Please open a github issue if your jobs are killed but the node remains up.