Skip to content

VTune

VTune is a performance analysis tool targeting Intel architectures. Users are encouraged to read the Intel VTune Amplifier documentation for general usage.

Using VTune

VTune is available on Cori by loading the vtune module.

Recommended compiler flags for VTune performance collection

Intel provides a page documenting their recommended compiler flags for compiling applications when collecting performance data with VTune. Users will generally have the best results when compiling codes using the Intel compilers, although the CCE and GCC compilers can also produce application suitable for analysis with VTune.

When collecting performance data with VTune, users should add the Slurm flag --perf=vtune[/<version>] where <version> is an explicit version of the vtune module, e.g., --perf=vtune/2020.up3.

For some versions of VTune, users must use --perf=likwid instead

The --perf flag is designed to enable users to request a specific version of the vtune modulefile, e.g., --perf=vtune/2020.up3. However, this currently fails with some versions of VTune, with an error like the following:

srun: error: nersc_perf: invalid perf module version specififed "vtune/2020.up3"
srun: error: Invalid --perf argument: vtune/2020.up3

If a user encounters this error, they are encouraged to replace --perf=vtune with --perf=likwid, which has the same effect, and enables VTune to perform all of its expected functions.

Defer finalization on KNL

It is generally recommended to defer finalization when running on KNL. Finalization is an inherently serial process and the individual core performance on KNL is very poor. Thus, when running VTune on KNL, add the parameter -finalization-mode=deferred

#!/bin/bash
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --perf=vtune
# ... additional sbatch parameters ...

module load vtune

vtune -finalization-mode=deferred -collect ... -r <result-dir> -- <command-to-profile>

# in some cases, it one might want to copy over the libraries need to finalize
vtune -archive -r <result-dir>

and then finalize on a login node:

vtune -finalize -result-dir <PATH>

Using VTune with Shifter

VTune can be attached to a Shifter container by executing the process in the background and then attaching VTune to the process via the PID (process identifier).

Cannot directly run collection on containers

The following will not work:

vtune -collect ... -- shifter <command-to-execute-in-container>

The recommended method is as follows:

#!/bin/bash
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --perf=vtune
#SBATCH --image=<username/some-image>
# ... additional sbatch parameters ...

module load vtune

PID_FILE=$(mktemp pid.XXXXXXX)

# the first "&" causes the command to execute in the background
# "echo $!" prints the PID
# "&> ${PID_FILE}" writes the PID to the temporary file
shifter <command-to-execute-in-container> & echo $! &> ${PID_FILE}

# read the PID from the file
TARGET_PID=$(cat ${PID_FILE})

# attach VTune to the process
vtune -collect <collection-mode> --target-pid=${TARGET_PID} ...

VTune finalization with Shifter

In the Using VTune section, it was recommended to not finalize on KNL. However, when using containers, deferring finalization creates a problem because the binaries needed for finalization exist only within the container. Due to this fact, it is recommended to not defer finalization when using containers.

VTune + Shifter Example

#!/bin/bash
#SBATCH --qos=regular
#SBATCH --constraint=knl
#SBATCH --nodes=1
#SBATCH --time=03:00:00
#SBATCH --job-name=tomopy_gridrec
#SBATCH --output=out_tomopy_%j.log
#SBATCH --image=jrmadsen/tomopy-reference:gcc
#SBATCH --perf=vtune

set -o errexit

# ensure VTune module is loaded
module load vtune

# this format of assignment only sets the variable to the specified value
# if not already set in the environment
: ${OMP_NUM_THREADS:=1}
: ${NUMEXPR_MAX_THREADS:=$(nproc)}
: ${VTUNE_COLLECTION_MODE:="advanced-hotspots"}
: ${VTUNE_SAMPLING_INTERVAL:=25}
: ${VTUNE_RESULTS_DIR:=$(mktemp -d ${PWD}/run-${VTUNE_COLLECTION_MODE}-XXXXX)}

export OMP_NUM_THREADS
export NUMEXPR_MAX_THREADS
export VTUNE_COLLECTION_MODE
export VTUNE_SAMPLING_INTERVAL
export VTUNE_RESULTS_DIR

# make sure empty, let vtune create directory
rm -rf ${VTUNE_RESULTS_DIR}

# use mktemp to ensure guard against multiple jobs in same dir
PID_FILE=$(mktemp pid.XXXXXX)

echo -e "\n### Submitting shifter job into background and storing PID in file: ${PID_FILE} ###\n"
shifter /opt/conda/bin/python ./run_tomopy.py -a gridrec -n 256 -s 512 -f jpeg -S 1 -c 8 -p shepp3d -i 5 & echo $! &> ${PID_FILE}

echo -e "\n### Reading PID file: ${PID_FILE} ###\n"
TARGET_PID=$(cat ${PID_FILE})

# echo the ps for debugging
echo -e "\n### Target PID: ${TARGET_PID} ###\n"
ps

# echo the environment for reference
echo -e "\n### Environment ###\n"
env

echo -e "\n### Attaching VTune process to PID ${TARGET_PID} ###\n"
vtune \
    -collect ${VTUNE_COLLECTION_MODE} \
    -knob collection-detail=hotspots-sampling \
    -knob event-mode=all \
    -knob analyze-openmp=true \
    -knob sampling-interval=${VTUNE_SAMPLING_INTERVAL} \
    -data-limit=0 \
    --target-pid=${TARGET_PID} \
    -r ${VTUNE_RESULTS_DIR}

echo -e "\nCompleted\n"