Timing
Omega includes a module for timing the code and incorporates timers around various parts of the code.
The timing output is written to two files: omega.summary
and omega.timing0
.
The omega.summary
file presents accumulated timing statistics across all MPI ranks.
The omega.timing.0
show timing result only from the first rank.
There are two parameters that are set by the user in the input configuration file that control the timing behavior. These are:
Timing:
Level: 1
Autofence: True
The Level
parameter is a non-negative integer that determines the granularity of timers.
Increasing it will turn on more timers.
Having more timers provides more detailed information, but it also comes with increased overhead,
and may be counter-productive if a high-level look at model performance is sufficient.
The Autofence
boolean option determines if Kokkos fences are automatically added before every timer call.
This option needs to be true for accurate timing using Omega timers on GPU-based systems.
However, there are circumstances when turning off automatic fences is useful.
The main use case is using external profiling tools.
Another one is measuring the overhead of automatic synchronization for very high timing levels.
Integration with external profiling tools
External profilers often include APIs to mark regions of code for detailed profiling. On some platforms, Omega timers automatically add these annotations. Currently, this is only implemented on systems with NVIDIA GPUs using NVTX.
This allows, for example, to use the Nsight Compute kernel profiler to obtain
detailed kernel information for all kernels enclosed in the Tend:computeVelocityTendencies
Omega timer.
mpirun -np 1 ncu --nvtx --nvtx-include "Tend:computeVelocityTendencies/" omega.exe