Timemory¶
Description¶
Timemory is a multi-purpose C++ toolkit and suite of C/C++/Fortran/Python tools for performance analysis, optimization studies, logging, and debugging.
Timemory may be used as a profiler for C, C++, Fortran, CUDA, and/or Python or as a C++ or Python backend for the creation of a custom profiling tool/library.
Features¶
- C++ / Python Profiling Toolkits
- C / C++ / CUDA / Fortran / Python Manual Instrumentation APIs
- Data analysis via Pandas
- Drop-in replacement for
time
- Binary rewriting and runtime instrumentation
- Python Function Profiler
- Python Line Profiler
- Kokkos Profiling Libraries
- MPI Profiling
- NCCL Profiling
- OpenMP Profiling
- Compiler Instrumentation
Data Collection Support¶
- Various timers (wall, user, system, CPU, CUDA kernels, etc.)
- Various resource usage (peak RSS, page RSS, virtual memory, context switches, etc.)
- Hardware Counters (CPU and GPU)
- Roofline (CPU and GPU)
- Data trackers
- I/O
- Network statistics
- Trip counts
Third-Party API Support¶
- Allinea-Map
- VTune
- NVTX
- CrayPat
- Caliper
- TAU
- LIKWID
- gperftools
When should you use timemory?¶
Replacement for time
¶
Timemory provides a command-line tool timem which is a superior alternative to the time
command-line tool.
Built-in Instrumentation¶
Timemory is ideal for adding built-in high-level performance analysis to your application which you can easily control, extend, and customize.
Performance Analysis Automation¶
The C++ and Python APIs enable unprecedented in-situ programmatic access to the profiling data being collected and the JSON output can easily be digested within Python and/or converted to pandas dataframes using hatchet.
Python Profiling¶
timemory supports line profiling and function profiling and has an extensive Python interface with decorators, context-managers, individual profiling components.
Python Line Profiling
timemory-python-trace
can be used on a Python script to record one or more metrics for each line of Python code. Furthermore, select functions can decorated with a bare @profile
to only record the lines within the designated function(s).
Python Function Profiling
timemory-python-profiler
can be used on a Python script to record one or more metrics for each function. Furthermore, select functions can decorated with a bare @profile
to only record the designated function(s).
Profiling hybrid C/C++ and Python codes¶
By instrumenting C or C++ manually or automatically via timemory-run in conjunction with instrumenting Python manually or automatically via timemory-python-trace
/timemory-python-profiler
, hybrid C/C++ and Python applications can obtain unified profiling output.
Create/prototype Custom Performance Analysis Metric¶
The C++ template API available through timemory is highly modular and customizable. The design is ideal for creating a composite or custom metric, e.g. some metric relative to a user-provided data or a metric normalized by another metric.
When should you NOT use timemory?¶
GUI-based Profiling¶
timemory does not have a GUI (graphical user interface) for launching profiling sessions and visualizing results is relatively complicated in comparison with established profilers such as VTune, Nsight-Systems, Nsight-Compute, etc.
Low Overhead, Whole Application Profiling¶
At present, most of the profiling data extracted by timemory is accomplished through instrumentation instead of sampling. Instrumentation effectively adds instructions to the target program to collect the required information. Instrumenting a program can cause performance changes depending on what information is being collected and on the level of timing details reported. A sampling profiler probes the target program's call stack at regular intervals using operating system interrupts. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed. Instrumenting a whole application can cause performance changes, and may in some cases artificially inflate the measurements around small functions which are called very frequently.
Relevant Links¶
- GitHub
- Documentation
- Doxygen
- Wiki
- Tutorials
- timemory ECP 2021 Tutorial Day 1 (YouTube)
- timemory ECP 2021 Tutorial Day 2 (YouTube)
Sample C / C++ Library API¶
#include "timemory/library.h"
#include "timemory/timemory.h"
int
main(int argc, char** argv)
{
// configure settings
int overwrite = 0;
int update_settings = 1;
// default to flat-profile
timemory_set_environ("TIMEMORY_FLAT_PROFILE", "ON", overwrite, update_settings);
// force timing units
overwrite = 1;
timemory_set_environ("TIMEMORY_TIMING_UNITS", "msec", overwrite, update_settings);
// initialize with cmd-line
timemory_init_library(argc, argv);
// check if inited, init with name
if(!timemory_library_is_initialized())
timemory_named_init_library("ex-c");
// define the default set of components
timemory_set_default("wall_clock, cpu_clock");
// create a region "main"
timemory_push_region("main");
timemory_pop_region("main");
// pause and resume collection globally
timemory_pause();
timemory_push_region("hidden");
timemory_pop_region("hidden");
timemory_resume();
// Add/remove component(s) to the current set of components
timemory_add_components("peak_rss");
timemory_remove_components("peak_rss");
// get an identifier for a region and end it
uint64_t idx = timemory_get_begin_record("indexed");
timemory_end_record(idx);
// assign an existing identifier for a region
timemory_begin_record("indexed/2", &idx);
timemory_end_record(idx);
// create region collecting a specific set of data
timemory_begin_record_enum("enum", &idx, TIMEMORY_PEAK_RSS, TIMEMORY_COMPONENTS_END);
timemory_end_record(idx);
timemory_begin_record_types("types", &idx, "peak_rss");
timemory_end_record(idx);
// replace current set of components and then restore previous set
timemory_push_components("page_rss");
timemory_pop_components();
timemory_push_components_enum(2, TIMEMORY_WALL_CLOCK, TIMEMORY_CPU_CLOCK);
timemory_pop_components();
// Output the results
timemory_finalize_library();
return 0;
}
Sample Fortran API¶
program fortran_example
use timemory
use iso_c_binding, only : C_INT64_T
implicit none
integer(C_INT64_T) :: idx
! initialize with explicit name
call timemory_init_library("ex-fortran")
! initialize with name extracted from get_command_argument(0, ...)
! call timemory_init_library("")
! define the default set of components
call timemory_set_default("wall_clock, cpu_clock")
! Start region "main"
call timemory_push_region("main")
! Add peak_rss to the current set of components
call timemory_add_components("peak_rss")
! Nested region "inner" nested under "main"
call timemory_push_region("inner")
! End the "inner" region
call timemory_pop_region("inner")
! remove peak_rss
call timemory_remove_components("peak_rss")
! begin a region and get an identifier
idx = timemory_get_begin_record("indexed")
! replace current set of components
call timemory_push_components("page_rss")
! Nested region "inner" with only page_rss components
call timemory_push_region("inner (pushed)")
! Stop "inner" region with only page_rss components
call timemory_pop_region("inner (pushed)")
! restore previous set of components
call timemory_pop_components()
! end the "indexed" region
call timemory_end_record(idx)
! End "main"
call timemory_pop_region("main")
! Output the results
call timemory_finalize_library()
end program fortran_example
Sample Python API¶
Decorator¶
from timemory.bundle import marker
@marker(["cpu_clock", "peak_rss"])
def foo():
pass
Context Manager¶
from timemory.profiler import profile
def bar():
with profile(["wall_clock", "cpu_util"]):
foo()
Individual Components¶
from timemory.component import WallClock
def spam():
wc = WallClock("spam")
wc.start()
bar()
wc.stop()
data = wc.get()
print(data)
Argparse Support¶
import argparse
parser = argparse.ArgumentParser("example")
# ...
timemory.add_arguments(parser)
args = parser.parse_args()
Component Storage¶
from timemory.storage import WallClockStorage
# data for current rank
data = WallClockStorage.get()
# combined data on rank zero but all ranks must call it
dmp_data = WallClockStorage.dmp_get()