Skip to content

Timemory

Description

Timemory is a multi-purpose C++ toolkit and suite of C/C++/Fortran/Python tools for performance analysis, optimization studies, logging, and debugging.

Timemory may be used as a profiler for C, C++, Fortran, CUDA, and/or Python or as a C++ or Python backend for the creation of a custom profiling tool/library.

Features

  • C++ / Python Profiling Toolkits
  • C / C++ / CUDA / Fortran / Python Manual Instrumentation APIs
  • Data analysis via Pandas
  • Drop-in replacement for time
  • Binary rewriting and runtime instrumentation
  • Python Function Profiler
  • Python Line Profiler
  • Kokkos Profiling Libraries
  • MPI Profiling
  • NCCL Profiling
  • OpenMP Profiling
  • Compiler Instrumentation

Data Collection Support

  • Various timers (wall, user, system, CPU, CUDA kernels, etc.)
  • Various resource usage (peak RSS, page RSS, virtual memory, context switches, etc.)
  • Hardware Counters (CPU and GPU)
  • Roofline (CPU and GPU)
  • Data trackers
  • I/O
  • Network statistics
  • Trip counts

Third-Party API Support

  • Allinea-Map
  • VTune
  • NVTX
  • CrayPat
  • Caliper
  • TAU
  • LIKWID
  • gperftools

When should you use timemory?

Replacement for time

Timemory provides a command-line tool timem which is a superior alternative to the time command-line tool.

Built-in Instrumentation

Timemory is ideal for adding built-in high-level performance analysis to your application which you can easily control, extend, and customize.

Performance Analysis Automation

The C++ and Python APIs enable unprecedented in-situ programmatic access to the profiling data being collected and the JSON output can easily be digested within Python and/or converted to pandas dataframes using hatchet.

Python Profiling

timemory supports line profiling and function profiling and has an extensive Python interface with decorators, context-managers, individual profiling components.

Python Line Profiling

timemory-python-trace can be used on a Python script to record one or more metrics for each line of Python code. Furthermore, select functions can decorated with a bare @profile to only record the lines within the designated function(s).

Python Function Profiling

timemory-python-profiler can be used on a Python script to record one or more metrics for each function. Furthermore, select functions can decorated with a bare @profile to only record the designated function(s).

Profiling hybrid C/C++ and Python codes

By instrumenting C or C++ manually or automatically via timemory-run in conjunction with instrumenting Python manually or automatically via timemory-python-trace/timemory-python-profiler, hybrid C/C++ and Python applications can obtain unified profiling output.

Create/prototype Custom Performance Analysis Metric

The C++ template API available through timemory is highly modular and customizable. The design is ideal for creating a composite or custom metric, e.g. some metric relative to a user-provided data or a metric normalized by another metric.

When should you NOT use timemory?

GUI-based Profiling

timemory does not have a GUI (graphical user interface) for launching profiling sessions and visualizing results is relatively complicated in comparison with established profilers such as VTune, Nsight-Systems, Nsight-Compute, etc.

Low Overhead, Whole Application Profiling

At present, most of the profiling data extracted by timemory is accomplished through instrumentation instead of sampling. Instrumentation effectively adds instructions to the target program to collect the required information. Instrumenting a program can cause performance changes depending on what information is being collected and on the level of timing details reported. A sampling profiler probes the target program's call stack at regular intervals using operating system interrupts. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed. Instrumenting a whole application can cause performance changes, and may in some cases artificially inflate the measurements around small functions which are called very frequently.

Sample C / C++ Library API

#include "timemory/library.h"
#include "timemory/timemory.h"

int
main(int argc, char** argv)
{
    // configure settings
    int overwrite       = 0;
    int update_settings = 1;
    // default to flat-profile
    timemory_set_environ("TIMEMORY_FLAT_PROFILE", "ON", overwrite, update_settings);
    // force timing units
    overwrite = 1;
    timemory_set_environ("TIMEMORY_TIMING_UNITS", "msec", overwrite, update_settings);

    // initialize with cmd-line
    timemory_init_library(argc, argv);

    // check if inited, init with name
    if(!timemory_library_is_initialized())
        timemory_named_init_library("ex-c");

    // define the default set of components
    timemory_set_default("wall_clock, cpu_clock");

    // create a region "main"
    timemory_push_region("main");
    timemory_pop_region("main");

    // pause and resume collection globally
    timemory_pause();
    timemory_push_region("hidden");
    timemory_pop_region("hidden");
    timemory_resume();

    // Add/remove component(s) to the current set of components
    timemory_add_components("peak_rss");
    timemory_remove_components("peak_rss");

    // get an identifier for a region and end it
    uint64_t idx = timemory_get_begin_record("indexed");
    timemory_end_record(idx);

    // assign an existing identifier for a region
    timemory_begin_record("indexed/2", &idx);
    timemory_end_record(idx);

    // create region collecting a specific set of data
    timemory_begin_record_enum("enum", &idx, TIMEMORY_PEAK_RSS, TIMEMORY_COMPONENTS_END);
    timemory_end_record(idx);

    timemory_begin_record_types("types", &idx, "peak_rss");
    timemory_end_record(idx);

    // replace current set of components and then restore previous set
    timemory_push_components("page_rss");
    timemory_pop_components();

    timemory_push_components_enum(2, TIMEMORY_WALL_CLOCK, TIMEMORY_CPU_CLOCK);
    timemory_pop_components();

    // Output the results
    timemory_finalize_library();
    return 0;
}

Sample Fortran API

program fortran_example
    use timemory
    use iso_c_binding, only : C_INT64_T
    implicit none
    integer(C_INT64_T) :: idx

    ! initialize with explicit name
    call timemory_init_library("ex-fortran")

    ! initialize with name extracted from get_command_argument(0, ...)
    ! call timemory_init_library("")

    ! define the default set of components
    call timemory_set_default("wall_clock, cpu_clock")

    ! Start region "main"
    call timemory_push_region("main")

    ! Add peak_rss to the current set of components
    call timemory_add_components("peak_rss")

    ! Nested region "inner" nested under "main"
    call timemory_push_region("inner")

    ! End the "inner" region
    call timemory_pop_region("inner")

    ! remove peak_rss
    call timemory_remove_components("peak_rss")

    ! begin a region and get an identifier
    idx = timemory_get_begin_record("indexed")

    ! replace current set of components
    call timemory_push_components("page_rss")

    ! Nested region "inner" with only page_rss components
    call timemory_push_region("inner (pushed)")

    ! Stop "inner" region with only page_rss components
    call timemory_pop_region("inner (pushed)")

    ! restore previous set of components
    call timemory_pop_components()

    ! end the "indexed" region
    call timemory_end_record(idx)

    ! End "main"
    call timemory_pop_region("main")

    ! Output the results
    call timemory_finalize_library()

end program fortran_example

Sample Python API

Decorator

from timemory.bundle import marker

@marker(["cpu_clock", "peak_rss"])
def foo():
    pass

Context Manager

from timemory.profiler import profile

def bar():
    with profile(["wall_clock", "cpu_util"]):
        foo()

Individual Components

from timemory.component import WallClock

def spam():

    wc = WallClock("spam")
    wc.start()

    bar()

    wc.stop()
    data = wc.get()
    print(data)

Argparse Support

import argparse

parser = argparse.ArgumentParser("example")
# ...
timemory.add_arguments(parser)

args = parser.parse_args()

Component Storage

from timemory.storage import WallClockStorage

# data for current rank
data = WallClockStorage.get()
# combined data on rank zero but all ranks must call it
dmp_data = WallClockStorage.dmp_get()