Debugging & Performance

Interpreting cProfile: Cumulative vs Total Time

You ran cProfile, you have a table, and the top row by one sort is your entry point while the top row by another sort is a tiny helper — and it is not obvious which one to optimize. The confusion is almost always between tottime and cumtime. This guide explains every column pytest's profiler emits, why a dispatcher shows enormous cumulative time but trivial total time, and how to sort pstats to find the function actually worth changing.

Prerequisites

  • Python 3.8+ (cProfile, pstats, pstats.SortKey).
  • A .prof file produced by cProfile.run(..., filename="x.prof") or Profile.dump_stats("x.prof") — see CPU profiling with cProfile and py-spy for capture.

Solution

Load the stats, strip directory noise, and print both sorts:

Python
import pstats
from pstats import SortKey

stats = pstats.Stats("workload.prof")
stats.strip_dirs()                       # turn /abs/path/mod.py:func into mod.py:func

# cumtime: which entry points dominate the whole run (body + everything called).
stats.sort_stats(SortKey.CUMULATIVE).print_stats(8)

# tottime: which leaf functions burn CPU in their own body.
stats.sort_stats(SortKey.TIME).print_stats(8)

A representative row looks like this:

Plain text
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      240/2    0.001    0.000    1.480    0.740 app.py:12(dispatch)
   1000000    1.310    0.000    1.420    0.000 app.py:40(parse_record)

Read it column by column:

  • ncalls — number of calls. A 240/2 split means the function recursed: 240 total entries, 2 primitive (non-recursive) entries.
  • tottime — time spent in this function's own body, excluding anything it called. This is the leaf-level CPU cost.
  • percall (first) — tottime / primitive ncalls, the average self-cost per call.
  • cumtime — total time in this function including every subcall, summed across all entries. The top-level entry point has the largest cumtime.
  • percall (second) — cumtime / primitive ncalls.

In the example, dispatch has tiny tottime (0.001s) but huge cumtime (1.480s): it does almost nothing itself and spends all its time in parse_record. parse_record has tottimecumtime (1.31 vs 1.42), so it is a genuine leaf hotspot. Optimize parse_record, not dispatch.

Why this works

cProfile records, for every function, the time charged to its own frame separately from the time charged to frames it called. tottime is the self-time; cumtime rolls in the descendants. That is why callers high in the stack accumulate large cumtime while doing little real work, and why a tight inner loop shows tottime ≈ cumtime. Sorting by cumtime answers "where does the run spend its time" (the hot path), and sorting by tottime answers "which line is doing the work" (the hot leaf). You almost always want both: cumulative to navigate to the path, total to find the function to change.

Edge cases and failure modes

  • Recursion inflates ncalls. The total/primitive split is the tell. Optimizing a recursive function means cutting call count (memoization) or self-cost per call — percall on primitive calls tells you which.
  • Built-in rows dominate. Many <built-in method> entries with high tottime usually mean a hot loop calling cheap builtins millions of times; the fix is fewer calls, not a faster builtin. Cross-check wall-clock attribution with py-spy on a live process when deterministic overhead is suspect.
  • cumtime is not additive across rows. You cannot sum cumtime over functions to get total runtime; nested calls double-count. Read the root frame's cumtime for the whole-run figure.
  • callers / callees views. Use stats.print_callers("parse_record") to see who drives a hotspot — essential when a leaf is called from several paths.
  • Threaded code under-reports. cProfile only profiles the thread that started it; spawned threads are invisible. Profile each thread, or sample with py-spy.

Frequently Asked Questions

What is the difference between tottime and cumtime in cProfile?tottime (total time) is the time spent in a function's own body, excluding calls it makes. cumtime (cumulative time) includes that plus all time spent in functions it called. A function that mostly delegates has low tottime and high cumtime.

Which column should I sort by to find a hotspot? Sort by tottime to find the leaf functions actually burning CPU, and by cumtime to find which high-level entry points dominate the run. Start with cumtime to locate the hot path, then tottime to find the line doing the work.

What does ncalls show when it reads like 240/2? The first number is total calls and the second is primitive (non-recursive) calls. A split like 240/2 means the function recursed: it was entered 240 times overall but only 2 of those were the original non-recursive entries.

← Back to CPU Profiling with cProfile and py-spy