You ran cProfile, you have a table, and the top row by one sort is your entry point while the top row by another sort is a tiny helper — and it is not obvious which one to optimize. The confusion is almost always between tottime and cumtime. This guide explains every column pytest's profiler emits, why a dispatcher shows enormous cumulative time but trivial total time, and how to sort pstats to find the function actually worth changing.
Prerequisites
- Python
3.8+(cProfile,pstats,pstats.SortKey). - A
.proffile produced bycProfile.run(..., filename="x.prof")orProfile.dump_stats("x.prof")— see CPU profiling with cProfile and py-spy for capture.
Solution
Load the stats, strip directory noise, and print both sorts:
import pstats
from pstats import SortKey
stats = pstats.Stats("workload.prof")
stats.strip_dirs() # turn /abs/path/mod.py:func into mod.py:func
# cumtime: which entry points dominate the whole run (body + everything called).
stats.sort_stats(SortKey.CUMULATIVE).print_stats(8)
# tottime: which leaf functions burn CPU in their own body.
stats.sort_stats(SortKey.TIME).print_stats(8)
A representative row looks like this:
ncalls tottime percall cumtime percall filename:lineno(function)
240/2 0.001 0.000 1.480 0.740 app.py:12(dispatch)
1000000 1.310 0.000 1.420 0.000 app.py:40(parse_record)
Read it column by column:
ncalls— number of calls. A240/2split means the function recursed:240total entries,2primitive (non-recursive) entries.tottime— time spent in this function's own body, excluding anything it called. This is the leaf-level CPU cost.percall(first) —tottime / primitive ncalls, the average self-cost per call.cumtime— total time in this function including every subcall, summed across all entries. The top-level entry point has the largestcumtime.percall(second) —cumtime / primitive ncalls.
In the example, dispatch has tiny tottime (0.001s) but huge cumtime (1.480s): it does almost nothing itself and spends all its time in parse_record. parse_record has tottime ≈ cumtime (1.31 vs 1.42), so it is a genuine leaf hotspot. Optimize parse_record, not dispatch.
Why this works
cProfile records, for every function, the time charged to its own frame separately from the time charged to frames it called. tottime is the self-time; cumtime rolls in the descendants. That is why callers high in the stack accumulate large cumtime while doing little real work, and why a tight inner loop shows tottime ≈ cumtime. Sorting by cumtime answers "where does the run spend its time" (the hot path), and sorting by tottime answers "which line is doing the work" (the hot leaf). You almost always want both: cumulative to navigate to the path, total to find the function to change.
Edge cases and failure modes
- Recursion inflates ncalls. The
total/primitivesplit is the tell. Optimizing a recursive function means cutting call count (memoization) or self-cost per call —percallon primitive calls tells you which. - Built-in rows dominate. Many
<built-in method>entries with hightottimeusually mean a hot loop calling cheap builtins millions of times; the fix is fewer calls, not a faster builtin. Cross-check wall-clock attribution with py-spy on a live process when deterministic overhead is suspect. - cumtime is not additive across rows. You cannot sum
cumtimeover functions to get total runtime; nested calls double-count. Read the root frame'scumtimefor the whole-run figure. callers/calleesviews. Usestats.print_callers("parse_record")to see who drives a hotspot — essential when a leaf is called from several paths.- Threaded code under-reports.
cProfileonly profiles the thread that started it; spawned threads are invisible. Profile each thread, or sample with py-spy.
Frequently Asked Questions
What is the difference between tottime and cumtime in cProfile?tottime (total time) is the time spent in a function's own body, excluding calls it makes. cumtime (cumulative time) includes that plus all time spent in functions it called. A function that mostly delegates has low tottime and high cumtime.
Which column should I sort by to find a hotspot?
Sort by tottime to find the leaf functions actually burning CPU, and by cumtime to find which high-level entry points dominate the run. Start with cumtime to locate the hot path, then tottime to find the line doing the work.
What does ncalls show when it reads like 240/2?
The first number is total calls and the second is primitive (non-recursive) calls. A split like 240/2 means the function recursed: it was entered 240 times overall but only 2 of those were the original non-recursive entries.
← Back to CPU Profiling with cProfile and py-spy