A long-running worker creeps from 200 MB to 2 GB over a day and gets OOM-killed; a pytest session that should be flat grows with every test until CI runners die. top tells you that memory is growing but not which line is responsible. tracemalloc, in the standard library since Python 3.4, records the Python call stack for every allocation, so you can attribute live bytes to exact source lines and call paths, diff two points in time, and assert ceilings in tests. This guide covers driving tracemalloc end to end: configuring frame depth, taking and grouping snapshots, filtering noise, and wiring memory assertions into pytest.
Prerequisites
- Python 3.4+ for
tracemallocitself; 3.6+ forSnapshot.compare_toordering bysize_diffto behave as documented here. - A way to start tracing before the allocations you care about — either
tracemalloc.start()early in the process, or thePYTHONTRACEMALLOC=Nenvironment variable /-X tracemalloc=Nflag to set frame depth at launch. - pytest 6+ if you intend to gate memory in CI with the fixture pattern below.
- Awareness that tracing adds CPU and memory overhead proportional to
nframe; never leave it on in production hot paths.
Core concept
tracemalloc hooks Python's memory allocators. Once tracemalloc.start(nframe) runs, every subsequent allocation is recorded with up to nframe stack frames. A snapshot (take_snapshot()) is an immutable copy of all currently tracked allocations at that instant. You then ask the snapshot to aggregate its traces into statistics, grouped either by 'lineno' (one entry per source line) or 'traceback' (one entry per distinct call path). filter_traces removes entries you do not care about — the standard library, importlib, tracemalloc's own frames — before you read the numbers.
The leak-hunting workflow is a pipeline: capture a baseline snapshot, exercise the code, capture a second snapshot, and compare_to the baseline to surface the lines whose retained bytes grew. The single number get_traced_memory() (current, peak) is the cheap gate for tests. The diagram traces that pipeline.
Step-by-step implementation
1. Start tracing with the right frame depth
tracemalloc.start(nframe) begins recording. nframe is the number of stack frames stored per allocation. The default of 1 tells you the allocating line but not how it was reached; raise it when allocations funnel through a shared helper and you need the caller.
import tracemalloc
# Record up to 25 frames so we can group by full call path later.
tracemalloc.start(25)
To enable tracing from the very first allocation — before your own code runs — set the environment instead of calling start():
PYTHONTRACEMALLOC=25 python worker.py # or: python -X tracemalloc=25 worker.py
2. Take a snapshot
take_snapshot() freezes the current set of tracked allocations into an immutable Snapshot. It is cheap to hold and safe to pickle, so you can capture one, run work, capture another, and diff offline.
import tracemalloc
tracemalloc.start(25)
data = [bytes(1024) for _ in range(10_000)] # ~10 MB of work
snapshot = tracemalloc.take_snapshot()
3. Group statistics by lineno or traceback
snapshot.statistics(key_type) returns a list of Statistic objects sorted largest-first. Use 'lineno' to collapse everything allocated on the same line into one entry, or 'traceback' to keep distinct call paths separate.
for stat in snapshot.statistics("lineno")[:5]:
# stat.size is bytes retained; stat.count is the number of blocks
print(f"{stat.size / 1024:8.1f} KiB {stat.count:>7} blocks {stat.traceback[0]}")
10240.0 KiB 10000 blocks worker.py:4
Switching to 'traceback' and printing stat.traceback.format() shows the full path that reached the allocating line — essential when a generic list.append is the named site but the cause is one specific caller.
top = snapshot.statistics("traceback")[0]
print("\n".join(top.traceback.format())) # full call stack for the biggest allocation
4. Filter out noise
Raw snapshots are dominated by the import machinery and tracemalloc's own bookkeeping. snapshot.filter_traces([...]) returns a new snapshot keeping only matching frames. Negative filters (inclusive=False) drop frames; positive filters keep only your module.
import tracemalloc
filtered = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, tracemalloc.__file__), # drop tracemalloc's own frames
tracemalloc.Filter(False, "<unknown>"),
))
for stat in filtered.statistics("lineno")[:5]:
print(stat)
5. Read the single-number gate with get_traced_memory
For a fast pass/fail, skip snapshots and read tracemalloc.get_traced_memory(), which returns (current, peak) bytes since start(). reset_peak() (Python 3.9+) zeroes the peak so you can measure a specific region.
import tracemalloc
tracemalloc.start()
buf = [bytes(2048) for _ in range(5_000)]
current, peak = tracemalloc.get_traced_memory()
print(f"current={current/1e6:.1f} MB peak={peak/1e6:.1f} MB")
tracemalloc.stop()
6. Assert a memory ceiling in pytest
Wrap tracing in a fixture so each test starts clean and the assertion reads peak. This is the CI counterpart to the session-fixture leaks called out in advanced pytest architecture and configuration — a leaking session-scoped fixture is exactly what blows the ceiling.
# conftest.py
import tracemalloc
import pytest
@pytest.fixture
def memory_ceiling():
tracemalloc.start()
tracemalloc.reset_peak() # Python 3.9+: ignore allocations before the test body
yield
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
# Surface the peak so a failing assert prints an actionable number.
pytest.peak_bytes = peak
# test_memory.py
import tracemalloc
def build_report(rows):
return [{"id": r, "blob": bytes(1024)} for r in range(rows)]
def test_report_stays_under_5mb(memory_ceiling):
build_report(2_000)
_, peak = tracemalloc.get_traced_memory()
assert peak < 5 * 1024 * 1024, f"peak {peak/1e6:.1f} MB exceeded 5 MB ceiling"
When the ceiling fails because allocations leak across tests rather than within one, fixture scope is usually the culprit; pin it down with the techniques in mastering pytest fixtures.
7. Locate growth by comparing snapshots
The core leak technique is diffing two snapshots. take_snapshot() before and after a repeated operation, then compare_to to rank lines by size_diff. The focused walkthroughs live in finding memory leaks with tracemalloc snapshots and comparing tracemalloc snapshots to locate growth.
import tracemalloc
tracemalloc.start(25)
before = tracemalloc.take_snapshot()
cache = {}
for i in range(50_000):
cache[i] = bytes(64) # an unbounded cache: the leak
after = tracemalloc.take_snapshot()
for stat in after.compare_to(before, "lineno")[:3]:
# size_diff is the byte growth between the two snapshots
print(f"+{stat.size_diff/1024:8.1f} KiB {stat.traceback[0]}")
Verification
- Confirm tracing is live before measuring:
tracemalloc.is_tracing()must returnTrue. - Sanity-check
get_traced_memory()peak against a known allocation — allocate a 10 MB list and verify the peak rises by roughly that much. - Run the pytest gate with
-qand deliberately bump the workload above the ceiling once; the assertion message must print the real peak in MB. - Cross-check the suspected leaking line by printing
stat.count(block count) alongsidestat.size— a line whose count grows unboundedly across iterations is a leak, not a one-time buffer.
Troubleshooting
| Symptom | Root cause | Fix |
|---|---|---|
RuntimeError: the tracemalloc module must be tracing memory | Called take_snapshot() before start() | Call tracemalloc.start() first, or set PYTHONTRACEMALLOC |
Top stats point only at <frozen importlib._bootstrap> | Import machinery dominates unfiltered snapshots | Apply filter_traces with Filter(False, ...) for importlib |
traceback only shows one frame | nframe too low (default 1) | Restart with tracemalloc.start(25) |
| tracemalloc number much lower than RSS | C-extension allocations are invisible to it | Cross-check with psutil/memray for non-Python memory |
| Peak includes setup you do not care about | Peak accumulates since start() | Call reset_peak() (3.9+) right before the region |
| Snapshot diff shows growth that is actually a cache warm-up | First snapshot taken too early | Take the baseline after warm-up, then loop the operation |
Frequently Asked Questions
What does the nframe argument to tracemalloc.start() control?nframe sets how many stack frames tracemalloc records for each allocation. With nframe=1 (the default) you only get the line that allocated; a higher value lets you group by full traceback to see the call path that led to the allocation, at the cost of more memory and overhead.
Why does tracemalloc report less memory than the operating system?tracemalloc only tracks allocations made through Python's memory allocators after start() was called. Memory allocated by C extensions outside pymalloc, allocations made before tracing started, and interpreter overhead are invisible to it, so RSS is always larger.
How do I assert a memory ceiling in a pytest test?
Call tracemalloc.start() in a fixture, run the code under test, then read tracemalloc.get_traced_memory() which returns current and peak bytes. Assert the peak against a threshold and stop tracing in teardown so the next test starts clean.
What is the difference between grouping statistics by lineno and by traceback?group_by='lineno' aggregates all allocations on the same source line into one entry, regardless of how that line was reached. group_by='traceback' keeps each distinct call path separate, which is essential when the same helper allocates on behalf of many callers.
Related guides
- For the full leak-hunting recipe, follow finding memory leaks with tracemalloc snapshots.
- To rank which lines grew between two points in time, use comparing tracemalloc snapshots to locate growth.
- When the leak is CPU-bound work rather than retained bytes, switch to interactive debugging with pdb and ipdb to step through the allocating path.
- Memory that grows only under concurrency often points at retained tasks; see debugging async code and event loops.