Debugging & Performance

Memory Profiling with tracemalloc

A long-running worker creeps from 200 MB to 2 GB over a day and gets OOM-killed; a pytest session that should be flat grows with every test until CI runners die. top tells you that memory is growing but not which line is responsible. tracemalloc, in the standard library since Python 3.4, records the Python call stack for every allocation, so you can attribute live bytes to exact source lines and call paths, diff two points in time, and assert ceilings in tests. This guide covers driving tracemalloc end to end: configuring frame depth, taking and grouping snapshots, filtering noise, and wiring memory assertions into pytest.

Prerequisites

  • Python 3.4+ for tracemalloc itself; 3.6+ for Snapshot.compare_to ordering by size_diff to behave as documented here.
  • A way to start tracing before the allocations you care about — either tracemalloc.start() early in the process, or the PYTHONTRACEMALLOC=N environment variable / -X tracemalloc=N flag to set frame depth at launch.
  • pytest 6+ if you intend to gate memory in CI with the fixture pattern below.
  • Awareness that tracing adds CPU and memory overhead proportional to nframe; never leave it on in production hot paths.

Core concept

tracemalloc hooks Python's memory allocators. Once tracemalloc.start(nframe) runs, every subsequent allocation is recorded with up to nframe stack frames. A snapshot (take_snapshot()) is an immutable copy of all currently tracked allocations at that instant. You then ask the snapshot to aggregate its traces into statistics, grouped either by 'lineno' (one entry per source line) or 'traceback' (one entry per distinct call path). filter_traces removes entries you do not care about — the standard library, importlib, tracemalloc's own frames — before you read the numbers.

The leak-hunting workflow is a pipeline: capture a baseline snapshot, exercise the code, capture a second snapshot, and compare_to the baseline to surface the lines whose retained bytes grew. The single number get_traced_memory() (current, peak) is the cheap gate for tests. The diagram traces that pipeline.

tracemalloc snapshot and compare pipeline Start tracing, take a baseline snapshot, run the workload, take a second snapshot, then compare and read the top growing lines. The tracemalloc pipeline start(nframe) hook allocators snapshot A baseline run workload allocations happen snapshot B after B.compare_to(A, 'lineno') sorted by size_diff, biggest growth first top stats - the leaking lines +12.4 MiB cache.py:88 (90342 blocks)
Two snapshots bracket the workload; compare_to diffs them and sorts by size_diff so the lines retaining the most new memory rise to the top.

Step-by-step implementation

1. Start tracing with the right frame depth

tracemalloc.start(nframe) begins recording. nframe is the number of stack frames stored per allocation. The default of 1 tells you the allocating line but not how it was reached; raise it when allocations funnel through a shared helper and you need the caller.

Python
import tracemalloc

# Record up to 25 frames so we can group by full call path later.
tracemalloc.start(25)

To enable tracing from the very first allocation — before your own code runs — set the environment instead of calling start():

Bash
PYTHONTRACEMALLOC=25 python worker.py        # or: python -X tracemalloc=25 worker.py

2. Take a snapshot

take_snapshot() freezes the current set of tracked allocations into an immutable Snapshot. It is cheap to hold and safe to pickle, so you can capture one, run work, capture another, and diff offline.

Python
import tracemalloc

tracemalloc.start(25)
data = [bytes(1024) for _ in range(10_000)]   # ~10 MB of work
snapshot = tracemalloc.take_snapshot()

3. Group statistics by lineno or traceback

snapshot.statistics(key_type) returns a list of Statistic objects sorted largest-first. Use 'lineno' to collapse everything allocated on the same line into one entry, or 'traceback' to keep distinct call paths separate.

Python
for stat in snapshot.statistics("lineno")[:5]:
    # stat.size is bytes retained; stat.count is the number of blocks
    print(f"{stat.size / 1024:8.1f} KiB  {stat.count:>7} blocks  {stat.traceback[0]}")
Plain text
 10240.0 KiB    10000 blocks  worker.py:4

Switching to 'traceback' and printing stat.traceback.format() shows the full path that reached the allocating line — essential when a generic list.append is the named site but the cause is one specific caller.

Python
top = snapshot.statistics("traceback")[0]
print("\n".join(top.traceback.format()))   # full call stack for the biggest allocation

4. Filter out noise

Raw snapshots are dominated by the import machinery and tracemalloc's own bookkeeping. snapshot.filter_traces([...]) returns a new snapshot keeping only matching frames. Negative filters (inclusive=False) drop frames; positive filters keep only your module.

Python
import tracemalloc

filtered = snapshot.filter_traces((
    tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
    tracemalloc.Filter(False, tracemalloc.__file__),   # drop tracemalloc's own frames
    tracemalloc.Filter(False, "<unknown>"),
))
for stat in filtered.statistics("lineno")[:5]:
    print(stat)

5. Read the single-number gate with get_traced_memory

For a fast pass/fail, skip snapshots and read tracemalloc.get_traced_memory(), which returns (current, peak) bytes since start(). reset_peak() (Python 3.9+) zeroes the peak so you can measure a specific region.

Python
import tracemalloc

tracemalloc.start()
buf = [bytes(2048) for _ in range(5_000)]
current, peak = tracemalloc.get_traced_memory()
print(f"current={current/1e6:.1f} MB  peak={peak/1e6:.1f} MB")
tracemalloc.stop()

6. Assert a memory ceiling in pytest

Wrap tracing in a fixture so each test starts clean and the assertion reads peak. This is the CI counterpart to the session-fixture leaks called out in advanced pytest architecture and configuration — a leaking session-scoped fixture is exactly what blows the ceiling.

Python
# conftest.py
import tracemalloc
import pytest

@pytest.fixture
def memory_ceiling():
    tracemalloc.start()
    tracemalloc.reset_peak()          # Python 3.9+: ignore allocations before the test body
    yield
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    # Surface the peak so a failing assert prints an actionable number.
    pytest.peak_bytes = peak

# test_memory.py
import tracemalloc

def build_report(rows):
    return [{"id": r, "blob": bytes(1024)} for r in range(rows)]

def test_report_stays_under_5mb(memory_ceiling):
    build_report(2_000)
    _, peak = tracemalloc.get_traced_memory()
    assert peak < 5 * 1024 * 1024, f"peak {peak/1e6:.1f} MB exceeded 5 MB ceiling"

When the ceiling fails because allocations leak across tests rather than within one, fixture scope is usually the culprit; pin it down with the techniques in mastering pytest fixtures.

7. Locate growth by comparing snapshots

The core leak technique is diffing two snapshots. take_snapshot() before and after a repeated operation, then compare_to to rank lines by size_diff. The focused walkthroughs live in finding memory leaks with tracemalloc snapshots and comparing tracemalloc snapshots to locate growth.

Python
import tracemalloc

tracemalloc.start(25)
before = tracemalloc.take_snapshot()
cache = {}
for i in range(50_000):
    cache[i] = bytes(64)          # an unbounded cache: the leak
after = tracemalloc.take_snapshot()

for stat in after.compare_to(before, "lineno")[:3]:
    # size_diff is the byte growth between the two snapshots
    print(f"+{stat.size_diff/1024:8.1f} KiB  {stat.traceback[0]}")

Verification

  • Confirm tracing is live before measuring: tracemalloc.is_tracing() must return True.
  • Sanity-check get_traced_memory() peak against a known allocation — allocate a 10 MB list and verify the peak rises by roughly that much.
  • Run the pytest gate with -q and deliberately bump the workload above the ceiling once; the assertion message must print the real peak in MB.
  • Cross-check the suspected leaking line by printing stat.count (block count) alongside stat.size — a line whose count grows unboundedly across iterations is a leak, not a one-time buffer.

Troubleshooting

SymptomRoot causeFix
RuntimeError: the tracemalloc module must be tracing memoryCalled take_snapshot() before start()Call tracemalloc.start() first, or set PYTHONTRACEMALLOC
Top stats point only at <frozen importlib._bootstrap>Import machinery dominates unfiltered snapshotsApply filter_traces with Filter(False, ...) for importlib
traceback only shows one framenframe too low (default 1)Restart with tracemalloc.start(25)
tracemalloc number much lower than RSSC-extension allocations are invisible to itCross-check with psutil/memray for non-Python memory
Peak includes setup you do not care aboutPeak accumulates since start()Call reset_peak() (3.9+) right before the region
Snapshot diff shows growth that is actually a cache warm-upFirst snapshot taken too earlyTake the baseline after warm-up, then loop the operation

Frequently Asked Questions

What does the nframe argument to tracemalloc.start() control?nframe sets how many stack frames tracemalloc records for each allocation. With nframe=1 (the default) you only get the line that allocated; a higher value lets you group by full traceback to see the call path that led to the allocation, at the cost of more memory and overhead.

Why does tracemalloc report less memory than the operating system?tracemalloc only tracks allocations made through Python's memory allocators after start() was called. Memory allocated by C extensions outside pymalloc, allocations made before tracing started, and interpreter overhead are invisible to it, so RSS is always larger.

How do I assert a memory ceiling in a pytest test? Call tracemalloc.start() in a fixture, run the code under test, then read tracemalloc.get_traced_memory() which returns current and peak bytes. Assert the peak against a threshold and stop tracing in teardown so the next test starts clean.

What is the difference between grouping statistics by lineno and by traceback?group_by='lineno' aggregates all allocations on the same source line into one entry, regardless of how that line was reached. group_by='traceback' keeps each distinct call path separate, which is essential when the same helper allocates on behalf of many callers.

← Back to Systematic Debugging & Performance Profiling