Finding Memory Leaks with tracemalloc Snapshots

A service's resident memory climbs steadily and never plateaus; restarts are the only mitigation. The leak is not a crash, so there is no traceback to follow — just a number going up. tracemalloc snapshots turn that into an exact line: bracket the suspect operation with two snapshots, diff the two snapshots to locate the growth, and the line whose retained bytes grew with iteration count is your leak.

Prerequisites

Python 3.4+ for tracemalloc; 3.6+ for the compare_to ordering used here.
The snapshot and statistics basics from memory profiling with tracemalloc.

Solution

The technique relies on the fact that a real leak grows linearly with iteration count while warm-up allocations (caches, interned strings, lazy imports) are one-time. Warm up first, snapshot a baseline, loop many times, snapshot again, then compare_to.

import tracemalloc

# A classic leak: an unbounded module-level cache that nothing ever evicts.
_CACHE = {}

def handle_request(request_id):
    # Each call retains a 1 KiB payload keyed by id; keys are never removed.
    _CACHE[request_id] = bytes(1024)
    return _CACHE[request_id]


tracemalloc.start(25)                 # 25 frames so we can see the call path

handle_request(-1)                    # warm-up pass: absorb one-time allocations
baseline = tracemalloc.take_snapshot()

for i in range(10_000):               # loop the suspect operation many times
    handle_request(i)

after = tracemalloc.take_snapshot()

# Diff the two snapshots; size_diff is byte growth between them.
top = after.compare_to(baseline, "lineno")
for stat in top[:3]:
    print(f"+{stat.size_diff/1024:8.1f} KiB  count {stat.count_diff:>6}  {stat.traceback[0]}")

+10240.0 KiB  count  10000  leak.py:9
+    1.2 KiB  count     31  leak.py:18

The first entry — line 9, the _CACHE[request_id] = bytes(1024) assignment — grew by ~10 MiB across 10,000 iterations with a matching count_diff of 10,000 blocks. That one-to-one growth between bytes and block count is the signature of a leak. To see who drove the allocation, switch the grouping to traceback and format the path:

top_tb = after.compare_to(baseline, "traceback")
print("\n".join(top_tb[0].traceback.format()))   # full call stack to the leaking line

If the same leaking line is reached from many callers, the 'traceback' grouping separates them so you can tell which call site is unbounded — exactly the case where nframe=1 would hide the answer.

A snapshot is a table of allocation sites, and the diff is what turns two tables into a diagnosis.

Warm-up before the baseline is what separates one-off initialisation from the growth you are hunting.

Why this works

A snapshot records the currently live tracked allocations. Anything freed between the two snapshots does not appear in the diff, so transient buffers cancel out and only retained growth survives. Because a leak retains a new block every iteration, its count_diff scales with the loop count while bounded structures stay flat. Grouping by lineno collapses all blocks from the offending line into a single ranked entry, and sorting by size_diff puts the worst offender first.

Between the baseline and after snapshots the leaking line jumps from 1 block to 10,000 blocks (+10,240 KiB), its byte and block growth scaling one-to-one with the loop count, while the bounded warm-up cache stays flat — that proportional growth is the leak signature compare_to surfaces.

The distinction that makes the diff trustworthy is linearity: a leak's byte growth and its block count grow together and proportionally with iteration count, whereas a bounded cache saturates. That is why looping the operation matters — a single call cannot separate a slow leak from a one-time allocation, but a few thousand calls make the leaking line tower over everything bounded.

Edge cases and failure modes

Warm-up not excluded: skipping the baseline-after-warm-up step floods the diff with import and cache allocations that look like leaks but plateau — always warm up first.
GC-deferred frees: objects in reference cycles are not freed until gc runs; call gc.collect() before the second snapshot to avoid mistaking deferred frees for a leak.
Too few iterations: a small loop lets a one-time 5 MiB cache outrank a slow leak; loop enough that linear growth dominates.
C-extension memory: raw malloc in a native library is invisible; if tracemalloc shows nothing but RSS climbs, reach for memray or valgrind.
Per-test leaks vs per-process: a leak that only appears across a pytest session usually means a session-scoped fixture retains state — confirm with the scoping guidance in mastering pytest fixtures.

From the allocation site to the reference that holds it

tracemalloc tells you where the memory was allocated. It does not tell you what is still holding it, and those are different questions: the allocation site is usually innocent library code, and the retaining reference is in yours.

Once the diff names a line, switch tools. gc.get_referrers walks backwards from an object to whatever refers to it, which is enough to identify the container that is growing:

import gc
import tracemalloc

def who_holds(sample_type: type, limit: int = 5) -> None:
    """Print the containers currently referring to instances of a leaking type."""
    gc.collect()
    instances = [o for o in gc.get_objects() if type(o) is sample_type]
    print(f"{len(instances)} live {sample_type.__name__} instance(s)")
    for obj in instances[:limit]:
        for ref in gc.get_referrers(obj):
            # Skip the frame from this function itself, which refers to everything.
            if isinstance(ref, dict):
                print("  held by a dict with keys:", list(ref)[:6])
            elif isinstance(ref, (list, set)):
                print(f"  held by a {type(ref).__name__} of length {len(ref)}")

Four containers account for most real leaks, and each has a recognisable shape in that output.

Module-level collections. A list or dict defined at module scope that something appends to per request. The referrer is a dict whose keys look like module globals, and the fix is a bounded structure or an explicit eviction.

Registries and observer lists. A register() with no matching unregister(). The referrer is a list on a long-lived singleton, growing by exactly one per operation.

Caches without bounds. A hand-rolled dict cache keyed on something high-cardinality — a request id, a timestamp, a full URL — which by construction never gets a cache hit and never evicts. functools.lru_cache with a maxsize is the fix, or a TTLCache when age rather than count is the right bound.

Closures and default arguments. A default argument evaluated once at definition time (def f(acc=[])) or a closure captured by a long-lived callback. The referrer here is a function object or a cell, which is the giveaway.

For cycles specifically, gc.set_debug(gc.DEBUG_SAVEALL) moves every unreachable-but-uncollectable object into gc.garbage instead of freeing it, so you can inspect what the collector could not clean. That is heavyweight and belongs in a scratch reproduction rather than in a service, but it is definitive: an object in gc.garbage after a collection is part of a cycle that something — usually a __del__ on an older Python, or a C extension — prevents from being reclaimed.

The end state of this workflow is a one-line change: a bound on a cache, a matching unregister, a weakref.WeakValueDictionary in place of a strong one. Reaching it takes two tools rather than one, because allocation and retention are genuinely separate facts about a program.

Each shape is identified by its referrer type, which is why the second tool in the workflow is gc rather than tracemalloc.

Frequently Asked Questions

How many times should I repeat the operation before the second snapshot? Repeat enough times that a genuine leak dwarfs one-time warm-up allocations, typically hundreds to thousands of iterations. A leak grows roughly linearly with iterations, while caches and interned objects plateau, which makes the leaking line obvious in the diff.

Why does the first run always show growth even with no leak? The first iterations allocate caches, compiled regexes, lazily imported modules, and interned strings that never free. Take the baseline snapshot after a warm-up pass so these one-time allocations are excluded from the comparison.

Can tracemalloc find leaks in C extensions? Only partially. tracemalloc sees allocations routed through Python's allocators, so objects a C extension creates via PyObject_Malloc are visible, but raw malloc outside the Python heap is not. Use memray or valgrind for native leaks. Why does RSS stay high after I fix the leak? Because CPython returns freed memory to its own allocator arenas rather than to the operating system, and an arena is only released when every block in it is free. A fixed leak therefore shows as a flat RSS rather than a falling one, with the freed space reused by later allocations. Judge the fix by tracemalloc's totals and by whether RSS stops growing, not by whether it drops.

Comparing tracemalloc Snapshots to Locate Growth — the general snapshot-diffing workflow this leak hunt is built on.
Profiling a Running Process with py-spy — when RSS climbs but tracemalloc is blind to the cause, sample the live process instead.
Post-Mortem Debugging with pdb.pm() — drop into the leaking frame once the diff has named the line.
Mastering pytest Fixtures — trace per-session leaks back to a fixture scope that retains state.

← Back to Memory Profiling with tracemalloc