A test that passes locally but fails roughly one run in twenty under CI — then passes on the automatic retry — is the canonical flaky test, and reaching for pytest-rerunfailures to silence it usually buries the real defect. Used deliberately, the plugin is a diagnostic amplifier: it forces a non-deterministic execution path into a repeatable state where you can capture sys.modules, GC object counts, and thread counts at the exact moment of failure. This guide shows the rerun mechanics, a state-capture hook, and the retry policy that surfaces root causes instead of hiding them.
Prerequisites
pytest-rerunfailures >= 14.0,pytest >= 8.0, Python3.9+.- Optional companions:
pytest-randomly(order shuffling) andpytest-xdist >= 3.0(parallel runs — note reruns are worker-local). - Markers registered so
@pytest.mark.flakydoes not warn:
# pyproject.toml
[tool.pytest.ini_options]
addopts = "--strict-markers"
markers = ["flaky: test with known transient instability under controlled rerun"]
The collection ordering that makes order-dependent flakiness reproducible is covered in Optimizing Test Discovery; timing-based flakiness in async suites overlaps with Debugging Async Code and Event Loops.
Solution
pytest-rerunfailures registers a pytest_runtest_makereport implementation. On a call-phase failure it re-invokes pytest_runtest_protocol for the same item: setup runs again (so function-scoped fixtures are recreated) but teardown is deferred until the final attempt. Anchor your diagnostics to makereport, not teardown, and dump state the instant the failure is observed:
# conftest.py
import pytest, sys, gc, threading, json
@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_makereport(item, call):
outcome = yield # let the report be built first
report = outcome.get_result()
# Only snapshot on the failing call phase, before the plugin retries.
if report.when == "call" and report.failed:
snapshot = {
"nodeid": item.nodeid,
"sys_modules": len(sys.modules), # detects import-time leakage
"gc_objects": len(gc.get_objects()), # rising count => retained objects
"thread_count": len(threading.enumerate()), # leaked workers / pools
"rerun": getattr(report, "rerun", 0), # which attempt this was
}
fname = f"flaky_state_{item.nodeid.replace('::', '_')}_{snapshot['rerun']}.json"
with open(fname, "w") as fh:
json.dump(snapshot, fh, indent=2)
Comparing gc_objects and thread_count across the per-attempt artifacts separates a memory leak (monotonic growth) from transient concurrency contention (flat counts, intermittent failure). Reproduce verbosely first:
# Stream real-time traces CI normally suppresses, and shuffle order.
pytest --reruns=3 --capture=no --log-cli-level=DEBUG -p randomly
Apply a delay only when timing is the suspect, and keep teardown explicit so state cannot leak across attempts:
import pytest
from sqlalchemy import create_engine, text
@pytest.fixture
def db_session():
conn = create_engine("sqlite:///:memory:").connect()
yield conn
conn.execute(text("DROP TABLE IF EXISTS test_data")) # reset before final teardown
conn.close()
assert conn.closed, "Connection leak detected during rerun teardown"
@pytest.mark.flaky(reruns=3, reruns_delay=2) # delay lets a transient partition heal
def test_writes_then_reads(db_session):
...
Why this works
The plugin replays only the setup and call phases per attempt and runs teardown exactly once after the last attempt, so a function-scoped fixture is genuinely fresh each retry while wider-scoped fixtures are not. Capturing state inside the makereport hookwrapper records the process exactly when the failure is observed — before any retry mutates it — turning a Heisenbug into a diff between artifacts. A passing-with-delay, failing-without-delay result is strong evidence of timing-dependent resource contention rather than an assertion error, which tells you where to look instead of just hiding the symptom.
Edge cases and failure modes
- Wider-scoped fixtures compound corruption. Session/module/class fixtures never reset between reruns, so a parametrized test sharing one of them inherits mutated state from earlier failures. Convert to function scope or clear state in the yield teardown.
- Hypothesis shrinking conflicts. Reruns break Hypothesis's deterministic shrinking, causing infinite reduction or false passes. Check the
PYTEST_RERUN_COUNTenvironment variable and skip property tests during retries, or isolate Hypothesis suites from retry logic entirely — see Hypothesis Framework Fundamentals. - Worker-local reruns under pytest-xdist. A failure on worker A retries only on A; shared DBs, ports, or locks cause cross-worker cascades. Route flaky subsets to sequential execution.
- Coverage / JUnit XML overwrite. Each attempt can overwrite reports. Use
--cov-appendpluscoverage combine, and collect per-attempt JUnit artifacts to preserve failure history. --rerunsas a permanent fix. It masks genuine regressions and inflates pipeline time. Cap at 2-3, pair with failure-signature hashing, and quarantine tests that keep recurring.
Frequently Asked Questions
Does pytest-rerunfailures reset fixture state between retry attempts? Only function-scoped fixtures are recreated per rerun attempt. Module, class, and session-scoped fixtures persist across retries and carry mutated state forward. To force a full reset, convert fixtures to function scope or clear mutable state in a yield-based teardown.
How do I stop pytest-rerunfailures from masking genuine failures?
Keep --reruns low (2-3), add --reruns-delay to space retries, and log failure signatures in a pytest_runtest_makereport hook. Alert on identical failure hashes recurring across pipeline runs so persistent logical bugs are escalated rather than silently retried.
Can I combine pytest-rerunfailures with pytest-xdist? Yes, but reruns are worker-local: a test that fails on worker A retries only on worker A. Shared databases, file locks, or ports can cause cascading failures across workers, so route flaky subsets to sequential execution or isolate per-worker resources.
← Back to Optimizing Test Discovery