Fixing Hypothesis FlakyHealthCheck Failures

Your Hypothesis test fails not with a falsifying example but with hypothesis.errors.FailedHealthCheck — too_slow, filter_too_much, data_too_large, or the function_scoped_fixture warning. These are not bugs in your code under test; they are guardrails telling you the test harness is degenerate: generation is too slow, too many examples are being thrown away, generated data is enormous, or a fixture is silently shared across examples. This guide explains each check and the correct fix, versus the blunt suppress-everything anti-pattern.

Prerequisites

hypothesis >= 6.0 (the HealthCheck enum and suppress_health_check settings argument are stable across 6.x; function_scoped_fixture was added in 5.x)
pytest >= 7.0, Python 3.9+
Familiarity with @given, @settings, assume, and strategy filtering from Hypothesis Framework Fundamentals.

Solution

Work the failure in order rather than reaching for suppress_health_check first:

Read the traceback and name the HealthCheck member that fired — each maps to a distinct cause below.
Reproduce with pytest --hypothesis-show-statistics to see generation time, discarded-example counts, and buffer usage.
Apply the source-level fix for that check.
Suppress only the single member you have understood, and only once the cost is confirmed intrinsic.

Name the member that fired, apply its source-level harness fix first, and reach for a single-member suppress_health_check only once the cost is confirmed intrinsic.

Each health check has a distinct trigger and a distinct correct fix. Import the enum from hypothesis:

from hypothesis import given, settings, HealthCheck, assume
import hypothesis.strategies as st

too_slow — Hypothesis aborts because it spent too long generating examples relative to running them. The fix is rarely suppression; usually the strategy is doing expensive work per example or the deadline is misread.

import time

# WRONG: hides a real slowness signal
@settings(suppress_health_check=[HealthCheck.too_slow])
@given(st.integers())
def test_blunt(n):
    time.sleep(0.05)  # genuinely slow body — suppression masks it
    assert n == n

# RIGHT: if the body is legitimately slow (real I/O), raise the deadline
# AND suppress only too_slow, having confirmed the cost is intrinsic.
@settings(deadline=None, suppress_health_check=[HealthCheck.too_slow])
@given(st.integers())
def test_legit_slow_io(n):
    time.sleep(0.05)   # e.g. an unavoidable network round-trip
    assert n == n

Note the distinction: deadline=None disables the per-example timing assertion, while suppress_health_check=[HealthCheck.too_slow] disables the aggregate generation-time guard. They are separate; slow tests often need both, but only after confirming the cost is real.

filter_too_much — Hypothesis discarded too many generated examples because a .filter() predicate or assume() rejects most candidates. Suppressing it does not help; the generator is starving. Constrain the strategy at its source instead — and when the constraint cannot be expressed as a bound, compose the strategy with flatmap or @composite so valid values are built directly rather than filtered out.

# WRONG: filter rejects ~99% of integers, starving the generator
@given(st.integers().filter(lambda x: 1000 <= x <= 1005))
def test_starved(x):
    assert 1000 <= x <= 1005

# RIGHT: generate inside the constraint so nothing is discarded
@given(st.integers(min_value=1000, max_value=1005))
def test_bounded(x):
    assert 1000 <= x <= 1005

data_too_large — generated examples exceed Hypothesis's internal byte budget, typically from unbounded collections or recursion. Bound the sizes; for structured payloads, build a custom strategy that caps depth and width at the source.

# WRONG: unbounded nested data routinely exceeds the buffer
@given(st.lists(st.lists(st.integers())))
def test_huge(data):
    assert isinstance(data, list)

# RIGHT: cap element and collection sizes
@given(st.lists(st.lists(st.integers(), max_size=20), max_size=20))
def test_bounded_size(data):
    assert isinstance(data, list)

function_scoped_fixture — a function-scoped pytest fixture is requested by a @given test. The fixture is set up once for the whole function, but the body runs once per example, so the fixture is not reset between examples. This is the most common cause of "Hypothesis test passes alone, fails in a suite" flakiness.

import pytest

# WRONG: function-scoped, mutable, shared across all examples
@pytest.fixture
def buffer():
    return []  # one list reused for every generated example

@given(st.integers())
def test_leaky(buffer, n):     # raises HealthCheck.function_scoped_fixture
    buffer.append(n)
    assert len(buffer) == 1    # FAILS after the first example

# RIGHT (option A): build the per-example resource inside the test body
@given(st.integers())
def test_local_state(n):
    buffer = []                # fresh each example
    buffer.append(n)
    assert len(buffer) == 1

# RIGHT (option B): if the fixture is genuinely set-up-once and read-only
# (e.g. a connection pool), keep it and suppress only this check.
@pytest.fixture
def readonly_engine():
    return {"url": "sqlite://"}  # never mutated by the test

@settings(suppress_health_check=[HealthCheck.function_scoped_fixture])
@given(st.integers())
def test_readonly(readonly_engine, n):
    assert readonly_engine["url"].startswith("sqlite")

Why this works

Health checks measure properties of the test harness, not the system under test, so the durable fix changes the harness — bound the strategy, generate within constraints, or stop sharing mutable state — rather than silencing the signal. suppress_health_check takes a list of specific HealthCheck members precisely so you can disable one diagnostic you have understood while leaving the rest active. Suppressing function_scoped_fixture is only safe when the fixture is set up once and never mutated, because Hypothesis cannot otherwise guarantee example independence.

Edge cases and failure modes

deadline=None does not silence too_slow. They are independent settings; a slow generation phase still trips too_slow even with the deadline disabled.
Suppressing filter_too_much hides a starving generator. The test will still run, but on a biased, tiny slice of the input space, weakening coverage silently.
Session/module-scoped fixtures do not raise function_scoped_fixture — only function scope does. Promoting a read-only fixture's scope is a legitimate fix.
Stateful machines have their own timing. too_slow on a RuleBasedStateMachine usually means real I/O per rule; lower stateful_step_count rather than suppressing blindly. See Stateful and Model-Based Testing.
CI-only failures. Slower runners trip too_slow even when local runs pass; profile and cache before suppressing, per reducing Hypothesis test execution time.
Async tests. Function-scoped async fixtures combine both this check and event-loop scoping concerns; the trade-offs are covered in pytest-asyncio vs anyio: scoping trade-offs.

What each health check is actually measuring

Health checks are not style warnings — each one measures a specific property of the generation loop, and knowing which measurement failed points straight at the fix.

filter_too_much and too_slow are both about the ratio of useful work to wasted work. The first fires when rejection sampling throws away too many draws; the second when generation itself dominates the time budget. Both are usually fixed in the strategy rather than in the settings: construct valid values instead of filtering invalid ones, and move expensive setup out of the strategy and into a fixture that runs once.

data_too_large fires when the average example exceeds the internal size budget, which almost always means an unbounded collection strategy — st.lists(st.text()) with no max_size will happily generate megabytes. Bounding both the outer and inner sizes fixes it and makes failures readable at the same time.

function_scoped_fixture is different in kind: it warns that a function-scoped fixture is being reused across every generated example rather than re-created per example, so state leaks between examples of the same test. The fix is either to make the fixture session-scoped and stateless, or to move the setup inside the test body where each example gets its own.

import pytest
from hypothesis import given, settings, HealthCheck, strategies as st

@pytest.fixture
def client():                       # function-scoped: created ONCE for all examples
    return Client()

@given(st.integers())
@settings(suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_wrong(client, n):          # suppressing hides real cross-example leakage
    client.record(n)

@given(st.integers())
def test_right(n):
    client = Client()               # per-example construction: no shared state
    client.record(n)

Suppressing a health check is legitimate exactly once: when you have measured the cost, understood why it is intrinsic, and cannot remove it. Suppress the single member rather than passing a list of everything, and leave a comment naming the measurement you accepted — suppress_health_check=[HealthCheck.too_slow] on a strategy that builds a real cryptographic key is reasonable, the same line on a strategy that filters 95% of its draws is a bug being silenced.

Each check names a measurement, and each measurement has a source-level fix; suppression is the last resort, one member at a time.

Frequently Asked Questions

Should I just set deadline=None to silence Hypothesis health checks? No. deadline=None only disables the per-example timing deadline; it does not suppress the too_slow health check, which measures aggregate generation time. Use suppress_health_check for the specific check and reserve deadline=None for operations that are legitimately slow.

Why does Hypothesis warn about function-scoped fixtures with @given? A function-scoped pytest fixture is set up once for the test function, but @given runs the body many times, so the fixture is not reset between examples. Hypothesis raises HealthCheck.function_scoped_fixture because shared mutable state across examples causes order-dependent flakiness.

How do I suppress a single Hypothesis health check without disabling the rest? Pass a list of specific HealthCheck members to suppress_health_check, e.g. @settings(suppress_health_check=[HealthCheck.too_slow]). Every other health check stays active.

Work the stages in order; a suppression added before the third stage is a silenced measurement rather than a fix.

Why does the same test pass locally and fail the health check in CI? Because both too_slow and deadline are wall-clock measurements, and CI runners are slower and noisier than a developer laptop. Confirm by re-running locally with the CI profile and a fixed seed before changing anything; if the strategy is genuinely intrinsic in cost, raise the deadline for that one test rather than suppressing the check globally, so the measurement still applies everywhere else.

Reducing Hypothesis test execution time — profile and cache generation before you reach for too_slow suppression.
Composing strategies with flatmap and @composite — the durable cure for filter_too_much.
Generating custom strategies with hypothesis.strategies — cap size and depth at the source to avoid data_too_large.
Stateful and model-based testing — where too_slow on rule-based machines actually comes from.
pytest-asyncio vs anyio: scoping trade-offs — untangles the async side of function_scoped_fixture.

← Back to Hypothesis Framework Fundamentals