Hypothesis & Fuzzing

Fixing Hypothesis FlakyHealthCheck Failures

Your Hypothesis test fails not with a falsifying example but with hypothesis.errors.FailedHealthChecktoo_slow, filter_too_much, data_too_large, or the function_scoped_fixture warning. These are not bugs in your code under test; they are guardrails telling you the test harness is degenerate: generation is too slow, too many examples are being thrown away, generated data is enormous, or a fixture is silently shared across examples. This guide explains each check and the correct fix, versus the blunt suppress-everything anti-pattern.

Prerequisites

  • hypothesis >= 6.0 (the HealthCheck enum and suppress_health_check settings argument are stable across 6.x; function_scoped_fixture was added in 5.x)
  • pytest >= 7.0, Python 3.9+
  • Familiarity with @given, @settings, assume, and strategy filtering from Hypothesis Framework Fundamentals.

Solution

Each health check has a distinct trigger and a distinct correct fix. Import the enum from hypothesis:

Python
from hypothesis import given, settings, HealthCheck, assume
import hypothesis.strategies as st

too_slow — Hypothesis aborts because it spent too long generating examples relative to running them. The fix is rarely suppression; usually the strategy is doing expensive work per example or the deadline is misread.

Python
import time

# WRONG: hides a real slowness signal
@settings(suppress_health_check=[HealthCheck.too_slow])
@given(st.integers())
def test_blunt(n):
    time.sleep(0.05)  # genuinely slow body — suppression masks it
    assert n == n

# RIGHT: if the body is legitimately slow (real I/O), raise the deadline
# AND suppress only too_slow, having confirmed the cost is intrinsic.
@settings(deadline=None, suppress_health_check=[HealthCheck.too_slow])
@given(st.integers())
def test_legit_slow_io(n):
    time.sleep(0.05)   # e.g. an unavoidable network round-trip
    assert n == n

Note the distinction: deadline=None disables the per-example timing assertion, while suppress_health_check=[HealthCheck.too_slow] disables the aggregate generation-time guard. They are separate; slow tests often need both, but only after confirming the cost is real.

filter_too_much — Hypothesis discarded too many generated examples because a .filter() predicate or assume() rejects most candidates. Suppressing it does not help; the generator is starving. Constrain the strategy at its source instead.

Python
# WRONG: filter rejects ~99% of integers, starving the generator
@given(st.integers().filter(lambda x: 1000 <= x <= 1005))
def test_starved(x):
    assert 1000 <= x <= 1005

# RIGHT: generate inside the constraint so nothing is discarded
@given(st.integers(min_value=1000, max_value=1005))
def test_bounded(x):
    assert 1000 <= x <= 1005

data_too_large — generated examples exceed Hypothesis's internal byte budget, typically from unbounded collections or recursion. Bound the sizes.

Python
# WRONG: unbounded nested data routinely exceeds the buffer
@given(st.lists(st.lists(st.integers())))
def test_huge(data):
    assert isinstance(data, list)

# RIGHT: cap element and collection sizes
@given(st.lists(st.lists(st.integers(), max_size=20), max_size=20))
def test_bounded_size(data):
    assert isinstance(data, list)

function_scoped_fixture — a function-scoped pytest fixture is requested by a @given test. The fixture is set up once for the whole function, but the body runs once per example, so the fixture is not reset between examples. This is the most common cause of "Hypothesis test passes alone, fails in a suite" flakiness.

Python
import pytest

# WRONG: function-scoped, mutable, shared across all examples
@pytest.fixture
def buffer():
    return []  # one list reused for every generated example

@given(st.integers())
def test_leaky(buffer, n):     # raises HealthCheck.function_scoped_fixture
    buffer.append(n)
    assert len(buffer) == 1    # FAILS after the first example

# RIGHT (option A): build the per-example resource inside the test body
@given(st.integers())
def test_local_state(n):
    buffer = []                # fresh each example
    buffer.append(n)
    assert len(buffer) == 1

# RIGHT (option B): if the fixture is genuinely set-up-once and read-only
# (e.g. a connection pool), keep it and suppress only this check.
@pytest.fixture
def readonly_engine():
    return {"url": "sqlite://"}  # never mutated by the test

@settings(suppress_health_check=[HealthCheck.function_scoped_fixture])
@given(st.integers())
def test_readonly(readonly_engine, n):
    assert readonly_engine["url"].startswith("sqlite")

Why this works

Health checks measure properties of the test harness, not the system under test, so the durable fix changes the harness — bound the strategy, generate within constraints, or stop sharing mutable state — rather than silencing the signal. suppress_health_check takes a list of specific HealthCheck members precisely so you can disable one diagnostic you have understood while leaving the rest active. Suppressing function_scoped_fixture is only safe when the fixture is set up once and never mutated, because Hypothesis cannot otherwise guarantee example independence.

Edge cases and failure modes

  • deadline=None does not silence too_slow. They are independent settings; a slow generation phase still trips too_slow even with the deadline disabled.
  • Suppressing filter_too_much hides a starving generator. The test will still run, but on a biased, tiny slice of the input space, weakening coverage silently.
  • Session/module-scoped fixtures do not raise function_scoped_fixture — only function scope does. Promoting a read-only fixture's scope is a legitimate fix.
  • Stateful machines have their own timing. too_slow on a RuleBasedStateMachine usually means real I/O per rule; lower stateful_step_count rather than suppressing blindly. See Stateful and Model-Based Testing.
  • CI-only failures. Slower runners trip too_slow even when local runs pass; profile and cache before suppressing, per reducing Hypothesis test execution time.
  • Async tests. Function-scoped async fixtures combine both this check and event-loop scoping concerns; the trade-offs are covered in pytest-asyncio vs anyio: Scoping Trade-offs.

Frequently Asked Questions

Should I just set deadline=None to silence Hypothesis health checks? No. deadline=None only disables the per-example timing deadline; it does not suppress the too_slow health check, which measures aggregate generation time. Use suppress_health_check for the specific check and reserve deadline=None for operations that are legitimately slow.

Why does Hypothesis warn about function-scoped fixtures with @given? A function-scoped pytest fixture is set up once for the test function, but @given runs the body many times, so the fixture is not reset between examples. Hypothesis raises HealthCheck.function_scoped_fixture because shared mutable state across examples causes order-dependent flakiness.

How do I suppress a single Hypothesis health check without disabling the rest? Pass a list of specific HealthCheck members to suppress_health_check, e.g. @settings(suppress_health_check=[HealthCheck.too_slow]). Every other health check stays active.

← Back to Hypothesis Framework Fundamentals