Pytest & CI

Optimizing Test Discovery

Optimizing Test Discovery

Introduction: The Hidden Cost of Test Collection

In enterprise Python codebases, test execution latency is rarely the primary bottleneck. The true performance sink lies in the collection phase: the period between invoking pytest and the actual execution of the first test case. Optimizing test discovery is foundational to Advanced Pytest Architecture & Configuration, as it dictates how efficiently your CI/CD pipelines scale, how reliably your developer feedback loops operate, and how predictably your test infrastructure consumes compute resources.

Collection latency manifests when pytest traverses the filesystem, evaluates directory structures, parses Python modules, resolves fixture dependencies, and expands parametrized matrices. In monorepos exceeding 10,000 Python files, default discovery routines can consume 30 to 90 seconds before a single assertion runs. This overhead compounds exponentially when multiplied across parallel CI runners, pull request checks, and nightly regression suites. The distinction between discovery and execution latency is critical: execution scales linearly with test count and complexity, while discovery scales with repository topology, import graph density, and configuration overhead.

Baseline metrics for enterprise environments typically reveal that 15–40% of total pipeline runtime is spent in the collection phase. This is unacceptable for high-velocity teams targeting sub-5-minute feedback loops. Optimizing this phase requires shifting from reactive execution tuning to proactive architectural intervention. By intercepting pytest's hookspec lifecycle, replacing runtime imports with static analysis, and strategically pruning the collection graph, engineering teams can reduce startup overhead by 60–80%. The following sections deconstruct pytest's internal discovery pipeline, provide production-ready interception patterns, and outline profiling methodologies to systematically eliminate collection bottlenecks.

Deconstructing Pytest's Discovery Pipeline

Pytest's collection phase operates as a deterministic, hook-driven state machine. Understanding its execution order is mandatory for targeted optimization. The pipeline progresses through five distinct phases:

  1. Filesystem Traversal & Pattern Matching: Pytest recursively walks the target directory tree, applying regex filters defined by python_files, python_classes, and python_functions. This phase is I/O bound and highly sensitive to directory depth.
  2. AST Parsing & Module Importation: For each matched file, pytest attempts to import the module. This triggers __init__.py execution, top-level imports, and module-level code evaluation.
  3. Class & Function Collection (pytest_pycollect_makeitem): Once imported, pytest inspects the module's namespace, identifying callable objects matching the configured naming patterns.
  4. Fixture Dependency Graph Resolution: Discovered items are analyzed for @pytest.fixture dependencies. Pytest constructs a directed acyclic graph (DAG) to determine setup/teardown ordering and scope boundaries.
  5. Parametrization Expansion & ID Generation: Parametrized tests are expanded into discrete test nodes, generating unique identifiers and computing Cartesian products for multi-dimensional parameter sets.

The default collection strategy relies heavily on runtime imports. When pytest encounters a test file, it executes importlib.import_module(), which evaluates every top-level statement in the file and its transitive dependencies. In large repositories, this triggers massive import chains, database connection initializations, environment variable validations, and network calls—all during collection.

To mitigate this, engineers must intercept the pipeline before module execution occurs. The pytest_collect_file hook allows developers to inspect file paths and decide whether to delegate collection to the default PyFile collector. By combining this with pytest_ignore_collect, teams can bypass entire directory trees or apply static analysis to determine test presence without triggering imports. Profiling this phase requires isolating collection from execution:

Bash
# Profile collection phase overhead
python -m cProfile -o collection.prof -m pytest --collect-only -q

# Analyze import chain bloat
py-spy record --subprocesses -o profile.svg -- pytest --collect-only

Monitoring sys.modules growth during --collect-only reveals unnecessary imports. If sys.modules contains 500+ entries before any test runs, the collection pipeline is executing heavy initialization logic that should be deferred to fixture scopes.

Static Analysis & Bypassing Heavy Imports

Runtime imports during collection are the primary source of latency and memory bloat. By leveraging Python's built-in ast module, engineers can parse source files into abstract syntax trees and identify test functions or classes without executing module-level code. This technique, known as static collection interception, eliminates import-time side effects and drastically reduces memory footprint.

The ast module provides a safe, zero-execution parsing mechanism. By walking the syntax tree and inspecting ast.FunctionDef and ast.ClassDef nodes, we can determine whether a file contains tests matching pytest's naming conventions. This approach integrates seamlessly with Mastering Pytest Fixtures by preventing premature fixture graph resolution. When pytest imports a module, it immediately evaluates fixture decorators and registers them in the internal registry. Static parsing defers this registration until the test is explicitly scheduled for execution, enabling lazy evaluation and graph pruning.

Below is a production-ready implementation of an AST-based static collector:

Python
import ast
import pytest

@pytest.hookimpl(tryfirst=True)
def pytest_collect_file(path, parent):
 """Intercept collection to parse Python files statically, bypassing heavy imports."""
 if path.ext == ".py":
 try:
 with open(path, "r", encoding="utf-8") as f:
 tree = ast.parse(f.read())
 
 # Check for test functions or classes without importing the module
 has_tests = any(
 isinstance(node, (ast.FunctionDef, ast.ClassDef)) and
 node.name.startswith("test_")
 for node in ast.walk(tree)
 )
 
 if has_tests:
 # Delegate to default PyFile collector only if tests exist
 return pytest.PyFile.from_parent(parent, path=path)
 except SyntaxError:
 # Fallback to default behavior for malformed files
 pass
 return None

This hook operates at tryfirst=True priority, ensuring it executes before pytest's default file collection logic. If the file contains no test nodes, the hook returns None, instructing pytest to skip the file entirely. This prevents the module from being imported, bypassing all top-level code execution.

Architectural Trade-offs:

  • Pros: Eliminates import overhead, prevents conftest.py side effects, reduces sys.modules pollution, scales linearly with file count rather than dependency graph complexity.
  • Cons: Cannot detect dynamically generated tests (e.g., globals()["test_" + name] = func), requires AST parsing overhead (negligible compared to imports), may miss tests defined via metaclasses or decorators that modify names at runtime.

For maximum effectiveness, combine static parsing with fixture scope optimization. Session-scoped fixtures should avoid heavy initialization during collection. Instead, defer resource allocation to the setup phase using pytest.fixture(scope="session", autouse=False) with explicit dependency injection.

Custom Collection Hooks & Plugin Architecture

Pytest's extensibility model allows engineers to override collection behavior through custom hooks and plugin registration. The pytest_ignore_collect and pytest_collect_file hooks form the backbone of selective collection strategies, enabling teams to bypass vendor directories, generated code, or legacy test suites without modifying pytest.ini configuration files.

When implementing custom collection logic, hook priority management is critical. Pytest evaluates hooks in registration order, with tryfirst=True executing before defaults and trylast=True executing after. Misconfigured priorities can cause duplicate collection, infinite recursion, or silent test omissions. Proper plugin registration via entry_points ensures deterministic hook execution across environments.

The following pattern demonstrates selective collection using directory markers and pytest_ignore_collect:

Python
import os
import pytest
from pathlib import Path

@pytest.hookimpl
def pytest_ignore_collect(collection_path, config):
 """Bypass directories lacking explicit collection markers in monorepos."""
 path_str = str(collection_path)
 
 # Skip vendor and generated directories by default
 if "vendor" in path_str or "generated" in path_str:
 marker_file = Path(collection_path.parent) / ".pytest_collect"
 # Only collect if marker file exists
 return not marker_file.exists()
 
 return False

This implementation leverages a sentinel file (.pytest_collect) to opt-in specific subdirectories within excluded paths. This is particularly valuable in polyglot monorepos where Python tests coexist with Go, Rust, or JavaScript tooling. The hook returns True to ignore the path, False to proceed with default collection.

For non-standard file extensions (e.g., .test.py, .spec.py), engineers can register custom collectors by subclassing pytest.PyCollector and implementing collect(). Reference Building Custom Pytest Plugins for comprehensive guidance on entry_points configuration, pyproject.toml plugin declarations, and hookspec lifecycle management.

Plugin Architecture Best Practices:

  • Always return None from pytest_collect_file when a file does not match your criteria. Returning False or raising exceptions breaks pytest's collection chain.
  • Use config.getini("python_files") to respect user-defined patterns in custom collectors.
  • Cache AST parse results using functools.lru_cache or pytest_cache to avoid redundant parsing across test runs.
  • Validate hook implementations with pytest --traceconfig to verify registration order and priority resolution.

Parallelization Strategies: Discovery vs Execution

A pervasive misconception in pytest optimization is the assumption that parallel execution frameworks parallelize test discovery. They do not. pytest-xdist and pytest-parallel distribute test execution across worker processes, but collection remains strictly single-threaded on the master process. This architectural constraint means that optimizing discovery is orthogonal to parallelizing execution.

When workers are spawned, each process independently imports modules and resolves fixtures. If collection is inefficient, every worker inherits the same startup penalty, multiplying latency by the number of workers. The master process must complete collection before distributing test IDs to workers, creating a hard serialization bottleneck.

Worker distribution modes interact differently with discovery caching. The --dist=loadscope strategy groups tests by fixture scope to minimize setup/teardown cycles, but it requires the master to fully resolve the fixture dependency graph before scheduling. Conversely, --dist=loadfile distributes tests by file path, which can reduce graph resolution overhead but increases fixture duplication across workers.

For large test matrices, reference the pytest-xdist vs pytest-parallel performance comparison to select optimal distribution modes. In practice:

  • Use --dist=loadscope when session-scoped fixtures dominate (e.g., database connections, Docker containers).
  • Use --dist=loadfile when tests are I/O bound and fixture dependencies are minimal.
  • Use --dist=worksteal for highly heterogeneous test durations to prevent worker starvation.

Collection caching mitigates repeated discovery overhead. Pytest's .pytest_cache directory stores collection metadata, including file hashes and test node IDs. When running pytest --collect-only, the cache can be leveraged to skip unchanged files. However, cache invalidation must be carefully managed in CI environments to prevent stale collection graphs.

Network Serialization Overhead: When distributing tests across remote workers (e.g., pytest-xdist with --tx), test IDs and fixture parameters are serialized over sockets. Large parametrized matrices can generate megabytes of metadata, increasing network latency. Compressing test IDs or reducing parametrization dimensions during collection improves distribution throughput.

Mitigating Flakiness During Collection

Collection-phase flakiness manifests when environment-dependent code executes during module import. Common culprits include missing environment variables, network calls in conftest.py, database migrations triggered at import time, or platform-specific library loading. Because discovery runs before test execution, these failures appear as collection errors rather than test failures, making them notoriously difficult to debug.

Import-time side effects violate pytest's separation of concerns. Collection should be deterministic, stateless, and environment-agnostic. Heavy setup must be deferred to fixtures with explicit scopes. When conftest.py contains top-level code that initializes connections or validates configurations, it executes during every collection cycle, regardless of whether the tests in that directory are actually scheduled.

To isolate collection from execution state:

  1. Wrap initialization logic in @pytest.fixture(scope="session") with autouse=False.
  2. Use pytest.importorskip() to gracefully handle missing optional dependencies during collection.
  3. Validate environment variables using os.environ.get() with explicit defaults rather than raising KeyError at module level.
  4. Defer network calls to setup_module or fixture yield blocks.

When collection flakiness persists, post-collection retry strategies can mask transient failures. Reference Debugging flaky tests with pytest-rerunfailures for implementing robust retry mechanisms. However, note that pytest-rerunfailures operates at the execution phase. Collection errors cannot be retried without modifying the collection pipeline itself.

State Leakage Prevention:

  • Avoid global mutable state in test modules. Use pytest.mark.parametrize with explicit data rather than module-level dictionaries.
  • Clear sys.modules entries for dynamically loaded test modules using importlib.invalidate_caches() when implementing hot-reload collection.
  • Use pytest.warns() and pytest.raises() at the test level, not during collection.

Large Repository Optimization & Startup Profiling

Scaling pytest to monorepo environments requires systematic profiling and configuration tuning. The default recursive traversal algorithm performs poorly when directory depth exceeds 15 levels or when norecursedirs is misconfigured. Tactical optimization begins with --collect-only profiling to establish baseline metrics.

Bash
# Measure collection time with verbose node reporting
time pytest --collect-only -q --no-header

# Profile hook execution durations
pytest --collect-only --durations=0

The --durations=0 flag reports the time spent in each collection hook, revealing bottlenecks in pytest_collect_file or pytest_pycollect_makeitem. If collection exceeds 10 seconds, implement directory exclusion patterns in pytest.ini:

INI
[pytest]
norecursedirs = 
 .git
 .tox
 venv
 node_modules
 vendor
 *.egg-info
 __pycache__
 migrations
 generated

Leverage .pytest_cache to persist collection metadata across CI runs. Configure your CI pipeline to cache the .pytest_cache directory between stages:

YAML
# GitHub Actions example
- name: Cache pytest collection
 uses: actions/cache@v3
 with:
 path: .pytest_cache
 key: pytest-cache-${{ runner.os }}-${{ hashFiles('**/pytest.ini', '**/conftest.py') }}

Import chain optimization requires auditing sys.modules during collection. Use python -c "import sys; print(len(sys.modules))" before and after pytest --collect-only to quantify import pollution. Reduce transitive dependencies by:

  • Replacing from package import * with explicit imports.
  • Using importlib.util.find_spec() to conditionally load heavy modules.
  • Implementing lazy imports via __getattr__ in __init__.py.

For comprehensive startup reduction strategies, consult Optimizing pytest startup time in large repos. Key takeaways include leveraging PYTHONOPTIMIZE=1 to strip docstrings and assertions during collection, using site.addsitedir() to preload critical paths, and configuring PYTHONDONTWRITEBYTECODE=1 in ephemeral CI runners to reduce disk I/O.

Common Pitfalls in Test Collection

  1. Overriding pytest_collect_file without returning None for non-matching paths: Returning False or omitting a return statement causes pytest to fall back to default collection, resulting in duplicate nodes or infinite recursion. Always explicitly return None when a file should be skipped.
  2. Placing heavy initialization in conftest.py module scope: Database connections, HTTP clients, or file I/O at the top level execute during discovery. This triggers unnecessary setup/teardown cycles and inflates memory usage. Defer to session-scoped fixtures.
  3. Misconfiguring norecursedirs in pytest.ini: Incorrect glob patterns or missing trailing slashes lead to silent test omissions. Validate exclusion rules with pytest --collect-only --co and verify node counts match expectations.
  4. Assuming pytest-xdist parallelizes discovery: Collection remains single-threaded. Parallel execution only begins after the master completes discovery. Optimize collection first, then scale execution.
  5. Using pytest.mark.parametrize with expensive generators: Generators evaluate during collection, not execution. Replace with pytest.lazy_fixture or compute parameters inside the test function to defer evaluation.

Frequently Asked Questions

Does pytest-xdist parallelize test discovery? No. pytest-xdist parallelizes test execution across worker processes. Discovery remains single-threaded on the master process. To optimize discovery, use static AST parsing, --collect-only caching, or custom pytest_ignore_collect hooks.

How can I profile pytest's collection phase? Run pytest --collect-only -v combined with Python's cProfile or py-spy. Use pytest --durations=0 to identify slow collection hooks, and monitor sys.modules growth to detect unnecessary imports. For granular hook timing, implement a custom pytest_runtest_logstart wrapper that logs timestamps before and after collection.

Why are my tests failing during discovery but passing when run individually? This typically indicates import-time side effects in conftest.py or test modules. Discovery executes module-level code. Isolate heavy setup into fixtures with explicit scopes (session, module) to defer execution. Ensure environment variables and external dependencies are validated lazily rather than at import time.

Can I cache test discovery results across CI runs? Yes. Pytest's .pytest_cache directory stores collection metadata. Ensure --cache-clear is not used unnecessarily, and configure CI to persist the cache directory between pipeline stages. Note that cache invalidation occurs automatically when file hashes change, but custom collection hooks may require manual cache management to prevent stale node resolution.