Pytest & CI

pytest-xdist vs pytest-parallel Performance

A suite that runs in eight minutes sequentially still takes six under pytest-parallel thread mode but two under pytest-xdist -n auto — or the reverse, depending entirely on whether the work is CPU-bound or I/O-bound. The choice between the two runners is not a benchmark contest but a match between workload profile and concurrency primitive: pytest-xdist spawns isolated interpreter processes over execnet, while pytest-parallel uses in-process threads or multiprocessing. This guide compares their execution models, fixture and pickling constraints, and the failure modes that only appear under concurrency.

Prerequisites

  • pytest >= 8.0, Python 3.9+.
  • pytest-xdist >= 3.0 (actively maintained). pytest-parallel has been effectively unmaintained since 2021 and does not officially support recent pytest releases — confirm it imports against your pinned pytest before relying on it.
  • For benchmarking: pytest-benchmark, memory_profiler, and cProfile from the standard library.

The collection caching that reduces per-worker startup cost is detailed in Optimizing Test Discovery; per-worker fixture instantiation builds on Managing Conftest Hierarchies.

Solution

Start from the workload profile, then pick the runner whose worker model fits.

pytest-xdist vs pytest-parallel comparison matrix A matrix comparing the two runners across worker model, isolation, serialization, memory cost, and best-fit workload. Choosing a parallel runner Dimension pytest-xdist pytest-parallel Worker model processes via execnet threads or processes Isolation full memory isolation shared heap (threads) Serialization execnet custom layer standard pickle Memory / worker high (full interpreter) low (threads) Best fit CPU-bound, heavy fixtures I/O-bound, light state Maintenance actively maintained stale since 2021
pytest-xdist trades higher per-worker memory for full process isolation and robust serialization, making it the safer default for CPU-bound suites; pytest-parallel's thread mode is lighter for I/O-bound work but shares the heap and is no longer actively maintained.

pytest-xdist uses execnet to spawn isolated interpreters that communicate over pickled RPC, so each worker pays full interpreter startup and conftest.py evaluation but gains absolute memory isolation. Module- and session-scoped fixtures are instantiated once per worker, not once per run. pytest-parallel uses multiprocessing.Pool (process mode) or ThreadPoolExecutor (thread mode); thread mode shares the interpreter heap for near-zero startup cost but exposes you to GIL contention on CPU-bound work and race conditions on any shared global state.

Run each with the appropriate flags and capture metrics:

Bash
# pytest-xdist: auto-detect CPUs, group tests by module to reuse fixtures.
pytest -n auto --dist loadscope --benchmark-only --benchmark-save=xdist

# pytest-parallel: thread pool by default; force processes for CPU-bound work.
pytest --workers auto --benchmark-only --benchmark-save=parallel

To find serialization bottlenecks, profile and watch for multiprocessing.reduction (pytest-parallel) or execnet.remote (pytest-xdist) dominating the call graph — both signal non-picklable fixtures or excessive parametrization.

Why this works

The two runners win in opposite regimes because the cost they avoid differs. pytest-xdist's process isolation removes the GIL ceiling and prevents cross-test contamination, so CPU-bound suites scale past eight cores and loadscope/loadfile distribution amortizes expensive fixture setup. pytest-parallel's thread mode skips interpreter duplication entirely, cutting peak RSS by 60-80% and eliminating spawn latency, which dominates total time for short I/O-bound suites where the GIL is released during blocking calls. Picking the wrong one means paying a penalty (interpreter startup, or GIL contention) that exceeds the parallelism gain.

Edge cases and failure modes

  • Fixture scope leaks. Module/session fixtures instantiate per worker in both runners' process modes; a fixture wrapping a mutable singleton produces failures that vanish at -n 1. Diagnose with pytest --setup-show -n auto, then narrow scope or key resources by os.getpid().
  • Pickling errors in pytest-parallel. TypeError: cannot pickle 'function' object arises from closures or dynamically generated fixtures crossing the multiprocessing queue. Move closures to module level or switch to thread mode for I/O-bound work; execnet handles more types but still fails on file descriptors and C extensions.
  • Coverage fragmentation. Workers overwrite each other's .coverage. Use --cov-append, then coverage combine; for xdist add --cov-context to attribute branches to workers. See memory profiling with tracemalloc for tracking worker RSS growth.
  • Hypothesis example DB desync. Independent workers build separate example databases, defeating shrinking. Point them at a shared DirectoryBasedExampleDatabase(".hypothesis/examples") — see Hypothesis Framework Fundamentals.
  • OS resource exhaustion. OSError: [Errno 24] Too many open files or OOM kills under high concurrency. Raise ulimit -n to 65536, set --max-worker-restart=3, and use connection pooling with bounded max_overflow.

Frequently Asked Questions

Why does pytest-parallel fail with 'cannot pickle local object' while pytest-xdist works?pytest-parallel uses standard multiprocessing, whose pickle protocol cannot serialize lambdas, closures, or non-picklable C extensions. pytest-xdist uses execnet's custom serialization, which handles more object types. Refactor closures into module-level functions, avoid dynamic fixtures, or use thread mode for I/O-bound work.

Can pytest-xdist and pytest-parallel be combined for nested parallelism? No. Both override pytest_runtestloop and pytest_collection_modifyitems to control distribution, so nesting them causes hook recursion, worker deadlocks, and dropped tests. Pick one runner per suite based on whether the workload is CPU-bound or I/O-bound.

Which runner is faster, and should I still consider pytest-parallel?pytest-xdist wins for CPU-bound suites with heavy fixtures via loadscope/loadfile distribution; pytest-parallel's thread mode wins for lightweight I/O-bound tests by avoiding interpreter duplication. But pytest-parallel has been effectively unmaintained since 2021 and lacks support for recent pytest releases, so verify compatibility before adopting it.

← Back to Optimizing Test Discovery