A suite that runs in eight minutes sequentially still takes six under pytest-parallel thread mode but two under pytest-xdist -n auto — or the reverse, depending entirely on whether the work is CPU-bound or I/O-bound. The choice between the two runners is not a benchmark contest but a match between workload profile and concurrency primitive: pytest-xdist spawns isolated interpreter processes over execnet, while pytest-parallel uses in-process threads or multiprocessing. This guide compares their execution models, fixture and pickling constraints, and the failure modes that only appear under concurrency.
Prerequisites
pytest >= 8.0, Python3.9+.pytest-xdist >= 3.0(actively maintained).pytest-parallelhas been effectively unmaintained since 2021 and does not officially support recent pytest releases — confirm it imports against your pinned pytest before relying on it.- For benchmarking:
pytest-benchmark,memory_profiler, andcProfilefrom the standard library.
The collection caching that reduces per-worker startup cost is detailed in Optimizing Test Discovery; per-worker fixture instantiation builds on Managing Conftest Hierarchies.
Solution
Start from the workload profile, then pick the runner whose worker model fits.
pytest-xdist uses execnet to spawn isolated interpreters that communicate over pickled RPC, so each worker pays full interpreter startup and conftest.py evaluation but gains absolute memory isolation. Module- and session-scoped fixtures are instantiated once per worker, not once per run. pytest-parallel uses multiprocessing.Pool (process mode) or ThreadPoolExecutor (thread mode); thread mode shares the interpreter heap for near-zero startup cost but exposes you to GIL contention on CPU-bound work and race conditions on any shared global state.
Run each with the appropriate flags and capture metrics:
# pytest-xdist: auto-detect CPUs, group tests by module to reuse fixtures.
pytest -n auto --dist loadscope --benchmark-only --benchmark-save=xdist
# pytest-parallel: thread pool by default; force processes for CPU-bound work.
pytest --workers auto --benchmark-only --benchmark-save=parallel
To find serialization bottlenecks, profile and watch for multiprocessing.reduction (pytest-parallel) or execnet.remote (pytest-xdist) dominating the call graph — both signal non-picklable fixtures or excessive parametrization.
Why this works
The two runners win in opposite regimes because the cost they avoid differs. pytest-xdist's process isolation removes the GIL ceiling and prevents cross-test contamination, so CPU-bound suites scale past eight cores and loadscope/loadfile distribution amortizes expensive fixture setup. pytest-parallel's thread mode skips interpreter duplication entirely, cutting peak RSS by 60-80% and eliminating spawn latency, which dominates total time for short I/O-bound suites where the GIL is released during blocking calls. Picking the wrong one means paying a penalty (interpreter startup, or GIL contention) that exceeds the parallelism gain.
Edge cases and failure modes
- Fixture scope leaks. Module/session fixtures instantiate per worker in both runners' process modes; a fixture wrapping a mutable singleton produces failures that vanish at
-n 1. Diagnose withpytest --setup-show -n auto, then narrow scope or key resources byos.getpid(). - Pickling errors in pytest-parallel.
TypeError: cannot pickle 'function' objectarises from closures or dynamically generated fixtures crossing themultiprocessingqueue. Move closures to module level or switch to thread mode for I/O-bound work;execnethandles more types but still fails on file descriptors and C extensions. - Coverage fragmentation. Workers overwrite each other's
.coverage. Use--cov-append, thencoverage combine; for xdist add--cov-contextto attribute branches to workers. See memory profiling with tracemalloc for tracking worker RSS growth. - Hypothesis example DB desync. Independent workers build separate example databases, defeating shrinking. Point them at a shared
DirectoryBasedExampleDatabase(".hypothesis/examples")— see Hypothesis Framework Fundamentals. - OS resource exhaustion.
OSError: [Errno 24] Too many open filesor OOM kills under high concurrency. Raiseulimit -nto65536, set--max-worker-restart=3, and use connection pooling with boundedmax_overflow.
Frequently Asked Questions
Why does pytest-parallel fail with 'cannot pickle local object' while pytest-xdist works?pytest-parallel uses standard multiprocessing, whose pickle protocol cannot serialize lambdas, closures, or non-picklable C extensions. pytest-xdist uses execnet's custom serialization, which handles more object types. Refactor closures into module-level functions, avoid dynamic fixtures, or use thread mode for I/O-bound work.
Can pytest-xdist and pytest-parallel be combined for nested parallelism?
No. Both override pytest_runtestloop and pytest_collection_modifyitems to control distribution, so nesting them causes hook recursion, worker deadlocks, and dropped tests. Pick one runner per suite based on whether the workload is CPU-bound or I/O-bound.
Which runner is faster, and should I still consider pytest-parallel?pytest-xdist wins for CPU-bound suites with heavy fixtures via loadscope/loadfile distribution; pytest-parallel's thread mode wins for lightweight I/O-bound tests by avoiding interpreter duplication. But pytest-parallel has been effectively unmaintained since 2021 and lacks support for recent pytest releases, so verify compatibility before adopting it.
← Back to Optimizing Test Discovery