[{"data":1,"prerenderedAt":1195},["ShallowReactive",2],{"page-\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002F":3},{"id":4,"title":5,"body":6,"description":1156,"extension":1157,"meta":1158,"navigation":368,"path":1191,"seo":1192,"stem":1193,"__hash__":1194},"content\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Findex.md","CPU Profiling with cProfile and py-spy",{"type":7,"value":8,"toc":1141},"minimark",[9,32,37,120,124,142,173,328,332,337,347,422,439,477,484,523,535,539,549,602,624,628,638,655,662,709,712,716,730,748,754,788,805,809,815,819,822,881,885,1027,1031,1043,1054,1063,1089,1093,1130,1137],[10,11,12,13,17,18,21,22,24,25,28,29,31],"p",{},"A service that meets its latency target in local benchmarks but burns CPU under production load is the canonical reason to reach for a profiler, and the first decision is which kind. Python ships a deterministic profiler, ",[14,15,16],"code",{},"cProfile",", that records every call and return — exact, reproducible, and ideal when you can run the workload yourself. It cannot help with a wedged production worker you must not restart; for that you need a sampling profiler such as ",[14,19,20],{},"py-spy"," that attaches to a live PID from outside the interpreter. This guide treats both as one toolkit: capture with ",[14,23,16],{},", read the numbers correctly with ",[14,26,27],{},"pstats",", visualize the call graph, and sample running processes with ",[14,30,20],{}," when the deterministic path is closed to you.",[33,34,36],"h2",{"id":35},"prerequisites","Prerequisites",[38,39,40,62,86,96,106],"ul",{},[41,42,43,44,47,48,50,51,53,54,57,58,61],"li",{},"Python ",[14,45,46],{},"3.8+"," (",[14,49,16],{},", ",[14,52,27],{},", and ",[14,55,56],{},"cProfile.Profile"," are stdlib; ",[14,59,60],{},"runctx"," has existed since 2.x).",[41,63,64,47,67,70,71,74,75,50,78,81,82,85],{},[14,65,66],{},"py-spy >= 0.3.14",[14,68,69],{},"pip install py-spy","); ",[14,72,73],{},"0.3.x"," added ",[14,76,77],{},"dump",[14,79,80],{},"--native",", and reliable ",[14,83,84],{},"--pid"," attach.",[41,87,88,91,92,95],{},[14,89,90],{},"snakeviz >= 2.2"," for interactive visualization (",[14,93,94],{},"pip install snakeviz",").",[41,97,98,101,102,105],{},[14,99,100],{},"gprof2dot >= 2024.6.6"," plus Graphviz (",[14,103,104],{},"dot",") for static call graphs.",[41,107,108,109,111,112,115,116,119],{},"On Linux, attaching ",[14,110,20],{}," to another process requires either ",[14,113,114],{},"CAP_SYS_PTRACE",", running as root, or relaxing ",[14,117,118],{},"kernel.yama.ptrace_scope"," — covered under Troubleshooting.",[33,121,123],{"id":122},"core-concept","Core concept",[10,125,126,127,131,132,135,136,138,139,141],{},"A ",[128,129,130],"strong",{},"deterministic profiler"," hooks the interpreter's call and return events, so it knows exactly how many times every function ran and how long each invocation took. That precision costs overhead — typically 2–5x slowdown — and the overhead is heaviest on functions that make many small calls, which can distort the picture. A ",[128,133,134],{},"sampling profiler"," instead interrupts the program at a fixed frequency (py-spy defaults to 100 Hz) and records the current call stack. It never touches the code path, adds negligible overhead, and reflects where wall-clock time is actually spent, at the cost of missing functions that run between samples. The mental model: ",[14,137,16],{}," answers \"how many times and how long per call\"; ",[14,140,20],{}," answers \"where is wall time going right now\".",[10,143,144,146,147,47,150,153,154,47,157,160,161,163,164,166,167,172],{},[14,145,27],{}," reports two times per function that engineers routinely confuse. ",[128,148,149],{},"Total time",[14,151,152],{},"tottime",") is time spent in the function body itself, excluding subcalls. ",[128,155,156],{},"Cumulative time",[14,158,159],{},"cumtime",") includes everything the function transitively called. A dispatcher near the top of the stack has huge ",[14,162,159],{}," but tiny ",[14,165,152],{},"; a tight numeric loop is the opposite. The distinction is important enough to have its own deep dive on ",[168,169,171],"a",{"href":170},"\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Finterpreting-cprofile-cumulative-vs-total-time\u002F","interpreting cProfile cumulative vs total time",".",[174,175,178,324],"figure",{"className":176},[177],"diagram",[179,180,187,188,187,192,187,196,187,206,187,217,187,223,187,228,187,237,187,241,187,244,187,248,187,251,187,255,187,258,187,262,187,266,187,270,187,273,187,279,187,284,187,287,187,290,187,293,187,297,187,301,187,305,187,308,187,312,187,316,187,320],"svg",{"viewBox":181,"role":182,"ariaLabelledBy":183,"xmlns":186},"0 0 800 380","img",[184,185],"cprof-t","cprof-d","http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg","\n  ",[189,190,191],"title",{"id":184},"Deterministic versus sampling profiling",[193,194,195],"desc",{"id":185},"cProfile hooks every call and return for exact counts; py-spy samples the stack at a fixed rate from outside the process.",[197,198,205],"text",{"x":199,"y":200,"textAnchor":201,"fontSize":202,"fontWeight":203,"fill":204},"400","32","middle","19","700","#3d405b","Two ways to see CPU time",[207,208],"rect",{"x":209,"y":210,"width":211,"height":212,"rx":213,"fill":214,"stroke":215,"strokeWidth":216},"40","58","340","270","14","#fffdf8","#e07a5f","2",[197,218,222],{"x":219,"y":220,"textAnchor":201,"fontSize":221,"fontWeight":203,"fill":204},"210","86","15","cProfile (deterministic)",[197,224,227],{"x":219,"y":225,"textAnchor":201,"fontSize":226,"fill":204},"108","12","hooks every call + return",[207,229],{"x":230,"y":231,"width":232,"height":233,"rx":234,"fill":235,"stroke":204,"strokeWidth":236},"70","126","280","34","8","#f4f1de","1.5",[197,238,240],{"x":219,"y":239,"textAnchor":201,"fontSize":226,"fill":204},"148","call handler -- exact ncalls",[207,242],{"x":230,"y":243,"width":232,"height":233,"rx":234,"fill":235,"stroke":204,"strokeWidth":236},"170",[197,245,247],{"x":219,"y":246,"textAnchor":201,"fontSize":226,"fill":204},"192","tottime: in the body only",[207,249],{"x":230,"y":250,"width":232,"height":233,"rx":234,"fill":235,"stroke":204,"strokeWidth":236},"214",[197,252,254],{"x":219,"y":253,"textAnchor":201,"fontSize":226,"fill":204},"236","cumtime: body + all subcalls",[197,256,257],{"x":219,"y":232,"textAnchor":201,"fontSize":226,"fill":215},"precise, but 2-5x overhead",[197,259,261],{"x":219,"y":260,"textAnchor":201,"fontSize":226,"fill":204},"302","you run the code yourself",[207,263],{"x":264,"y":210,"width":211,"height":212,"rx":213,"fill":214,"stroke":265,"strokeWidth":216},"420","#81b29a",[197,267,269],{"x":268,"y":220,"textAnchor":201,"fontSize":221,"fontWeight":203,"fill":204},"590","py-spy (sampling)",[197,271,272],{"x":268,"y":225,"textAnchor":201,"fontSize":226,"fill":204},"reads stacks from outside",[274,275],"line",{"x1":276,"y1":277,"x2":278,"y2":277,"stroke":265,"strokeWidth":236},"460","130","720",[280,281],"circle",{"cx":282,"cy":277,"r":283,"fill":265},"490","5",[280,285],{"cx":286,"cy":277,"r":283,"fill":265},"550",[280,288],{"cx":289,"cy":277,"r":283,"fill":265},"610",[280,291],{"cx":292,"cy":277,"r":283,"fill":265},"670",[197,294,296],{"x":268,"y":295,"textAnchor":201,"fontSize":226,"fill":204},"156","snapshot stack at 100 Hz",[207,298],{"x":299,"y":300,"width":232,"height":233,"rx":234,"fill":235,"stroke":204,"strokeWidth":236},"450","174",[197,302,304],{"x":268,"y":303,"textAnchor":201,"fontSize":226,"fill":204},"196","attaches by --pid, no restart",[207,306],{"x":299,"y":307,"width":232,"height":233,"rx":234,"fill":235,"stroke":204,"strokeWidth":236},"218",[197,309,311],{"x":268,"y":310,"textAnchor":201,"fontSize":226,"fill":204},"240","wall-clock, near-zero overhead",[197,313,315],{"x":268,"y":314,"textAnchor":201,"fontSize":226,"fill":265},"284","safe on production workers",[197,317,319],{"x":268,"y":318,"textAnchor":201,"fontSize":226,"fill":204},"306","misses sub-sample functions",[197,321,323],{"x":199,"y":322,"textAnchor":201,"fontSize":226,"fill":204},"360","Profile deterministically when you can; sample when not.",[325,326,327],"figcaption",{},"cProfile instruments every call for exact counts at the cost of overhead; py-spy samples the stack from outside the interpreter, reflecting wall-clock time with negligible cost and no restart.",[33,329,331],{"id":330},"step-by-step-implementation","Step-by-step implementation",[333,334,336],"h3",{"id":335},"_1-capture-a-deterministic-profile-with-cprofilerun-runctx","1. Capture a deterministic profile with cProfile.run \u002F runctx",[10,338,339,342,343,346],{},[14,340,341],{},"cProfile.run(statement, filename=None)"," executes a string of code under the profiler. Pass a ",[14,344,345],{},"filename"," to persist raw stats for later analysis instead of dumping a table to stdout:",[348,349,354],"pre",{"className":350,"code":351,"language":352,"meta":353,"style":353},"language-python shiki shiki-themes github-light github-dark","import cProfile\n\ndef fib(n: int) -> int:\n    return n if n \u003C 2 else fib(n - 1) + fib(n - 2)\n\ndef workload() -> int:\n    return sum(fib(n) for n in range(30))\n\n# Run the statement under the profiler and write raw stats to disk.\n# The string is exec'd in a fresh namespace, so reference module globals by name.\ncProfile.run(\"workload()\", filename=\"workload.prof\")\n","python","",[14,355,356,363,370,376,382,387,393,399,404,410,416],{"__ignoreMap":353},[357,358,360],"span",{"class":274,"line":359},1,[357,361,362],{},"import cProfile\n",[357,364,366],{"class":274,"line":365},2,[357,367,369],{"emptyLinePlaceholder":368},true,"\n",[357,371,373],{"class":274,"line":372},3,[357,374,375],{},"def fib(n: int) -> int:\n",[357,377,379],{"class":274,"line":378},4,[357,380,381],{},"    return n if n \u003C 2 else fib(n - 1) + fib(n - 2)\n",[357,383,385],{"class":274,"line":384},5,[357,386,369],{"emptyLinePlaceholder":368},[357,388,390],{"class":274,"line":389},6,[357,391,392],{},"def workload() -> int:\n",[357,394,396],{"class":274,"line":395},7,[357,397,398],{},"    return sum(fib(n) for n in range(30))\n",[357,400,402],{"class":274,"line":401},8,[357,403,369],{"emptyLinePlaceholder":368},[357,405,407],{"class":274,"line":406},9,[357,408,409],{},"# Run the statement under the profiler and write raw stats to disk.\n",[357,411,413],{"class":274,"line":412},10,[357,414,415],{},"# The string is exec'd in a fresh namespace, so reference module globals by name.\n",[357,417,419],{"class":274,"line":418},11,[357,420,421],{},"cProfile.run(\"workload()\", filename=\"workload.prof\")\n",[10,423,424,427,428,430,431,434,435,438],{},[14,425,426],{},"run"," execs the statement in an empty namespace, so it cannot see locals. When the code you want to profile depends on local variables — common inside a test or a function — use ",[14,429,60],{},", which takes explicit ",[14,432,433],{},"globals"," and ",[14,436,437],{},"locals"," dicts:",[348,440,442],{"className":350,"code":441,"language":352,"meta":353,"style":353},"import cProfile\n\ndef profile_request(payload: dict) -> None:\n    handler = build_handler(payload)          # local objects the statement needs\n    # runctx exec's \"handler.run()\" with these exact namespaces, so the local\n    # `handler` resolves correctly where plain run() would raise NameError.\n    cProfile.runctx(\"handler.run()\", globals(), locals(), filename=\"request.prof\")\n",[14,443,444,448,452,457,462,467,472],{"__ignoreMap":353},[357,445,446],{"class":274,"line":359},[357,447,362],{},[357,449,450],{"class":274,"line":365},[357,451,369],{"emptyLinePlaceholder":368},[357,453,454],{"class":274,"line":372},[357,455,456],{},"def profile_request(payload: dict) -> None:\n",[357,458,459],{"class":274,"line":378},[357,460,461],{},"    handler = build_handler(payload)          # local objects the statement needs\n",[357,463,464],{"class":274,"line":384},[357,465,466],{},"    # runctx exec's \"handler.run()\" with these exact namespaces, so the local\n",[357,468,469],{"class":274,"line":389},[357,470,471],{},"    # `handler` resolves correctly where plain run() would raise NameError.\n",[357,473,474],{"class":274,"line":395},[357,475,476],{},"    cProfile.runctx(\"handler.run()\", globals(), locals(), filename=\"request.prof\")\n",[10,478,479,480,483],{},"For surgical control — profiling a single hot region without wrapping it in a string — drive a ",[14,481,482],{},"Profile"," object directly:",[348,485,487],{"className":350,"code":486,"language":352,"meta":353,"style":353},"import cProfile, pstats\n\nprofiler = cProfile.Profile()\nprofiler.enable()\nresult = expensive_pipeline(records)          # only this region is measured\nprofiler.disable()\nprofiler.dump_stats(\"pipeline.prof\")          # raw stats for pstats \u002F SnakeViz\n",[14,488,489,494,498,503,508,513,518],{"__ignoreMap":353},[357,490,491],{"class":274,"line":359},[357,492,493],{},"import cProfile, pstats\n",[357,495,496],{"class":274,"line":365},[357,497,369],{"emptyLinePlaceholder":368},[357,499,500],{"class":274,"line":372},[357,501,502],{},"profiler = cProfile.Profile()\n",[357,504,505],{"class":274,"line":378},[357,506,507],{},"profiler.enable()\n",[357,509,510],{"class":274,"line":384},[357,511,512],{},"result = expensive_pipeline(records)          # only this region is measured\n",[357,514,515],{"class":274,"line":389},[357,516,517],{},"profiler.disable()\n",[357,519,520],{"class":274,"line":395},[357,521,522],{},"profiler.dump_stats(\"pipeline.prof\")          # raw stats for pstats \u002F SnakeViz\n",[10,524,525,526,529,530,534],{},"Avoid profiling under ",[14,527,528],{},"pytest"," collection: the deterministic overhead skews fixture setup. Profile the function under test directly, or use the ",[168,531,533],{"href":532},"\u002Fadvanced-pytest-architecture-configuration\u002Fmastering-pytest-fixtures\u002Fhow-to-scope-pytest-fixtures-for-async-tests\u002F","pytest fixture scoping rules"," to isolate the call inside a function-scoped fixture so collection time is excluded.",[333,536,538],{"id":537},"_2-sort-and-read-the-stats-with-pstats","2. Sort and read the stats with pstats",[10,540,126,541,544,545,548],{},[14,542,543],{},".prof"," file is raw; ",[14,546,547],{},"pstats.Stats"," turns it into a sortable table. Strip directory prefixes so function names are legible, then sort:",[348,550,552],{"className":350,"code":551,"language":352,"meta":353,"style":353},"import pstats\nfrom pstats import SortKey\n\nstats = pstats.Stats(\"workload.prof\")\nstats.strip_dirs()                            # drop absolute paths from names\nstats.sort_stats(SortKey.CUMULATIVE)          # rank by total + subcall time\nstats.print_stats(10)                         # top 10 rows\n\n# A second pass by total time surfaces leaf hotspots, not just hot callers.\nstats.sort_stats(SortKey.TIME).print_stats(10)\n",[14,553,554,559,564,568,573,578,583,588,592,597],{"__ignoreMap":353},[357,555,556],{"class":274,"line":359},[357,557,558],{},"import pstats\n",[357,560,561],{"class":274,"line":365},[357,562,563],{},"from pstats import SortKey\n",[357,565,566],{"class":274,"line":372},[357,567,369],{"emptyLinePlaceholder":368},[357,569,570],{"class":274,"line":378},[357,571,572],{},"stats = pstats.Stats(\"workload.prof\")\n",[357,574,575],{"class":274,"line":384},[357,576,577],{},"stats.strip_dirs()                            # drop absolute paths from names\n",[357,579,580],{"class":274,"line":389},[357,581,582],{},"stats.sort_stats(SortKey.CUMULATIVE)          # rank by total + subcall time\n",[357,584,585],{"class":274,"line":395},[357,586,587],{},"stats.print_stats(10)                         # top 10 rows\n",[357,589,590],{"class":274,"line":401},[357,591,369],{"emptyLinePlaceholder":368},[357,593,594],{"class":274,"line":406},[357,595,596],{},"# A second pass by total time surfaces leaf hotspots, not just hot callers.\n",[357,598,599],{"class":274,"line":412},[357,600,601],{},"stats.sort_stats(SortKey.TIME).print_stats(10)\n",[10,603,604,605,608,609,612,613,50,616,50,619,53,621,623],{},"Sort by ",[14,606,607],{},"SortKey.CUMULATIVE"," to find which entry points dominate the run, and by ",[14,610,611],{},"SortKey.TIME"," (total time) to find the leaf functions actually burning CPU. The ",[14,614,615],{},"ncalls",[14,617,618],{},"percall",[14,620,152],{},[14,622,159],{}," columns are dissected in the dedicated guide linked above.",[333,625,627],{"id":626},"_3-visualize-the-call-graph","3. Visualize the call graph",[10,629,630,631,634,635,637],{},"Tables hide structure. ",[128,632,633],{},"SnakeViz"," renders a ",[14,636,543],{}," file as an interactive icicle chart in the browser — the width of each block is its cumulative time:",[348,639,643],{"className":640,"code":641,"language":642,"meta":353,"style":353},"language-bash shiki shiki-themes github-light github-dark","snakeviz workload.prof\n","bash",[14,644,645],{"__ignoreMap":353},[357,646,647,651],{"class":274,"line":359},[357,648,650],{"class":649},"sScJk","snakeviz",[357,652,654],{"class":653},"sZZnC"," workload.prof\n",[10,656,657,658,661],{},"For a static, shareable artifact (CI logs, code review), ",[128,659,660],{},"gprof2dot"," converts stats to a Graphviz call graph:",[348,663,665],{"className":640,"code":664,"language":642,"meta":353,"style":353},"# gprof2dot reads the cProfile format and emits a DOT graph; dot renders it.\npython -m gprof2dot -f pstats workload.prof | dot -Tsvg -o callgraph.svg\n",[14,666,667,673],{"__ignoreMap":353},[357,668,669],{"class":274,"line":359},[357,670,672],{"class":671},"sJ8bj","# gprof2dot reads the cProfile format and emits a DOT graph; dot renders it.\n",[357,674,675,677,681,684,687,690,693,697,700,703,706],{"class":274,"line":365},[357,676,352],{"class":649},[357,678,680],{"class":679},"sj4cs"," -m",[357,682,683],{"class":653}," gprof2dot",[357,685,686],{"class":679}," -f",[357,688,689],{"class":653}," pstats",[357,691,692],{"class":653}," workload.prof",[357,694,696],{"class":695},"szBVR"," |",[357,698,699],{"class":649}," dot",[357,701,702],{"class":679}," -Tsvg",[357,704,705],{"class":679}," -o",[357,707,708],{"class":653}," callgraph.svg\n",[10,710,711],{},"Each node is colored by cumulative time, making the hot path visually obvious without scanning rows.",[333,713,715],{"id":714},"_4-sample-a-running-process-with-py-spy","4. Sample a running process with py-spy",[10,717,718,719,721,722,725,726,729],{},"When the workload is a long-running process you cannot restart, attach ",[14,720,20],{}," by PID. ",[14,723,724],{},"py-spy top"," gives a live, ",[14,727,728],{},"top","-like view of the hottest functions:",[348,731,733],{"className":640,"code":732,"language":642,"meta":353,"style":353},"py-spy top --pid 48291\n",[14,734,735],{"__ignoreMap":353},[357,736,737,739,742,745],{"class":274,"line":359},[357,738,20],{"class":649},[357,740,741],{"class":653}," top",[357,743,744],{"class":679}," --pid",[357,746,747],{"class":679}," 48291\n",[10,749,750,753],{},[14,751,752],{},"py-spy record"," writes a flame graph SVG over a sampling window:",[348,755,757],{"className":640,"code":756,"language":642,"meta":353,"style":353},"# Sample PID 48291 for 30 seconds and write an interactive flame graph.\npy-spy record --pid 48291 --duration 30 --output flame.svg\n",[14,758,759,764],{"__ignoreMap":353},[357,760,761],{"class":274,"line":359},[357,762,763],{"class":671},"# Sample PID 48291 for 30 seconds and write an interactive flame graph.\n",[357,765,766,768,771,773,776,779,782,785],{"class":274,"line":365},[357,767,20],{"class":649},[357,769,770],{"class":653}," record",[357,772,744],{"class":679},[357,774,775],{"class":679}," 48291",[357,777,778],{"class":679}," --duration",[357,780,781],{"class":679}," 30",[357,783,784],{"class":679}," --output",[357,786,787],{"class":653}," flame.svg\n",[10,789,790,793,794,796,797,800,801,172],{},[14,791,792],{},"py-spy dump"," prints a one-shot snapshot of every thread's stack — the fastest way to see what a hung process is doing right now. Driving these subcommands against a live PID, including ",[14,795,80],{}," for C extensions and ",[14,798,799],{},"ptrace_scope"," permissions, is the focus of ",[168,802,804],{"href":803},"\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Fprofiling-a-running-process-with-py-spy\u002F","profiling a running process with py-spy",[333,806,808],{"id":807},"_5-confirm-the-fix","5. Confirm the fix",[10,810,811,812,814],{},"After changing the suspect function, re-profile the same workload and compare ",[14,813,159],{}," on that function. A real win shows the number shrinking; if total runtime barely moved, the hotspot relocated and you optimized the wrong thing.",[33,816,818],{"id":817},"verification","Verification",[10,820,821],{},"Confirm each tool actually measured what you intended:",[38,823,824,844,861,871],{},[41,825,826,829,830,832,833,835,836,839,840,843],{},[128,827,828],{},"cProfile captured the region:"," load the ",[14,831,543],{}," and check the call you care about appears with a plausible ",[14,834,615],{},". A zero or missing row means ",[14,837,838],{},"enable()","\u002F",[14,841,842],{},"disable()"," bracketed the wrong code.",[41,845,846,849,850,853,854,857,858,860],{},[128,847,848],{},"pstats sort is correct:"," the first row under ",[14,851,852],{},"CUMULATIVE"," should be your entry point (it transitively contains everything); under ",[14,855,856],{},"TIME"," it should be a leaf. If the entry point tops the ",[14,859,856],{}," list, it is doing real work in its own body, not just dispatching.",[41,862,863,866,867,870],{},[128,864,865],{},"py-spy attached:"," ",[14,868,869],{},"py-spy dump --pid \u003CPID>"," should print live stacks. An empty or error result means a permissions or interpreter-detection problem, not an idle process.",[41,872,873,876,877,880],{},[128,874,875],{},"The fix held:"," diff cumulative time before and after. Use ",[14,878,879],{},"pstats.Stats(old).sort_stats(SortKey.CUMULATIVE)"," and the same on the new profile.",[33,882,884],{"id":883},"troubleshooting","Troubleshooting",[886,887,888,904],"table",{},[889,890,891],"thead",{},[892,893,894,898,901],"tr",{},[895,896,897],"th",{},"Symptom",[895,899,900],{},"Root cause",[895,902,903],{},"Fix",[905,906,907,931,946,974,993,1012],"tbody",{},[892,908,909,920,925],{},[910,911,912,915,916,919],"td",{},[14,913,914],{},"cProfile.run"," raises ",[14,917,918],{},"NameError"," for a variable",[910,921,922,924],{},[14,923,426],{}," execs in an empty namespace",[910,926,927,928],{},"Use ",[14,929,930],{},"cProfile.runctx(stmt, globals(), locals())",[892,932,933,940,943],{},[910,934,935,936,939],{},"Profile shows huge time in ",[14,937,938],{},"\u003Cbuilt-in method>"," rows",[910,941,942],{},"Deterministic overhead inflates many tiny calls",[910,944,945],{},"Cross-check with py-spy; trust sampling for wall-clock attribution",[892,947,948,960,965],{},[910,949,950,952,953,956,957],{},[14,951,20],{}," exits with ",[14,954,955],{},"Permission denied"," \u002F ",[14,958,959],{},"Operation not permitted",[910,961,962,964],{},[14,963,799],{}," restricts attaching",[910,966,967,968,970,971],{},"Run with sudo, grant ",[14,969,114],{},", or set ",[14,972,973],{},"kernel.yama.ptrace_scope=0",[892,975,976,984,987],{},[910,977,978,980,981],{},[14,979,20],{}," reports ",[14,982,983],{},"Failed to find python interpreter",[910,985,986],{},"Process is in a container or static build",[910,988,989,990,992],{},"Run py-spy inside the same namespace \u002F container, or use ",[14,991,84],{}," of the in-namespace PID",[892,994,995,1002,1005],{},[910,996,997,998,1001],{},"Flame graph is all idle \u002F ",[14,999,1000],{},"wait"," frames",[910,1003,1004],{},"App is I\u002FO-bound, not CPU-bound",[910,1006,1007,1008,1011],{},"Add ",[14,1009,1010],{},"--idle"," to include idle threads, or profile the blocking call, not CPU",[892,1013,1014,1017,1024],{},[910,1015,1016],{},"SnakeViz shows one giant block, no detail",[910,1018,1019,1020,1023],{},"Stats were not ",[14,1021,1022],{},"strip_dirs()","'d or the region is one call",[910,1025,1026],{},"Profile a representative workload with many iterations",[33,1028,1030],{"id":1029},"frequently-asked-questions","Frequently Asked Questions",[10,1032,1033,1036,1037,1039,1040,1042],{},[128,1034,1035],{},"When should I use cProfile instead of py-spy?","\nUse ",[14,1038,16],{}," when you can run the code yourself and want exact, deterministic per-call counts. Use ",[14,1041,20],{}," when you need to profile a process you cannot restart, such as a production worker, because it samples from outside the interpreter and adds almost no overhead.",[10,1044,1045,1048,1050,1051,1053],{},[128,1046,1047],{},"Why does cProfile show a different hot function than py-spy?",[14,1049,16],{}," is deterministic and counts every call, so its overhead inflates functions that make many tiny calls. ",[14,1052,20],{}," samples the call stack at a fixed frequency, so it reflects wall-clock time spent and is less skewed by call frequency. Trust py-spy for where wall time goes and cProfile for exact call counts.",[10,1055,1056,1059,1060,1062],{},[128,1057,1058],{},"Does py-spy require changing my application code?","\nNo. ",[14,1061,20],{}," attaches to a running Python process by PID using OS debugging facilities and reads its stacks externally. You do not import anything or add hooks; you point py-spy at the PID and it produces top output, a flame graph, or a one-shot dump.",[10,1064,1065,1068,1069,1072,1073,434,1075,1077,1078,1080,1081,434,1083,1085,1086,1088],{},[128,1066,1067],{},"How do I profile only one function instead of the whole program?","\nWrap the call with ",[14,1070,1071],{},"cProfile.runctx",", passing the expression and explicit ",[14,1074,433],{},[14,1076,437],{}," dicts, or use a ",[14,1079,56],{}," object with ",[14,1082,838],{},[14,1084,842],{}," around the region. Then load the stats with ",[14,1087,547],{}," and sort by cumulative time.",[33,1090,1092],{"id":1091},"related-guides","Related guides",[38,1094,1095,1101,1107,1115,1122],{},[41,1096,1097,1098,1100],{},"Pin down the column meanings in ",[168,1099,171],{"href":170}," before you trust a sort order.",[41,1102,1103,1104,1106],{},"When the target is a live worker, ",[168,1105,804],{"href":803}," covers attach, flame graphs, and ptrace permissions.",[41,1108,1109,1110,1114],{},"Pair CPU work with ",[168,1111,1113],{"href":1112},"\u002Fsystematic-debugging-performance-profiling\u002Fmemory-profiling-with-tracemalloc\u002F","memory profiling using tracemalloc"," when high CPU is actually GC pressure from allocation churn.",[41,1116,1117,1118,172],{},"For async services, a CPU hotspot is often a blocked event loop — see ",[168,1119,1121],{"href":1120},"\u002Fsystematic-debugging-performance-profiling\u002Fdebugging-async-code-and-event-loops\u002F","debugging async code and event loops",[41,1123,1124,1125,1129],{},"When the slow path lives in a test suite, ",[168,1126,1128],{"href":1127},"\u002Fsystematic-debugging-performance-profiling\u002Finteractive-debugging-with-pdb-and-ipdb\u002F","interactive debugging with pdb and ipdb"," helps you reach the call site before profiling it.",[10,1131,1132,1133],{},"← Back to ",[168,1134,1136],{"href":1135},"\u002Fsystematic-debugging-performance-profiling\u002F","Systematic Debugging & Performance Profiling",[1138,1139,1140],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sScJk, html code.shiki .sScJk{--shiki-default:#6F42C1;--shiki-dark:#B392F0}html pre.shiki code .sZZnC, html code.shiki .sZZnC{--shiki-default:#032F62;--shiki-dark:#9ECBFF}html pre.shiki code .sJ8bj, html code.shiki .sJ8bj{--shiki-default:#6A737D;--shiki-dark:#6A737D}html pre.shiki code .sj4cs, html code.shiki .sj4cs{--shiki-default:#005CC5;--shiki-dark:#79B8FF}html pre.shiki code .szBVR, html code.shiki .szBVR{--shiki-default:#D73A49;--shiki-dark:#F97583}",{"title":353,"searchDepth":365,"depth":365,"links":1142},[1143,1144,1145,1152,1153,1154,1155],{"id":35,"depth":365,"text":36},{"id":122,"depth":365,"text":123},{"id":330,"depth":365,"text":331,"children":1146},[1147,1148,1149,1150,1151],{"id":335,"depth":372,"text":336},{"id":537,"depth":372,"text":538},{"id":626,"depth":372,"text":627},{"id":714,"depth":372,"text":715},{"id":807,"depth":372,"text":808},{"id":817,"depth":365,"text":818},{"id":883,"depth":365,"text":884},{"id":1029,"depth":365,"text":1030},{"id":1091,"depth":365,"text":1092},"Profile Python CPU hotspots with cProfile, sort pstats by cumulative vs total time, visualize with SnakeViz, and sample live processes with py-spy in production.","md",{"slug":1159,"type":1160,"breadcrumb":1161,"datePublished":1162,"dateModified":1162,"faq":1163,"howto":1172},"cpu-profiling-with-cprofile-and-py-spy","cluster","CPU Profiling","2026-06-18",[1164,1166,1168,1170],{"q":1035,"a":1165},"Use cProfile when you can run the code yourself and want exact, deterministic per-call counts. Use py-spy when you need to profile a process you cannot restart, such as a production worker, because it samples from outside the interpreter and adds almost no overhead.",{"q":1047,"a":1167},"cProfile is deterministic and counts every call, so its overhead inflates functions that make many tiny calls. py-spy samples the call stack at a fixed frequency, so it reflects wall-clock time spent and is less skewed by call frequency. Trust py-spy for where wall time goes and cProfile for exact call counts.",{"q":1058,"a":1169},"No. py-spy attaches to a running Python process by PID using OS debugging facilities and reads its stacks externally. You do not import anything or add hooks; you point py-spy at the PID and it produces top output, a flame graph, or a one-shot dump.",{"q":1067,"a":1171},"Wrap the call with cProfile.runctx, passing the expression and explicit globals and locals dicts, or use a cProfile.Profile object with enable() and disable() around the region. Then load the stats with pstats.Stats and sort by cumulative time.",{"name":1173,"description":1174,"steps":1175},"How to profile Python CPU usage with cProfile and py-spy","Capture a deterministic profile with cProfile, analyze it with pstats, and sample a live process with py-spy.",[1176,1179,1182,1185,1188],{"name":1177,"text":1178},"Capture a deterministic profile","Run the target under cProfile with cProfile.run or cProfile.runctx and write the raw stats to a .prof file with the filename argument.",{"name":1180,"text":1181},"Sort and read the stats","Load the .prof file with pstats.Stats, strip directory noise, and sort by cumulative time to find callers and by total time to find leaf hotspots.",{"name":1183,"text":1184},"Visualize the call graph","Render the .prof file with SnakeViz for an interactive icicle chart or pipe it through gprof2dot and Graphviz for a static call-graph image.",{"name":1186,"text":1187},"Sample a running process","Attach py-spy to the live process by PID with py-spy top for a live view or py-spy record to write a flame graph SVG, with no code change or restart.",{"name":1189,"text":1190},"Confirm the fix","Re-profile after the change and compare cumulative time on the suspect function to verify the hotspot shrank rather than moving elsewhere.","\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy",{"title":5,"description":1156},"systematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Findex","wBsdKDxk6bqqxI-wsqSPAqSad0ofeqmOlyYbjoLZh6k",1781793487406]