[{"data":1,"prerenderedAt":476},["ShallowReactive",2],{"page-\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Finterpreting-cprofile-cumulative-vs-total-time\u002F":3},{"id":4,"title":5,"body":6,"description":442,"extension":443,"meta":444,"navigation":111,"path":472,"seo":473,"stem":474,"__hash__":475},"content\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Finterpreting-cprofile-cumulative-vs-total-time\u002Findex.md","Interpreting cProfile: Cumulative vs Total Time",{"type":7,"value":8,"toc":435},"minimark",[9,30,35,76,80,83,159,162,170,173,244,275,279,303,307,378,382,398,416,425,431],[10,11,12,13,17,18,21,22,25,26,29],"p",{},"You ran ",[14,15,16],"code",{},"cProfile",", you have a table, and the top row by one sort is your entry point while the top row by another sort is a tiny helper — and it is not obvious which one to optimize. The confusion is almost always between ",[14,19,20],{},"tottime"," and ",[14,23,24],{},"cumtime",". This guide explains every column pytest's profiler emits, why a dispatcher shows enormous cumulative time but trivial total time, and how to sort ",[14,27,28],{},"pstats"," to find the function actually worth changing.",[31,32,34],"h2",{"id":33},"prerequisites","Prerequisites",[36,37,38,55],"ul",{},[39,40,41,42,45,46,48,49,48,51,54],"li",{},"Python ",[14,43,44],{},"3.8+"," (",[14,47,16],{},", ",[14,50,28],{},[14,52,53],{},"pstats.SortKey",").",[39,56,57,58,61,62,65,66,69,70,75],{},"A ",[14,59,60],{},".prof"," file produced by ",[14,63,64],{},"cProfile.run(..., filename=\"x.prof\")"," or ",[14,67,68],{},"Profile.dump_stats(\"x.prof\")"," — see ",[71,72,74],"a",{"href":73},"\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002F","CPU profiling with cProfile and py-spy"," for capture.",[31,77,79],{"id":78},"solution","Solution",[10,81,82],{},"Load the stats, strip directory noise, and print both sorts:",[84,85,90],"pre",{"className":86,"code":87,"language":88,"meta":89,"style":89},"language-python shiki shiki-themes github-light github-dark","import pstats\nfrom pstats import SortKey\n\nstats = pstats.Stats(\"workload.prof\")\nstats.strip_dirs()                       # turn \u002Fabs\u002Fpath\u002Fmod.py:func into mod.py:func\n\n# cumtime: which entry points dominate the whole run (body + everything called).\nstats.sort_stats(SortKey.CUMULATIVE).print_stats(8)\n\n# tottime: which leaf functions burn CPU in their own body.\nstats.sort_stats(SortKey.TIME).print_stats(8)\n","python","",[14,91,92,100,106,113,119,125,130,136,142,147,153],{"__ignoreMap":89},[93,94,97],"span",{"class":95,"line":96},"line",1,[93,98,99],{},"import pstats\n",[93,101,103],{"class":95,"line":102},2,[93,104,105],{},"from pstats import SortKey\n",[93,107,109],{"class":95,"line":108},3,[93,110,112],{"emptyLinePlaceholder":111},true,"\n",[93,114,116],{"class":95,"line":115},4,[93,117,118],{},"stats = pstats.Stats(\"workload.prof\")\n",[93,120,122],{"class":95,"line":121},5,[93,123,124],{},"stats.strip_dirs()                       # turn \u002Fabs\u002Fpath\u002Fmod.py:func into mod.py:func\n",[93,126,128],{"class":95,"line":127},6,[93,129,112],{"emptyLinePlaceholder":111},[93,131,133],{"class":95,"line":132},7,[93,134,135],{},"# cumtime: which entry points dominate the whole run (body + everything called).\n",[93,137,139],{"class":95,"line":138},8,[93,140,141],{},"stats.sort_stats(SortKey.CUMULATIVE).print_stats(8)\n",[93,143,145],{"class":95,"line":144},9,[93,146,112],{"emptyLinePlaceholder":111},[93,148,150],{"class":95,"line":149},10,[93,151,152],{},"# tottime: which leaf functions burn CPU in their own body.\n",[93,154,156],{"class":95,"line":155},11,[93,157,158],{},"stats.sort_stats(SortKey.TIME).print_stats(8)\n",[10,160,161],{},"A representative row looks like this:",[84,163,168],{"className":164,"code":166,"language":167,"meta":89},[165],"language-text","   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n      240\u002F2    0.001    0.000    1.480    0.740 app.py:12(dispatch)\n   1000000    1.310    0.000    1.420    0.000 app.py:40(parse_record)\n","text",[14,169,166],{"__ignoreMap":89},[10,171,172],{},"Read it column by column:",[36,174,175,196,208,220,234],{},[39,176,177,183,184,187,188,191,192,195],{},[178,179,180],"strong",{},[14,181,182],{},"ncalls"," — number of calls. A ",[14,185,186],{},"240\u002F2"," split means the function recursed: ",[14,189,190],{},"240"," total entries, ",[14,193,194],{},"2"," primitive (non-recursive) entries.",[39,197,198,202,203,207],{},[178,199,200],{},[14,201,20],{}," — time spent in this function's own body, ",[204,205,206],"em",{},"excluding"," anything it called. This is the leaf-level CPU cost.",[39,209,210,215,216,219],{},[178,211,212],{},[14,213,214],{},"percall"," (first) — ",[14,217,218],{},"tottime \u002F primitive ncalls",", the average self-cost per call.",[39,221,222,226,227,230,231,233],{},[178,223,224],{},[14,225,24],{}," — total time in this function ",[204,228,229],{},"including"," every subcall, summed across all entries. The top-level entry point has the largest ",[14,232,24],{},".",[39,235,236,240,241,233],{},[178,237,238],{},[14,239,214],{}," (second) — ",[14,242,243],{},"cumtime \u002F primitive ncalls",[10,245,246,247,250,251,253,254,256,257,260,261,263,264,266,267,269,270,272,273,233],{},"In the example, ",[14,248,249],{},"dispatch"," has tiny ",[14,252,20],{}," (0.001s) but huge ",[14,255,24],{}," (1.480s): it does almost nothing itself and spends all its time in ",[14,258,259],{},"parse_record",". ",[14,262,259],{}," has ",[14,265,20],{}," ≈ ",[14,268,24],{}," (1.31 vs 1.42), so it is a genuine leaf hotspot. Optimize ",[14,271,259],{},", not ",[14,274,249],{},[31,276,278],{"id":277},"why-this-works","Why this works",[10,280,281,283,284,286,287,289,290,292,293,296,297,299,300,302],{},[14,282,16],{}," records, for every function, the time charged to its own frame separately from the time charged to frames it called. ",[14,285,20],{}," is the self-time; ",[14,288,24],{}," rolls in the descendants. That is why callers high in the stack accumulate large ",[14,291,24],{}," while doing little real work, and why a tight inner loop shows ",[14,294,295],{},"tottime ≈ cumtime",". Sorting by ",[14,298,24],{}," answers \"where does the run spend its time\" (the hot path), and sorting by ",[14,301,20],{}," answers \"which line is doing the work\" (the hot leaf). You almost always want both: cumulative to navigate to the path, total to find the function to change.",[31,304,306],{"id":305},"edge-cases-and-failure-modes","Edge cases and failure modes",[36,308,309,322,340,352,369],{},[39,310,311,314,315,318,319,321],{},[178,312,313],{},"Recursion inflates ncalls."," The ",[14,316,317],{},"total\u002Fprimitive"," split is the tell. Optimizing a recursive function means cutting call count (memoization) or self-cost per call — ",[14,320,214],{}," on primitive calls tells you which.",[39,323,324,327,328,331,332,334,335,339],{},[178,325,326],{},"Built-in rows dominate."," Many ",[14,329,330],{},"\u003Cbuilt-in method>"," entries with high ",[14,333,20],{}," usually mean a hot loop calling cheap builtins millions of times; the fix is fewer calls, not a faster builtin. Cross-check wall-clock attribution with ",[71,336,338],{"href":337},"\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Fprofiling-a-running-process-with-py-spy\u002F","py-spy on a live process"," when deterministic overhead is suspect.",[39,341,342,345,346,348,349,351],{},[178,343,344],{},"cumtime is not additive across rows."," You cannot sum ",[14,347,24],{}," over functions to get total runtime; nested calls double-count. Read the root frame's ",[14,350,24],{}," for the whole-run figure.",[39,353,354,364,365,368],{},[178,355,356,359,360,363],{},[14,357,358],{},"callers"," \u002F ",[14,361,362],{},"callees"," views."," Use ",[14,366,367],{},"stats.print_callers(\"parse_record\")"," to see who drives a hotspot — essential when a leaf is called from several paths.",[39,370,371,374,375,377],{},[178,372,373],{},"Threaded code under-reports."," ",[14,376,16],{}," only profiles the thread that started it; spawned threads are invisible. Profile each thread, or sample with py-spy.",[31,379,381],{"id":380},"frequently-asked-questions","Frequently Asked Questions",[10,383,384,387,389,390,392,393,395,396,233],{},[178,385,386],{},"What is the difference between tottime and cumtime in cProfile?",[14,388,20],{}," (total time) is the time spent in a function's own body, excluding calls it makes. ",[14,391,24],{}," (cumulative time) includes that plus all time spent in functions it called. A function that mostly delegates has low ",[14,394,20],{}," and high ",[14,397,24],{},[10,399,400,403,404,406,407,409,410,412,413,415],{},[178,401,402],{},"Which column should I sort by to find a hotspot?","\nSort by ",[14,405,20],{}," to find the leaf functions actually burning CPU, and by ",[14,408,24],{}," to find which high-level entry points dominate the run. Start with ",[14,411,24],{}," to locate the hot path, then ",[14,414,20],{}," to find the line doing the work.",[10,417,418,421,422,424],{},[178,419,420],{},"What does ncalls show when it reads like 240\u002F2?","\nThe first number is total calls and the second is primitive (non-recursive) calls. A split like ",[14,423,186],{}," means the function recursed: it was entered 240 times overall but only 2 of those were the original non-recursive entries.",[10,426,427,428],{},"← Back to ",[71,429,430],{"href":73},"CPU Profiling with cProfile and py-spy",[432,433,434],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":89,"searchDepth":102,"depth":102,"links":436},[437,438,439,440,441],{"id":33,"depth":102,"text":34},{"id":78,"depth":102,"text":79},{"id":277,"depth":102,"text":278},{"id":305,"depth":102,"text":306},{"id":380,"depth":102,"text":381},"Read cProfile output correctly: what tottime, cumtime, ncalls, and percall mean, why a dispatcher has high cumtime but low tottime, and how to sort pstats.","md",{"slug":445,"type":446,"breadcrumb":447,"datePublished":448,"dateModified":448,"faq":449,"howto":456},"interpreting-cprofile-cumulative-vs-total-time","long_tail","cumtime vs tottime","2026-06-18",[450,452,454],{"q":386,"a":451},"tottime (total time) is the time spent in a function's own body, excluding calls it makes. cumtime (cumulative time) includes that plus all time spent in functions it called. A function that mostly delegates has low tottime and high cumtime.",{"q":402,"a":453},"Sort by tottime to find the leaf functions actually burning CPU, and by cumtime to find which high-level entry points dominate the run. Start with cumtime to locate the hot path, then tottime to find the line doing the work.",{"q":420,"a":455},"The first number is total calls and the second is primitive (non-recursive) calls. A split like 240\u002F2 means the function recursed: it was entered 240 times overall but only 2 of those were the original non-recursive entries.",{"name":457,"description":458,"steps":459},"How to interpret cProfile output","Load a cProfile stats file and read tottime, cumtime, ncalls, and percall to locate the real hotspot.",[460,463,466,469],{"name":461,"text":462},"Load the stats file","Read the .prof file with pstats.Stats and call strip_dirs to remove path noise from function names.",{"name":464,"text":465},"Sort by cumulative time","Sort by SortKey.CUMULATIVE to see which entry points account for the most total time including subcalls.",{"name":467,"text":468},"Sort by total time","Sort by SortKey.TIME to surface leaf functions whose own body burns the most CPU.",{"name":470,"text":471},"Read ncalls and percall","Interpret ncalls for call frequency and recursion, and percall for average cost per call to decide between reducing calls or speeding each one.","\u002Fsystematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Finterpreting-cprofile-cumulative-vs-total-time",{"title":5,"description":442},"systematic-debugging-performance-profiling\u002Fcpu-profiling-with-cprofile-and-py-spy\u002Finterpreting-cprofile-cumulative-vs-total-time\u002Findex","mXSHUzfynJU1P8zlMVdZQIbvCpsdOqjuzuX1xXeBy0Q",1781793487406]