Skip to content

Tachyon sampling profiler overcounts recursive function cumulative time #142374

@jbosboom

Description

@jbosboom

Bug report

Bug description:

CPython main at c5b3722, ./configure --enable-optimizations --with-lto, Linux 6.17.9-arch1-1

Consider the following small program that does some CPU-bound work in a recursive function:

import sys, random

def foo(x):
    for _ in range(x):
        random.sample(range(x), x)
    if x > 0:
        foo(x-1)
    for _ in range(x):
        random.sample(range(x), x)

foo(int(sys.argv[1]))

When I profile this program with the command line python -m profiling.sampling run --sort cumul-pct tachyon-recursive-cumul.py 500, I get the following output:

Captured 100001 samples in 10.00 seconds
Sample rate: 10000.06 samples/sec
Error rate: 16.89%
Profile Stats:
       nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
   256/13217760       0.3         0.026   15903.2      1321.776  tachyon-recursive-cumul.py:7(foo)
     620/113462       0.7         0.062     136.5        11.346  tachyon-recursive-cumul.py:5(foo)
        0/83110       0.0         0.000     100.0         8.311  tachyon-recursive-cumul.py:11(<module>)
        0/83110       0.0         0.000     100.0         8.311  _sync_coordinator.py:193(_execute_script)
        0/83110       0.0         0.000     100.0         8.311  _sync_coordinator.py:234(main)
        0/83110       0.0         0.000     100.0         8.311  _sync_coordinator.py:251(<module>)
        0/83110       0.0         0.000     100.0         8.311  <frozen runpy>:88(_run_code)
        0/83110       0.0         0.000     100.0         8.311  <frozen runpy>:198(_run_module_as_main)
    52397/70127      63.0         5.240      84.4         7.013  random.py:446(Random.sample)
       150/6409       0.2         0.015       7.7         0.641  tachyon-recursive-cumul.py:9(foo)
      5553/5736       6.7         0.555       6.9         0.574  random.py:448(Random.sample)
      4965/4965       6.0         0.496       6.0         0.496  random.py:249(Random._randbelow_with_getrandbits)
      4513/4513       5.4         0.451       5.4         0.451  random.py:250(Random._randbelow_with_getrandbits)
      2849/2849       3.4         0.285       3.4         0.285  random.py:251(Random._randbelow_with_getrandbits)
      2501/2501       3.0         0.250       3.0         0.250  random.py:252(Random._randbelow_with_getrandbits)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 17.8% direct samples: random.py:(Random._randbelow_with_getrandbits)
  0.764 direct/cumulative ratio, 69.7% direct samples: random.py:(Random.sample)
  0.000 direct/cumulative ratio, 1.2% direct samples: tachyon-recursive-cumul.py:(foo)

Functions with Highest Call Frequency (Indirect Calls):
  13336605 indirect calls, 16047.4% total stack presence: tachyon-recursive-cumul.py:(foo)
  83110 indirect calls, 100.0% total stack presence: tachyon-recursive-cumul.py:(<module>)
  83110 indirect calls, 100.0% total stack presence: _sync_coordinator.py:(_execute_script)

Functions with Highest Call Magnification (Cumulative/Direct):
  12999.6x call magnification, 13336605 indirect calls from 1026 direct: tachyon-recursive-cumul.py:(foo)
  1.3x call magnification, 17913 indirect calls from 57950 direct: random.py:(Random.sample)

Per the legend "cumul%: Percentage of total samples when this function was on the call stack", I expect foo to have about 100% cumul%. Instead it has 15903.2% and/or 136.5%. The module correctly has 100%, so the problem seems to relate to recursion. Perhaps the profiler is counting each stack frame in each sample?

The actual program I want to profile has a highly recursive part and a nonrecursive part. I want to use the cumulative percent/time to compare the amount of time spent in each part.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.15new features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directorytopic-profilingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions