Skip to content

WIP: ci: run asan over sharness#7558

Closed
chu11 wants to merge 7 commits into
flux-framework:masterfrom
chu11:issue7537_run_asan_sharness
Closed

WIP: ci: run asan over sharness#7558
chu11 wants to merge 7 commits into
flux-framework:masterfrom
chu11:issue7537_run_asan_sharness

Conversation

@chu11

@chu11 chu11 commented Apr 22, 2026

Copy link
Copy Markdown
Member

Problem: The asan CI tests only run against unit tests in the src/ directory. This is because tests did not pass or some tests hung.
This is no longer the case.

Run asan over the entire testsuite under t/. Increase timeouts as the tests are expected to take more time.


WIP: we'll see how much time asan takes. could adjust those timeouts as needed.

Built on top of #7538

Edit: I did not add any log file checks to check-annotate.sh yet. Wanted to see how far this builder went first. It's possible it'll take so long we will decide not to run.

chu11 added 7 commits April 22, 2026 12:37
Problem: A test in t2714-python-cli-batch.t assumes a prior test
has been executed.  But that prior test will not run if ASAN is
configured.

Set NO_ASAN on the test to ensure it isn't executed like prior
dependent ones.
Problem: Tests in t0006-module-exec.t and t2614-job-shell-doom.t
test how segfaults are reported.  Under ASAN, segfault signals may be
handled / reported differently.

Skip tests that expect a specific message when a segfault occurs.  Under
ASAN, that specific message cannot be expected.
Problem: A test in t0006-module-exec.t and t3306-system-routercrash.t
simulate a segfault by sending a SIGSEGV signal to crash a module / broker.
With ASAN, the signal could be captured, the expected result may not happen,
and an asan log will be generated indicating a segfault happened.  Follow up
tests depend on the SIGSEGV crashing a module / broker, so we can't just skip
the test.  These tests are specifically covering SIGSEGV, so we don't want to
just change the signal.

Under ASAN, instead send a SIGKILL to crash the broker / module.  This will
ensure follow on tests continue to work as expected under ASAN.  We will still
get ample coverage of the SIGSEGV case under non-ASAN workflows.
Problem: Several tests are skipped when ASAN is enabled, but they
have no comments / explanation why.

Add comments explaining that the tests in t0005-exec.t are skipped
because ASAN causes a segfault to be reported differently than we
normally would expect.  Tests in t0016-cron-faketime.t and
t3001-mpi-personalities.t are skipped because they change LD_PRELOAD.
Tests in t2714-python-cli-batch.t are skipped because of slowness.
Problem: The 'ps' and 'pkill' command hangs when ASAN is enabled.

Unset LD_PRELOAD when the 'ps' or 'pkill' command will be run.  This
will effectively disable ASAN when they are run.
Problem: A few tests fail or hang under ASAN for test specific reasons.

Set NO_ASAN on those tests.
Problem: The asan CI tests only run against unit tests in the src/
directory.  This is because tests did not pass or some tests hung.
This is no longer the case.

Run asan over the entire testsuite under t/.  Increase timeouts as
the tests are expected to take more time.
@codecov

codecov Bot commented Apr 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.11%. Comparing base (14352ad) to head (b82de64).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7558   +/-   ##
=======================================
  Coverage   84.11%   84.11%           
=======================================
  Files         569      569           
  Lines       96925    96925           
=======================================
+ Hits        81524    81531    +7     
+ Misses      15401    15394    -7     

see 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wihobbs

wihobbs commented Apr 22, 2026

Copy link
Copy Markdown
Member

It's possible it'll take so long we will decide not to run.

We could always schedule it to run nightly if we're concerned about holding up PRs.

@chu11

chu11 commented Apr 22, 2026

Copy link
Copy Markdown
Member Author

We could always schedule it to run nightly if we're concerned about holding up PRs.

Was thinking of that!

First run here on CI took ~80m. So not horrific ...

Edit: sorry didn't notice this .... Error: The action 'docker-run-checks with ASan' has timed out after 80 minutes.

@chu11

chu11 commented Apr 22, 2026

Copy link
Copy Markdown
Member Author

Whoah, I did not expect so many errors on CI. It did atleast finish.

Warning: Found 69 errors from 467 tests in testsuite

I'm guessing an issue with different image + maybe different asan. Will have to try the fedora image (previous work on #7538 was on RHEL8).

With this many errors, an especially so many related to "simple things" (like dd failing makes some content .

High level skimming

  • seems like anything related to color output or display or tty had an issue. Perhaps something specific to asan not liking it.
  • seems like anything related to dd had issues with running out of memory. Might be an asan + ci workflow memory thing (in one test: dd: memory exhausted by input buffer of size 64 bytes (64 B))
  • every python test had the trace I'll post at the bottom.

I does make me pause a moment. If "simple" things like dd or anything tty related will have problems with asan in our ci workflow, we may end up not running mountain of sharness tests as a result. So suddenly running against sharness with asan suddenly has less value (work on #7538 is useful b/c we can still test within beefier non-ci).

Note: check-annotate.sh already outputs asan logs, so don't have to add that.

==236154==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7ffff4ecc030 at pc 0x7ffff79ea676 bp 0x7fffffff6f50 sp 0x7fffffff6710
  READ of size 174752 at 0x7ffff4ecc030 thread T0
      #0 0x7ffff79ea675 in memcpy (/usr/lib64/libasan.so.8+0xf5675) (BuildId: c1431025b5d8af781c22c9ceea71f065c547d32d)
      #1 0x7ffff7485b0d in dictresize (/usr/lib64/libpython3.12.so.1.0+0x160b0d) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #2 0x7ffff74862cf in PyDict_SetDefault (/usr/lib64/libpython3.12.so.1.0+0x1612cf) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #3 0x7ffff747e17d in _PyUnicode_InternInPlace (/usr/lib64/libpython3.12.so.1.0+0x15917d) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #4 0x7ffff74938eb in r_object (/usr/lib64/libpython3.12.so.1.0+0x16e8eb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #5 0x7ffff7493a7a in r_object (/usr/lib64/libpython3.12.so.1.0+0x16ea7a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #6 0x7ffff7493cc4 in r_object (/usr/lib64/libpython3.12.so.1.0+0x16ecc4) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #7 0x7ffff7493ab9 in r_object (/usr/lib64/libpython3.12.so.1.0+0x16eab9) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #8 0x7ffff7493cac in r_object (/usr/lib64/libpython3.12.so.1.0+0x16ecac) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #9 0x7ffff75237b9 in read_object (/usr/lib64/libpython3.12.so.1.0+0x1fe7b9) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #10 0x7ffff754aa49 in marshal_loads (/usr/lib64/libpython3.12.so.1.0+0x225a49) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #11 0x7ffff7497494 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x172494) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #12 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #13 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #14 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #15 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #16 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #17 0x7ffff75400f6 in builtin_exec (/usr/lib64/libpython3.12.so.1.0+0x21b0f6) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #18 0x7ffff74adfdb in cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/lib64/libpython3.12.so.1.0+0x188fdb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #19 0x7ffff749be44 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x176e44) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #20 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #21 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #22 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #23 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #24 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #25 0x7ffff75400f6 in builtin_exec (/usr/lib64/libpython3.12.so.1.0+0x21b0f6) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #26 0x7ffff74adfdb in cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/lib64/libpython3.12.so.1.0+0x188fdb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #27 0x7ffff749be44 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x176e44) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #28 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #29 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #30 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #31 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #32 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #33 0x7ffff75400f6 in builtin_exec (/usr/lib64/libpython3.12.so.1.0+0x21b0f6) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #34 0x7ffff74adfdb in cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/lib64/libpython3.12.so.1.0+0x188fdb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #35 0x7ffff749be44 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x176e44) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #36 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #37 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #38 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #39 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #40 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #41 0x7ffff75400f6 in builtin_exec (/usr/lib64/libpython3.12.so.1.0+0x21b0f6) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #42 0x7ffff74adfdb in cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/lib64/libpython3.12.so.1.0+0x188fdb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #43 0x7ffff749be44 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x176e44) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #44 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #45 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #46 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #47 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #48 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #49 0x7ffff75400f6 in builtin_exec (/usr/lib64/libpython3.12.so.1.0+0x21b0f6) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #50 0x7ffff74adfdb in cfunction_vectorcall_FASTCALL_KEYWORDS (/usr/lib64/libpython3.12.so.1.0+0x188fdb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #51 0x7ffff749be44 in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x176e44) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #52 0x7ffff74b5e3a in object_vacall (/usr/lib64/libpython3.12.so.1.0+0x190e3a) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #53 0x7ffff74d5359 in PyObject_CallMethodObjArgs (/usr/lib64/libpython3.12.so.1.0+0x1b0359) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #54 0x7ffff74d4c55 in PyImport_ImportModuleLevelObject (/usr/lib64/libpython3.12.so.1.0+0x1afc55) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #55 0x7ffff749d41b in _PyEval_EvalFrameDefault (/usr/lib64/libpython3.12.so.1.0+0x17841b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #56 0x7ffff7522ef3 in PyEval_EvalCode (/usr/lib64/libpython3.12.so.1.0+0x1fdef3) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #57 0x7ffff7547159 in run_eval_code_obj (/usr/lib64/libpython3.12.so.1.0+0x222159) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #58 0x7ffff75418bd in run_mod (/usr/lib64/libpython3.12.so.1.0+0x21c8bd) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #59 0x7ffff755c0f2 in pyrun_file (/usr/lib64/libpython3.12.so.1.0+0x2370f2) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #60 0x7ffff755b3eb in _PyRun_SimpleFileObject (/usr/lib64/libpython3.12.so.1.0+0x2363eb) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #61 0x7ffff755b00e in _PyRun_AnyFileObject (/usr/lib64/libpython3.12.so.1.0+0x23600e) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #62 0x7ffff75537c2 in Py_RunMain (/usr/lib64/libpython3.12.so.1.0+0x22e7c2) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #63 0x7ffff750b06b in Py_BytesMain (/usr/lib64/libpython3.12.so.1.0+0x1e606b) (BuildId: 4eb4ec979b0f0c63567af48fe88d1591c571bad5)
      #64 0x7ffff715e087 in __libc_start_call_main (/usr/lib64/libc.so.6+0x2a087) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
      #65 0x7ffff715e14a in __libc_start_main_alias_2 (/usr/lib64/libc.so.6+0x2a14a) (BuildId: 8f53abaad945a669f2bdcd25f471d80e077568ef)
      #66 0x555555555094 in _start (/usr/bin/python3.12+0x1094) (BuildId: 3382022a8b02ec2cee9e04ba8326741e93cf928e)
  
  Address 0x7ffff4ecc030 is a wild pointer inside of access range of size 0x00000002aaa0.
  SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/lib64/libasan.so.8+0xf5675) (BuildId: c1431025b5d8af781c22c9ceea71f065c547d32d) in memcpy
  Shadow bytes around the buggy address:
    0x7ffff4ecbd80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecbe00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecbe80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecbf00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecbf80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  =>0x7ffff4ecc000: fa fa fa fa fa fa[fa]fa fa fa fa fa fa fa fa fa
    0x7ffff4ecc080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecc100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecc180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecc200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ffff4ecc280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07 
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==236154==ABORTING

@chu11

chu11 commented May 13, 2026

Copy link
Copy Markdown
Member Author

closing, we're going to go with additions of just a new t5001-asan.t sharness that will be the only thing run under asan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants