Fix ASAN issues in sharness tests#7538
Conversation
11d4f72 to
e84367c
Compare
grondo
left a comment
There was a problem hiding this comment.
LGTM! However, since the testsuite still doesn't run under ASan, I'd say keep the PR open and wait until we have a fully passing make check with ASan enabled, then title the PR "testsuite: allow entire testsuite to run with address sanitizers enabled"
| grep "size=2 nodes=2" flux-${id4}.out && | ||
| grep "size=2 nodes=2" flux-${id5}.out | ||
| ' | ||
| test_expect_success MULTICORE 'flux batch: exclusive flag worked' ' |
There was a problem hiding this comment.
Commit message is difficult to parse: perhaps
t: set NO_ASAN on tests that depend on ASAN-skipped tests
| grep "^test two commands" help2.out && | ||
| grep "a test two" help2.out | ||
| ' | ||
| # N.B. Set NO_ASAN, as there appears to be errors in 'man', which is executed |
There was a problem hiding this comment.
Commit message suggestion: Remove interesting/quirky from problem description - "test specific reasons" is sufficient to fully describe the problem being fixed.
|
Uhoh, the restart check is failing now that we're actually generating a restart dump from the last tag. I highly doubt we've broken restarts since v0.84.0, so this is probably a testing issue. |
|
The test passed locally, so I'm not sure what went wrong. Unfortunately there was no detailed |
e84367c to
494c415
Compare
79d97fe to
c5951f2
Compare
|
just re-pushed, this lot of fixes
Note: exception with issue #7556, which we can workaround in CI / workflow we setup Note that I did not setup asan CI workflow to run sharness. We can do that in another PR. Due to the fact ASAN takes a long time to run, it's possible we may want to debate if we want to run in CI or not. Or perhaps we want to architect it to run a unique way in CI (i.e. split it up into multiple runners so it collectively runs faster). |
2ff9148 to
892f8c6
Compare
|
side note, #7558 shows that asan on other systems (maybe other versions of libasan) may have troubles. So perhaps we should consider this PR one that fixes asan issues so that sharness can succeed on rhel8 w/ asan. |
Problem: A test in t2714-python-cli-batch.t assumes a prior test has been executed. But that prior test will not run if ASAN is configured. Set NO_ASAN on the test to ensure it isn't executed like prior dependent ones.
Problem: Tests in t0006-module-exec.t and t2614-job-shell-doom.t test how segfaults are reported. Under ASAN, segfault signals may be handled / reported differently. Skip tests that expect a specific message when a segfault occurs. Under ASAN, that specific message cannot be expected.
Problem: A test in t0006-module-exec.t and t3306-system-routercrash.t simulate a segfault by sending a SIGSEGV signal to crash a module / broker. With ASAN, the signal could be captured, the expected result may not happen, and an asan log will be generated indicating a segfault happened. Follow up tests depend on the SIGSEGV crashing a module / broker, so we can't just skip the test. These tests are specifically covering SIGSEGV, so we don't want to just change the signal. Under ASAN, instead send a SIGKILL to crash the broker / module. This will ensure follow on tests continue to work as expected under ASAN. We will still get ample coverage of the SIGSEGV case under non-ASAN workflows.
Problem: Several tests are skipped when ASAN is enabled, but they have no comments / explanation why. Add comments explaining that the tests in t0005-exec.t are skipped because ASAN causes a segfault to be reported differently than we normally would expect. Tests in t0016-cron-faketime.t and t3001-mpi-personalities.t are skipped because they change LD_PRELOAD. Tests in t2714-python-cli-batch.t are skipped because of slowness.
Problem: Some tests file under ASAN due to problems with underlying commands. This can be worked around if ASAN was not set on those specific commands. Add a no-asan-wrapper.sh script. The simple script will unset LD_PRELOAD and ASAN_OPTIONS before running a command.
Problem: The 'ps' and 'pkill' command hangs when ASAN is enabled. Run 'ps' and 'pkill' under the no-asan script.
Problem: The 'dd' command has shown to have problems when run under asan. Wrap all calls to 'dd' under the no-asan-wrapper.sh script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem: A few tests fail or hang under ASAN for test specific reasons. Set NO_ASAN on those tests.
892f8c6 to
c52774f
Compare
|
re-pushed with a lot more fixes, most notably to make sharness pass on fedora40 with asan, not counting the python tests that fail with #7624. The big change is the addition of a new util script And for problematic commands, we call I went back on forth if I should replace I settled on doing "$NOASAN dd". It was partially influenced by the fact I didn't want to make a mountain of |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7538 +/- ##
==========================================
+ Coverage 83.99% 84.01% +0.02%
==========================================
Files 576 576
Lines 97215 97215
==========================================
+ Hits 81651 81678 +27
+ Misses 15564 15537 -27 🚀 New features to boost your workflow:
|
Fix various ASAN failures in sharness tests. Most fixes are by adding NO_ASAN to tests, in which ASAN leads to unexpected behavior changes.
One use-after-free was corrected in libterminus.(Edit: Changed my mind, will move that fix to #7539)Note that a number of hangs still exist in sharness with ASAN. Those are being dealt with separately.