WIP: ci: run asan over sharness#7558
Conversation
Problem: A test in t2714-python-cli-batch.t assumes a prior test has been executed. But that prior test will not run if ASAN is configured. Set NO_ASAN on the test to ensure it isn't executed like prior dependent ones.
Problem: Tests in t0006-module-exec.t and t2614-job-shell-doom.t test how segfaults are reported. Under ASAN, segfault signals may be handled / reported differently. Skip tests that expect a specific message when a segfault occurs. Under ASAN, that specific message cannot be expected.
Problem: A test in t0006-module-exec.t and t3306-system-routercrash.t simulate a segfault by sending a SIGSEGV signal to crash a module / broker. With ASAN, the signal could be captured, the expected result may not happen, and an asan log will be generated indicating a segfault happened. Follow up tests depend on the SIGSEGV crashing a module / broker, so we can't just skip the test. These tests are specifically covering SIGSEGV, so we don't want to just change the signal. Under ASAN, instead send a SIGKILL to crash the broker / module. This will ensure follow on tests continue to work as expected under ASAN. We will still get ample coverage of the SIGSEGV case under non-ASAN workflows.
Problem: Several tests are skipped when ASAN is enabled, but they have no comments / explanation why. Add comments explaining that the tests in t0005-exec.t are skipped because ASAN causes a segfault to be reported differently than we normally would expect. Tests in t0016-cron-faketime.t and t3001-mpi-personalities.t are skipped because they change LD_PRELOAD. Tests in t2714-python-cli-batch.t are skipped because of slowness.
Problem: The 'ps' and 'pkill' command hangs when ASAN is enabled. Unset LD_PRELOAD when the 'ps' or 'pkill' command will be run. This will effectively disable ASAN when they are run.
Problem: A few tests fail or hang under ASAN for test specific reasons. Set NO_ASAN on those tests.
Problem: The asan CI tests only run against unit tests in the src/ directory. This is because tests did not pass or some tests hung. This is no longer the case. Run asan over the entire testsuite under t/. Increase timeouts as the tests are expected to take more time.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7558 +/- ##
=======================================
Coverage 84.11% 84.11%
=======================================
Files 569 569
Lines 96925 96925
=======================================
+ Hits 81524 81531 +7
+ Misses 15401 15394 -7 🚀 New features to boost your workflow:
|
We could always schedule it to run nightly if we're concerned about holding up PRs. |
Was thinking of that! First run here on CI took ~80m. So not horrific ... Edit: sorry didn't notice this .... |
|
Whoah, I did not expect so many errors on CI. It did atleast finish.
I'm guessing an issue with different image + maybe different asan. Will have to try the fedora image (previous work on #7538 was on RHEL8). With this many errors, an especially so many related to "simple things" (like High level skimming
I does make me pause a moment. If "simple" things like Note: |
|
closing, we're going to go with additions of just a new |
Problem: The asan CI tests only run against unit tests in the src/ directory. This is because tests did not pass or some tests hung.
This is no longer the case.
Run asan over the entire testsuite under t/. Increase timeouts as the tests are expected to take more time.
WIP: we'll see how much time asan takes. could adjust those timeouts as needed.
Built on top of #7538
Edit: I did not add any log file checks to
check-annotate.shyet. Wanted to see how far this builder went first. It's possible it'll take so long we will decide not to run.