Skip to content

Fix rtt_target blocking panic handler.#491

Merged
jordens merged 6 commits intoquartiq:mainfrom
reitermarkus:rtt-target-panic-handler
Apr 8, 2026
Merged

Fix rtt_target blocking panic handler.#491
jordens merged 6 commits intoquartiq:mainfrom
reitermarkus:rtt-target-panic-handler

Conversation

@reitermarkus
Copy link
Copy Markdown
Contributor

@reitermarkus reitermarkus commented Apr 28, 2025

In release mode, the panic_persist::report_panic_info was never reached because the panic handler would block when writing to the RTT target (since none is connected).

Only set BlockIfFull in debug mode, and replace unsafe UpChannel::conjure with with_terminal_channel while we're at it.

@jordens
Copy link
Copy Markdown
Member

jordens commented Apr 28, 2025

Why would this only be triggered in release?

IIRC the unsafe conjure is there to avoid deadlock/borrow failure. with_terminal_channel looks like it would panic when nested.

@jordens
Copy link
Copy Markdown
Member

jordens commented Apr 28, 2025

And a hang on panic doesn't sound too bad here. IIRC the watchdog should kick in.
I can see the peristent panic stuff being moved earlier though.

@reitermarkus
Copy link
Copy Markdown
Contributor Author

Why would this only be triggered in release?

Right, it isn't. It's triggered when no debug probe is connected. However there is already an assumption in the panic_handler that when debug_assertions are disabled, the device should reset rather than loop.

And a hang on panic doesn't sound too bad here. IIRC the watchdog should kick in.

The watchdog does indeed kick in, but I'd like the device to immediately reset without delay rather than wait for 4 seconds for the watchdog.

Maybe adding an rtt-target feature that can be disabled is the better option?

@jordens
Copy link
Copy Markdown
Member

jordens commented Apr 29, 2025

Ack the rtt feature.

  • Ensure that with rtt we don't enable the watchdog. It makes debugging difficult
  • Early panic persist
  • Blocking panic dump over rtt with unsafe conjure (or argue that there is no deadlock).

The reboot on panic never actually solves any problem or bug. I'm fine with keeping it as is. It seems slightly better than the alternative. But I won't spend time on "optimizing the panic experience". Rather fix the cause of the panic.

@reitermarkus reitermarkus force-pushed the rtt-target-panic-handler branch 2 times, most recently from e9ec34b to 8c6c052 Compare April 29, 2025 09:20
Comment thread src/hardware/platform.rs

#[cfg(debug_assertions)]
loop {}
cortex_m::asm::bkpt();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this, combined with watchdog.stop_on_debug(...), enough, or do you additionally want to completely disable the watchdog when the rtt feature is enabled?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Just watchdog.stop_on_debug is fine. Good find.

@reitermarkus
Copy link
Copy Markdown
Contributor Author

@jordens, anything still missing here?

@reitermarkus reitermarkus requested a review from jordens August 6, 2025 10:22
@jordens jordens added this pull request to the merge queue Aug 6, 2025
@jordens
Copy link
Copy Markdown
Member

jordens commented Aug 6, 2025

The review request was.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Aug 6, 2025
@jordens jordens added this pull request to the merge queue Aug 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 6, 2025
@reitermarkus reitermarkus requested a review from jordens August 13, 2025 10:21
Copy link
Copy Markdown
Member

@jordens jordens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linker script needs fixing AFAICT

@reitermarkus
Copy link
Copy Markdown
Contributor Author

Seems like rust-lld got more strict about whitespace. Should be fixed now.

@reitermarkus reitermarkus requested a review from jordens August 18, 2025 10:27
@jordens jordens enabled auto-merge August 18, 2025 11:10
@jordens jordens added this pull request to the merge queue Aug 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 18, 2025
@jordens
Copy link
Copy Markdown
Member

jordens commented Aug 18, 2025

+ probe-rs download --chip STM32F407ZGTx --log-file /dev/null --probe 0483:3754:003C002F5632500A20313236 target/thumbv7em-none-eabihf/release/booster
     Finished in 49.05s
+ probe-rs reset --chip STM32F407ZGTx --log-file /dev/null --probe 0483:3754:003C002F5632500A20313236 --connect-under-reset
 WARN probe_rs::architecture::arm::core::armv7m: Core is running, but we expected it to be halted
+ sleep 30
+ ping -c 5 -w 20 booster-hitl
+ python3 -m miniconf --broker mqtt.ber.quartiq.de --discover dt/sinara/booster/+ /telemetry_period=5
+ python3 hitl/basic.py --broker mqtt.ber.quartiq.de --prefix dt/sinara/booster/+
Traceback (most recent call last):
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 45, in channel_on
    await booster.miniconf.set(f'/channel/{channel}/state', initial_state)
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/miniconf.py", line 144, in set
    ret = await self._do(
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/miniconf.py", line 134, in _do
    return await fut
miniconf.miniconf.MiniconfException: ('Error', 'Variant absent (depth: 2)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 162, in <module>
    main()
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 151, in main
    sys.exit(loop.run_until_complete(test()))
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 147, in test
    await test_channel(booster, channel, tele.messages)
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 63, in test_channel
    async with channel_on(booster, channel, 'Powered'):
  File "/usr/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/home/runner/actions-runner/_work/hitl/hitl/hitl/basic.py", line 49, in channel_on
    await booster.miniconf.set(f'/channel/{channel}/state', 'Off')
miniconf.miniconf.MiniconfException: ('Error', 'Variant absent (depth: 2)')
+ finish
+ mosquitto_pub -h mqtt.ber.quartiq.de -t cmnd/tasmota_4F64D0/POWER -m OFF
Error: Process completed with exit code 1.

@reitermarkus reitermarkus force-pushed the rtt-target-panic-handler branch from 63b24b7 to d9501af Compare March 11, 2026 11:52
@reitermarkus
Copy link
Copy Markdown
Contributor Author

@jordens, I finally got around to testing this again. I ran the HITL tests locally but couldn't reproduce the error you posted. I did have to comment out the test_bias step though, since I tested this on a Bosster v1.6 HL.

I saw the miniconf version for Python was not in sync with the Rust version, so maybe that caused it?

@reitermarkus reitermarkus force-pushed the rtt-target-panic-handler branch from d9501af to b3d3f90 Compare March 11, 2026 12:11
@jordens jordens marked this pull request as ready for review March 13, 2026 08:48
@jordens
Copy link
Copy Markdown
Member

jordens commented Mar 13, 2026

Could you dummy-update the branch? CI got disabled and doesn't pick this up until it's updated.

@reitermarkus reitermarkus force-pushed the rtt-target-panic-handler branch from b3d3f90 to e4d533f Compare March 16, 2026 08:11
@reitermarkus reitermarkus requested a review from jordens March 16, 2026 09:47
@jordens jordens added this pull request to the merge queue Mar 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 16, 2026
@jordens
Copy link
Copy Markdown
Member

jordens commented Mar 16, 2026

With newer miniconf in this context it should target the specific booster by id, not discover by wildcard. I think we do it similarly over in stabilizer hitl.

+ python3 -m miniconf --broker mqtt.ber.quartiq.de --discover dt/sinara/booster/+ /telemetry_period=5
Traceback (most recent call last):
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/discover.py", line 78, in discover_one
    (device,) = devices.items()
ValueError: too many values to unpack (expected 1)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/__main__.py", line 128, in <module>
    main()
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/__main__.py", line 124, in main
    asyncio.run(run())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/__main__.py", line 76, in run
    prefix, _alive = await discover_one(client, args.prefix)
  File "/home/runner/actions-runner/_work/hitl/hitl/vpy/lib/python3.9/site-packages/miniconf/discover.py", line 80, in discover_one
    raise MiniconfException(
miniconf.miniconf.MiniconfException: ('Discover', "No unique Miniconf device (found `{'dt/sinara/booster/80-34-28-1a-8e-76': 0, 'dt/sinara/booster/80-34-28-1b-45-41': 1}`).")

@reitermarkus
Copy link
Copy Markdown
Contributor Author

Ah, so the CI environment has two boosters and fails because it only expects one?

Which of them should be used then, 80-34-28-1a-8e-76 or 80-34-28-1b-45-41?

@reitermarkus reitermarkus requested a review from jordens March 16, 2026 14:56
@jordens jordens added this pull request to the merge queue Mar 23, 2026
Copy link
Copy Markdown
Member

@jordens jordens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try. But I'm pretty sure it's the live one. 80-34-28-1b-45-41

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026
@jordens
Copy link
Copy Markdown
Member

jordens commented Mar 23, 2026

+ probe-rs download --chip STM32F407ZGTx --log-file /dev/null --probe 0483:3754:003C002F5632500A20313236 target/thumbv7em-none-eabihf/release/booster
     Finished in 56.78s
+ probe-rs reset --chip STM32F407ZGTx --log-file /dev/null --probe 0483:3754:003C002F5632500A20313236 --connect-under-reset
 WARN probe_rs::architecture::arm::core::armv7m: Core is running, but we expected it to be halted
+ sleep 30
+ ping -c 5 -w 20 booster-hitl
+ python3 -m miniconf --broker mqtt.ber.quartiq.de --discover dt/sinara/booster/80-34-28-1a-8e-76 /telemetry_period=5
##[debug]RESULT='Unknown'
Error: The action 'Execute Testing' has timed out after 5 minutes.
##[debug]Finishing: Execute Testing

Comment thread hitl/run.sh Outdated
@reitermarkus reitermarkus force-pushed the rtt-target-panic-handler branch from 7ccd1d1 to 5f72c44 Compare April 8, 2026 11:36
@reitermarkus reitermarkus requested a review from jordens April 8, 2026 11:36
@jordens jordens added this pull request to the merge queue Apr 8, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2026
@jordens
Copy link
Copy Markdown
Member

jordens commented Apr 8, 2026

It times out in the calibration. Let's just calibrate on two channels.

...
+ ping -c 5 -w 20 booster-hitl
+ python3 -m miniconf --broker mqtt.ber.quartiq.de --discover dt/sinara/booster/80-34-28-1b-45-41 /telemetry_period=5
+ python3 hitl/basic.py --broker mqtt.ber.quartiq.de --prefix dt/sinara/booster/80-34-28-1b-45-41
##[debug]RESULT='Unknown'
Error: The action 'Execute Testing' has timed out after 5 minutes.
##[debug]Finishing: Execute Testing

But I've also seen it timeout in a different way. This may be a different problem:

...
Vgs = -1.311 V, Ids = 51.67 mA
Vgs = -1.312 V, Ids = 51.47 mA
Vgs = -1.313 V, Ids = 51.08 mA
Vgs = -1.314 V, Ids = 50.89 mA
Vgs = -1.315 V, Ids = 50.50 mA
Vgs = -1.316 V, Ids = 50.30 mA
Vgs = -1.317 V, Ids = 49.91 mA
Channel 0 bias tuning: Vgs = -1.317, Ids = 0.04991279
Commanding channel 0 off
Setting output interlock threshold to 30 dB
Commanding channel 0 into Enabled
Commanding channel 0 off
Traceback (most recent call last):
  File "/home/rj/src/booster/hitl/basic.py", line 162, in <module>
    main()
    ~~~~^^
  File "/home/rj/src/booster/hitl/basic.py", line 151, in main
    sys.exit(loop.run_until_complete(test()))
             ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/home/rj/src/booster/hitl/basic.py", line 147, in test
    await test_channel(booster, channel, tele.messages)
  File "/home/rj/src/booster/hitl/basic.py", line 89, in test_channel
    await periodic_check(is_enabled, timeout=5)
  File "/home/rj/src/booster/hitl/basic.py", line 31, in periodic_check
    raise Exception('Condition did not occur within expected timeout')
Exception: Condition did not occur within expected timeout

@reitermarkus reitermarkus requested a review from jordens April 8, 2026 14:50
@jordens jordens added this pull request to the merge queue Apr 8, 2026
Merged via the queue into quartiq:main with commit 93b0766 Apr 8, 2026
3 checks passed
@reitermarkus reitermarkus deleted the rtt-target-panic-handler branch April 8, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants