i2c::slave: implement embedded-mcu-hal i2c::target trait#565
i2c::slave: implement embedded-mcu-hal i2c::target trait#565felipebalbi wants to merge 22 commits into
Conversation
Adds embedded_mcu_hal::i2c::target::blocking::I2c and ::asynch::I2c trait impls to the existing I2cSlave driver. The inherent listen / respond_to_write / respond_to_read API and the Command / Response enums are preserved verbatim — the trait impls are layered on top via private helpers that return richer InternalEvent / InternalTermination enums. Both flavours are implemented twice, once for SevenBitAddress and once for TenBitAddress; a runtime address-mode check returns ErrorKind::Other when the wrong impl is driven against the configured address. Adds a public recover() method on both I2cSlave<'_, Blocking> and I2cSlave<'_, Async> that aborts any in-flight DMA, NAKs a pending byte, disables DMA arming, clears the deselect latch, and drops any pending remediation flags. The configured address(es) and slven bit are preserved across recover, matching the trait contract. Surfaces a RepeatedStart edge from a previous respond_to_* as a separate listen event for the trait impls (via the new PendingEdge bookkeeping field). The inherent API ignores this field, so existing callers are unaffected. A probe (0-byte write) surfaces as Request::Stop(addr) with no preceding Write — documented in the module rustdoc. Adds two rt685s-evk examples that drive the new trait API end-to-end: i2c-slave-async-target-trait.rs and i2c-slave-target-trait.rs. Pin-out matches the existing i2c-slave-async.rs example so the Pi5 master test rig referenced in that example works unchanged. Includes a host-side unit test module covering the Error::kind() mapping for every TransferError variant + UnsupportedConfiguration, and the AddressKind helper round-trips. Tests compile via cargo check --tests; running them requires the same out-of-tree test harness setup the rest of the crate already needs (cortex-m-rt is pulled in unconditionally). Bumps the embedded-mcu-hal dependency from 0.2.0 to 0.3.0, which is the first release containing the i2c::target trait module.
The async respond_to_read_internal poll_fn was missing the symmetric counterpart of the write path's '!dma_channel.is_active() && is_slave_receive()' wake condition. When DMA finished pushing the supplied buffer but the master kept clocking SCL for more bytes, the hardware did not raise slvpending (it just floated SDA so the master clocked 0xFF from the bus pull-ups), and the future slept until something external aborted the transaction. Symptom from a 16-byte controller read against an 8-byte slave buffer: the host received [0..7, 0xFF*8] after a ~9 s timeout instead of [0..7, 0..7], and the slave never logged NeedMore. Add the missing wake arm to poll_fn and the matching post-await branch that returns InternalTermination::Continued(buf_len) → ReadStatus::NeedMore so the caller can refill the buffer and resume the read.
Cargo Vet Audit Passed
|
There was a problem hiding this comment.
Pull request overview
Implements the embedded_mcu_hal::i2c::target I2C target/slave traits for the existing I2cSlave driver (blocking + async, 7-bit + 10-bit), adds a recover() API to re-baseline the peripheral after cancelled/wedged transfers, and introduces new on-target examples demonstrating the trait-based API. Also bumps embedded-mcu-hal to 0.3.0 to pick up the new target trait module.
Changes:
- Add
embedded_mcu_hal::i2c::target::{blocking, asynch}::I2cimplementations forI2cSlave(7-bit and 10-bit) and supporting internal event/termination plumbing. - Add
recover()for both blocking and async flavours, plus pending-edge bookkeeping to surfaceRepeatedStartvia the trait API. - Add two rt685s-evk examples using the target trait API; bump
embedded-mcu-haldependency to 0.3.0 (and lockfiles accordingly).
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/i2c/slave.rs |
Adds target-trait implementations, internal event/termination mapping, pending edge tracking, and recover() for blocking/async. |
examples/rt685s-evk/src/bin/i2c-slave-target-trait.rs |
New blocking example exercising the target trait API. |
examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs |
New async example exercising the target trait API. |
examples/rt685s-evk/Cargo.toml |
Bumps embedded-mcu-hal to 0.3.0 for the examples workspace. |
examples/rt685s-evk/Cargo.lock |
Lockfile updates due to the dependency bump. |
Cargo.toml |
Bumps embedded-mcu-hal to 0.3.0 for the main crate. |
Cargo.lock |
Lockfile updates due to the dependency bump. |
Runs cargo vet to a clean 'Vetting Succeeded (111 fully audited, 1 partially audited, 9 exempted)' state on top of the embedded-mcu-hal 0.2.0 -> 0.3.0 bump and the dependency churn since the last vet pass. Audits added: - embedded-mcu-hal 0.2.0 -> 0.3.0 — first-party delta covering the new i2c::target trait module (which I authored) and the new smbus::bus module. Trait-only additions, no new unsafe, no new transitive deps. - defmt-rtt 1.0.0 -> 1.1.0 — single new public query function in_blocking_mode() (relaxed atomic read of the RTT up-channel flags). Pulled in transitively via the ariel-os import set. - typenum 1.20.0 -> 1.20.1 — macro hygiene fix using \\::tarr![...]\ inside the recursive expansion; doc_root URL bump. Trust entries added (publishers cargo-vet already covers via other audit sets): - dtolnay → syn, thiserror, thiserror-impl - joshlf → zerocopy, zerocopy-derive (already on the OpenDevicePartnership trust list) - cuviper → autocfg - KodrAus → log - rust-lang-owner → cc (matches existing libc trust entry) Imports added: - ariel-os — gives us the upstream defmt-rtt 1.0.0 -> 1.1.0 audit instead of the previously-unavailable 1.0.0 -> 1.2.0 jump. Exemption changes: - chrono bumped 0.4.40 -> 0.4.44 (430-line diff, same djc publisher). - generator bumped 0.8.8 -> 0.8.9 (loom dev-dep, safe-to-run only). - cc bumped 1.2.59 -> 1.2.63 (build-time only, GitHub Actions publisher). - New shlex 2.0.1 exemption (transitive build-only dep of cc). - Other previously-exempted crates (az, bitfield, cordyceps, embassy-embedded-hal, embassy-futures, embassy-hal-internal, embassy-sync, embassy-time-driver, embedded-hal*, embedded-io-async, embedded-storage*, find-msvc-tools, fixed, hash32, heapless, ident_case, mimxrt600-fcb, portable-atomic, syn, thiserror*, tracing*, typenum 1.18.0, valuable) were pruned automatically by cargo vet because they are now covered by trust entries, imports, or audits.
kurtjd
left a comment
There was a problem hiding this comment.
There seems to be quite a bit of code duplication. Looks like you've defined several *_inner methods which are mostly copy and paste of the existing methods they mirror with slightly different return types to be used by the trait impls. Is it not possible to just use the exusting methods and just map their output in the trait impls instead of duplicating code?
| // hardware does not raise slvpending once DMA is armed and | ||
| // running dry — SDA just floats and the master clocks 0xFF | ||
| // until something else aborts the transaction. | ||
| if !dma_channel.is_active() && stat.slvstate().is_slave_transmit() { |
There was a problem hiding this comment.
I had to drop the partial read condition from the original implementation due to a HW race condition, in stress testing, the slv_state() would go to addressed instead of transmitted so the async operation gets stuck:
felipebalbi@62dc59c, and I cannot find a workaround even after working with NXP.
There was a problem hiding this comment.
The change here is on the read side, not write. Haven't seen this particular race condition. I'll try to add a quick test to my hacky tests (with pico de gallo).
There was a problem hiding this comment.
Yeah, this is in respond_to_read_internal(). I was hitting this race condition consistently in long term stressing test with release binary.
There was a problem hiding this comment.
unable to hit it. Has been running for quite a while now.
There was a problem hiding this comment.
I'll bring Mole tomorrow for testing, I guess it's really the only way to give this a solid shot. Really cannot reproduce with pico de gallo.
There was a problem hiding this comment.
Mole has been running since 6AM today, no errors.
There was a problem hiding this comment.
Running i2c-loopback-async.rs with this changes to trigger case where i2c target does not provide enough data on single call of respond_to_read() on IMXRT in loopback mode produces a panic:
diff --git a/examples/rt685s-evk/src/bin/i2c-loopback-async.rs b/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
index fbdbf50..1086100 100644
--- a/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
+++ b/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
@@ -16,7 +16,7 @@ const ADDR: u8 = 0x20;
const MASTER_BUFLEN: usize = 8;
// slave buffer has to be 1 bigger than master buffer because master does not
// handle end of read properly
-const SLAVE_BUFLEN: usize = MASTER_BUFLEN + 1;
+const SLAVE_BUFLEN: usize = MASTER_BUFLEN / 2;
const SLAVE_ADDR: Option<Address> = Address::new(ADDR);
bind_interrupts!(struct Irqs {
9140.448752 [INFO ] Read @ "SevenBit(32)" (i2c_loopback_async src/bin/i2c-loopback-async.rs:43)
9140.448841 [INFO ] Response to read got 4 bytes, more bytes to fill (i2c_loopback_async src/bin/i2c-loopback-async.rs:54)
9140.448934 [INFO ] Response to read got 4 bytes, more bytes to fill (i2c_loopback_async src/bin/i2c-loopback-async.rs:54)
9140.485323 [ERROR] panicked at src/bin/i2c-loopback-async.rs:45:68:
called `Result::unwrap()` on an `Err` value: Transfer(WriteFail) (panic_probe panic-probe-1.0.0/src/lib.rs:104)
Firmware exited unexpectedly: Exception
WARN probe_rs_debug::debug_info: UNWIND: Error while checking for exception context. The stack trace will not include the calling frames. : Probe(Register("No Stack Pointer register. Please report this as a bug."))
Core 0
Frame 0: HardFault_ @ 0x08007a7e
/home/jxie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cortex-m-rt-0.7.5/src/lib.rs:1103:1
Frame 1: UNWIND: Error while checking for exception context. The stack trace will not include the calling frames. : Probe(Register("No Stack Pointer register. Please report this as a bug.")) @ 0x08007a7e
The line that triggered this error
// Verify that we are ready for transmit
if !i2c.stat().read().slvstate().is_slave_transmit() {
return Err(TransferError::WriteFail.into());
}
This corresponds to the hardware race condition that we engaged with NXP before:
We are seeing some inconsistencies on the SLVSTATE behavior:
* In some instance, for I2C writes where slave is in transmit mode. After DMA is completed, we get the DMA interrupt. Master has not NACK any data. SLVSTATE is addressed (0x0) instead of transmit (0x2).
* In some instances, for I2C writes where slave is in transmit mode. After DMA is completed, we get the DMA interrupt. Master is generating NACK and STOP with a little bit of delay. Initially we observed SLVSTATE is set to transmit (2) and SLVSEL is set, then SLVSTATE changes to addressed (0) briefly and SLVSEL is set, then finally SLVSTATE stays as addressed and then SLVDESEL is set and SLVSEL is cleared.
We don't have a 100% guarantee that SLVSTATE will be in transmit
poll_fn(|cx| {
let i2c = self.info.regs;
self.info.waker.register(cx.waker());
dma_channel.get_waker().register(cx.waker());
let stat = i2c.stat().read();
// Master sent a stop or nack
if stat.slvdesel().is_deselected() {
return Poll::Ready(());
}
// We need SW intervention
if stat.slvpending().is_pending() {
return Poll::Ready(());
}
// DMA drained but the master still expects more bytes.
// Without this guard the future sleeps forever because the
// hardware does not raise slvpending once DMA is armed and
// running dry — SDA just floats and the master clocks 0xFF
// until something else aborts the transaction.
if !dma_channel.is_active() && stat.slvstate().is_slave_transmit() { <-- this slvstate is not guaranteed to be in the state we expect
return Poll::Ready(());
}
Poll::Pending
})
.await;
Maybe we should consider not supporting Response::ReadContinued(xfer_count) case on imxrt?
I'll do it, but it will be a breaking change for the existing API due to |
Addresses the duplication kurtjd flagged on the PR by promoting the inherent Command and Response types into the rich shape and letting the trait impls map down from them. The state-machine bodies that used to be split between the inherent methods and four _internal helpers are now the single source of truth on the inherent path; the target trait impls are thin wrappers.
Public API changes (both enums now #[non_exhaustive] so future variants do not break callers):
Command::Probe { addr: Address } // was: Command::Probe
Command::Read { addr: Address } // was: Command::Read
Command::Write { addr: Address } // was: Command::Write
Response::Stopped(usize) // STOP terminator
Response::Restarted(usize) // Sr terminator
Response::WriteContinued(usize) // write side: buffer drained, master still sending
Response::ReadContinued(usize) // read side: buffer exhausted, master still clocking
Response::ReadComplete(usize) // buffer exactly consumed at NACK+STOP
Response::ReadEarlyStop(usize) // master STOP'd with bytes left in buffer
Plus convenience helpers Response::bytes() and Response::is_terminal() so callers that just want the byte count and a yes/no don't have to match six variants by hand.
Reuses the existing public Address enum (SevenBit(u8) / TenBit(u16)) for the Command address payload instead of introducing a new AddressKind type. AddressKind, InternalEvent, InternalTermination, write_termination_to_status, read_termination_to_status, and all four *_internal helpers (blocking + async × listen/respond_to_write/respond_to_read) are deleted.
The pending: Cell<PendingEdge> field stays — it carries the RepeatedStart edge from a previous respond_to_* to the next target-trait listen() call so the trait surfaces Request::RepeatedStart as its own event. The inherent API folds the restart edge into the preceding Response::Restarted instead and never reads pending.
Trait impls disambiguate inherent-vs-trait method calls via small private _impl shims (recover_impl, listen_impl, respond_to_write_impl, respond_to_read_impl). The public inherent methods delegate to those shims so the inherent surface stays public; the trait impls call the shims directly so resolution is unambiguous when both I2c<SevenBitAddress> and I2c<TenBitAddress> are in scope.
Inherent recover() stays pub on both flavours (Blocking sync, Async async).
Slave-side examples updated:
- i2c-slave.rs / i2c-slave-async.rs / i2c-loopback-async.rs / i2c-loopback-async-10bit.rs — match on Command::{Probe,Read,Write} { addr } and use Response::is_terminal() + Response::bytes() in the respond loops.
- i2c-slave-async-target-trait.rs — comment updated to reflect that the inherent API now surfaces probes as Command::Probe { addr } too (not collapsed).
Verification:
- cargo build --target thumbv8m.main-none-eabihf with 5 representative feature combos + 8 minimal sample combos: clean.
- cargo build --release for both mimxrt685s and mimxrt633s with full features: clean.
- cargo clippy --locked -- -Dwarnings on both -os-timer and -rtc time-driver feature combos: clean.
- cargo clippy --locked -- -Dwarnings -D clippy::style on the examples workspace: clean.
- cargo check --tests: clean (3 new unit tests cover Response helpers and cmd_addr widening).
- cargo +nightly fmt --check: clean on slave.rs and on every touched example.
- cargo vet: still 'Vetting Succeeded (111 fully audited, 1 partially audited, 9 exempted)'.
Net delta on src/i2c/slave.rs: -236 lines.
Replaces the four private *_impl wrappers (listen_impl, respond_to_write_impl, respond_to_read_impl, recover_impl) on both Blocking and Async with direct UFCS calls in the trait impl bodies:
I2cSlave::<Blocking>::listen(self)
I2cSlave::<Async>::respond_to_read(self, buf).await
I2cSlave::<Async>::recover(self).await
The shims existed only to disambiguate inherent-vs-trait method resolution when both I2c<SevenBitAddress> and I2c<TenBitAddress> impls are in scope. Spelling the inherent receiver type explicitly via UFCS removes that ambiguity at the call site without an extra layer of indirection. The public inherent methods (listen, respond_to_write, respond_to_read, recover) are now the only definitions; the trait impls call them directly.
Verification: cargo build (debug + release, mimxrt685s + mimxrt633s, full and minimal feature combos), cargo clippy --locked -- -Dwarnings on both -os-timer and -rtc time-driver flavours, examples workspace clippy --locked with style/perf/correctness lints, cargo check --tests, cargo +nightly fmt --check, cargo vet --locked --frozen — all clean.
Net delta on src/i2c/slave.rs: -43 lines (1436 -> 1393); -279 lines vs the original PR baseline.
CI's cargo-vet bumped from 0.10.1 to 0.10.2, which has stricter TOML string-literal normalisation: single-quoted literal strings are preferred over double-quoted-with-escaping for notes that contain bare double quotes, and triple-quoted multi-line strings whose interior contains no special characters get unescaped. All twelve affected lines are inside notes the Google audit aggregator imports verbatim (proc-macro-error / proc-macro-error-attr / proc-macro2 review notes); no audit semantics or trust entries change. Generated by running 'cargo vet fmt' against the existing store.
CI's nightly rustfmt (1.9.0-nightly, rustc 1.98.0-nightly 6bdf43094 2026-06-01) prefers one-import-per-line for the throwaway 'use _ as _;' bindings used to anchor defmt-rtt, panic-probe, and embassy-imxrt-examples into the binary. The older nightly that was pinned locally collapsed those into braced groups; the current CI nightly expands them again. Pure formatter output across 47 example binaries + examples/rt633/src/lib.rs. No behavioural changes. Generated by running 'cargo +nightly fmt' in examples/rt685s-evk and examples/rt633 with rustfmt 1.9.0-nightly (6bdf43094f 2026-06-01).
The embedded_mcu_hal::i2c::target impls already check the configured address mode in listen(), but respond_to_read() and respond_to_write() trusted the caller to go through listen() first. A misbehaving caller that drives the wrong trait impl (e.g. I2c<SevenBitAddress> on a 10-bit-configured slave) directly into respond_to_* would have skipped the guard and driven the peripheral with mismatched semantics. Add self.require_address_kind(want_ten_bit)? as the first statement of every respond_to_read() and respond_to_write() across all four trait impls (Blocking SevenBit/TenBit, Async SevenBit/TenBit). The guard is O(1) — one Option::is_some + one equality compare — and returns Error::UnsupportedConfiguration on mismatch, which maps to ErrorKind::Other at the trait surface (per the existing Error::kind() implementation tested in error_kind_mapping_covers_every_transfer_variant). Of the eight respond_to_* methods, one (Blocking SevenBitAddress::respond_to_read) was already guarded by the auto-merged Copilot autofix in 273ec2e; this commit adds the remaining seven. The result is uniform address-mode rejection across the full trait API surface, regardless of call ordering.
273ec2e to
04cfe79
Compare
The async/blocking i2c-slave-*-target-trait examples now emit two `warn!` lines when the slave observes event shapes associated with the `slv_state -> addressed` mid-DMA HW race tracked on PR OpenDevicePartnership#565: 1. `WriteStatus::Restarted(0)` — a zero-byte restart should not occur on a healthy bus; a real repeated START is preceded by at least one ACKed payload byte. Seeing it strongly suggests the slave mis-classified an in-progress receive as a restart. 2. `Request::RepeatedStart(_)` arriving when the prior `respond_to_*` did not report `Restarted(_)`. The queued edge that surfaces as `RepeatedStart` should always have a matching upstream `Restarted`. Both signatures are tracked via a single local `expect_repeated_start: bool` flipped on `Restarted` and consumed on the next `listen()`. `info!` `(consumed expected RepeatedStart edge before Read/Write)` is also logged on the healthy combined-format path so the RTT capture has a clear positive marker, not just the negative warning. Drives the race-watch surface complementary to the host-side `write-read-soak` harness: the host call times out and prints the iteration that wedged; the on-target RTT log identifies which mis-classification the slave made when it stopped clocking. Also drops the stale module-rustdoc bullet on the async example that claimed `recover()` was called from the `Restarted` branch as a no-op demonstration — the actual code (rightly) does not call it, because a `Restarted` is a healthy mid-transaction continuation. Replaced with explicit `recover()` is not called on the happy path language, plus a new `Race-watching telemetry` section in the rustdoc that documents the two warnings and points at the soak tool. Addresses Copilot review thread PRRT_kwDOMUqVSs6Gc48a on PR OpenDevicePartnership#565. No behavioural change to the trait API or the inherent `I2cSlave` driver. Verified locally: cargo +stable build --release --bin i2c-slave-async-target-trait --bin i2c-slave-target-trait (clean), cargo +stable clippy --locked -- -Dwarnings -D clippy::suspicious -D clippy::correctness -D clippy::perf -D clippy::style on examples/rt685s-evk (clean), cargo +nightly fmt --check (clean).
… respond_to_read The async `respond_to_read` termination handler treated `slvpending().is_pending() || slvstate().is_slave_address()` as a single `Restarted` case, queueing a `RepeatedStart` edge for any trait caller. `slvpending` is overloaded though — it asserts for *any* SW-intervention condition the FC peripheral cannot resolve on its own, not just a repeated START. The common case on the read path is a controller-side NACK ending the transaction: the FC raises `slvpending` first, then propagates `slvdesel` one peripheral cycle later. If the poll loop wakes on the `slvpending` arm before `slvdesel` lands, we synthesise a phantom `Restarted` plus a queued `RepeatedStart` edge — and the next `listen()` reports a `RepeatedStart` event that has no matching Sr on the wire. Split the branch to discriminate on the actual state, matching what the blocking flavour already does at line 665: * `slvstate:().is_slave_address()` — a genuine new address phase has been latched, queue `EdgeKind::RepeatedStart` and report `Response::Restarted(xfer_count)`. * `slvpending:().is_pending()` (bare, without `slvdesel` or new address) — treat as a normal read termination: NAK to settle the peripheral, then report `ReadComplete(xfer_count)` or `ReadEarlyStop(xfer_count)` based on whether the buffer was fully drained. No edge queued. This also hardens against the `slv_state -> addressed` mid-DMA HW race jerrysxie has been chasing on the read path: if the FC transiently mis-classifies during a stress soak, `slvpending` may fire without a real address transition. After this change the worst-case observable becomes a `ReadEarlyStop` instead of a fabricated trait-side `RepeatedStart` followed by a wedge on the next `listen()` waiting for a phantom Sr that never came. Addresses Copilot review thread PRRT_kwDOMUqVSs6GcUOx on PR on-target race-watch `warn!` lines added in 62937dd — the soak now has a fix to prove, not just a probe to fire.
5239e78 to
edf3ec4
Compare
| extern crate embassy_imxrt_perf_examples; | ||
|
|
||
| use core::sync::atomic::{AtomicBool, Ordering}; | ||
|
|
| match r { | ||
| Response::ReadComplete(n) => mcu_target::ReadStatus::Complete(n), | ||
| Response::ReadContinued(n) => mcu_target::ReadStatus::NeedMore(n), | ||
| Response::ReadEarlyStop(n) | Response::Restarted(n) | Response::Stopped(n) => { |
There was a problem hiding this comment.
There is no possible way for resp_read to returned Response::Stopped(n) in the current code, right?
Why is Response::Restarted(n) -> mcu_target::ReadStatus::EarlyStop(n)? We can technically exhaust the DMA and still get a restart? Just trying to understand the terminology here.
| self.require_address_kind(false)?; | ||
|
|
||
| // Drain any queued edge first. | ||
| if let Some(req) = self.drain_edge::<SevenBitAddress>(cmd_addr_u8) { |
There was a problem hiding this comment.
So we are skipping the listen() step if we restart? So this would be an optimization?
Wouldn't we not skip this transition in listen() to advance the i2c state machine if that is the case?
// Disable DMA
i2c.slvctl().write(|w| w.slvdma().disabled());
// Check whether we already have a matched address and just waiting
// for software ack/nack
if !i2c.stat().read().slvpending().is_pending() {
self.poll_sw_action().await;
}
if i2c.stat().read().slvstate().is_slave_address() {
i2c.slvctl().write(|w| w.slvcontinue().continue_());
} else {
// If we are already past the addressed phase and in transmit or receive, that means we are already in the
// next state, most likely due to calling listen() before the previous transaction is completed and leading
// to state transition out of order. We can tolerate that, so we just move onto the next state.
}
// Poll for HW to transitioning from addressed to receive/transmit
self.poll_sw_action().await;
I guess in this case, we left the DMA enabled and the i2c state machine auto transition to the valid state anyway?
There was a problem hiding this comment.
What if after we restarted, we change direction? If we skip listen() and check for directions, do we even know that we change direction?
There was a problem hiding this comment.
Oh, I see, for repeated start, the client would have call listen() again to get the direction. Nevermind.
There was a problem hiding this comment.
Hmmm, this feels a bit weird, is this the expected sequence of events for start -> read -> start -> read
- listen() -> Request::Read(addr)
- respond_to_read() -> Response::Restarted(n) -> ReadStatus::EarlyStop(n)
- listen() -> Request::RepeatedStart(prev_addr)
- listen() -> Request::Read(addr)
| // synthesise a `RepeatedStart` edge here — that would | ||
| // fabricate a phantom trait-side event with no matching | ||
| // upstream Sr on the wire. | ||
| i2c.slvctl().write(|w| w.slvnack().nack()); |
There was a problem hiding this comment.
Slave is sending data to the master, so this nack here does not make sense.
Do we ever hit this branch in testing? With DMA enabled, the slv pending bit is usually never set in my observation.
I left the pending flag check in there previously just as a catch-all in case if something strange ever happens.
| self.require_address_kind(false)?; | ||
|
|
||
| // Drain any queued edge first. | ||
| if let Some(req) = self.drain_edge::<SevenBitAddress>(cmd_addr_u8) { |
There was a problem hiding this comment.
Hmmm, this feels a bit weird, is this the expected sequence of events for start -> read -> start -> read
- listen() -> Request::Read(addr)
- respond_to_read() -> Response::Restarted(n) -> ReadStatus::EarlyStop(n)
- listen() -> Request::RepeatedStart(prev_addr)
- listen() -> Request::Read(addr)
1af80e4 to
176ad80
Compare
Long-running soak example targeting the specific master pattern that surfaces `WriteStatus::Restarted(0)` on the slave trait API: master.write_read(addr, &w[..k*S], &mut r[..m]) where `S` is the slave receive-buffer length and the master write payload is always an exact `k*S` multiple. The slave's first `respond_to_write` returns `BufferFull(S)`; the caller loops and the second call returns `Restarted(0)` because the controller has issued Sr on the wire in between. The existing `i2c-slave-async-target-trait` example flags that shape as `RACE WATCH` on the suspicion of a HW race. This soak tests the alternative hypothesis: `Restarted(0)` is a buffer-boundary surfacing artifact, not a hardware race. The example: * Uses the `embedded_mcu_hal::i2c::target::asynch::I2c` trait so the reproducer matches the harness that produced the original log. * Sweeps `k in 1..=4` and `m in 1..=S-1` round-robin (12 unique pairs per cycle), back-to-back with no master pacing. * Wraps every awaited transfer on both tasks in `with_timeout(250 ms)`; any timeout panics with a distinct message identifying the call site and the offending iteration/parameters so RTT freezes with the wedge state captured. * Refills the slave TX buffer with `t_buf[j] = (j + iter) mod 256` each iteration; the master verifies the increment in the read buffer and panics on mismatch (catches cross-iteration buffer bleed without requiring shared state). * Counts every terminator branch on the slave (BufferFull, Stopped(0)/n, Restarted(0)/n, ReadComplete/NeedMore/EarlyStop, RepeatedStart, Probe) and emits a one-line summary every 10 000 iterations. Pass signal for the hypothesis: many millions of iterations with no `defmt::panic!` and no read mismatch -> the `Restarted(0)` shape is benign API surfacing, not a HW race. Any timeout or mismatch promotes the artifact to a real bug with the failing iteration captured in RTT. Wiring matches the existing i2c-loopback-async example (FC2 slave on PIO0_18/PIO0_17 + DMA0_CH4, FC4 master on PIO0_29/PIO0_30 + DMA0_CH9, Fast mode at 50 percent duty).
176ad80 to
94dd08b
Compare
The async `respond_to_write` and `respond_to_read` entry guards both required `slvstate` to be `slave_receive` / `slave_transmit` and treated everything else as `TransferError::ReadFail` / `TransferError::WriteFail` (the only exception was an entry-time `slvdesel`, which `respond_to_write` mapped to `Stopped(0)`). That is correct for the straight-through case where the caller has just returned from `listen()`, but wrong for the continuation case where the caller is looping back with a fresh buffer after a prior call returned `Response::WriteContinued` / `Response::ReadContinued`. Between the previous call returning and the re-entry being polled the executor still runs `info!` lines, sets up `with_timeout`, polls other ready futures, etc. At 400 kHz the controller can finish its remaining bytes, ACK its last byte, and issue Sr+ADDR+R/W in that gap. When the re-entry then samples `stat`, `slvstate` is `slave_address` (with `slvpending` asserted waiting for SW to ACK/NACK the new address phase) rather than `slave_receive` / `slave_transmit`. The old guard fabricated a `ReadFail` / `WriteFail`, which the i2c-write-read- boundary-soak example surfaced as a `defmt::panic!` from its `Ok(Err(e))` arm — a HardFault visible to probe-rs as `Firmware exited unexpectedly: Exception` because the defmt panic message and breadcrumb dump were dropped by defmt-rtt's saturated host-side buffer before the `udf #0` halt landed. The blocking flavour was unaffected because it polls inside the for-loop and naturally `break`s out of the loop when `slave_address` fires, then the post-loop arm queues the restart edge and returns `Response::Restarted(xfer_count)`. Replace both async entry guards with re-entry classification that mirrors what the post-poll path already emits, so the continuation case sees uniform `Response` shapes regardless of when the controller terminated: * `slvdesel` set — clear the latch and return `Response::Stopped(0)` / `Response::ReadEarlyStop(0)`. * `slvstate == slave_address` — queue `EdgeKind::RepeatedStart` and return `Response::Restarted(0)`. The next `listen()` then continues the address phase and surfaces the new direction normally. * other non-receive / non-transmit state — keep returning `TransferError::ReadFail` / `TransferError::WriteFail`, the caller must `recover()`. `respond_to_read` got the symmetric treatment for completeness; the current i2c-write-read-boundary-soak example keeps `READ_LEN_MAX < SLAVE_BUFLEN` so the read leg never triggers `ReadContinued` and the race is not directly exercised on the read path, but the same window exists. Validated on rt685s-evk with the i2c-write-read-boundary-soak example (FC2 slave ↔ FC4 master loopback, Fast/400 kHz, S=4, k∈[1..=4], m∈[1..=3]). Pre-fix: HardFault after ~28 minutes of MCU uptime at slv_iter ≈ 150 000. Post-fix: ran cleanly past 76 minutes of MCU uptime, slv_iter ≈ 11.2 million, no panic / wedge / mismatch / fault anywhere in the RTT log.
Yesterday's classify-bus-state fix (286df1f) reduced the mean time between failures on the i2c-write-read-boundary-soak from ~28 minutes to ~3 hours, but did not eliminate the wedge. The remaining failure mode was a master-side `WEDGE: master.write_read timed out` that landed with the slave stuck inside `respond_to_write`'s `poll_fn` await, between bc[0x18] ("about to await respond_to_write") and the 0x19 ("returned") that never came. Root cause is a window-of-vulnerability between the entry guard's `stat.read()` and the `i2c.slvctl().write(|w| w.slvdma().enabled())` that arms DMA. The flow inside the async `respond_to_write` is: 1. entry guard reads `stat` -> sees `slvstate == slave_receive` 2. enable SLVDMA 3. install NakGuard / dma_guard 4. construct the DMA Transfer 5. await poll_fn If the controller finishes its current data phase and issues Sr+ADDR+R between steps 1 and 5, the SLV peripheral transitions out of `slave_receive` (to `slave_address`, and -- in DMA mode -- auto-acknowledges the address and lands in `slave_transmit`). Per the mimxrt633s-pac STAT documentation (matching UM11147 25.7.2.2.2), `SLVPENDING is not set when the DMA is handling an event`. With SLVDMA now set, this suppression covers the auto-handled address phase, so `slvpending` does not raise on the bus state change. The DMA channel also never completes (it is armed for receive into a buffer that will never receive bytes, the bus having moved past the receive phase). Neither of the future's wake sources fires, and the await wedges until its outer `with_timeout` cancels it. Yesterday's fix only checked the entry guard. The window between that read and DMA arm -- a few-dozen-clock instruction sequence under release builds -- is just long enough at 400 kHz to lose the race against a tight write_read pattern on a fast controller. This commit adds a second `stat` read AFTER all the guards are installed but BEFORE awaiting. If the bus has advanced past the expected direction, drop into the same classification the post-poll path uses: * slvdesel set -> clear the latch, return Stopped(0) / ReadEarlyStop(0) * slvstate latched in slave_address -> queue the RepeatedStart edge for any trait caller, return Restarted(0) The dropped guards (`_transfer`, `_dma_guard`) handle the SLVDMA teardown and DMA channel abort on the early return; `nak_guard` is defused explicitly on the write path because we never moved any bytes and there is no in-flight transmission to NAK. Validated on rt685s-evk with the i2c-write-read-boundary-soak example (FC2 slave <-> FC4 master loopback, Fast/400 kHz, S=4, k in [1..=4], m in [1..=3]). Pre-fix MTBF was a few hours. Post-fix run achieved 19h 57min of continuous clean MCU uptime with zero panic / wedge / mismatch / fault markers across the full rotated log set (~226 MB of RTT output lefover from the 3x 100MiB rotated logs) and zero entries in the forensic Err-path capture ring before the runner was stopped for cleanup.
Summary
This PR implements the
embedded_mcu_hal::i2c::targettrait on the existingI2cSlavedriver and adds two on-target examples that exercise it.The driver gains two trait implementations:
embedded_mcu_hal::i2c::target::blocking::I2cforI2cSlave<'_, Blocking>embedded_mcu_hal::i2c::target::asynch::I2cforI2cSlave<'_, Async>Each is implemented twice — once for
SevenBitAddressand once forTenBitAddress— with a runtime guard that returnsErrorKind::Otherwhen the wrong impl is driven against the configured address.The existing inherent
listen/respond_to_write/respond_to_readAPI and theCommand/Responseenums are preserved verbatim. The trait impls are layered on top via private helpers that return richer internal events / terminations, mapped down to either flavour at the boundary. The existingi2c-slave.rsandi2c-slave-async.rsexamples build and behave unchanged.What's new
recover()as an inherent method on both flavours: aborts in-flight DMA, NAKs any pending byte, disables DMA arming, clears the deselect latch, and drops any pending remediation flag. The configured address(es) andslvenbit are preserved across recover, matching the trait contract.PendingEdgebookkeeping that surfaces aSr(repeated START) terminator from a previousrespond_to_*as aRequest::RepeatedStart(prev_addr)on the nextlisten()call — for trait callers only; the inherent API is unaffected.Probe → Stop(addr)mapping: a zero-byte write surfaces asRequest::Stop(addr)with no precedingWrite, documented in the module rustdoc.Error::kind()mapping for everyTransferErrorvariant +UnsupportedConfiguration, with unit tests in#[cfg(test)] mod tests.Bug fix included
Mid-implementation HIL testing surfaced a pre-existing-style bug in the new async DMA read helper that this PR introduces: the
poll_fnwas missing the symmetric counterpart of the write path's!dma_channel.is_active() && is_slave_receive()wake condition. When DMA finished draining the supplied buffer but the controller kept clocking SCL for more bytes, the hardware did not raiseslvpending(it just floated SDA so the controller clocked0xFFfrom the bus pull-ups), and the future slept until something external aborted the transaction.The fix adds the missing
!dma_channel.is_active() && is_slave_transmit()wake arm plus the matching post-awaitInternalTermination::Continued(buf_len) → ReadStatus::NeedMoremapping. This is the second commit in the PR for ease of review.Examples
Two new rt685s-evk examples (FC2 /
PIO0_18SCL,PIO0_17SDA — same wiring asi2c-slave-async.rs, so any existing controller test rig works unchanged):examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rsexamples/rt685s-evk/src/bin/i2c-slave-target-trait.rsEach example wires the trait's
listen → match Request → respond_* → looploop and handles everyRequest/WriteStatus/ReadStatusvariant.HIL validation
Verified against an RT685S-EVK driven by a Pico de Gallo USB-to-I2C bridge. The full test matrix passes:
0x20.WriteStatus::Stopped(8).ReadStatus::Complete(8), payload matches[0..7].Sr→ 8-byte read in a single transaction →Restarted(4)+RepeatedStart(0x20)+Read(0x20)+Complete(8).BufferFull(8)cycles +Stopped(0).EarlyStop(4).NeedMore(8)+Complete(8)(the bug-fix scenario above).Dep bump
embedded-mcu-halis bumped from 0.2.0 to 0.3.0, which is the first release containing thei2c::targettrait module.Other notes
unsafeblocks, clock setup, or pin muxing.mimxrt685sandmimxrt633sfeature combos.examples/rt685s-evk/still build.