Skip to content

i2c::slave: implement embedded-mcu-hal i2c::target trait#565

Open
felipebalbi wants to merge 22 commits into
OpenDevicePartnership:mainfrom
felipebalbi:i2c-slave-trait
Open

i2c::slave: implement embedded-mcu-hal i2c::target trait#565
felipebalbi wants to merge 22 commits into
OpenDevicePartnership:mainfrom
felipebalbi:i2c-slave-trait

Conversation

@felipebalbi

Copy link
Copy Markdown
Contributor

Summary

This PR implements the embedded_mcu_hal::i2c::target trait on the existing I2cSlave driver and adds two on-target examples that exercise it.

The driver gains two trait implementations:

  • embedded_mcu_hal::i2c::target::blocking::I2c for I2cSlave<'_, Blocking>
  • embedded_mcu_hal::i2c::target::asynch::I2c for I2cSlave<'_, Async>

Each is implemented twice — once for SevenBitAddress and once for TenBitAddress — with a runtime guard that returns ErrorKind::Other when the wrong impl is driven against the configured address.

The existing inherent listen / respond_to_write / respond_to_read API and the Command / Response enums are preserved verbatim. The trait impls are layered on top via private helpers that return richer internal events / terminations, mapped down to either flavour at the boundary. The existing i2c-slave.rs and i2c-slave-async.rs examples build and behave unchanged.

What's new

  • recover() as an inherent method on both flavours: aborts in-flight DMA, NAKs any pending byte, disables DMA arming, clears the deselect latch, and drops any pending remediation flag. The configured address(es) and slven bit are preserved across recover, matching the trait contract.
  • PendingEdge bookkeeping that surfaces a Sr (repeated START) terminator from a previous respond_to_* as a Request::RepeatedStart(prev_addr) on the next listen() call — for trait callers only; the inherent API is unaffected.
  • Probe → Stop(addr) mapping: a zero-byte write surfaces as Request::Stop(addr) with no preceding Write, documented in the module rustdoc.
  • Error::kind() mapping for every TransferError variant + UnsupportedConfiguration, with unit tests in #[cfg(test)] mod tests.

Bug fix included

Mid-implementation HIL testing surfaced a pre-existing-style bug in the new async DMA read helper that this PR introduces: the poll_fn was missing the symmetric counterpart of the write path's !dma_channel.is_active() && is_slave_receive() wake condition. When DMA finished draining the supplied buffer but the controller kept clocking SCL for more bytes, the hardware did not raise slvpending (it just floated SDA so the controller clocked 0xFF from the bus pull-ups), and the future slept until something external aborted the transaction.

The fix adds the missing !dma_channel.is_active() && is_slave_transmit() wake arm plus the matching post-await InternalTermination::Continued(buf_len) → ReadStatus::NeedMore mapping. This is the second commit in the PR for ease of review.

Examples

Two new rt685s-evk examples (FC2 / PIO0_18 SCL, PIO0_17 SDA — same wiring as i2c-slave-async.rs, so any existing controller test rig works unchanged):

  • examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs
  • examples/rt685s-evk/src/bin/i2c-slave-target-trait.rs

Each example wires the trait's listen → match Request → respond_* → loop loop and handles every Request / WriteStatus / ReadStatus variant.

HIL validation

Verified against an RT685S-EVK driven by a Pico de Gallo USB-to-I2C bridge. The full test matrix passes:

  • 7-bit address scan finds the slave at 0x20.
  • 8-byte write → WriteStatus::Stopped(8).
  • 8-byte read → ReadStatus::Complete(8), payload matches [0..7].
  • 4-byte write → Sr → 8-byte read in a single transaction → Restarted(4) + RepeatedStart(0x20) + Read(0x20) + Complete(8).
  • 24-byte write → three BufferFull(8) cycles + Stopped(0).
  • 4-byte read of an 8-byte buffer → EarlyStop(4).
  • 16-byte read of an 8-byte buffer → NeedMore(8) + Complete(8) (the bug-fix scenario above).

Dep bump

embedded-mcu-hal is bumped from 0.2.0 to 0.3.0, which is the first release containing the i2c::target trait module.

Other notes

  • No new feature flags.
  • No changes to register access, unsafe blocks, clock setup, or pin muxing.
  • Clippy clean on both the mimxrt685s and mimxrt633s feature combos.
  • All existing examples in examples/rt685s-evk/ still build.

Felipe Balbi added 2 commits June 1, 2026 12:54
Adds embedded_mcu_hal::i2c::target::blocking::I2c and ::asynch::I2c
trait impls to the existing I2cSlave driver. The inherent listen /
respond_to_write / respond_to_read API and the Command / Response enums
are preserved verbatim — the trait impls are layered on top via private
helpers that return richer InternalEvent / InternalTermination enums.

Both flavours are implemented twice, once for SevenBitAddress and once
for TenBitAddress; a runtime address-mode check returns ErrorKind::Other
when the wrong impl is driven against the configured address.

Adds a public recover() method on both I2cSlave<'_, Blocking> and
I2cSlave<'_, Async> that aborts any in-flight DMA, NAKs a pending byte,
disables DMA arming, clears the deselect latch, and drops any pending
remediation flags. The configured address(es) and slven bit are
preserved across recover, matching the trait contract.

Surfaces a RepeatedStart edge from a previous respond_to_* as a separate
listen event for the trait impls (via the new PendingEdge bookkeeping
field). The inherent API ignores this field, so existing callers are
unaffected. A probe (0-byte write) surfaces as Request::Stop(addr) with
no preceding Write — documented in the module rustdoc.

Adds two rt685s-evk examples that drive the new trait API end-to-end:
i2c-slave-async-target-trait.rs and i2c-slave-target-trait.rs. Pin-out
matches the existing i2c-slave-async.rs example so the Pi5 master test
rig referenced in that example works unchanged.

Includes a host-side unit test module covering the Error::kind() mapping
for every TransferError variant + UnsupportedConfiguration, and the
AddressKind helper round-trips. Tests compile via cargo check --tests;
running them requires the same out-of-tree test harness setup the rest
of the crate already needs (cortex-m-rt is pulled in unconditionally).

Bumps the embedded-mcu-hal dependency from 0.2.0 to 0.3.0, which is the
first release containing the i2c::target trait module.
The async respond_to_read_internal poll_fn was missing the symmetric
counterpart of the write path's '!dma_channel.is_active() &&
is_slave_receive()' wake condition. When DMA finished pushing the
supplied buffer but the master kept clocking SCL for more bytes, the
hardware did not raise slvpending (it just floated SDA so the master
clocked 0xFF from the bus pull-ups), and the future slept until
something external aborted the transaction.

Symptom from a 16-byte controller read against an 8-byte slave buffer:
the host received [0..7, 0xFF*8] after a ~9 s timeout instead of
[0..7, 0..7], and the slave never logged NeedMore.

Add the missing wake arm to poll_fn and the matching post-await branch
that returns InternalTermination::Continued(buf_len) →
ReadStatus::NeedMore so the caller can refill the buffer and resume
the read.
Copilot AI review requested due to automatic review settings June 1, 2026 19:57
@felipebalbi felipebalbi requested a review from a team as a code owner June 1, 2026 19:57
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Cargo Vet Audit Passed

cargo vet has passed in this PR. No new unvetted dependencies were found.

@github-actions github-actions Bot added the cargo vet PR requiring auditor review label Jun 1, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the embedded_mcu_hal::i2c::target I2C target/slave traits for the existing I2cSlave driver (blocking + async, 7-bit + 10-bit), adds a recover() API to re-baseline the peripheral after cancelled/wedged transfers, and introduces new on-target examples demonstrating the trait-based API. Also bumps embedded-mcu-hal to 0.3.0 to pick up the new target trait module.

Changes:

  • Add embedded_mcu_hal::i2c::target::{blocking, asynch}::I2c implementations for I2cSlave (7-bit and 10-bit) and supporting internal event/termination plumbing.
  • Add recover() for both blocking and async flavours, plus pending-edge bookkeeping to surface RepeatedStart via the trait API.
  • Add two rt685s-evk examples using the target trait API; bump embedded-mcu-hal dependency to 0.3.0 (and lockfiles accordingly).

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/i2c/slave.rs Adds target-trait implementations, internal event/termination mapping, pending edge tracking, and recover() for blocking/async.
examples/rt685s-evk/src/bin/i2c-slave-target-trait.rs New blocking example exercising the target trait API.
examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs New async example exercising the target trait API.
examples/rt685s-evk/Cargo.toml Bumps embedded-mcu-hal to 0.3.0 for the examples workspace.
examples/rt685s-evk/Cargo.lock Lockfile updates due to the dependency bump.
Cargo.toml Bumps embedded-mcu-hal to 0.3.0 for the main crate.
Cargo.lock Lockfile updates due to the dependency bump.

Comment thread src/i2c/slave.rs Outdated
Comment thread examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs Outdated
Comment thread Cargo.toml
Runs cargo vet to a clean 'Vetting Succeeded (111 fully audited, 1 partially audited, 9 exempted)' state on top of the embedded-mcu-hal 0.2.0 -> 0.3.0 bump and the dependency churn since the last vet pass.

Audits added:
- embedded-mcu-hal 0.2.0 -> 0.3.0 — first-party delta covering the new i2c::target trait module (which I authored) and the new smbus::bus module. Trait-only additions, no new unsafe, no new transitive deps.
- defmt-rtt 1.0.0 -> 1.1.0 — single new public query function in_blocking_mode() (relaxed atomic read of the RTT up-channel flags). Pulled in transitively via the ariel-os import set.
- typenum 1.20.0 -> 1.20.1 — macro hygiene fix using \\::tarr![...]\ inside the recursive expansion; doc_root URL bump.

Trust entries added (publishers cargo-vet already covers via other audit sets):
- dtolnay → syn, thiserror, thiserror-impl
- joshlf → zerocopy, zerocopy-derive (already on the OpenDevicePartnership trust list)
- cuviper → autocfg
- KodrAus → log
- rust-lang-owner → cc (matches existing libc trust entry)

Imports added:
- ariel-os — gives us the upstream defmt-rtt 1.0.0 -> 1.1.0 audit instead of the previously-unavailable 1.0.0 -> 1.2.0 jump.

Exemption changes:
- chrono bumped 0.4.40 -> 0.4.44 (430-line diff, same djc publisher).
- generator bumped 0.8.8 -> 0.8.9 (loom dev-dep, safe-to-run only).
- cc bumped 1.2.59 -> 1.2.63 (build-time only, GitHub Actions publisher).
- New shlex 2.0.1 exemption (transitive build-only dep of cc).
- Other previously-exempted crates (az, bitfield, cordyceps, embassy-embedded-hal, embassy-futures, embassy-hal-internal, embassy-sync, embassy-time-driver, embedded-hal*, embedded-io-async, embedded-storage*, find-msvc-tools, fixed, hash32, heapless, ident_case, mimxrt600-fcb, portable-atomic, syn, thiserror*, tracing*, typenum 1.18.0, valuable) were pruned automatically by cargo vet because they are now covered by trust entries, imports, or audits.
@felipebalbi felipebalbi requested a review from a team as a code owner June 1, 2026 20:11

@kurtjd kurtjd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be quite a bit of code duplication. Looks like you've defined several *_inner methods which are mostly copy and paste of the existing methods they mirror with slightly different return types to be used by the trait impls. Is it not possible to just use the exusting methods and just map their output in the trait impls instead of duplicating code?

Copilot AI review requested due to automatic review settings June 1, 2026 20:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.

Comment thread src/i2c/slave.rs Outdated
Comment thread src/i2c/slave.rs Outdated
Comment thread src/i2c/slave.rs Outdated
Comment thread src/i2c/slave.rs Outdated
Comment thread src/i2c/slave.rs Outdated
Comment thread examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs Outdated
Comment thread src/i2c/slave.rs Outdated
// hardware does not raise slvpending once DMA is armed and
// running dry — SDA just floats and the master clocks 0xFF
// until something else aborts the transaction.
if !dma_channel.is_active() && stat.slvstate().is_slave_transmit() {

@jerrysxie jerrysxie Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop the partial read condition from the original implementation due to a HW race condition, in stress testing, the slv_state() would go to addressed instead of transmitted so the async operation gets stuck:
felipebalbi@62dc59c, and I cannot find a workaround even after working with NXP.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is on the read side, not write. Haven't seen this particular race condition. I'll try to add a quick test to my hacky tests (with pico de gallo).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is in respond_to_read_internal(). I was hitting this race condition consistently in long term stressing test with release binary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to hit it. Has been running for quite a while now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bring Mole tomorrow for testing, I guess it's really the only way to give this a solid shot. Really cannot reproduce with pico de gallo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole has been running since 6AM today, no errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running i2c-loopback-async.rs with this changes to trigger case where i2c target does not provide enough data on single call of respond_to_read() on IMXRT in loopback mode produces a panic:

diff --git a/examples/rt685s-evk/src/bin/i2c-loopback-async.rs b/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
index fbdbf50..1086100 100644
--- a/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
+++ b/examples/rt685s-evk/src/bin/i2c-loopback-async.rs
@@ -16,7 +16,7 @@ const ADDR: u8 = 0x20;
const MASTER_BUFLEN: usize = 8;
// slave buffer has to be 1 bigger than master buffer because master does not
// handle end of read properly
-const SLAVE_BUFLEN: usize = MASTER_BUFLEN + 1;
+const SLAVE_BUFLEN: usize = MASTER_BUFLEN / 2;
const SLAVE_ADDR: Option<Address> = Address::new(ADDR);
bind_interrupts!(struct Irqs {
9140.448752 [INFO ] Read @ "SevenBit(32)" (i2c_loopback_async src/bin/i2c-loopback-async.rs:43)
9140.448841 [INFO ] Response to read got 4 bytes, more bytes to fill (i2c_loopback_async src/bin/i2c-loopback-async.rs:54)
9140.448934 [INFO ] Response to read got 4 bytes, more bytes to fill (i2c_loopback_async src/bin/i2c-loopback-async.rs:54)
9140.485323 [ERROR] panicked at src/bin/i2c-loopback-async.rs:45:68:
called `Result::unwrap()` on an `Err` value: Transfer(WriteFail) (panic_probe panic-probe-1.0.0/src/lib.rs:104)
Firmware exited unexpectedly: Exception
WARN probe_rs_debug::debug_info: UNWIND: Error while checking for exception context. The stack trace will not include the calling frames. : Probe(Register("No Stack Pointer register. Please report this as a bug."))
Core 0
    Frame 0: HardFault_ @ 0x08007a7e
       /home/jxie/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cortex-m-rt-0.7.5/src/lib.rs:1103:1
    Frame 1: UNWIND: Error while checking for exception context. The stack trace will not include the calling frames. : Probe(Register("No Stack Pointer register. Please report this as a bug.")) @ 0x08007a7e

The line that triggered this error

        // Verify that we are ready for transmit
        if !i2c.stat().read().slvstate().is_slave_transmit() {
            return Err(TransferError::WriteFail.into());
        }

This corresponds to the hardware race condition that we engaged with NXP before:

We are seeing some inconsistencies on the SLVSTATE behavior:
* In some instance, for I2C writes where slave is in transmit mode. After DMA is completed, we get the DMA interrupt. Master has not NACK any data. SLVSTATE is addressed (0x0) instead of transmit (0x2).
* In some instances, for I2C writes where slave is in transmit mode. After DMA is completed, we get the DMA interrupt. Master is generating NACK and STOP with a little bit of delay. Initially we observed SLVSTATE is set to transmit (2) and SLVSEL is set, then SLVSTATE changes to addressed (0) briefly and SLVSEL is set, then finally SLVSTATE stays as addressed and then SLVDESEL is set and SLVSEL is cleared.

We don't have a 100% guarantee that SLVSTATE will be in transmit

        poll_fn(|cx| {
            let i2c = self.info.regs;

            self.info.waker.register(cx.waker());
            dma_channel.get_waker().register(cx.waker());

            let stat = i2c.stat().read();
            // Master sent a stop or nack
            if stat.slvdesel().is_deselected() {
                return Poll::Ready(());
            }
            // We need SW intervention
            if stat.slvpending().is_pending() {
                return Poll::Ready(());
            }
            // DMA drained but the master still expects more bytes.
            // Without this guard the future sleeps forever because the
            // hardware does not raise slvpending once DMA is armed and
            // running dry — SDA just floats and the master clocks 0xFF
            // until something else aborts the transaction.
            if !dma_channel.is_active() && stat.slvstate().is_slave_transmit() {  <-- this slvstate is not guaranteed to be in the state we expect
                return Poll::Ready(());
            }

            Poll::Pending
        })
        .await;

Maybe we should consider not supporting Response::ReadContinued(xfer_count) case on imxrt?

@felipebalbi

Copy link
Copy Markdown
Contributor Author

There seems to be quite a bit of code duplication. Looks like you've defined several *_inner methods which are mostly copy and paste of the existing methods they mirror with slightly different return types to be used by the trait impls. Is it not possible to just use the exusting methods and just map their output in the trait impls instead of duplicating code?

I'll do it, but it will be a breaking change for the existing API due to Request not carrying the address and usize fields we need.

Felipe Balbi added 2 commits June 2, 2026 06:25
Addresses the duplication kurtjd flagged on the PR by promoting the inherent Command and Response types into the rich shape and letting the trait impls map down from them. The state-machine bodies that used to be split between the inherent methods and four _internal helpers are now the single source of truth on the inherent path; the target trait impls are thin wrappers.

Public API changes (both enums now #[non_exhaustive] so future variants do not break callers):

  Command::Probe { addr: Address }    // was: Command::Probe
  Command::Read  { addr: Address }    // was: Command::Read
  Command::Write { addr: Address }    // was: Command::Write

  Response::Stopped(usize)            // STOP terminator
  Response::Restarted(usize)          // Sr terminator
  Response::WriteContinued(usize)     // write side: buffer drained, master still sending
  Response::ReadContinued(usize)      // read side: buffer exhausted, master still clocking
  Response::ReadComplete(usize)       // buffer exactly consumed at NACK+STOP
  Response::ReadEarlyStop(usize)      // master STOP'd with bytes left in buffer

Plus convenience helpers Response::bytes() and Response::is_terminal() so callers that just want the byte count and a yes/no don't have to match six variants by hand.

Reuses the existing public Address enum (SevenBit(u8) / TenBit(u16)) for the Command address payload instead of introducing a new AddressKind type. AddressKind, InternalEvent, InternalTermination, write_termination_to_status, read_termination_to_status, and all four *_internal helpers (blocking + async × listen/respond_to_write/respond_to_read) are deleted.

The pending: Cell<PendingEdge> field stays — it carries the RepeatedStart edge from a previous respond_to_* to the next target-trait listen() call so the trait surfaces Request::RepeatedStart as its own event. The inherent API folds the restart edge into the preceding Response::Restarted instead and never reads pending.

Trait impls disambiguate inherent-vs-trait method calls via small private _impl shims (recover_impl, listen_impl, respond_to_write_impl, respond_to_read_impl). The public inherent methods delegate to those shims so the inherent surface stays public; the trait impls call the shims directly so resolution is unambiguous when both I2c<SevenBitAddress> and I2c<TenBitAddress> are in scope.

Inherent recover() stays pub on both flavours (Blocking sync, Async async).

Slave-side examples updated:
- i2c-slave.rs / i2c-slave-async.rs / i2c-loopback-async.rs / i2c-loopback-async-10bit.rs — match on Command::{Probe,Read,Write} { addr } and use Response::is_terminal() + Response::bytes() in the respond loops.
- i2c-slave-async-target-trait.rs — comment updated to reflect that the inherent API now surfaces probes as Command::Probe { addr } too (not collapsed).

Verification:
- cargo build --target thumbv8m.main-none-eabihf with 5 representative feature combos + 8 minimal sample combos: clean.
- cargo build --release for both mimxrt685s and mimxrt633s with full features: clean.
- cargo clippy --locked -- -Dwarnings on both -os-timer and -rtc time-driver feature combos: clean.
- cargo clippy --locked -- -Dwarnings -D clippy::style on the examples workspace: clean.
- cargo check --tests: clean (3 new unit tests cover Response helpers and cmd_addr widening).
- cargo +nightly fmt --check: clean on slave.rs and on every touched example.
- cargo vet: still 'Vetting Succeeded (111 fully audited, 1 partially audited, 9 exempted)'.

Net delta on src/i2c/slave.rs: -236 lines.
Replaces the four private *_impl wrappers (listen_impl, respond_to_write_impl, respond_to_read_impl, recover_impl) on both Blocking and Async with direct UFCS calls in the trait impl bodies:

    I2cSlave::<Blocking>::listen(self)
    I2cSlave::<Async>::respond_to_read(self, buf).await
    I2cSlave::<Async>::recover(self).await

The shims existed only to disambiguate inherent-vs-trait method resolution when both I2c<SevenBitAddress> and I2c<TenBitAddress> impls are in scope. Spelling the inherent receiver type explicitly via UFCS removes that ambiguity at the call site without an extra layer of indirection. The public inherent methods (listen, respond_to_write, respond_to_read, recover) are now the only definitions; the trait impls call them directly.

Verification: cargo build (debug + release, mimxrt685s + mimxrt633s, full and minimal feature combos), cargo clippy --locked -- -Dwarnings on both -os-timer and -rtc time-driver flavours, examples workspace clippy --locked with style/perf/correctness lints, cargo check --tests, cargo +nightly fmt --check, cargo vet --locked --frozen — all clean.

Net delta on src/i2c/slave.rs: -43 lines (1436 -> 1393); -279 lines vs the original PR baseline.
Copilot AI review requested due to automatic review settings June 2, 2026 13:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 55 out of 60 changed files in this pull request and generated 1 comment.

Comment thread src/i2c/slave.rs Outdated
Felipe Balbi added 2 commits June 2, 2026 06:54
CI's cargo-vet bumped from 0.10.1 to 0.10.2, which has stricter TOML string-literal normalisation: single-quoted literal strings are preferred over double-quoted-with-escaping for notes that contain bare double quotes, and triple-quoted multi-line strings whose interior contains no special characters get unescaped.

All twelve affected lines are inside notes the Google audit aggregator imports verbatim (proc-macro-error / proc-macro-error-attr / proc-macro2 review notes); no audit semantics or trust entries change. Generated by running 'cargo vet fmt' against the existing store.
CI's nightly rustfmt (1.9.0-nightly, rustc 1.98.0-nightly 6bdf43094 2026-06-01) prefers one-import-per-line for the throwaway 'use _ as _;' bindings used to anchor defmt-rtt, panic-probe, and embassy-imxrt-examples into the binary. The older nightly that was pinned locally collapsed those into braced groups; the current CI nightly expands them again.

Pure formatter output across 47 example binaries + examples/rt633/src/lib.rs. No behavioural changes.

Generated by running 'cargo +nightly fmt' in examples/rt685s-evk and examples/rt633 with rustfmt 1.9.0-nightly (6bdf43094f 2026-06-01).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 18 changed files in this pull request and generated 10 comments.

Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread examples/rt685s-evk/src/bin/i2c-slave-async-target-trait.rs Outdated
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
Comment thread src/i2c/slave.rs
The embedded_mcu_hal::i2c::target impls already check the configured
address mode in listen(), but respond_to_read() and respond_to_write()
trusted the caller to go through listen() first. A misbehaving caller
that drives the wrong trait impl (e.g. I2c<SevenBitAddress> on a
10-bit-configured slave) directly into respond_to_* would have skipped
the guard and driven the peripheral with mismatched semantics.

Add self.require_address_kind(want_ten_bit)? as the first statement of
every respond_to_read() and respond_to_write() across all four trait
impls (Blocking SevenBit/TenBit, Async SevenBit/TenBit). The guard is
O(1) — one Option::is_some + one equality compare — and returns
Error::UnsupportedConfiguration on mismatch, which maps to
ErrorKind::Other at the trait surface (per the existing Error::kind()
implementation tested in
error_kind_mapping_covers_every_transfer_variant).

Of the eight respond_to_* methods, one (Blocking
SevenBitAddress::respond_to_read) was already guarded by the
auto-merged Copilot autofix in 273ec2e; this commit adds the remaining
seven. The result is uniform address-mode rejection across the full
trait API surface, regardless of call ordering.
Felipe Balbi added 2 commits June 2, 2026 07:38
The async/blocking i2c-slave-*-target-trait examples now emit two
`warn!` lines when the slave observes event shapes associated with the
`slv_state -> addressed` mid-DMA HW race tracked on PR OpenDevicePartnership#565:

1. `WriteStatus::Restarted(0)` — a zero-byte restart should not occur
on a healthy bus; a real repeated START is preceded by at least one
ACKed payload byte. Seeing it strongly suggests the slave
mis-classified an in-progress receive as a restart.

2. `Request::RepeatedStart(_)` arriving when the prior `respond_to_*`
did not report `Restarted(_)`. The queued edge that surfaces as
`RepeatedStart` should always have a matching upstream `Restarted`.

Both signatures are tracked via a single local `expect_repeated_start:
bool` flipped on `Restarted` and consumed on the next
`listen()`. `info!` `(consumed expected RepeatedStart edge before
Read/Write)` is also logged on the healthy combined-format path so the
RTT capture has a clear positive marker, not just the negative
warning.

Drives the race-watch surface complementary to the host-side
`write-read-soak` harness: the host call times out and prints the
iteration that wedged; the on-target RTT log identifies which
mis-classification the slave made when it stopped clocking.

Also drops the stale module-rustdoc bullet on the async example that
claimed `recover()` was called from the `Restarted` branch as a no-op
demonstration — the actual code (rightly) does not call it, because a
`Restarted` is a healthy mid-transaction continuation. Replaced with
explicit `recover()` is not called on the happy path language, plus a
new `Race-watching telemetry` section in the rustdoc that documents
the two warnings and points at the soak tool. Addresses Copilot review
thread PRRT_kwDOMUqVSs6Gc48a on PR OpenDevicePartnership#565.

No behavioural change to the trait API or the inherent `I2cSlave`
driver. Verified locally: cargo +stable build --release --bin
i2c-slave-async-target-trait --bin i2c-slave-target-trait (clean),
cargo +stable clippy --locked -- -Dwarnings -D clippy::suspicious -D
clippy::correctness -D clippy::perf -D clippy::style on
examples/rt685s-evk (clean), cargo +nightly fmt --check (clean).
… respond_to_read

The async `respond_to_read` termination handler treated
`slvpending().is_pending() || slvstate().is_slave_address()` as a
single `Restarted` case, queueing a `RepeatedStart` edge for any trait
caller. `slvpending` is overloaded though — it asserts for *any*
SW-intervention condition the FC peripheral cannot resolve on its own,
not just a repeated START. The common case on the read path is a
controller-side NACK ending the transaction: the FC raises
`slvpending` first, then propagates `slvdesel` one peripheral cycle
later. If the poll loop wakes on the `slvpending` arm before
`slvdesel` lands, we synthesise a phantom `Restarted` plus a queued
`RepeatedStart` edge — and the next `listen()` reports a
`RepeatedStart` event that has no matching Sr on the wire.

Split the branch to discriminate on the actual state, matching what
the blocking flavour already does at line 665:

* `slvstate:().is_slave_address()` — a genuine new address phase has
been latched, queue `EdgeKind::RepeatedStart` and report
`Response::Restarted(xfer_count)`.

* `slvpending:().is_pending()` (bare, without `slvdesel` or new
address) — treat as a normal read termination: NAK to settle the
peripheral, then report `ReadComplete(xfer_count)` or
`ReadEarlyStop(xfer_count)` based on whether the buffer was fully
drained. No edge queued.

This also hardens against the `slv_state -> addressed` mid-DMA HW race
jerrysxie has been chasing on the read path: if the FC transiently
mis-classifies during a stress soak, `slvpending` may fire without a
real address transition. After this change the worst-case observable
becomes a `ReadEarlyStop` instead of a fabricated trait-side
`RepeatedStart` followed by a wedge on the next `listen()` waiting for
a phantom Sr that never came.

Addresses Copilot review thread PRRT_kwDOMUqVSs6GcUOx on PR
on-target race-watch `warn!` lines added in 62937dd — the soak now has
a fix to prove, not just a probe to fire.
@felipebalbi felipebalbi enabled auto-merge (squash) June 2, 2026 14:49
RobertZ2011
RobertZ2011 previously approved these changes Jun 2, 2026
@jerrysxie jerrysxie added the BREAKING CHANGE PR causes a breaking change label Jun 4, 2026
extern crate embassy_imxrt_perf_examples;

use core::sync::atomic::{AtomicBool, Ordering};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline?

Comment thread src/i2c/slave.rs
match r {
Response::ReadComplete(n) => mcu_target::ReadStatus::Complete(n),
Response::ReadContinued(n) => mcu_target::ReadStatus::NeedMore(n),
Response::ReadEarlyStop(n) | Response::Restarted(n) | Response::Stopped(n) => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no possible way for resp_read to returned Response::Stopped(n) in the current code, right?

Why is Response::Restarted(n) -> mcu_target::ReadStatus::EarlyStop(n)? We can technically exhaust the DMA and still get a restart? Just trying to understand the terminology here.

Comment thread src/i2c/slave.rs
self.require_address_kind(false)?;

// Drain any queued edge first.
if let Some(req) = self.drain_edge::<SevenBitAddress>(cmd_addr_u8) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are skipping the listen() step if we restart? So this would be an optimization?

Wouldn't we not skip this transition in listen() to advance the i2c state machine if that is the case?

        // Disable DMA
        i2c.slvctl().write(|w| w.slvdma().disabled());

        // Check whether we already have a matched address and just waiting
        // for software ack/nack
        if !i2c.stat().read().slvpending().is_pending() {
            self.poll_sw_action().await;
        }

        if i2c.stat().read().slvstate().is_slave_address() {
            i2c.slvctl().write(|w| w.slvcontinue().continue_());
        } else {
            // If we are already past the addressed phase and in transmit or receive, that means we are already in the
            // next state, most likely due to calling listen() before the previous transaction is completed and leading
            // to state transition out of order. We can tolerate that, so we just move onto the next state.
        }

        // Poll for HW to transitioning from addressed to receive/transmit
        self.poll_sw_action().await;

I guess in this case, we left the DMA enabled and the i2c state machine auto transition to the valid state anyway?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if after we restarted, we change direction? If we skip listen() and check for directions, do we even know that we change direction?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, for repeated start, the client would have call listen() again to get the direction. Nevermind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this feels a bit weird, is this the expected sequence of events for start -> read -> start -> read

  1. listen() -> Request::Read(addr)
  2. respond_to_read() -> Response::Restarted(n) -> ReadStatus::EarlyStop(n)
  3. listen() -> Request::RepeatedStart(prev_addr)
  4. listen() -> Request::Read(addr)

@jerrysxie jerrysxie requested review from bramsdell-ms and gjpmsft June 5, 2026 01:15
Comment thread src/i2c/slave.rs
// synthesise a `RepeatedStart` edge here — that would
// fabricate a phantom trait-side event with no matching
// upstream Sr on the wire.
i2c.slvctl().write(|w| w.slvnack().nack());

@jerrysxie jerrysxie Jun 5, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slave is sending data to the master, so this nack here does not make sense.

Do we ever hit this branch in testing? With DMA enabled, the slv pending bit is usually never set in my observation.

I left the pending flag check in there previously just as a catch-all in case if something strange ever happens.

Comment thread src/i2c/slave.rs
self.require_address_kind(false)?;

// Drain any queued edge first.
if let Some(req) = self.drain_edge::<SevenBitAddress>(cmd_addr_u8) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this feels a bit weird, is this the expected sequence of events for start -> read -> start -> read

  1. listen() -> Request::Read(addr)
  2. respond_to_read() -> Response::Restarted(n) -> ReadStatus::EarlyStop(n)
  3. listen() -> Request::RepeatedStart(prev_addr)
  4. listen() -> Request::Read(addr)

Long-running soak example targeting the specific master pattern that
surfaces `WriteStatus::Restarted(0)` on the slave trait API:

  master.write_read(addr, &w[..k*S], &mut r[..m])

where `S` is the slave receive-buffer length and the master write
payload is always an exact `k*S` multiple. The slave's first
`respond_to_write` returns `BufferFull(S)`; the caller loops and the
second call returns `Restarted(0)` because the controller has issued Sr
on the wire in between. The existing `i2c-slave-async-target-trait`
example flags that shape as `RACE WATCH` on the suspicion of a HW
race.

This soak tests the alternative hypothesis: `Restarted(0)` is a
buffer-boundary surfacing artifact, not a hardware race. The example:

* Uses the `embedded_mcu_hal::i2c::target::asynch::I2c` trait so the
  reproducer matches the harness that produced the original log.
* Sweeps `k in 1..=4` and `m in 1..=S-1` round-robin (12 unique
  pairs per cycle), back-to-back with no master pacing.
* Wraps every awaited transfer on both tasks in `with_timeout(250 ms)`;
  any timeout panics with a distinct message identifying the call site
  and the offending iteration/parameters so RTT freezes with the wedge
  state captured.
* Refills the slave TX buffer with `t_buf[j] = (j + iter) mod 256`
  each iteration; the master verifies the increment in the read buffer
  and panics on mismatch (catches cross-iteration buffer bleed without
  requiring shared state).
* Counts every terminator branch on the slave (BufferFull, Stopped(0)/n,
  Restarted(0)/n, ReadComplete/NeedMore/EarlyStop, RepeatedStart,
  Probe) and emits a one-line summary every 10 000 iterations.

Pass signal for the hypothesis: many millions of iterations with no
`defmt::panic!` and no read mismatch -> the `Restarted(0)` shape is
benign API surfacing, not a HW race. Any timeout or mismatch promotes
the artifact to a real bug with the failing iteration captured in RTT.

Wiring matches the existing i2c-loopback-async example (FC2 slave on
PIO0_18/PIO0_17 + DMA0_CH4, FC4 master on PIO0_29/PIO0_30 + DMA0_CH9,
Fast mode at 50 percent duty).
Felipe Balbi added 3 commits June 9, 2026 07:37
The async `respond_to_write` and `respond_to_read` entry guards both
required `slvstate` to be `slave_receive` / `slave_transmit` and
treated everything else as `TransferError::ReadFail` /
`TransferError::WriteFail` (the only exception was an entry-time
`slvdesel`, which `respond_to_write` mapped to `Stopped(0)`). That is
correct for the straight-through case where the caller has just
returned from `listen()`, but wrong for the continuation case where the
caller is looping back with a fresh buffer after a prior call returned
`Response::WriteContinued` / `Response::ReadContinued`.

Between the previous call returning and the re-entry being polled the
executor still runs `info!` lines, sets up `with_timeout`, polls other
ready futures, etc. At 400 kHz the controller can finish its remaining
bytes, ACK its last byte, and issue Sr+ADDR+R/W in that gap. When the
re-entry then samples `stat`, `slvstate` is `slave_address` (with
`slvpending` asserted waiting for SW to ACK/NACK the new address phase)
rather than `slave_receive` / `slave_transmit`. The old guard
fabricated a `ReadFail` / `WriteFail`, which the i2c-write-read-
boundary-soak example surfaced as a `defmt::panic!` from its
`Ok(Err(e))` arm — a HardFault visible to probe-rs as
`Firmware exited unexpectedly: Exception` because the defmt panic
message and breadcrumb dump were dropped by defmt-rtt's saturated
host-side buffer before the `udf #0` halt landed.

The blocking flavour was unaffected because it polls inside the
for-loop and naturally `break`s out of the loop when `slave_address`
fires, then the post-loop arm queues the restart edge and returns
`Response::Restarted(xfer_count)`.

Replace both async entry guards with re-entry classification that
mirrors what the post-poll path already emits, so the continuation
case sees uniform `Response` shapes regardless of when the controller
terminated:

* `slvdesel` set — clear the latch and return
  `Response::Stopped(0)` / `Response::ReadEarlyStop(0)`.
* `slvstate == slave_address` — queue
  `EdgeKind::RepeatedStart` and return `Response::Restarted(0)`. The
  next `listen()` then continues the address phase and surfaces the
  new direction normally.
* other non-receive / non-transmit state — keep returning
  `TransferError::ReadFail` / `TransferError::WriteFail`, the caller
  must `recover()`.

`respond_to_read` got the symmetric treatment for completeness; the
current i2c-write-read-boundary-soak example keeps
`READ_LEN_MAX < SLAVE_BUFLEN` so the read leg never triggers
`ReadContinued` and the race is not directly exercised on the read
path, but the same window exists.

Validated on rt685s-evk with the i2c-write-read-boundary-soak example
(FC2 slave ↔ FC4 master loopback, Fast/400 kHz, S=4, k∈[1..=4],
m∈[1..=3]). Pre-fix: HardFault after ~28 minutes of MCU uptime at
slv_iter ≈ 150 000. Post-fix: ran cleanly past 76 minutes of MCU
uptime, slv_iter ≈ 11.2 million, no panic / wedge / mismatch / fault
anywhere in the RTT log.
Yesterday's classify-bus-state fix (286df1f) reduced the mean time
between failures on the i2c-write-read-boundary-soak from ~28 minutes to
~3 hours, but did not eliminate the wedge. The remaining failure mode
was a master-side `WEDGE: master.write_read timed out` that landed with
the slave stuck inside `respond_to_write`'s `poll_fn` await, between
bc[0x18] ("about to await respond_to_write") and the 0x19 ("returned")
that never came.

Root cause is a window-of-vulnerability between the entry guard's
`stat.read()` and the `i2c.slvctl().write(|w| w.slvdma().enabled())`
that arms DMA. The flow inside the async `respond_to_write` is:

  1. entry guard reads `stat` -> sees `slvstate == slave_receive`
  2. enable SLVDMA
  3. install NakGuard / dma_guard
  4. construct the DMA Transfer
  5. await poll_fn

If the controller finishes its current data phase and issues Sr+ADDR+R
between steps 1 and 5, the SLV peripheral transitions out of
`slave_receive` (to `slave_address`, and -- in DMA mode --
auto-acknowledges the address and lands in `slave_transmit`). Per the
mimxrt633s-pac STAT documentation (matching UM11147 25.7.2.2.2),
`SLVPENDING is not set when the DMA is handling an event`. With SLVDMA
now set, this suppression covers the auto-handled address phase, so
`slvpending` does not raise on the bus state change. The DMA channel
also never completes (it is armed for receive into a buffer that will
never receive bytes, the bus having moved past the receive
phase). Neither of the future's wake sources fires, and the await wedges
until its outer `with_timeout` cancels it.

Yesterday's fix only checked the entry guard. The window between
that read and DMA arm -- a few-dozen-clock instruction sequence
under release builds -- is just long enough at 400 kHz to lose the
race against a tight write_read pattern on a fast controller.

This commit adds a second `stat` read AFTER all the guards are installed
but BEFORE awaiting. If the bus has advanced past the expected
direction, drop into the same classification the post-poll path uses:

  * slvdesel set    -> clear the latch, return Stopped(0) /
                       ReadEarlyStop(0)
  * slvstate latched in slave_address -> queue the RepeatedStart
                       edge for any trait caller, return Restarted(0)

The dropped guards (`_transfer`, `_dma_guard`) handle the SLVDMA
teardown and DMA channel abort on the early return; `nak_guard` is
defused explicitly on the write path because we never moved any bytes
and there is no in-flight transmission to NAK.

Validated on rt685s-evk with the i2c-write-read-boundary-soak
example (FC2 slave <-> FC4 master loopback, Fast/400 kHz, S=4, k in
[1..=4], m in [1..=3]). Pre-fix MTBF was a few hours.  Post-fix run
achieved 19h 57min of continuous clean MCU uptime with zero panic /
wedge / mismatch / fault markers across the full rotated log set (~226
MB of RTT output lefover from the 3x 100MiB rotated logs) and zero
entries in the forensic Err-path capture ring before the runner was
stopped for cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BREAKING CHANGE PR causes a breaking change cargo vet PR requiring auditor review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants