soundwire: cadence: Fixes for three interrupt races #5659

rfvirgil · 2026-01-30T13:57:19Z

This series is to fix three places in the cadence IRQ handling where interrupt evens could be lost.
Two are variations of the same problem that the code cleared interrupts AFTER it handled them. This is not the correct way, because any interrupt that re-triggered while the irq handler function is running would then be cleared at the end of the function without ever having been handled.

The other case is to avoid masking the PING status change interrupt while waiting for the work function to handle it. There is a potential problem here if state changes that happen while masked are lost,

…thread Clear the CDNS_MCP_INT_RX_WL interrupt before signalling completion. This is to prevent the possible race: - completion is signalled. - waiting thread is immediately scheduled and starts a new message. - new message completes and CDNS_MCP_INT_RX_WL is asserted. - reschedule to sdw_cdns_irq() thread, which then clears the new interrupt. - interrupt is never handled. Before this change, this error message was sometimes seen on kernels that have large amounts of debugging enabled: SCP Msg trf timed out This error indicates that the completion has not been signalled after 500ms. Signed-off-by: Richard Fitzgerald <[email protected]> Fixes: 956baa1 ("soundwire: cdns: Add sdw_master_ops and IO transfer support")

Remove the masking of CDNS_MCP_INT_SLAVE_MASK while the work thread runs. This is to avoid losing interrupts for state changes that happen while the work is running. The code assumed that any status change events that happened while CDNS_MCP_INT_SLAVE_MASK is masked would be latched and would assert the main interrupt when CDNS_MCP_INT_SLAVE_MASK is unmasked. We can avoid relying on this assumption by leaving CDNS_MCP_INT_SLAVE_MASK unmasked. There is no need to mask the interrupt while waiting for the work to run. The kernel will already filter out schedule_work() for work that is already pending. The worst that can happen is that the work can be re-queued while it is running, and so will run again when there are no more status changes. But the core code already has to filter out duplicate states because sdw_handle_slave_status() is given the current status of all devices, so some of those will usually be unchanged. Signed-off-by: Richard Fitzgerald <[email protected]> Fixes: 4a98a6b ("soundwire: intel/cadence: merge Soundwire interrupt handlers/threads")

Clear interrupts before handling them in sdw_cdns_irq() to prevent loasing events that re-triggered before they can be handled. Clearing interrupts at the end of sdw_cdns_irq() is a potential race. If the event re-triggered while sdw_cdns_irq() is still running it would be cleared when sdw_cdns_irq() ends, so the irq handler would not be re-triggered to handle it. Signed-off-by: Richard Fitzgerald <[email protected]> Fixes: 2f52a51 ("soundwire: cdns: Add cadence library")

rfvirgil · 2026-01-30T16:56:10Z

@bardliao @shumingfan Do you have soak/stress tests you could run on Realtek so we can get more confidence this doesn't cause any new problems?

bardliao · 2026-02-02T03:10:27Z

@rfvirgil There are some Peripheral 0 status: 1 error from https://sof-ci.01.org/linuxpr/PR5659/build10036/devicetest/index.html

[ 466.649563] kernel: soundwire_intel soundwire_intel.link.1: Peripheral 0 status: 1

That means Device 0 is attached. Not sure if it is a real issue.

bardliao · 2026-02-02T06:02:22Z

drivers/soundwire/cadence_master.c

-			     CDNS_MCP_INT_SLAVE_MASK, 0);
-
-		int_status &= ~CDNS_MCP_INT_SLAVE_MASK;
-


@rfvirgil @shumingfan Is it possible that a Peripheral can generate a new event while handling a special event and lead to an infinite loop in a corner case?

rfvirgil added 3 commits January 30, 2026 13:49

bardliao reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

soundwire: cadence: Fixes for three interrupt races #5659

soundwire: cadence: Fixes for three interrupt races #5659

Uh oh!

rfvirgil commented Jan 30, 2026

Uh oh!

rfvirgil commented Jan 30, 2026

Uh oh!

bardliao commented Feb 2, 2026

Uh oh!

bardliao Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		CDNS_MCP_INT_SLAVE_MASK, 0);

		int_status &= ~CDNS_MCP_INT_SLAVE_MASK;

soundwire: cadence: Fixes for three interrupt races #5659

Are you sure you want to change the base?

soundwire: cadence: Fixes for three interrupt races #5659

Uh oh!

Conversation

rfvirgil commented Jan 30, 2026

Uh oh!

rfvirgil commented Jan 30, 2026

Uh oh!

bardliao commented Feb 2, 2026

Uh oh!

bardliao Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants