Skip to content

Conversation

@rfvirgil
Copy link

This series is to fix three places in the cadence IRQ handling where interrupt evens could be lost.
Two are variations of the same problem that the code cleared interrupts AFTER it handled them. This is not the correct way, because any interrupt that re-triggered while the irq handler function is running would then be cleared at the end of the function without ever having been handled.

The other case is to avoid masking the PING status change interrupt while waiting for the work function to handle it. There is a potential problem here if state changes that happen while masked are lost,

…thread

Clear the CDNS_MCP_INT_RX_WL interrupt before signalling completion.

This is to prevent the possible race:
- completion is signalled.
- waiting thread is immediately scheduled and starts a new message.
- new message completes and CDNS_MCP_INT_RX_WL is asserted.
- reschedule to sdw_cdns_irq() thread, which then clears the new
  interrupt.
- interrupt is never handled.

Before this change, this error message was sometimes seen on kernels
that have large amounts of debugging enabled:

   SCP Msg trf timed out

This error indicates that the completion has not been signalled after
500ms.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 956baa1 ("soundwire: cdns: Add sdw_master_ops and IO transfer support")
Remove the masking of CDNS_MCP_INT_SLAVE_MASK while the work thread runs.
This is to avoid losing interrupts for state changes that happen while
the work is running.

The code assumed that any status change events that happened while
CDNS_MCP_INT_SLAVE_MASK is masked would be latched and would assert the
main interrupt when CDNS_MCP_INT_SLAVE_MASK is unmasked.

We can avoid relying on this assumption by leaving
CDNS_MCP_INT_SLAVE_MASK unmasked.

There is no need to mask the interrupt while waiting for the work to run.
The kernel will already filter out schedule_work() for work that is
already pending. The worst that can happen is that the work can be
re-queued while it is running, and so will run again when there are no
more status changes. But the core code already has to filter out duplicate
states because sdw_handle_slave_status() is given the current status of
all devices, so some of those will usually be unchanged.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 4a98a6b ("soundwire: intel/cadence: merge Soundwire interrupt handlers/threads")
Clear interrupts before handling them in sdw_cdns_irq() to prevent
loasing events that re-triggered before they can be handled.

Clearing interrupts at the end of sdw_cdns_irq() is a potential race.
If the event re-triggered while sdw_cdns_irq() is still running it
would be cleared when sdw_cdns_irq() ends, so the irq handler would
not be re-triggered to handle it.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 2f52a51 ("soundwire: cdns: Add cadence library")
@rfvirgil
Copy link
Author

@bardliao @shumingfan Do you have soak/stress tests you could run on Realtek so we can get more confidence this doesn't cause any new problems?

@bardliao
Copy link
Collaborator

bardliao commented Feb 2, 2026

@rfvirgil There are some Peripheral 0 status: 1 error from https://sof-ci.01.org/linuxpr/PR5659/build10036/devicetest/index.html

[ 466.649563] kernel: soundwire_intel soundwire_intel.link.1: Peripheral 0 status: 1

That means Device 0 is attached. Not sure if it is a real issue.

CDNS_MCP_INT_SLAVE_MASK, 0);

int_status &= ~CDNS_MCP_INT_SLAVE_MASK;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfvirgil @shumingfan Is it possible that a Peripheral can generate a new event while handling a special event and lead to an infinite loop in a corner case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants