Skip to content

UseStorageLock does not throttle failed-message retries across multiple instances — poisoned messages are retried at FailedRetryInterval/replicas cadence #1797

@tamazbagdavadzespacege

Description

@tamazbagdavadzespacege

Summary

With UseStorageLock = true and N pods of the same service, a message whose subscriber consistently throws is retried roughly every FailedRetryInterval / N seconds instead of once per FailedRetryInterval. With 12 replicas and FailedRetryInterval = 60 we observe retries spaced ~2–15 s apart on the same MessageId, with Retries incrementing 1‑per‑attempt, originating from different pods.

Steps to Reproduce

  1. Configure CAP with PostgreSQL + Kafka, defaults, UseStorageLock = true.
  2. Add a subscriber that always throws
  3. Start ≥ 2 replicas of the same service
  4. Publish one message to its topic.
  5. After Retries >= 3 (the in-process inner retry loop is exhausted) and Added > FallbackWindowLookbackSeconds ago, observe in your logs that the subscriber is invoked roughly every 60 / N seconds, not every 60 seconds.

Expected Behavior

With UseStorageLock = true, a single failed message should be retried at most once per FailedRetryInterval cluster-wide, regardless of replica count. (Or, equivalently, there should be a documented mechanism to achieve this.)

Actual Behavior

No response

Log Output

CAP Configuration

Transport Used

Kafka

Storage Provider

PostgreSQL

Environment

CAP 8.4.1

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions