Skip to content

SPOC-484: TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only dry-run logging#400

Open
danolivo wants to merge 1 commit intomainfrom
spoc-484
Open

SPOC-484: TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only dry-run logging#400
danolivo wants to merge 1 commit intomainfrom
spoc-484

Conversation

@danolivo
Copy link
Contributor

  • In TRANSDISCARD and SUB_DISABLE modes, replace per-DML subtransaction replay with a single read-only transaction. Each DML handler logs a row to spock.exception_log instead of attempting the operation. Only the record that originally caused the error carries the real error message; other records get an empty string.
  • Identify the failed record by xact_action_counter (saved as failed_action in SpockExceptionLog), which is unique per DML within a transaction.
  • In DISCARD mode, each DML still runs inside a subtransaction; only the failed ones are logged.
  • Add dry_run_logging regression test covering all three exception modes with different breakage methods (absent table, truncated table, deleted row) and an INSERT_EXISTS conflict to verify that dry-run modes leave spock.resolutions empty while DISCARD populates it.

…-run replay

In TRANSDISCARD and SUB_DISABLE modes, the second (retry) pass replays
the transaction as read-only.  Each DML handler logs a row to
spock.exception_log instead of attempting the operation.  The record
that originally caused the error carries the real error message; other
records get an empty string.

The failed record is identified by xact_action_counter (saved as
failed_action in SpockExceptionLog), which is unique per DML within a
transaction.

In DISCARD mode, each DML still runs inside a subtransaction; only the
failed ones are logged.

Add dry_run_logging regression test covering TRANSDISCARD (absent
table), DISCARD (truncated table), and SUB_DISABLE (deleted row), with
INSERT_EXISTS conflict on drl_t1 to verify that dry-run modes leave
spock.resolutions empty while DISCARD populates it.
@danolivo danolivo self-assigned this Mar 24, 2026
@danolivo danolivo added the enhancement New feature or request label Mar 24, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This pull request enhances exception handling and retry replay logic in the replication system. It adds a failed_action field to track error attribution, implements read-only enforcement during retry scenarios, refactors commit-time control flow, and introduces comprehensive regression tests for exception behavior modes DISCARD, TRANSDISCARD, and SUB_DISABLE.

Changes

Cohort / File(s) Summary
Exception Handler Infrastructure
include/spock_exception_handler.h, src/spock_exception_handler.c
Added failed_action field to SpockExceptionLog struct for error attribution tracking. Updated NULL error message handling to insert empty string instead of "unknown".
Apply Worker Enhancement
src/spock_apply.c
Reworked retry replay logic: enforces read-only mode for TRANSDISCARD/SUB_DISABLE retries, adjusts LSN advancement behavior, refactored commit-time control flow, broadened subscription disable triggering, and implemented operation handlers that skip DML while logging exceptions with proper error message scoping.
Test Coverage
tests/regress/sql/dry_run_logging.sql, tests/regress/sql/replication_set.sql
New comprehensive regression test covering exception behavior modes (DISCARD, TRANSDISCARD, SUB_DISABLE) with multi-table transactional scenarios. Enhanced replication_set test with exception log cleanup and additional validation queries for consistent test behavior.
Build Configuration
Makefile
Added dry_run_logging to the REGRESS test suite.

Poem

🐰 Bouncing through the retry lanes,
Where read-only rules now reign!
Exceptions tracked with action-care,
Replication flows beyond compare.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: switching from subtransaction replay to read-only dry-run logging for TRANSDISCARD/SUB_DISABLE modes, which aligns with the core modifications across all changed files.
Description check ✅ Passed The description accurately explains the key changes: read-only transaction replacement, exception logging instead of DML execution, error message attribution via failed_action, and the addition of regression tests covering multiple exception modes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-484

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/spock_apply.c (1)

2263-2334: ⚠️ Potential issue | 🟠 Major

Potential duplicate exception logging in TRANSDISCARD/SUB_DISABLE mode.

In the SQL handler, for TRANSDISCARD/SUB_DISABLE modes:

  1. Lines 2281-2288 log the exception directly
  2. Then failed is set to false at line 2289
  3. The code falls through to line 2314 where should_log_exception(false) is called
  4. should_log_exception(false) returns true when exception_behaviour == TRANSDISCARD || SUB_DISABLE (lines 310-312)
  5. Lines 2326-2333 log the exception again

This could result in duplicate entries in spock.exception_log for SQL operations in dry-run modes.

🐛 Proposed fix to prevent duplicate logging

Add an early return or skip the second logging block for TRANSDISCARD/SUB_DISABLE:

 		if (exception_behaviour == TRANSDISCARD ||
 			exception_behaviour == SUB_DISABLE)
 		{
 			/*
 			 * TRANSDISCARD and SUB_DISABLE: skip the DDL and log the
 			 * discarded operation directly to spock.exception_log.
 			 */
 			char	   *error_msg =
 				(my_exception_log_index >= 0 &&
 				 xact_action_counter ==
 				 exception_log_ptr[my_exception_log_index].failed_action &&
 				 exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0') ?
 				exception_log_ptr[my_exception_log_index].initial_error_message :
 				NULL;

 			sql = JsonbToCString(NULL,
 								 &queued_message->message->root, 0);
 			exception_command_counter++;
 			add_entry_to_exception_log(remote_origin_id,
 									   replorigin_session_origin_timestamp,
 									   remote_xid,
 									   0, 0,
 									   NULL, NULL, NULL, NULL,
 									   sql, queued_message->role,
 									   "SQL",
 									   error_msg);
-			failed = false;
+			/* Already logged above; skip the should_log_exception block below */
 		}
 		else
 		{
 			/* DISCARD MODE needs hard way - try block and subtransactions */
 			PG_TRY();
 			{
 				exception_command_counter++;
 				BeginInternalSubTransaction(NULL);
 				handle_sql(queued_message, tx_just_started, &sql);
 			}
 			PG_CATCH();
 			{
 				failed = true;
 				RollbackAndReleaseCurrentSubTransaction();
 				edata = CopyErrorData();
 				xact_had_exception = true;
 			}
 			PG_END_TRY();

 			if (!failed)
 				ReleaseCurrentSubTransaction();
-		}

-		/* Let's create an exception log entry if true. */
-		if (should_log_exception(failed))
-		{
-			/*
-			 * Use current error message if operation failed, otherwise use
-			 * initial_error_message for context (e.g., in DISCARD mode when
-			 * SQL succeeds but we're logging it because of a previous error).
-			 */
-			char	   *error_msg = failed ? edata->message :
-				(my_exception_log_index >= 0 &&
-				 exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0' ?
-				 exception_log_ptr[my_exception_log_index].initial_error_message : NULL);
-
-			add_entry_to_exception_log(remote_origin_id,
-									   replorigin_session_origin_timestamp,
-									   remote_xid,
-									   0, 0,
-									   NULL, NULL, NULL, NULL,
-									   sql, queued_message->role,
-									   "SQL",
-									   error_msg);
+
+			/* Let's create an exception log entry if true (DISCARD mode only). */
+			if (should_log_exception(failed))
+			{
+				/*
+				 * Use current error message if operation failed, otherwise use
+				 * initial_error_message for context (e.g., in DISCARD mode when
+				 * SQL succeeds but we're logging it because of a previous error).
+				 */
+				char	   *error_msg = failed ? edata->message :
+					(my_exception_log_index >= 0 &&
+					 exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0' ?
+					 exception_log_ptr[my_exception_log_index].initial_error_message : NULL);
+
+				add_entry_to_exception_log(remote_origin_id,
+										   replorigin_session_origin_timestamp,
+										   remote_xid,
+										   0, 0,
+										   NULL, NULL, NULL, NULL,
+										   sql, queued_message->role,
+										   "SQL",
+										   error_msg);
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2263 - 2334, The TRANSDISCARD/SUB_DISABLE
branch logs an exception via add_entry_to_exception_log and then falls through
to the generic logging block which calls should_log_exception and logs again,
causing duplicates; fix by recording that the exception was already logged
(e.g., set a local flag like exception_logged = true after the first
add_entry_to_exception_log in the TRANSDISCARD/SUB_DISABLE branch) and then skip
the later add_entry_to_exception_log call when exception_logged is true (or
check exception_behaviour directly before calling add_entry_to_exception_log in
the final block); reference symbols: exception_behaviour, TRANSDISCARD,
SUB_DISABLE, add_entry_to_exception_log, should_log_exception, failed,
exception_log_ptr, my_exception_log_index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/spock_apply.c`:
- Around line 2263-2334: The TRANSDISCARD/SUB_DISABLE branch logs an exception
via add_entry_to_exception_log and then falls through to the generic logging
block which calls should_log_exception and logs again, causing duplicates; fix
by recording that the exception was already logged (e.g., set a local flag like
exception_logged = true after the first add_entry_to_exception_log in the
TRANSDISCARD/SUB_DISABLE branch) and then skip the later
add_entry_to_exception_log call when exception_logged is true (or check
exception_behaviour directly before calling add_entry_to_exception_log in the
final block); reference symbols: exception_behaviour, TRANSDISCARD, SUB_DISABLE,
add_entry_to_exception_log, should_log_exception, failed, exception_log_ptr,
my_exception_log_index.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee09d8b3-4d1d-4de2-92e6-415759235dd9

📥 Commits

Reviewing files that changed from the base of the PR and between a116148 and f359f4d.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/dry_run_logging.out is excluded by !**/*.out
  • tests/regress/expected/replication_set.out is excluded by !**/*.out
📒 Files selected for processing (6)
  • Makefile
  • include/spock_exception_handler.h
  • src/spock_apply.c
  • src/spock_exception_handler.c
  • tests/regress/sql/dry_run_logging.sql
  • tests/regress/sql/replication_set.sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant