Persist data to Cloud Storage before processing to prevent OOM data loss by jodeleeuw · Pull Request #146 · jspsych/datapipe

jodeleeuw · 2026-03-30T22:56:06Z

Summary

Writes incoming data to Cloud Storage (pending-data/ prefix) immediately after validation but before heavy processing (metadata, OSF upload)
If the function OOM-crashes during processing, the data survives in Cloud Storage and can be recovered
On successful OSF upload or successful queue, the pending copy is automatically cleaned up
Adds DATA_PERSIST_ERROR message for the (rare) case where the initial persist fails
Enables the storage emulator in firebase.json for testing
Adds emulator tests verifying the persist/cleanup cycle

Context

Issue #102 — OOM crashes during metadata or OSF upload processing kill the function instantly, bypassing all catch blocks. The researcher's data in the request body is permanently lost. This change ensures data is safely persisted before any memory-intensive work begins.

How it works

After parameter validation and experiment checks, persistPending() writes the raw data to Cloud Storage
Heavy processing continues as before (metadata, token resolution, OSF upload)
On success: cleanupPending() removes the temporary file
On queue (existing retry paths): cleanup runs since queueUpload writes its own copy
On OOM crash: the pending file survives — a future recovery process can scan pending-data/ for stale files

Test plan

Emulator test: pending files are cleaned up after successful upload
Emulator test: no pending files created for requests that fail before persist step (missing params, inactive experiment)
Emulator test: multiple submissions don't leave orphaned files
Verify existing data/metadata emulator tests still pass

🤖 Generated with Claude Code

…t data loss When the Cloud Function OOM-crashes during metadata processing or OSF upload, the researcher's data payload is lost because no catch block executes. This change writes the data to Cloud Storage immediately after validation, before any heavy processing begins. If the function crashes, the data survives in the pending-data/ prefix and can be recovered. On successful OSF upload (or successful queue), the pending copy is cleaned up. Also adds the storage emulator config to firebase.json so tests can exercise the persist/cleanup cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds scheduledPendingRecovery that runs every 15 minutes to scan the pending-data/ prefix for stale files (older than 15 min). For each orphaned file, it replays the full processing pipeline: token resolution, metadata processing (if active), and OSF upload. This handles the case where api-data OOM-crashed after persisting but before completing. Also updates persist-pending to store the full request envelope (including metadataOptions) so the recovery function can replay metadata processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add PUT /endpoint to mock server so OSF upload succeeds in tests - Update early-persist test to use mock server - Fix skip-metadata test assertion to check property existence instead of non-empty value (metadata errors return empty string without mock) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The metadata-emulator test already uses port 3000 for its mock server. Use port 3001 with an inline mock server for the early-persist test to avoid port conflicts when Jest runs tests in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The data-emulator test was flaky on CI due to resource contention when running all test files in parallel. Increase the polling timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of reimplementing OSF upload logic, the recovery function now promotes orphaned pending-data/ files into the existing uploadQueue system. This means recovered data immediately appears in the researcher's dashboard QueuePanel and follows the same retry/download lifecycle as normal upload failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Expand "Why am I seeing this?" to cover OOM/crash recoveries alongside OSF errors and config issues - Map raw failure reasons to plain-language descriptions so researchers understand what happened without technical jargon Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Pending uploads are no longer shown in the alert panel. A light text indicator near the header badges shows retry count and next retry time instead — no alarm for things the system handles. - The full alert panel (FailedUploadsPanel) only appears when uploads have exhausted all retries and the researcher needs to download. - Failure reasons get their own REASON column instead of tiny text under the filename. - Replace ATTEMPTS column with AUTO-CLEANUP (time until data expires). - Add UploadsResolvedNotice: brief success confirmation when all queued uploads complete, so the panel doesn't just vanish. - Remove error log mixing from the queue table (ErrorPanel handles those separately). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Researchers should be able to see and download queued data files as soon as they appear, not after 30 hours of retries. The panel now shows all entries (pending + failed) in a single table with: - STATUS column with badge and next retry time for pending items - REASON column with human-readable failure explanation - STORED FOR column showing time until auto-cleanup - Download button available immediately for every entry The panel uses warning tone for pending items (retries still running) and error tone when all retries are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jodeleeuw and others added 9 commits March 30, 2026 18:55

fix: increase waitForLog timeout to 30s for CI reliability

2d045f1

The data-emulator test was flaky on CI due to resource contention when running all test files in parallel. Increase the polling timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jodeleeuw merged commit 2873cae into test Mar 31, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist data to Cloud Storage before processing to prevent OOM data loss#146

Persist data to Cloud Storage before processing to prevent OOM data loss#146
jodeleeuw merged 9 commits intotestfrom
fix/early-persist-data-loss-prevention

jodeleeuw commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jodeleeuw commented Mar 30, 2026

Summary

Context

How it works

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant