Merge test into main by jodeleeuw · Pull Request #147 · jspsych/datapipe

jodeleeuw · 2026-03-31T19:45:59Z

Summary

Add upload queue feature: cache failed OSF uploads in Firestore/Storage and retry automatically via scheduled function
Add QueuePanel dashboard component with download, retry, and ZIP export for queued files
Improve error logging across all OSF upload and metadata failure paths
Extract shared token resolution helper; fix OAuth token refresh fallback to PAT
Skip metadata processing when metadata is inactive (performance optimization)
Increase memory limit for data upload functions to 512MiB
Add Firestore and Storage security rules for uploadQueue
Add emulator tests for upload queue and skip-metadata behavior
Replace fixed sleep with polling in emulator tests for CI reliability

Test plan

Verify upload queue retry works end-to-end with OSF
Confirm QueuePanel displays queued files with download/retry/ZIP functionality
Check that metadata skip optimization doesn't affect active metadata experiments
Run emulator test suite (npm run test-ci in functions/)

🤖 Generated with Claude Code

The apiData and apiBase64 functions were running with the default 256MiB memory limit, which is insufficient for the Node.js runtime + Firebase SDK baseline (~150MiB) plus multiple copies of the data payload held in memory during upload. This caused OOM kills that returned 503 responses without CORS headers, leading users to report CORS errors (see #102). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: increase memory limit for data upload functions

When an experiment has metadataActive=false, the blockMetadata function was still called, performing unnecessary token decryption, potential OAuth refresh, and Firestore document reference creation. This change skips the entire metadata block when metadata is disabled, reducing function execution time and avoiding unnecessary OSF API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verifies that when metadataActive is false: - metadataMessage is empty in the response - no metadata document is created in Firestore - metadata processing is still attempted when metadataActive is true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf: skip metadata block when inactive

…t data loss When the Cloud Function OOM-crashes during metadata processing or OSF upload, the researcher's data payload is lost because no catch block executes. This change writes the data to Cloud Storage immediately after validation, before any heavy processing begins. If the function crashes, the data survives in the pending-data/ prefix and can be recovered. On successful OSF upload (or successful queue), the pending copy is cleaned up. Also adds the storage emulator config to firebase.json so tests can exercise the persist/cleanup cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds scheduledPendingRecovery that runs every 15 minutes to scan the pending-data/ prefix for stale files (older than 15 min). For each orphaned file, it replays the full processing pipeline: token resolution, metadata processing (if active), and OSF upload. This handles the case where api-data OOM-crashed after persisting but before completing. Also updates persist-pending to store the full request envelope (including metadataOptions) so the recovery function can replay metadata processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add PUT /endpoint to mock server so OSF upload succeeds in tests - Update early-persist test to use mock server - Fix skip-metadata test assertion to check property existence instead of non-empty value (metadata errors return empty string without mock) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The metadata-emulator test already uses port 3000 for its mock server. Use port 3001 with an inline mock server for the early-persist test to avoid port conflicts when Jest runs tests in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The data-emulator test was flaky on CI due to resource contention when running all test files in parallel. Increase the polling timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of reimplementing OSF upload logic, the recovery function now promotes orphaned pending-data/ files into the existing uploadQueue system. This means recovered data immediately appears in the researcher's dashboard QueuePanel and follows the same retry/download lifecycle as normal upload failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Expand "Why am I seeing this?" to cover OOM/crash recoveries alongside OSF errors and config issues - Map raw failure reasons to plain-language descriptions so researchers understand what happened without technical jargon Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Pending uploads are no longer shown in the alert panel. A light text indicator near the header badges shows retry count and next retry time instead — no alarm for things the system handles. - The full alert panel (FailedUploadsPanel) only appears when uploads have exhausted all retries and the researcher needs to download. - Failure reasons get their own REASON column instead of tiny text under the filename. - Replace ATTEMPTS column with AUTO-CLEANUP (time until data expires). - Add UploadsResolvedNotice: brief success confirmation when all queued uploads complete, so the panel doesn't just vanish. - Remove error log mixing from the queue table (ErrorPanel handles those separately). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Researchers should be able to see and download queued data files as soon as they appear, not after 30 hours of retries. The panel now shows all entries (pending + failed) in a single table with: - STATUS column with badge and next retry time for pending items - REASON column with human-readable failure explanation - STORED FOR column showing time until auto-cleanup - Download button available immediately for every entry The panel uses warning tone for pending items (retries still running) and error tone when all retries are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ention Persist data to Cloud Storage before processing to prevent OOM data loss

- Clean up pending file on metadata failure path (api-data.ts) - Add early-persist to apiBase64 for OOM crash protection (api-base64.ts) - Use Firestore transaction for atomic deduplication in pending recovery - Use random port (port 0) in early-persist test to avoid EADDRINUSE - Improve DATA_PERSIST_ERROR message for live experiment context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Logs process.memoryUsage() at four points during request processing: - request-received: after body parsing, before any processing - after-persist: after writing to Cloud Storage - after-metadata: after metadata processing - after-osf-upload: after successful OSF upload Each log line includes data payload size, RSS, heap used/total, and external memory. This will help determine what payload sizes approach the 512MiB function memory limit. This instrumentation is temporary — remove after testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ensures each Cloud Function instance handles only one request at a time. This eliminates the risk of concurrent large payloads sharing memory and pushing past the 512MiB limit. The tradeoff (more cold starts under burst traffic) is negligible for DataPipe's usage pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The logMemory instrumentation was added to measure OOM thresholds during testing. Results confirmed 512MiB with concurrency:1 is safe for all payloads up to the 32MB Cloud Run limit. Removing before merge to main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The log increment tests used waitForLog() which polled Firestore in a loop for up to 30s. Under CI load with parallel test files, the combined time for two requests + two polling cycles often exceeded the 30s jest timeout, causing flaky failures. Since writeLog() is awaited inside apiData before the response is sent, the log document is guaranteed to exist by the time saveData() returns. Replace the polling with a simple direct read after a small delay, and remove the now-unused waitForLog helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jodeleeuw and others added 18 commits March 30, 2026 16:21

Merge pull request #144 from jspsych/fix/increase-function-memory

3631a34

fix: increase memory limit for data upload functions

ci: run test workflow on PRs against test branch

07996fe

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: run test workflow on PRs against test branch

639356b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #145 from jspsych/fix/skip-metadata-when-inactive

cf8c957

perf: skip metadata block when inactive

fix: increase waitForLog timeout to 30s for CI reliability

2d045f1

The data-emulator test was flaky on CI due to resource contention when running all test files in parallel. Increase the polling timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #146 from jspsych/fix/early-persist-data-loss-prev…

2873cae

…ention Persist data to Cloud Storage before processing to prevent OOM data loss

jodeleeuw mentioned this pull request Apr 1, 2026

Document and address 32MB request body size limit #148

Open

jodeleeuw and others added 4 commits April 1, 2026 09:47

jodeleeuw merged commit 775b299 into main Apr 1, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge test into main#147

Merge test into main#147
jodeleeuw merged 22 commits intomainfrom
test

jodeleeuw commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jodeleeuw commented Mar 31, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant