Conversation
The apiData and apiBase64 functions were running with the default 256MiB memory limit, which is insufficient for the Node.js runtime + Firebase SDK baseline (~150MiB) plus multiple copies of the data payload held in memory during upload. This caused OOM kills that returned 503 responses without CORS headers, leading users to report CORS errors (see #102). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: increase memory limit for data upload functions
When an experiment has metadataActive=false, the blockMetadata function was still called, performing unnecessary token decryption, potential OAuth refresh, and Firestore document reference creation. This change skips the entire metadata block when metadata is disabled, reducing function execution time and avoiding unnecessary OSF API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that when metadataActive is false: - metadataMessage is empty in the response - no metadata document is created in Firestore - metadata processing is still attempted when metadataActive is true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
perf: skip metadata block when inactive
…t data loss When the Cloud Function OOM-crashes during metadata processing or OSF upload, the researcher's data payload is lost because no catch block executes. This change writes the data to Cloud Storage immediately after validation, before any heavy processing begins. If the function crashes, the data survives in the pending-data/ prefix and can be recovered. On successful OSF upload (or successful queue), the pending copy is cleaned up. Also adds the storage emulator config to firebase.json so tests can exercise the persist/cleanup cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds scheduledPendingRecovery that runs every 15 minutes to scan the pending-data/ prefix for stale files (older than 15 min). For each orphaned file, it replays the full processing pipeline: token resolution, metadata processing (if active), and OSF upload. This handles the case where api-data OOM-crashed after persisting but before completing. Also updates persist-pending to store the full request envelope (including metadataOptions) so the recovery function can replay metadata processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add PUT /endpoint to mock server so OSF upload succeeds in tests - Update early-persist test to use mock server - Fix skip-metadata test assertion to check property existence instead of non-empty value (metadata errors return empty string without mock) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The metadata-emulator test already uses port 3000 for its mock server. Use port 3001 with an inline mock server for the early-persist test to avoid port conflicts when Jest runs tests in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The data-emulator test was flaky on CI due to resource contention when running all test files in parallel. Increase the polling timeout from 10s to 30s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of reimplementing OSF upload logic, the recovery function now promotes orphaned pending-data/ files into the existing uploadQueue system. This means recovered data immediately appears in the researcher's dashboard QueuePanel and follows the same retry/download lifecycle as normal upload failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expand "Why am I seeing this?" to cover OOM/crash recoveries alongside OSF errors and config issues - Map raw failure reasons to plain-language descriptions so researchers understand what happened without technical jargon Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pending uploads are no longer shown in the alert panel. A light text indicator near the header badges shows retry count and next retry time instead — no alarm for things the system handles. - The full alert panel (FailedUploadsPanel) only appears when uploads have exhausted all retries and the researcher needs to download. - Failure reasons get their own REASON column instead of tiny text under the filename. - Replace ATTEMPTS column with AUTO-CLEANUP (time until data expires). - Add UploadsResolvedNotice: brief success confirmation when all queued uploads complete, so the panel doesn't just vanish. - Remove error log mixing from the queue table (ErrorPanel handles those separately). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Researchers should be able to see and download queued data files as soon as they appear, not after 30 hours of retries. The panel now shows all entries (pending + failed) in a single table with: - STATUS column with badge and next retry time for pending items - REASON column with human-readable failure explanation - STORED FOR column showing time until auto-cleanup - Download button available immediately for every entry The panel uses warning tone for pending items (retries still running) and error tone when all retries are exhausted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ention Persist data to Cloud Storage before processing to prevent OOM data loss
- Clean up pending file on metadata failure path (api-data.ts) - Add early-persist to apiBase64 for OOM crash protection (api-base64.ts) - Use Firestore transaction for atomic deduplication in pending recovery - Use random port (port 0) in early-persist test to avoid EADDRINUSE - Improve DATA_PERSIST_ERROR message for live experiment context Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logs process.memoryUsage() at four points during request processing: - request-received: after body parsing, before any processing - after-persist: after writing to Cloud Storage - after-metadata: after metadata processing - after-osf-upload: after successful OSF upload Each log line includes data payload size, RSS, heap used/total, and external memory. This will help determine what payload sizes approach the 512MiB function memory limit. This instrumentation is temporary — remove after testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures each Cloud Function instance handles only one request at a time. This eliminates the risk of concurrent large payloads sharing memory and pushing past the 512MiB limit. The tradeoff (more cold starts under burst traffic) is negligible for DataPipe's usage pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The logMemory instrumentation was added to measure OOM thresholds during testing. Results confirmed 512MiB with concurrency:1 is safe for all payloads up to the 32MB Cloud Run limit. Removing before merge to main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The log increment tests used waitForLog() which polled Firestore in a loop for up to 30s. Under CI load with parallel test files, the combined time for two requests + two polling cycles often exceeded the 30s jest timeout, causing flaky failures. Since writeLog() is awaited inside apiData before the response is sent, the log document is guaranteed to exist by the time saveData() returns. Replace the polling with a simple direct read after a small delay, and remove the now-unused waitForLog helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
npm run test-ciin functions/)🤖 Generated with Claude Code