Skip to content

ruvbrain (pi.ruv.io): reclassify route missing, 3 schedulers timing out on /v1/pipeline/optimize, firestore backup invalid #464

@ruvnet

Description

@ruvnet

Summary

Routine GCP review on 2026-05-15 found pi.ruv.io brain is up but most of its Cloud Scheduler optimization jobs have been failing for days. Root causes mixed: one dead route, three jobs racing the same overloaded endpoint, one unrelated Firestore backup config, and one unrelated Cloud Run service that never starts.

Findings

1. brain-reclassify-daily → dead route (NOT_FOUND, every 4h)

Scheduler brain-reclassify-daily POSTs https://pi.ruv.io/v1/reclassify, but that route does not exist in crates/mcp-brain-server/src/routes.rs (verified against the router build at lines 325–365 of routes.rs). Has been 404ing on every fire — at least back through 2026-05-14. Either restore the handler or delete/pause the scheduler.

2. Three schedulers hammer /v1/pipeline/optimize and time out

  • brain-train — every 30m, mix of DEADLINE_EXCEEDED / RESOURCE_EXHAUSTED / UNKNOWN
  • brain-full-optimize — daily 03:00, DEADLINE_EXCEEDED
  • brain-graph — every 6h, DEADLINE_EXCEEDED / UNKNOWN

All three POST the same endpoint. The endpoint exists (routes.rs:365 → pipeline_optimize) but with revision ruvbrain-00203-brv (deployed 2026-04-14) on 2 vCPU / 4Gi the graph rebuild work (22k nodes, 1.2M edges) does not finish inside Cloud Scheduler's attempt deadline.

Options:

  • Raise scheduler attemptDeadline on the three jobs (Scheduler's default is short)
  • Split /v1/pipeline/optimize into work-class-specific paths so train/optimize/graph don't compete
  • Bump Cloud Run memory/cpu and/or --no-cpu-throttling
  • Move to async job pattern (scheduler kicks, returns 202; worker drains a queue)

3. firestore-daily-backup / firestore-weekly-backupINVALID_ARGUMENT

Daily and weekly Firestore exports have failed every run for at least 5 days (most recent 2026-05-15 02:00Z). Likely stale arguments (bucket, collectionIds, or output URI) — not a brain-server problem, but breaks DR. Needs a one-line job redefinition.

4. rvagent-learning Cloud Run service won't start

Revision rvagent-learning-00001-kld never bound PORT=8080 within the health-check window. Scheduler rvagent-learning-daily keeps invoking it. Source isn't in this repo — needs to be traced to its owner (or the scheduler paused).

5. Stale brain deployment + paused optimization jobs

  • Latest deployed revision ruvbrain-00203-brv is 31 days old (2026-04-14). Worth a fresh deploy.
  • Per CLAUDE.md "7 optimization jobs running" — actually brain-drift, brain-transfer, brain-attractor, brain-partition-cache are PAUSED. Either intentional and CLAUDE.md is stale, or they should be re-enabled. Worth a one-line decision.

6. Embedding engine

/v1/status reports embedding_engine=ruvllm::HashEmbedder, embedding_dim=128, embedding_corpus=0. Hash embeddings, not real semantic vectors — separate (existing) work, but document the current state so consumers know brain_search recall is bounded.

Reproduction

curl -sS https://pi.ruv.io/v1/status | jq '.embedding_engine,.graph_nodes,.graph_edges'
gcloud scheduler jobs list --location=us-central1 \
  --format="table(name.basename(),state,schedule,status.code)"
gcloud logging read 'resource.type="cloud_scheduler_job" AND severity>=ERROR' \
  --limit=20 --freshness=24h --format=json | jq -r '.[].jsonPayload | "\(.targetType) \(.status) \(.url)"'

Proposed fix order (low-risk → high-risk)

  1. Pause brain-reclassify-daily until route is restored or job is deleted (stops 404 spam)
  2. Pause rvagent-learning-daily until owner re-attaches it (stops invoking broken service)
  3. Bump attemptDeadline on brain-train / brain-full-optimize / brain-graph to 20–30 min
  4. Re-deploy ruvbrain from current main so the 31-day staleness goes away
  5. Fix firestore-daily-backup / firestore-weekly-backup arguments
  6. Decide on the paused jobs (re-enable or delete from CLAUDE.md doc)

Environment

  • GCP project: ruv-dev, region: us-central1
  • Cloud Run: ruvbrain rev 00203-brv from gcr.io/ruv-dev/ruvbrain:latest@sha256:9f71c28f...
  • Domain mappings healthy: pi.ruv.io → ruvbrain, mcp.pi.ruv.io → ruvbrain-sse

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions