ruvbrain (pi.ruv.io): reclassify route missing, 3 schedulers timing out on /v1/pipeline/optimize, firestore backup invalid

## Summary

Routine GCP review on 2026-05-15 found `pi.ruv.io` brain is up but **most of its Cloud Scheduler optimization jobs have been failing for days**. Root causes mixed: one dead route, three jobs racing the same overloaded endpoint, one unrelated Firestore backup config, and one unrelated Cloud Run service that never starts.

## Findings

### 1. `brain-reclassify-daily` → dead route (NOT_FOUND, every 4h)
Scheduler `brain-reclassify-daily` POSTs `https://pi.ruv.io/v1/reclassify`, but that route does not exist in `crates/mcp-brain-server/src/routes.rs` (verified against the router build at lines 325–365 of `routes.rs`). Has been 404ing on every fire — at least back through 2026-05-14. Either restore the handler or delete/pause the scheduler.

### 2. Three schedulers hammer `/v1/pipeline/optimize` and time out
- `brain-train`         — every 30m, mix of `DEADLINE_EXCEEDED` / `RESOURCE_EXHAUSTED` / `UNKNOWN`
- `brain-full-optimize` — daily 03:00, `DEADLINE_EXCEEDED`
- `brain-graph`         — every 6h, `DEADLINE_EXCEEDED` / `UNKNOWN`

All three POST the same endpoint. The endpoint exists (`routes.rs:365 → pipeline_optimize`) but with revision `ruvbrain-00203-brv` (deployed 2026-04-14) on 2 vCPU / 4Gi the graph rebuild work (22k nodes, 1.2M edges) does not finish inside Cloud Scheduler's attempt deadline.

Options:
- Raise scheduler `attemptDeadline` on the three jobs (Scheduler's default is short)
- Split `/v1/pipeline/optimize` into work-class-specific paths so train/optimize/graph don't compete
- Bump Cloud Run memory/cpu and/or `--no-cpu-throttling`
- Move to async job pattern (scheduler kicks, returns 202; worker drains a queue)

### 3. `firestore-daily-backup` / `firestore-weekly-backup` → `INVALID_ARGUMENT`
Daily and weekly Firestore exports have failed every run for at least 5 days (most recent 2026-05-15 02:00Z). Likely stale arguments (bucket, collectionIds, or output URI) — not a brain-server problem, but breaks DR. Needs a one-line job redefinition.

### 4. `rvagent-learning` Cloud Run service won't start
Revision `rvagent-learning-00001-kld` never bound `PORT=8080` within the health-check window. Scheduler `rvagent-learning-daily` keeps invoking it. Source isn't in this repo — needs to be traced to its owner (or the scheduler paused).

### 5. Stale brain deployment + paused optimization jobs
- Latest deployed revision `ruvbrain-00203-brv` is 31 days old (2026-04-14). Worth a fresh deploy.
- Per CLAUDE.md "7 optimization jobs running" — actually `brain-drift`, `brain-transfer`, `brain-attractor`, `brain-partition-cache` are PAUSED. Either intentional and CLAUDE.md is stale, or they should be re-enabled. Worth a one-line decision.

### 6. Embedding engine
`/v1/status` reports `embedding_engine=ruvllm::HashEmbedder`, `embedding_dim=128`, `embedding_corpus=0`. Hash embeddings, not real semantic vectors — separate (existing) work, but document the current state so consumers know `brain_search` recall is bounded.

## Reproduction

```bash
curl -sS https://pi.ruv.io/v1/status | jq '.embedding_engine,.graph_nodes,.graph_edges'
gcloud scheduler jobs list --location=us-central1 \
  --format="table(name.basename(),state,schedule,status.code)"
gcloud logging read 'resource.type="cloud_scheduler_job" AND severity>=ERROR' \
  --limit=20 --freshness=24h --format=json | jq -r '.[].jsonPayload | "\(.targetType) \(.status) \(.url)"'
```

## Proposed fix order (low-risk → high-risk)

1. Pause `brain-reclassify-daily` until route is restored or job is deleted (stops 404 spam)
2. Pause `rvagent-learning-daily` until owner re-attaches it (stops invoking broken service)
3. Bump `attemptDeadline` on `brain-train` / `brain-full-optimize` / `brain-graph` to 20–30 min
4. Re-deploy `ruvbrain` from current `main` so the 31-day staleness goes away
5. Fix `firestore-daily-backup` / `firestore-weekly-backup` arguments
6. Decide on the paused jobs (re-enable or delete from CLAUDE.md doc)

## Environment
- GCP project: `ruv-dev`, region: `us-central1`
- Cloud Run: `ruvbrain` rev `00203-brv` from `gcr.io/ruv-dev/ruvbrain:latest@sha256:9f71c28f...`
- Domain mappings healthy: `pi.ruv.io → ruvbrain`, `mcp.pi.ruv.io → ruvbrain-sse`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ruvbrain (pi.ruv.io): reclassify route missing, 3 schedulers timing out on /v1/pipeline/optimize, firestore backup invalid #464

Summary

Findings

1. `brain-reclassify-daily` → dead route (NOT_FOUND, every 4h)

2. Three schedulers hammer `/v1/pipeline/optimize` and time out

3. `firestore-daily-backup` / `firestore-weekly-backup` → `INVALID_ARGUMENT`

4. `rvagent-learning` Cloud Run service won't start

5. Stale brain deployment + paused optimization jobs

6. Embedding engine

Reproduction

Proposed fix order (low-risk → high-risk)

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ruvbrain (pi.ruv.io): reclassify route missing, 3 schedulers timing out on /v1/pipeline/optimize, firestore backup invalid #464

Description

Summary

Findings

1. brain-reclassify-daily → dead route (NOT_FOUND, every 4h)

2. Three schedulers hammer /v1/pipeline/optimize and time out

3. firestore-daily-backup / firestore-weekly-backup → INVALID_ARGUMENT

4. rvagent-learning Cloud Run service won't start

5. Stale brain deployment + paused optimization jobs

6. Embedding engine

Reproduction

Proposed fix order (low-risk → high-risk)

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `brain-reclassify-daily` → dead route (NOT_FOUND, every 4h)

2. Three schedulers hammer `/v1/pipeline/optimize` and time out

3. `firestore-daily-backup` / `firestore-weekly-backup` → `INVALID_ARGUMENT`

4. `rvagent-learning` Cloud Run service won't start