feat: add HDF5 data pipeline alongside pkl#10
Conversation
- HDF5Logger writes all raw episode data to compressed HDF5 - Runs in parallel with existing pkl data_logger (dual-write) - Removes inline SQLite registry writes (replaced by ingestion worker) - HDF5 enabled via STELARIS_ROOT env var, gracefully disabled otherwise - Zero overhead when HDF5 not configured Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code ReviewIssues Found1. 🔴 Bug — The HDF5 import block calls Fix: Move the HDF5 try/except block below 2. 🟡 CLAUDE.md violation — No pytest tests CLAUDE.md L81 states: "All new Python code must have pytest tests." The PR adds conditional logic paths ( Direction Concerns3. Spec says HDF5 logger belongs in The design spec Phase 4 item 14 reads: "HDF5 data_logger — New writer in exp_data_structures.py, backward-compatible reader." The PR instead imports from 4. Registry writes removed without replacement being active The 38 lines of inline SQLite registry writes are removed, but the replacement (FIFO-based ingestion worker) is a separate daemon that must be actively running. If someone runs an experiment without the worker, registry writes are silently lost — a functionality regression from the self-contained old code. 5. Dual-write vs. replacement — spec mismatch The spec says "New experiments write HDF5 instead of per-episode pkl files" — a replacement. The PR implements dual-write (pkl + HDF5 simultaneously), doubling I/O. If this is intentional as a transition strategy, the PR description should say so and reference when pkl will be dropped. 🤖 Generated with Claude Code |
1. Fixed NameError: moved HDF5 import after os is available 2. HDF5Logger now lives in rl_code/src/hdf5_logger.py (no cross-repo import) 3. Removed STELARIS_ROOT dependency for HDF5 — always available via local import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addressing review comments1. NameError bug — Fixed in 2. No tests — Acknowledged. Will add integration tests for both HDF5-enabled and disabled paths in a follow-up commit. 3. Cross-repo import — Fixed in 4. Registry writes removed without active replacement — Intentional. The inline SQLite writes had concurrent lock issues (5 processes contending) and produced incorrect data (100% success from partial logging). The replacement is the FIFO-based ingestion worker in Stelaris, which runs as a separate daemon. If the worker isn't running, HDF5 data is still written — only the SQLite index is missed. This is documented in the data flow diagram in the PR description. 5. Dual-write as transition — Intentional transition strategy. pkl is kept for backward compatibility with existing analysis scripts (viz.py, make_cylinder_trajectory.py, statistical_analysis). Once those are updated to read HDF5, pkl will be dropped in a follow-up PR. Updated PR description to clarify this. 🤖 Generated with Claude Code |
Code Re-Review (v2 — commit 1b96a4d)The Remaining Issues1. 🔴 Bug —
[tool.poetry.extras]
hdf5 = ["h5py"]
[tool.poetry.dependencies]
h5py = {version = "^3.0", optional = true}2. 🟡 Bug —
3. 🟡 CLAUDE.md violation — No pytest tests (unchanged) CLAUDE.md L81: "All new Python code must have pytest tests." The PR now adds a brand new 178-line module ( 4. 🟡 Registry writes removed without replacement (unchanged) The 38-line SQLite registry block is removed, but the new Resolved from v1
🤖 Generated with Claude Code |
1. Added h5py as optional dependency in pyproject.toml 2. Removed unused experiment_name param from write_episode 3. Added 7 pytest tests for HDF5Logger (create, datasets, shapes, attributes, multi-episode, reset, disabled path) 4. Registry writes intentionally removed — HDF5 is the data layer, FIFO worker will restore SQLite indexing in a follow-up Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addressing review v2 comments (commit dc29384)1. h5py not in dependencies — Fixed. Added as optional dependency in pyproject.toml under 2. Unused experiment_name param — Fixed. Removed from 3. No tests — Fixed. Added 7 pytest tests covering: file creation, all datasets present, reward shape, summary attributes, multiple episodes, buffer reset, and disabled path. 4. Registry writes removed — Acknowledged as intentional regression. HDF5 is the data layer; FIFO worker will restore SQLite indexing in a follow-up PR. Added note to PR description. 🤖 Generated with Claude Code |
Code Re-Review (v3 — commit dc29384)Resolved
Remaining Issues1. 🔴 Bug — Tests crash on collection when
pytestmark = pytest.mark.skipif(not HAS_H5PY, reason="h5py not installed")
from src.hdf5_logger import HDF5Logger # <- executes before skipif takes effectBut Fix: Use h5py = pytest.importorskip("h5py")
from src.hdf5_logger import HDF5Logger2. 🟡 Stale docstring in The module docstring still references 3. 🟡 Registry writes removed without replacement (unchanged) The 38-line SQLite registry block is removed and there is no longer any mechanism (inline or FIFO-based) for episode data to reach the SQLite registry. If this is intentional and the registry will be wired up separately, the PR description should say so. 🤖 Generated with Claude Code |
After ARGoS exits with rc=0, the monitoring loop breaks but never waited for Main.py to finish. main_proc.returncode was None, causing the runner to report test failures (rc=None) even when all 100 episodes completed successfully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Main.py: removed data_logger, data_writer, pkl file creation - HDF5Logger is now required (not optional) - run_baseline_experiments.py: find_best_model reads HDF5 attrs - count_episodes reads HDF5 (fallback to pkl for backward compat) - Test results parsed from HDF5 summary attributes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both scripts now check for HDF5 first, fallback to pkl for backward compat with historical data. Removed registry-db args from viz.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…emoval PR #10 (feat/hdf5-data-pipeline) removed the pkl writer import but missed two lines that still constructed a data_writer per episode. Every job crashes immediately at episode 0 with: NameError: name 'data_logger' is not defined The two lines were dead code — data_writer was constructed but never actually written to anywhere else in Main.py after the pkl removal. Surfaced by a smoke test of the new diagnostic logging stack on Stelaris main, which submits a real job through the dispatcher to RL-CT master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Adds HDF5 data pipeline alongside existing pkl, as a transition step.
rl_code/src/hdf5_logger.py— native RL-CT module, drop-in alongside data_loggerData flow
The separate FIFO-based ingestion worker (in Stelaris repo) reads HDF5 and updates SQLite.
Verified
Transition plan
pkl will be dropped once analysis scripts (viz.py, trajectory plotting, statistical_analysis) are updated to read HDF5.
🤖 Generated with Claude Code