Skip to content

feat: Depth perception from mono camera#18

Merged
MJohnson459 merged 31 commits into
mainfrom
l1-depth-perception
Jul 4, 2026
Merged

feat: Depth perception from mono camera#18
MJohnson459 merged 31 commits into
mainfrom
l1-depth-perception

Conversation

@MJohnson459

@MJohnson459 MJohnson459 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

TL;DR

The robot navigates on a single 2D lidar plane ~13 cm up; everything below it (cables, thresholds, chair/table legs, a robot vacuum) is invisible — exactly what a low robot snags on. This PR turns the existing RGB camera into a PointCloud2 of those low/thin obstacles (/camera_obstacles) via off-board monocular depth, feeding a dedicated Nav2 layer. The lidar stays the primary, fast obstacle+clearing source; this is an additive near-band marker that can never clear a lidar hit — worst case is a spurious mark, never a missed wall.

See it (live, on the robot)

Detection vs. clean floor

Detection (right) vs. raw camera (left). The stool legs, the bin, and a transparent storage box all mark — the transparent box is something the lidar sees straight through. The open floor stays clean: no false positives.

Go-under gate

Go-under height gate. Green (≤ 0.18 m) marks, so Nav2 avoids the legs; red is above the robot's height and passable — it paths through the gap under a seat or tabletop instead of treating the whole object as a wall.

(A third camera-vs-lidar BEV figure was removed in review: it was rendered by an uncommitted script and showed the lidar misaligned — consistent with plotting raw scan coordinates without the scan→base transform, which is yawed 90°. tools/depth_obstacles.py is now the committed, frame-correct generator for its replacement.)

Review guide

Suggested reading order — integration first, then the pipeline core, then supporting pieces:

  1. mote_bringup/config/nav2_params.yaml — the Nav2 integration: a self-clearing VoxelLayer on the local costmap only, separate from the lidar layer, near-band (≤ 1.2 m), max_obstacle_height: 0.18 go-under gate. The comments carry the design rationale; this file is where a config mistake would matter most.
  2. mote_perception/mote_perception/depth_obstacle_node.py — the robot-side node (torch-free): compressed image → depth server → lidar-anchored metric rescale → back-project → ground-level → z/range gates → PointCloud2. Lazy: does zero work (including inference) unless an output is subscribed.
  3. mote_perception/mote_perception/depth_wire.py — the server↔node wire protocol in one shared module: spec, framing helpers, DepthClient, and the rationale for a hand-rolled length-prefixed TCP protocol over gRPC/ROS for this link.
  4. mote_perception/mote_perception/lidar_rescale.py — the metric-scale core: per-frame Theil-Sen affine-in-disparity fit anchored to lidar range returns; holds the last good fit when a scan can't constrain it. The module docstring explains the estimator choice (and why the earlier floor-plane anchor was removed).
  5. mote_perception/mote_perception/ground_projection.py — camera geometry: GroundProjector (shared back-projection via pixel_rays/back_project), floor-plane fit, cloud leveling.
  6. mote_perception/tools/depth_server.py + depth_workstation.sh + the depth pixi feature — the off-board torch server (own env, no-default-feature, never touches the robot/Pi solve) and the one-command bring-up (pixi run depth).
  7. mote_perception/tools/ — offline bag harnesses on shared bag_utils.py: depth_bag_replay (pipeline replay + fit diagnostics), depth_bag_eval (model accuracy/speed vs lidar), depth_obstacles (decision-level overlay + BEV), bag_overlay (geometry sanity), measure_camera_pitch (mount calibration). Dev-only, lower review priority.
  8. mote.rviz, README.mds, package.xml/setup.py — wiring and docs.

Design choices

  • Relative Depth Anything V2-Small, not a metric model: we refit scale every frame anyway, and the relative model measured both more accurate and faster. (A DA3 server was evaluated and removed — it needed its own Python ≤ 3.13 venv and duplicated the serve loop; depth_bag_eval can still compare any server speaking the wire protocol.)
  • Lidar-anchored metric rescale only: the lidar gives metric truth through a chassis-fixed transform, invariant to body/floor tilt. The floor-plane scale anchor was removed in review — floor gradients and resting pitch shift it, and a second scale path made failures harder to attribute. When lidar can't constrain a frame the last good correction is held; before the first fit the frame is skipped, loudly.
  • Off-board, two processes: a resident torch server + a torch-free rclpy node, so torch never enters the robot/ROS env. The link is a deliberate hand-rolled protocol (depth_wire.py docstring has the trade-off against ROS/gRPC/HTTP).
  • Cloud stamped at capture time, so Nav2 tf-places it correctly and the ~0.6 s latency is absorbed without a speed cap.

Measured, not asserted

  • False positives on clean floor ≈ 0, including a bright specular sun-glare floor — the exact case that defeated the earlier classical-CV attempt.
  • Latency (capture→publish) is inference-bound, ~0.63 s on a quiet workstation (queue 140 ms + inference 458 ms + post 41 ms); honestly up to ~2 s under CPU contention (mitigations scoped as follow-ups).
  • Costmap: the live run contributed ~40 camera-only lethal cells in the local costmap (confirmed by toggling the layer) — real marks from obstacles the lidar never saw.

Review hardening (later commits)

Two review passes are folded into the branch:

  • Correctness (3b8f406, f9845ee, c2ec883, 5cade87): forced-lidar mode no longer silently falls back; degenerate fits reject the frame with a warning instead of publishing confident garbage; the floor offset is subtracted after leveling; off-FOV lidar returns can't fold back through the distortion polynomial and bias the fit; one corrupt JPEG no longer kills the inference server; the offline tools run from a clean checkout.
  • Consolidation (6dad7b5, 200de85): the wire protocol moved into one shared module (depth_wire.py) with a DepthClient used everywhere; the unreliable floor-plane scale anchor was deleted outright; back-projection has one implementation (GroundProjector.back_project — one tool's private copy had drifted and ignored distortion); the rosbag2 boilerplate collapsed into tools/bag_utils.py; bag_overlay's lidar overlay now transforms through /tf_static (the scan frame is yawed 90° from base — skipping it was wrong, not approximate); the superseded motion-parallax prototype (bev_motion.py) and the misleading BEV figure were removed.

Risk / rollout

Additive and isolated: local costmap only, near-band, a separate layer that can't clear lidar marks, self-clearing from its own dense frames. Merging changes nothing until pixi run depth is running. Known caveat (documented in the layer config): a phantom mark over open floor with nothing behind it in range gets no clearing ray until the rolling window scrolls past; measured ≈ 0 on clean floor, and spatio_temporal_voxel_layer is the swap-in if it shows up live.

Remaining gate

Developed and validated offline against recorded bags (tools/depth_bag_eval.py accuracy/speed, tools/depth_obstacles.py decision overlays vs lidar) plus the live cross-network run shown above. The one open gate is a live driving run confirming the controller avoids a marked low obstacle (tracked as a follow-up).

Follow-ups

MJohnson459 and others added 9 commits June 29, 2026 17:31
Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes
or deps added yet):
- ground_projection.py: shared pixel<->floor geometry (camera->base via static TF)
- free_space.py: classical appearance floor segmentation (spike — fast but
  false-positive prone under variable lighting)
- depth_rescale.py: robust per-frame metric rescaling of learned mono-depth
  against the known floor plane (RANSAC affine-in-disparity) — the chosen L1
  direction; inlier fraction gates seed contamination
- tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval,
  segmentation video) for evaluating approaches against recorded bags

Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust;
findings drove the decision to pursue learned depth off-board.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env:
- tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and
  serves depth over a socket. Runs in a throwaway torch venv on the workstation.
- depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed
  frame to the server, metrically rescales the returned depth against the known
  floor plane (depth_rescale), back-projects, keeps points above z_obstacle
  (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image
  capture time so Nav2 places it via tf at the moment it was seen — how the
  off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar
  stays the primary, low-latency obstacle/clearing source; this is a supplementary
  marker for the low/thin things the 2D scan misses.

z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise
is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the
lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay.
Validated end-to-end against a recorded bag via ros2 bag play. README documents it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Diagnostics for the L1 monocular-depth obstacle pipeline:
- depth_obstacle_node gains a publish_debug param (default true) that publishes
  the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered,
  floor-inclusive cloud (/camera_cloud_full) for geometry checks.
- mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays
  (AxisColor by height) and a Depth image display.
- tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor
  and solvePnPs its plane to read the camera's pitch/roll/height relative to the
  floor it sits on, folding in chassis tilt and local floor slope.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The floor-plane rescale fits a narrow near-floor band and extrapolates to far
walls, and depends on the camera->floor angle, which was measured to wander ~1.5
deg across rest positions (floor slope + how the robot rests) — so the obstacle
cloud over-ranged past the walls even stationary. Lidar gives metric range on the
walls themselves through the body-fixed lidar->camera transform, which is immune
to chassis/floor tilt.

LidarDepthRescaler matches each scan return to its camera pixel, samples the model
depth there, and fits the shared affine-in-disparity correction on those pairs.
The node buffers scans and matches the one nearest the image *capture* stamp
(absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto
holds the last good lidar (a,b) when a scan can't constrain it rather than falling
back to the floor fit it replaces. Logs scale source, pair count, and scan dt.

Scale only; the cloud is still back-projected through the level-URDF transform, so
residual pitch can still skew floor-point z-classification — a follow-up plane-fit
on the lidar-scaled floor points will recover that.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The single-threaded executor was monopolized by the ~0.5 s blocking inference in
_on_image, so _on_scan was starved, the scan buffer went stale, and no scan
landed within scan_max_dt of an image's capture stamp — the node fit lidar once
at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor
with the scan subscription in its own callback group so scans keep buffering
during inference; snapshot the deque when matching (read on the image thread,
appended on the scan thread).

Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair
count, and the fitted (a,b); publish the raw pre-rescale model depth
(/camera_depth_raw) next to the rescaled one to separate model noise from a
runaway rescale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Reconcile the lock with the merged pixi.toml so it carries both the depth env
(rebased) and the bag-recorder deps (from main #17).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
@MJohnson459 MJohnson459 force-pushed the l1-depth-perception branch from 66a032a to d6c1feb Compare June 29, 2026 16:56
MJohnson459 and others added 20 commits June 30, 2026 09:42
Offline replay of the recorded bags (depth server + the real pairing/fit) showed
the raw model depth is clean every frame, but the lidar affine fit was RANSAC-
bistable: the scan's returns include a large cluster at one range (a wall filling
the view) that supports a near-flat degenerate line (slope ~0, depth inverted)
with inlier support equal to the true line, so the unconstrained fit flipped
between them frame to frame -- ~40% of frames collapsed to inverted/exploded depth,
which is the noise. Good fits had a in [1.6, 3.1]; degenerate ones a in [-0.32, 0.10].

fit_affine_disparity takes optional a_min/disp_floor: when set, RANSAC only scores
physically valid models (slope >= a_min, and a*disp_floor + b > 0 so corrected
disparity can't blow up), and keeps the valid seed if the least-squares refit drifts
off it. Default stays unconstrained, so the floor path is untouched (verified: exact
recovery). LidarDepthRescaler passes a_min=0.5 and rejects a residual invalid fit so
the node holds last-good. On both bags this takes degenerate frames 19/40 and 15/40
-> 0/40 while leaving the good-frame inliers (69%) and median depth (0.64 m) exactly
as the baseline; a pure-scale (b=0) alternative also reached 0 but shifted the median
to 0.74 m, so the 2-DOF constraint won on accuracy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Replays a recorded bag through the depth server + the real lidar pairing/fit and
reports per-frame raw depth, pair count/spread, fitted (a,b)/inliers, and rescaled
depth, saving colorized raw/corrected maps. A collapsed fit prints DEGENERATE. This
is the rig that localised the RANSAC bistability; keep it for analysing future
'depth goes to noise' bags without the robot.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
After the a<0 constraint, the cloud still flickered on a static scene: the fit was
bimodal between two physically valid lines (a~2.5 median depth 0.58 m, and a~1.1
median 0.95 m) with near-equal inlier support, and the count-RANSAC flipped between
them frame to frame. Offline replay showed the pairs actually follow one line
(near-constant local slope ~1.9); the bug is the inlier-*count* objective, which is
multimodal when the scatter is two-sided -- a steeper or shallower line catches the
same count.

Fit the lidar pairs with Theil-Sen (median of pairwise slopes, then median
intercept) -- a unique, deterministic central estimate. Across the three bags this
cuts the per-frame median-depth std from 0.18 m to 0.04-0.09 m and a-std from 0.66 to
0.14-0.22, with the mean unchanged, so it's stabilizing, not biasing. Theil-Sen is
naturally positive (no inverted line) with intercept ~0 (no blow-up), so the a_min
constraint added last commit is now redundant -- kept only as a defensive reject for a
pathological scan, and the in-RANSAC constraint is reverted. The floor seed keeps
count-RANSAC (its one-sided obstacle rejection is a different need; exact recovery
verified). No temporal smoothing -- Theil-Sen is stable per-frame with zero lag, which
EMA would trade away under motion.

Still scale-only and not yet validated on a moving/on-robot bag (occlusion-edge
parallax can exceed Theil-Sen's ~29% breakdown); the cloud is also still the full
image (the Phase-2 over-range/plane-fit gap).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Model-agnostic eval that talks to whatever depth server is on --port (V2 now, a V3
server later), so models are compared by pointing it at each. Per sampled frame it
measures, against the time-nearest lidar scan: held-out AbsRel/RMSE/delta1 after the
best affine alignment (the model's in-band shape/scale fidelity, comparable across
models) and server round-trip latency. Saves three views to inspect by eye -- the
depth map with lidar returns overprinted, a side elevation (range vs height) that
shows whether vertical edges lean into the distance, and a top-down BEV.

V2 baseline on bag 20260630_103318: ~470 ms/frame (CPU), AbsRel 0.231, delta1 57.5%,
and the side view shows the scene ramping up with range -- i.e. the slant is largely
in the depth itself, motivating the V2-vs-V3 comparison.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
depth_server takes --model and --disparity so the eval harness can compare models
(metric outputs depth; relative/SSI models output disparity -> invert to depth, the
pipeline refits scale either way). depth_bag_replay was still calling
fit_affine_disparity(a_min=...), removed when the lidar path moved to Theil-Sen --
point it at fit_affine_disparity_theilsen.

Finding (rescale-anyway pipeline, 3 bags): relative V2-Small beats V2-Metric-Indoor
on every bag -- delta1 90.6/74.3/76.9% vs 57.5/70.9/72.7%, lower AbsRel/RMSE, ~40 ms
faster, and a stable non-bimodal fit. The metric model's absolute scale is discarded
by our per-frame affine, so it buys nothing and is worse-conditioned. The slant
persists across both, so it is geometric (back-projection/pitch), not the model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
depth_server_da3.py serves Depth Anything 3 over the same socket protocol so the rig
compares it like any other model. DA3 needs Python<=3.13 (the pixi depth env is 3.14),
so it runs in its own uv venv (documented in the file); its export path pulls heavy
3D/video deps (open3d/moviepy/pycolmap) that single-image depth never uses, so those
are stubbed at import -- the actual install is a venv + CPU torch + a few small deps.
Takes --model and --intrinsics (metric variants use a canonical-focal transform).

depth_bag_eval now reports raw (no-rescale) AND affine-aligned accuracy, so 'does a
metric model need rescaling' is answerable directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The node refits a full affine in disparity against lidar every frame, so a metric
model's absolute scale is discarded; measured over 3 bags, relative V2-Small beats
V2-Metric-Indoor on accuracy (aligned delta1 ~91 vs ~57% on one bag, better on all),
is faster, and gives a stable non-bimodal fit. Make it the default MODEL. Relative
models output disparity, so invert to depth by default; --metric passes a metric
model through unchanged. Bare `pixi run depth-server` (used by depth_workstation.sh)
now serves the relative model -- no node change needed.

Note: swapping the model id alone (without the inversion) yields a garbled depth map,
since the node expects depth and the relative model emits disparity -- the inversion
is the missing piece.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Pack each point's camera pixel into the camera_obstacles and
camera_cloud_full clouds as an RGB8 field and switch the RViz displays
to RGB8, so the depth cloud renders in real camera colour rather than a
Z-axis gradient.

Make depth_obstacle_node lazy: skip the off-board inference and all
downstream work unless an output is subscribed, gating each stage (raw
and corrected depth images, full cloud, obstacle cloud) on its own
subscription count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Fit the floor plane (z = a*x + b*y + c, RANSAC over near-floor points)
each frame and rotate the whole cloud about the camera origin so the
floor is level. This removes the residual camera tilt the level URDF
transform misses (dynamic pitch / floor slope), which previously left a
range-dependent z-ramp: floor points drifted above the obstacle gate
(false positives) and verticals leaned into the distance.

It is a rotation, not a z-only shear: the lean is a rotation of the
cloud about the camera, so it sits in x as well as z; rotating the
normal onto +z straightens verticals and flattens the floor together.
Over-large or unconstrained fits are rejected and the last good rotation
held, so the cloud can't snap flat<->tilted frame to frame.

Add ground_projection.fit_ground_plane + level_rotation (shared, pure),
a plane_fit toggle, and a plane-levelled side view to depth_bag_eval for
before/after inspection. Validated on a synthetic known-tilt cloud
(exact recovery) and a recorded bag (1-3 deg recovered, floor flat after).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The lidar affine slope a maps the model's disparity onto the true
(lidar) disparity, so its magnitude absorbs the model's arbitrary
disparity units. The pipeline default is now the relative (SSI) V2-Small
model, whose disparity scale puts a valid fit near 0.25-0.5 -- an order
of magnitude below the metric model a_min=0.5 was picked for. That
threshold rejected every valid relative-model fit, so the node never
obtained a lidar scale and silently fell back to the floor fit (the
over-ranging path lidar anchoring exists to replace).

Drop a_min to 0.05 so it guards only the sign/near-flat degeneracy
(a <= ~0) it was meant to, independent of the model's disparity scale.
Verified on recorded bags: valid fits land a=0.25-0.47 (corrected depth
AbsRel 0.13-0.33, delta1 79-92% vs lidar); with a_min=0.5 all were
rejected, with 0.05 all pass and the cloud is lidar-anchored again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Wire the /camera_obstacles cloud into a dedicated VoxelLayer on the local
costmap, so the low/near things the single lidar plane misses (cables,
thresholds, chair/table legs, a clothes-horse cross-bar) actually stop
the robot. Until now the cloud was produced but nothing consumed it.

- Clamp the published cloud to the mount's validated near band by
  dropping the node range_max default 3.0 -> 1.2 m. Past ~1.2 m the
  monocular depth compresses into false positives; the near band is both
  where this layer is trusted and what the goal cares about.
- Add camera_layer (VoxelLayer) to the local costmap only, separate from
  the lidar obstacle_layer so the camera can never clear a lidar mark and
  a laggy source never touches the global plan. It marks and clears from
  its own dense observations. sensor_frame pins the clearing-ray origin to
  the camera height (cloud stays leveled base_footprint) so rays descend
  onto the floor instead of rising through the low-obstacle band.

Offline validation on the recorded low-obstacle bags: the near band
(<=1.2 m) detects the dumbbell, clothes-horse legs, lamp base and chair
legs while clean floor marks ~0 points. Live nav2/robot behaviour --
including standstill decay of any phantom over open floor -- is the
documented remaining gate (see README caveat).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
All validation so far is offline against recorded bags. Record the
workstation+Pi co-run, the ordered live checks (camera_layer actually
marks / off-board latency vs transform_tolerance / drive past a low
obstacle / motion-only FP from a stale held level rotation), and note the
bright-glare-floor FP result, so the first robot session doesn't
rediscover them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
The camera obstacle band had no meaningful upper bound (marked up to
0.8 m), so a chair seat or tabletop the robot fits beneath was marked as
an obstacle and the robot would refuse to path under it.

Cap the obstacle band at the robot's height plus a margin. Because the
camera layer is a 3D VoxelLayer, capping the marking height means the
legs of the furniture (which reach the floor) still mark while the seat
or top above the cap does not -- so the robot avoids the legs but drives
through the clear gap between them.

- Nav2 camera_layer max_obstacle_height 0.8 -> 0.18 m (the authoritative
  gate: current ~0.13 m chassis + margin), voxel grid retuned to a 0.20 m
  top (z_resolution 0.025, z_voxels 8). Documented how to reconfigure for
  the ~0.30 m arm build (raise gate + grid; the benefit shrinks there).
- Node: split the obstacle-cloud ceiling (new z_obstacle_max, default
  0.5 m) from the debug-cloud ceiling (z_ceiling stays 1.6 m), so it stops
  streaming points Nav2 discards while the full debug view is unaffected.

Verified offline (bag 145232 f1532): marks drop 26984 -> 5060 between an
0.8 m and 0.15 m cap, keeping the floor-reaching chair legs while the
seats/backs above the cap stop marking (go_under_height_gate.png).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Inference is slower than the camera frame rate, so a deeper image queue
only ever feeds stale frames. Depth 1 drops the backlog and keeps the
published obstacle cloud as fresh as the pipeline allows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
The README's pipeline paragraph still described the old metric model +
floor-plane rescale; update it to the relative model with lidar-anchored
scale, and add live output images.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Make every degenerate path fail safe and loud instead of publishing a
wrong or silently-empty obstacle cloud:

- rescale_source:=lidar no longer falls back to the floor fit when the
  lidar is unavailable; it skips the frame as the forced mode promises.
- An empty or degenerate floor seed returns inlier fraction 0 (frame
  rejected with a warning) instead of a maximally-confident garbage fit
  that published empty clouds forever.
- _ground_correct now subtracts the fitted floor offset c after
  leveling, so a depth-scale error can't shift the whole floor across
  the 2 cm obstacle threshold.
- Lidar returns outside the pinhole FOV are rejected before projection;
  the distortion polynomial could fold far-off-axis points back into
  image bounds and bias the Theil-Sen scale fit.
- The depth map is resized to the camera_info resolution before any
  consumer indexes it, and a corrupt color frame is skipped instead of
  crashing the callback.
- The server replies with an H=0,W=0 sentinel for a frame it cannot
  process and survives decode/inference/socket errors; previously one
  corrupt JPEG killed the process for the rest of the mission. The node
  validates msg.format (image_transport style, e.g. "rgb8; jpeg
  compressed bgr8") before sending and handles the sentinel without
  waiting out the socket timeout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
depth_obstacles.py, bag_overlay.py, and bev_motion.py hardcoded output
paths under an ephemeral per-session scratch directory that no longer
exists, so the committed validation harness could not reproduce its
results. depth_obstacles.py gains argparse with ~/.mote defaults; the
other two derive their output directory from the bag path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pipeline committed to relative Depth Anything V2-Small (measured
both more accurate and faster); the DA3 server needed its own uv venv
on Python <= 3.13 plus import stubs for unused 3D/video deps, and
duplicated the whole serve loop. depth_bag_eval can still compare
models by pointing at any server that speaks the wire protocol.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
self.grid was stored but never read after building the undistorted
rays; pixels_to_ground NaN-ed invalid points twice; sensor_msgs_py was
declared in package.xml but never imported.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

MJohnson459 and others added 2 commits July 4, 2026 12:14
Review feedback: the code had grown as self-contained files with the
common pieces copy-pasted, the floor-plane scale anchor was still wired
in despite testing showing it unreliable (floor gradients and resting
pitch shift it), and the socket protocol existed only as inline code in
three places.

- New depth_wire.py: the wire protocol in one place — the spec, the
  framing helpers used by the server, a DepthClient (persistent
  connection, reconnect, sentinel handling) used by the node and the
  offline tools, and the rationale for a hand-rolled protocol over
  gRPC/ROS for this link. The node sheds all connection code.
- Floor-plane rescale removed (depth_rescale.py deleted): the node is
  lidar-anchored only — fit, else hold the last good correction, else
  skip the frame loudly. One scale path to debug. The Theil-Sen fit and
  the disparity-affine apply move into lidar_rescale.py; the
  rescale_source parameter goes away.
- GroundProjector gains cached pixel_rays() and back_project(), the one
  implementation of back-projection; the node, depth_bag_eval, and
  depth_obstacles all used private copies (depth_obstacles' copy had
  drifted: it ignored the distortion coefficients).
- tools/bag_utils.py: shared bag loading, base transforms, scan
  matching, colorize — previously four private copies of the rosbag2
  boilerplate.
- depth_obstacles.py rewritten server-based on the shared modules: same
  stages and gates as the live node, BEV with both point sets in
  base_footprint. bag_overlay.py's lidar overlay now transforms through
  /tf_static — the scan frame is yawed 90 deg from base, so plotting
  raw scan coordinates was wrong, not approximate.
- bev_motion.py deleted: the classical motion-parallax prototype the
  learned-depth approach replaced.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The camera-vs-lidar BEV image was rendered by an ad-hoc script that was
never committed, and the lidar in it does not show a doorway that
should be visible — consistent with plotting raw scan coordinates
without the scan->base transform (the scan frame is yawed 90 deg from
base). Removed rather than defended; depth_obstacles.py is now the
committed, frame-correct generator for a replacement once regenerated
against a live bag.

The README pipeline section drops the floor-fallback description,
points at depth_wire.py for the protocol, and gains a tools inventory.
CLAUDE.md's mote_perception section now describes the L1 pipeline
instead of stopping at L0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

@MJohnson459 MJohnson459 merged commit 1308bfb into main Jul 4, 2026
3 checks passed
@MJohnson459 MJohnson459 deleted the l1-depth-perception branch July 4, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant