feat: Depth perception from mono camera by MJohnson459 · Pull Request #18 · ClachDev/Mote

MJohnson459 · 2026-06-29T11:10:37Z

TL;DR

The robot navigates on a single 2D lidar plane ~13 cm up; everything below it (cables, thresholds, chair/table legs, a robot vacuum) is invisible — exactly what a low robot snags on. This PR turns the existing RGB camera into a PointCloud2 of those low/thin obstacles (/camera_obstacles) via off-board monocular depth, feeding a dedicated Nav2 layer. The lidar stays the primary, fast obstacle+clearing source; this is an additive near-band marker that can never clear a lidar hit — worst case is a spurious mark, never a missed wall.

See it (live, on the robot)

Detection (right) vs. raw camera (left). The stool legs, the bin, and a transparent storage box all mark — the transparent box is something the lidar sees straight through. The open floor stays clean: no false positives.

Go-under height gate. Green (≤ 0.18 m) marks, so Nav2 avoids the legs; red is above the robot's height and passable — it paths through the gap under a seat or tabletop instead of treating the whole object as a wall.

(A third camera-vs-lidar BEV figure was removed in review: it was rendered by an uncommitted script and showed the lidar misaligned — consistent with plotting raw scan coordinates without the scan→base transform, which is yawed 90°. tools/depth_obstacles.py is now the committed, frame-correct generator for its replacement.)

Review guide

Suggested reading order — integration first, then the pipeline core, then supporting pieces:

mote_bringup/config/nav2_params.yaml — the Nav2 integration: a self-clearing VoxelLayer on the local costmap only, separate from the lidar layer, near-band (≤ 1.2 m), max_obstacle_height: 0.18 go-under gate. The comments carry the design rationale; this file is where a config mistake would matter most.
mote_perception/mote_perception/depth_obstacle_node.py — the robot-side node (torch-free): compressed image → depth server → lidar-anchored metric rescale → back-project → ground-level → z/range gates → PointCloud2. Lazy: does zero work (including inference) unless an output is subscribed.
mote_perception/mote_perception/depth_wire.py — the server↔node wire protocol in one shared module: spec, framing helpers, DepthClient, and the rationale for a hand-rolled length-prefixed TCP protocol over gRPC/ROS for this link.
mote_perception/mote_perception/lidar_rescale.py — the metric-scale core: per-frame Theil-Sen affine-in-disparity fit anchored to lidar range returns; holds the last good fit when a scan can't constrain it. The module docstring explains the estimator choice (and why the earlier floor-plane anchor was removed).
mote_perception/mote_perception/ground_projection.py — camera geometry: GroundProjector (shared back-projection via pixel_rays/back_project), floor-plane fit, cloud leveling.
mote_perception/tools/depth_server.py + depth_workstation.sh + the depth pixi feature — the off-board torch server (own env, no-default-feature, never touches the robot/Pi solve) and the one-command bring-up (pixi run depth).
mote_perception/tools/ — offline bag harnesses on shared bag_utils.py: depth_bag_replay (pipeline replay + fit diagnostics), depth_bag_eval (model accuracy/speed vs lidar), depth_obstacles (decision-level overlay + BEV), bag_overlay (geometry sanity), measure_camera_pitch (mount calibration). Dev-only, lower review priority.
mote.rviz, README.mds, package.xml/setup.py — wiring and docs.

Design choices

Relative Depth Anything V2-Small, not a metric model: we refit scale every frame anyway, and the relative model measured both more accurate and faster. (A DA3 server was evaluated and removed — it needed its own Python ≤ 3.13 venv and duplicated the serve loop; depth_bag_eval can still compare any server speaking the wire protocol.)
Lidar-anchored metric rescale only: the lidar gives metric truth through a chassis-fixed transform, invariant to body/floor tilt. The floor-plane scale anchor was removed in review — floor gradients and resting pitch shift it, and a second scale path made failures harder to attribute. When lidar can't constrain a frame the last good correction is held; before the first fit the frame is skipped, loudly.
Off-board, two processes: a resident torch server + a torch-free rclpy node, so torch never enters the robot/ROS env. The link is a deliberate hand-rolled protocol (depth_wire.py docstring has the trade-off against ROS/gRPC/HTTP).
Cloud stamped at capture time, so Nav2 tf-places it correctly and the ~0.6 s latency is absorbed without a speed cap.

Measured, not asserted

False positives on clean floor ≈ 0, including a bright specular sun-glare floor — the exact case that defeated the earlier classical-CV attempt.
Latency (capture→publish) is inference-bound, ~0.63 s on a quiet workstation (queue 140 ms + inference 458 ms + post 41 ms); honestly up to ~2 s under CPU contention (mitigations scoped as follow-ups).
Costmap: the live run contributed ~40 camera-only lethal cells in the local costmap (confirmed by toggling the layer) — real marks from obstacles the lidar never saw.

Review hardening (later commits)

Two review passes are folded into the branch:

Correctness (3b8f406, f9845ee, c2ec883, 5cade87): forced-lidar mode no longer silently falls back; degenerate fits reject the frame with a warning instead of publishing confident garbage; the floor offset is subtracted after leveling; off-FOV lidar returns can't fold back through the distortion polynomial and bias the fit; one corrupt JPEG no longer kills the inference server; the offline tools run from a clean checkout.
Consolidation (6dad7b5, 200de85): the wire protocol moved into one shared module (depth_wire.py) with a DepthClient used everywhere; the unreliable floor-plane scale anchor was deleted outright; back-projection has one implementation (GroundProjector.back_project — one tool's private copy had drifted and ignored distortion); the rosbag2 boilerplate collapsed into tools/bag_utils.py; bag_overlay's lidar overlay now transforms through /tf_static (the scan frame is yawed 90° from base — skipping it was wrong, not approximate); the superseded motion-parallax prototype (bev_motion.py) and the misleading BEV figure were removed.

Risk / rollout

Additive and isolated: local costmap only, near-band, a separate layer that can't clear lidar marks, self-clearing from its own dense frames. Merging changes nothing until pixi run depth is running. Known caveat (documented in the layer config): a phantom mark over open floor with nothing behind it in range gets no clearing ray until the rolling window scrolls past; measured ≈ 0 on clean floor, and spatio_temporal_voxel_layer is the swap-in if it shows up live.

Remaining gate

Developed and validated offline against recorded bags (tools/depth_bag_eval.py accuracy/speed, tools/depth_obstacles.py decision overlays vs lidar) plus the live cross-network run shown above. The one open gate is a live driving run confirming the controller avoids a marked low obstacle (tracked as a follow-up).

Follow-ups

Research: evaluate replacing v4l2_camera with usb_cam #19 — evaluate replacing v4l2_camera with usb_cam (framerate + MJPEG offload).
Research: SfM / multi-view depth as an improvement or replacement for single-image Depth Anything #21 — research SfM / multi-view depth (odometry gives a metric baseline) as an improvement or replacement for single-image Depth Anything — would remove the per-frame scale refit entirely.
Regenerate the camera-vs-lidar BEV figure with tools/depth_obstacles.py from a live bag.
Contention hardening: cap torch.set_num_threads, raise the node socket_timeout (deepen the scan buffer in tandem).
Measure the model input-resolution vs. inference-time tradeoff.

Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes or deps added yet): - ground_projection.py: shared pixel<->floor geometry (camera->base via static TF) - free_space.py: classical appearance floor segmentation (spike — fast but false-positive prone under variable lighting) - depth_rescale.py: robust per-frame metric rescaling of learned mono-depth against the known floor plane (RANSAC affine-in-disparity) — the chosen L1 direction; inlier fraction gates seed contamination - tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval, segmentation video) for evaluating approaches against recorded bags Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust; findings drove the decision to pursue learned depth off-board. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env: - tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and serves depth over a socket. Runs in a throwaway torch venv on the workstation. - depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed frame to the server, metrically rescales the returned depth against the known floor plane (depth_rescale), back-projects, keeps points above z_obstacle (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image capture time so Nav2 places it via tf at the moment it was seen — how the off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar stays the primary, low-latency obstacle/clearing source; this is a supplementary marker for the low/thin things the 2D scan misses. z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay. Validated end-to-end against a recorded bag via ros2 bag play. README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Diagnostics for the L1 monocular-depth obstacle pipeline: - depth_obstacle_node gains a publish_debug param (default true) that publishes the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered, floor-inclusive cloud (/camera_cloud_full) for geometry checks. - mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays (AxisColor by height) and a Depth image display. - tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor and solvePnPs its plane to read the camera's pitch/roll/height relative to the floor it sits on, folding in chassis tilt and local floor slope. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The floor-plane rescale fits a narrow near-floor band and extrapolates to far walls, and depends on the camera->floor angle, which was measured to wander ~1.5 deg across rest positions (floor slope + how the robot rests) — so the obstacle cloud over-ranged past the walls even stationary. Lidar gives metric range on the walls themselves through the body-fixed lidar->camera transform, which is immune to chassis/floor tilt. LidarDepthRescaler matches each scan return to its camera pixel, samples the model depth there, and fits the shared affine-in-disparity correction on those pairs. The node buffers scans and matches the one nearest the image *capture* stamp (absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto holds the last good lidar (a,b) when a scan can't constrain it rather than falling back to the floor fit it replaces. Logs scale source, pair count, and scan dt. Scale only; the cloud is still back-projected through the level-URDF transform, so residual pitch can still skew floor-point z-classification — a follow-up plane-fit on the lidar-scaled floor points will recover that. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The single-threaded executor was monopolized by the ~0.5 s blocking inference in _on_image, so _on_scan was starved, the scan buffer went stale, and no scan landed within scan_max_dt of an image's capture stamp — the node fit lidar once at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor with the scan subscription in its own callback group so scans keep buffering during inference; snapshot the deque when matching (read on the image thread, appended on the scan thread). Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair count, and the fitted (a,b); publish the raw pre-rescale model depth (/camera_depth_raw) next to the rescaled one to separate model noise from a runaway rescale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Reconcile the lock with the merged pixi.toml so it carries both the depth env (rebased) and the bag-recorder deps (from main #17). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Offline replay of the recorded bags (depth server + the real pairing/fit) showed the raw model depth is clean every frame, but the lidar affine fit was RANSAC- bistable: the scan's returns include a large cluster at one range (a wall filling the view) that supports a near-flat degenerate line (slope ~0, depth inverted) with inlier support equal to the true line, so the unconstrained fit flipped between them frame to frame -- ~40% of frames collapsed to inverted/exploded depth, which is the noise. Good fits had a in [1.6, 3.1]; degenerate ones a in [-0.32, 0.10]. fit_affine_disparity takes optional a_min/disp_floor: when set, RANSAC only scores physically valid models (slope >= a_min, and a*disp_floor + b > 0 so corrected disparity can't blow up), and keeps the valid seed if the least-squares refit drifts off it. Default stays unconstrained, so the floor path is untouched (verified: exact recovery). LidarDepthRescaler passes a_min=0.5 and rejects a residual invalid fit so the node holds last-good. On both bags this takes degenerate frames 19/40 and 15/40 -> 0/40 while leaving the good-frame inliers (69%) and median depth (0.64 m) exactly as the baseline; a pure-scale (b=0) alternative also reached 0 but shifted the median to 0.74 m, so the 2-DOF constraint won on accuracy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Replays a recorded bag through the depth server + the real lidar pairing/fit and reports per-frame raw depth, pair count/spread, fitted (a,b)/inliers, and rescaled depth, saving colorized raw/corrected maps. A collapsed fit prints DEGENERATE. This is the rig that localised the RANSAC bistability; keep it for analysing future 'depth goes to noise' bags without the robot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

After the a<0 constraint, the cloud still flickered on a static scene: the fit was bimodal between two physically valid lines (a~2.5 median depth 0.58 m, and a~1.1 median 0.95 m) with near-equal inlier support, and the count-RANSAC flipped between them frame to frame. Offline replay showed the pairs actually follow one line (near-constant local slope ~1.9); the bug is the inlier-*count* objective, which is multimodal when the scatter is two-sided -- a steeper or shallower line catches the same count. Fit the lidar pairs with Theil-Sen (median of pairwise slopes, then median intercept) -- a unique, deterministic central estimate. Across the three bags this cuts the per-frame median-depth std from 0.18 m to 0.04-0.09 m and a-std from 0.66 to 0.14-0.22, with the mean unchanged, so it's stabilizing, not biasing. Theil-Sen is naturally positive (no inverted line) with intercept ~0 (no blow-up), so the a_min constraint added last commit is now redundant -- kept only as a defensive reject for a pathological scan, and the in-RANSAC constraint is reverted. The floor seed keeps count-RANSAC (its one-sided obstacle rejection is a different need; exact recovery verified). No temporal smoothing -- Theil-Sen is stable per-frame with zero lag, which EMA would trade away under motion. Still scale-only and not yet validated on a moving/on-robot bag (occlusion-edge parallax can exceed Theil-Sen's ~29% breakdown); the cloud is also still the full image (the Phase-2 over-range/plane-fit gap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Model-agnostic eval that talks to whatever depth server is on --port (V2 now, a V3 server later), so models are compared by pointing it at each. Per sampled frame it measures, against the time-nearest lidar scan: held-out AbsRel/RMSE/delta1 after the best affine alignment (the model's in-band shape/scale fidelity, comparable across models) and server round-trip latency. Saves three views to inspect by eye -- the depth map with lidar returns overprinted, a side elevation (range vs height) that shows whether vertical edges lean into the distance, and a top-down BEV. V2 baseline on bag 20260630_103318: ~470 ms/frame (CPU), AbsRel 0.231, delta1 57.5%, and the side view shows the scene ramping up with range -- i.e. the slant is largely in the depth itself, motivating the V2-vs-V3 comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

depth_server takes --model and --disparity so the eval harness can compare models (metric outputs depth; relative/SSI models output disparity -> invert to depth, the pipeline refits scale either way). depth_bag_replay was still calling fit_affine_disparity(a_min=...), removed when the lidar path moved to Theil-Sen -- point it at fit_affine_disparity_theilsen. Finding (rescale-anyway pipeline, 3 bags): relative V2-Small beats V2-Metric-Indoor on every bag -- delta1 90.6/74.3/76.9% vs 57.5/70.9/72.7%, lower AbsRel/RMSE, ~40 ms faster, and a stable non-bimodal fit. The metric model's absolute scale is discarded by our per-frame affine, so it buys nothing and is worse-conditioned. The slant persists across both, so it is geometric (back-projection/pitch), not the model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

depth_server_da3.py serves Depth Anything 3 over the same socket protocol so the rig compares it like any other model. DA3 needs Python<=3.13 (the pixi depth env is 3.14), so it runs in its own uv venv (documented in the file); its export path pulls heavy 3D/video deps (open3d/moviepy/pycolmap) that single-image depth never uses, so those are stubbed at import -- the actual install is a venv + CPU torch + a few small deps. Takes --model and --intrinsics (metric variants use a canonical-focal transform). depth_bag_eval now reports raw (no-rescale) AND affine-aligned accuracy, so 'does a metric model need rescaling' is answerable directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The node refits a full affine in disparity against lidar every frame, so a metric model's absolute scale is discarded; measured over 3 bags, relative V2-Small beats V2-Metric-Indoor on accuracy (aligned delta1 ~91 vs ~57% on one bag, better on all), is faster, and gives a stable non-bimodal fit. Make it the default MODEL. Relative models output disparity, so invert to depth by default; --metric passes a metric model through unchanged. Bare `pixi run depth-server` (used by depth_workstation.sh) now serves the relative model -- no node change needed. Note: swapping the model id alone (without the inversion) yields a garbled depth map, since the node expects depth and the relative model emits disparity -- the inversion is the missing piece. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Pack each point's camera pixel into the camera_obstacles and camera_cloud_full clouds as an RGB8 field and switch the RViz displays to RGB8, so the depth cloud renders in real camera colour rather than a Z-axis gradient. Make depth_obstacle_node lazy: skip the off-board inference and all downstream work unless an output is subscribed, gating each stage (raw and corrected depth images, full cloud, obstacle cloud) on its own subscription count. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

Fit the floor plane (z = a*x + b*y + c, RANSAC over near-floor points) each frame and rotate the whole cloud about the camera origin so the floor is level. This removes the residual camera tilt the level URDF transform misses (dynamic pitch / floor slope), which previously left a range-dependent z-ramp: floor points drifted above the obstacle gate (false positives) and verticals leaned into the distance. It is a rotation, not a z-only shear: the lean is a rotation of the cloud about the camera, so it sits in x as well as z; rotating the normal onto +z straightens verticals and flattens the floor together. Over-large or unconstrained fits are rejected and the last good rotation held, so the cloud can't snap flat<->tilted frame to frame. Add ground_projection.fit_ground_plane + level_rotation (shared, pure), a plane_fit toggle, and a plane-levelled side view to depth_bag_eval for before/after inspection. Validated on a synthetic known-tilt cloud (exact recovery) and a recorded bag (1-3 deg recovered, floor flat after). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS

The lidar affine slope a maps the model's disparity onto the true (lidar) disparity, so its magnitude absorbs the model's arbitrary disparity units. The pipeline default is now the relative (SSI) V2-Small model, whose disparity scale puts a valid fit near 0.25-0.5 -- an order of magnitude below the metric model a_min=0.5 was picked for. That threshold rejected every valid relative-model fit, so the node never obtained a lidar scale and silently fell back to the floor fit (the over-ranging path lidar anchoring exists to replace). Drop a_min to 0.05 so it guards only the sign/near-flat degeneracy (a <= ~0) it was meant to, independent of the model's disparity scale. Verified on recorded bags: valid fits land a=0.25-0.47 (corrected depth AbsRel 0.13-0.33, delta1 79-92% vs lidar); with a_min=0.5 all were rejected, with 0.05 all pass and the cloud is lidar-anchored again. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

Wire the /camera_obstacles cloud into a dedicated VoxelLayer on the local costmap, so the low/near things the single lidar plane misses (cables, thresholds, chair/table legs, a clothes-horse cross-bar) actually stop the robot. Until now the cloud was produced but nothing consumed it. - Clamp the published cloud to the mount's validated near band by dropping the node range_max default 3.0 -> 1.2 m. Past ~1.2 m the monocular depth compresses into false positives; the near band is both where this layer is trusted and what the goal cares about. - Add camera_layer (VoxelLayer) to the local costmap only, separate from the lidar obstacle_layer so the camera can never clear a lidar mark and a laggy source never touches the global plan. It marks and clears from its own dense observations. sensor_frame pins the clearing-ray origin to the camera height (cloud stays leveled base_footprint) so rays descend onto the floor instead of rising through the low-obstacle band. Offline validation on the recorded low-obstacle bags: the near band (<=1.2 m) detects the dumbbell, clothes-horse legs, lamp base and chair legs while clean floor marks ~0 points. Live nav2/robot behaviour -- including standstill decay of any phantom over open floor -- is the documented remaining gate (see README caveat). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

All validation so far is offline against recorded bags. Record the workstation+Pi co-run, the ordered live checks (camera_layer actually marks / off-board latency vs transform_tolerance / drive past a low obstacle / motion-only FP from a stale held level rotation), and note the bright-glare-floor FP result, so the first robot session doesn't rediscover them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

The camera obstacle band had no meaningful upper bound (marked up to 0.8 m), so a chair seat or tabletop the robot fits beneath was marked as an obstacle and the robot would refuse to path under it. Cap the obstacle band at the robot's height plus a margin. Because the camera layer is a 3D VoxelLayer, capping the marking height means the legs of the furniture (which reach the floor) still mark while the seat or top above the cap does not -- so the robot avoids the legs but drives through the clear gap between them. - Nav2 camera_layer max_obstacle_height 0.8 -> 0.18 m (the authoritative gate: current ~0.13 m chassis + margin), voxel grid retuned to a 0.20 m top (z_resolution 0.025, z_voxels 8). Documented how to reconfigure for the ~0.30 m arm build (raise gate + grid; the benefit shrinks there). - Node: split the obstacle-cloud ceiling (new z_obstacle_max, default 0.5 m) from the debug-cloud ceiling (z_ceiling stays 1.6 m), so it stops streaming points Nav2 discards while the full debug view is unaffected. Verified offline (bag 145232 f1532): marks drop 26984 -> 5060 between an 0.8 m and 0.15 m cap, keeping the floor-reaching chair legs while the seats/backs above the cap stop marking (go_under_height_gate.png). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

Inference is slower than the camera frame rate, so a deeper image queue only ever feeds stale frames. Depth 1 drops the backlog and keeps the published obstacle cloud as fresh as the pipeline allows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

The README's pipeline paragraph still described the old metric model + floor-plane rescale; update it to the relative model with lidar-anchored scale, and add live output images. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

Make every degenerate path fail safe and loud instead of publishing a wrong or silently-empty obstacle cloud: - rescale_source:=lidar no longer falls back to the floor fit when the lidar is unavailable; it skips the frame as the forced mode promises. - An empty or degenerate floor seed returns inlier fraction 0 (frame rejected with a warning) instead of a maximally-confident garbage fit that published empty clouds forever. - _ground_correct now subtracts the fitted floor offset c after leveling, so a depth-scale error can't shift the whole floor across the 2 cm obstacle threshold. - Lidar returns outside the pinhole FOV are rejected before projection; the distortion polynomial could fold far-off-axis points back into image bounds and bias the Theil-Sen scale fit. - The depth map is resized to the camera_info resolution before any consumer indexes it, and a corrupt color frame is skipped instead of crashing the callback. - The server replies with an H=0,W=0 sentinel for a frame it cannot process and survives decode/inference/socket errors; previously one corrupt JPEG killed the process for the rest of the mission. The node validates msg.format (image_transport style, e.g. "rgb8; jpeg compressed bgr8") before sending and handles the sentinel without waiting out the socket timeout. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

depth_obstacles.py, bag_overlay.py, and bev_motion.py hardcoded output paths under an ephemeral per-session scratch directory that no longer exists, so the committed validation harness could not reproduce its results. depth_obstacles.py gains argparse with ~/.mote defaults; the other two derive their output directory from the bag path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The pipeline committed to relative Depth Anything V2-Small (measured both more accurate and faster); the DA3 server needed its own uv venv on Python <= 3.13 plus import stubs for unused 3D/video deps, and duplicated the whole serve loop. depth_bag_eval can still compare models by pointing at any server that speaks the wire protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

self.grid was stored but never read after building the undistorted rays; pixels_to_ground NaN-ed invalid points twice; sensor_msgs_py was declared in package.xml but never imported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor · 2026-07-04T10:21:08Z

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

Review feedback: the code had grown as self-contained files with the common pieces copy-pasted, the floor-plane scale anchor was still wired in despite testing showing it unreliable (floor gradients and resting pitch shift it), and the socket protocol existed only as inline code in three places. - New depth_wire.py: the wire protocol in one place — the spec, the framing helpers used by the server, a DepthClient (persistent connection, reconnect, sentinel handling) used by the node and the offline tools, and the rationale for a hand-rolled protocol over gRPC/ROS for this link. The node sheds all connection code. - Floor-plane rescale removed (depth_rescale.py deleted): the node is lidar-anchored only — fit, else hold the last good correction, else skip the frame loudly. One scale path to debug. The Theil-Sen fit and the disparity-affine apply move into lidar_rescale.py; the rescale_source parameter goes away. - GroundProjector gains cached pixel_rays() and back_project(), the one implementation of back-projection; the node, depth_bag_eval, and depth_obstacles all used private copies (depth_obstacles' copy had drifted: it ignored the distortion coefficients). - tools/bag_utils.py: shared bag loading, base transforms, scan matching, colorize — previously four private copies of the rosbag2 boilerplate. - depth_obstacles.py rewritten server-based on the shared modules: same stages and gates as the live node, BEV with both point sets in base_footprint. bag_overlay.py's lidar overlay now transforms through /tf_static — the scan frame is yawed 90 deg from base, so plotting raw scan coordinates was wrong, not approximate. - bev_motion.py deleted: the classical motion-parallax prototype the learned-depth approach replaced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The camera-vs-lidar BEV image was rendered by an ad-hoc script that was never committed, and the lidar in it does not show a doorway that should be visible — consistent with plotting raw scan coordinates without the scan->base transform (the scan frame is yawed 90 deg from base). Removed rather than defended; depth_obstacles.py is now the committed, frame-correct generator for a replacement once regenerated against a live bag. The README pipeline section drops the floor-fallback description, points at depth_wire.py for the protocol, and gains a tools inventory. CLAUDE.md's mote_perception section now describes the L1 pipeline instead of stopping at L0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor · 2026-07-04T11:31:46Z

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

MJohnson459 and others added 9 commits June 29, 2026 17:31

PR feedback

8425581

Remove old CV solution. Cleanup

f55886b

MJohnson459 force-pushed the l1-depth-perception branch from 66a032a to d6c1feb Compare June 29, 2026 16:56

MJohnson459 and others added 20 commits June 30, 2026 09:42

Add live depth-obstacle output images for docs

c06aa7b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW

Drop dead state and an unused dependency

5cade87

self.grid was stored but never read after building the undistorted rays; pixels_to_ground NaN-ed invalid points twice; sensor_msgs_py was declared in package.xml but never imported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MJohnson459 mentioned this pull request Jul 4, 2026

Research: SfM / multi-view depth as an improvement or replacement for single-image Depth Anything #21

Open

4 tasks

MJohnson459 and others added 2 commits July 4, 2026 12:14

MJohnson459 merged commit 1308bfb into main Jul 4, 2026
3 checks passed

MJohnson459 deleted the l1-depth-perception branch July 4, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Depth perception from mono camera#18

feat: Depth perception from mono camera#18
MJohnson459 merged 31 commits into
mainfrom
l1-depth-perception

MJohnson459 commented Jun 29, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jul 4, 2026

Uh oh!

cursor Bot commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

MJohnson459 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

See it (live, on the robot)

Review guide

Design choices

Measured, not asserted

Review hardening (later commits)

Risk / rollout

Remaining gate

Follow-ups

Uh oh!

cursor Bot commented Jul 4, 2026

Uh oh!

cursor Bot commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MJohnson459 commented Jun 29, 2026 •

edited

Loading