feat: Depth perception from mono camera#18
Merged
Conversation
Monocular obstacle-detection spike under mote_perception (offline; no ROS nodes or deps added yet): - ground_projection.py: shared pixel<->floor geometry (camera->base via static TF) - free_space.py: classical appearance floor segmentation (spike — fast but false-positive prone under variable lighting) - depth_rescale.py: robust per-frame metric rescaling of learned mono-depth against the known floor plane (RANSAC affine-in-disparity) — the chosen L1 direction; inlier fraction gates seed contamination - tools/: offline bag harnesses (geometry overlay, classical/BEV/depth eval, segmentation video) for evaluating approaches against recorded bags Depth + floor-rescale gives ~0.19 m median range vs lidar and is lighting-robust; findings drove the decision to pursue learned depth off-board. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The L1 obstacle pipeline as two processes so torch stays out of the ROS/robot env: - tools/depth_server.py: keeps Depth Anything V2 (metric, indoor) resident and serves depth over a socket. Runs in a throwaway torch venv on the workstation. - depth_obstacle_node (rclpy, no torch; runs anywhere): forwards each compressed frame to the server, metrically rescales the returned depth against the known floor plane (depth_rescale), back-projects, keeps points above z_obstacle (default 0.02 m), and publishes /camera_obstacles. The cloud is stamped at image capture time so Nav2 places it via tf at the moment it was seen — how the off-board (~0.6 s, inference-bound) latency is absorbed without inflation. Lidar stays the primary, low-latency obstacle/clearing source; this is a supplementary marker for the low/thin things the 2D scan misses. z_obstacle=0.02 chosen from a floor-noise sweep across the bag: floor height noise is ~1.5 cm p99, so <1.5 cm false-positives on the floor; 2 cm is clean and the lidar already covers >=6 cm. depth_obstacles.py gains an obstacle-tint overlay. Validated end-to-end against a recorded bag via ros2 bag play. README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The workstation depth task runs in the default/ROS env, so the nested `pixi run depth-server` inherited a PYTHONPATH pointing at the ROS Python 3.12 site-packages, and the depth env's Python 3.14 then loaded those incompatible numpy C-extensions. Drop PYTHONPATH for the server child only; the ROS node still needs it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Diagnostics for the L1 monocular-depth obstacle pipeline: - depth_obstacle_node gains a publish_debug param (default true) that publishes the rescaled metric depth as a 32FC1 Image (/camera_depth) and the unfiltered, floor-inclusive cloud (/camera_cloud_full) for geometry checks. - mote.rviz: Camera Obstacles + Camera Cloud (full) PointCloud2 displays (AxisColor by height) and a Depth image display. - tools/measure_camera_pitch.py: lays the calibration checkerboard on the floor and solvePnPs its plane to read the camera's pitch/roll/height relative to the floor it sits on, folding in chassis tilt and local floor slope. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The floor-plane rescale fits a narrow near-floor band and extrapolates to far walls, and depends on the camera->floor angle, which was measured to wander ~1.5 deg across rest positions (floor slope + how the robot rests) — so the obstacle cloud over-ranged past the walls even stationary. Lidar gives metric range on the walls themselves through the body-fixed lidar->camera transform, which is immune to chassis/floor tilt. LidarDepthRescaler matches each scan return to its camera pixel, samples the model depth there, and fits the shared affine-in-disparity correction on those pairs. The node buffers scans and matches the one nearest the image *capture* stamp (absorbing the ~0.6 s off-board latency). rescale_source = auto|lidar|floor; auto holds the last good lidar (a,b) when a scan can't constrain it rather than falling back to the floor fit it replaces. Logs scale source, pair count, and scan dt. Scale only; the cloud is still back-projected through the level-URDF transform, so residual pitch can still skew floor-point z-classification — a follow-up plane-fit on the lidar-scaled floor points will recover that. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The single-threaded executor was monopolized by the ~0.5 s blocking inference in _on_image, so _on_scan was starved, the scan buffer went stale, and no scan landed within scan_max_dt of an image's capture stamp — the node fit lidar once at startup then held that one (bad) (a,b) forever. Run on a MultiThreadedExecutor with the scan subscription in its own callback group so scans keep buffering during inference; snapshot the deque when matching (read on the image thread, appended on the scan thread). Diagnostics for chasing fit quality: log scan-buffer depth, matched dt, pair count, and the fitted (a,b); publish the raw pre-rescale model depth (/camera_depth_raw) next to the rescaled one to separate model noise from a runaway rescale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Reconcile the lock with the merged pixi.toml so it carries both the depth env (rebased) and the bag-recorder deps (from main #17). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
66a032a to
d6c1feb
Compare
Offline replay of the recorded bags (depth server + the real pairing/fit) showed the raw model depth is clean every frame, but the lidar affine fit was RANSAC- bistable: the scan's returns include a large cluster at one range (a wall filling the view) that supports a near-flat degenerate line (slope ~0, depth inverted) with inlier support equal to the true line, so the unconstrained fit flipped between them frame to frame -- ~40% of frames collapsed to inverted/exploded depth, which is the noise. Good fits had a in [1.6, 3.1]; degenerate ones a in [-0.32, 0.10]. fit_affine_disparity takes optional a_min/disp_floor: when set, RANSAC only scores physically valid models (slope >= a_min, and a*disp_floor + b > 0 so corrected disparity can't blow up), and keeps the valid seed if the least-squares refit drifts off it. Default stays unconstrained, so the floor path is untouched (verified: exact recovery). LidarDepthRescaler passes a_min=0.5 and rejects a residual invalid fit so the node holds last-good. On both bags this takes degenerate frames 19/40 and 15/40 -> 0/40 while leaving the good-frame inliers (69%) and median depth (0.64 m) exactly as the baseline; a pure-scale (b=0) alternative also reached 0 but shifted the median to 0.74 m, so the 2-DOF constraint won on accuracy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Replays a recorded bag through the depth server + the real lidar pairing/fit and reports per-frame raw depth, pair count/spread, fitted (a,b)/inliers, and rescaled depth, saving colorized raw/corrected maps. A collapsed fit prints DEGENERATE. This is the rig that localised the RANSAC bistability; keep it for analysing future 'depth goes to noise' bags without the robot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
After the a<0 constraint, the cloud still flickered on a static scene: the fit was bimodal between two physically valid lines (a~2.5 median depth 0.58 m, and a~1.1 median 0.95 m) with near-equal inlier support, and the count-RANSAC flipped between them frame to frame. Offline replay showed the pairs actually follow one line (near-constant local slope ~1.9); the bug is the inlier-*count* objective, which is multimodal when the scatter is two-sided -- a steeper or shallower line catches the same count. Fit the lidar pairs with Theil-Sen (median of pairwise slopes, then median intercept) -- a unique, deterministic central estimate. Across the three bags this cuts the per-frame median-depth std from 0.18 m to 0.04-0.09 m and a-std from 0.66 to 0.14-0.22, with the mean unchanged, so it's stabilizing, not biasing. Theil-Sen is naturally positive (no inverted line) with intercept ~0 (no blow-up), so the a_min constraint added last commit is now redundant -- kept only as a defensive reject for a pathological scan, and the in-RANSAC constraint is reverted. The floor seed keeps count-RANSAC (its one-sided obstacle rejection is a different need; exact recovery verified). No temporal smoothing -- Theil-Sen is stable per-frame with zero lag, which EMA would trade away under motion. Still scale-only and not yet validated on a moving/on-robot bag (occlusion-edge parallax can exceed Theil-Sen's ~29% breakdown); the cloud is also still the full image (the Phase-2 over-range/plane-fit gap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Model-agnostic eval that talks to whatever depth server is on --port (V2 now, a V3 server later), so models are compared by pointing it at each. Per sampled frame it measures, against the time-nearest lidar scan: held-out AbsRel/RMSE/delta1 after the best affine alignment (the model's in-band shape/scale fidelity, comparable across models) and server round-trip latency. Saves three views to inspect by eye -- the depth map with lidar returns overprinted, a side elevation (range vs height) that shows whether vertical edges lean into the distance, and a top-down BEV. V2 baseline on bag 20260630_103318: ~470 ms/frame (CPU), AbsRel 0.231, delta1 57.5%, and the side view shows the scene ramping up with range -- i.e. the slant is largely in the depth itself, motivating the V2-vs-V3 comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
depth_server takes --model and --disparity so the eval harness can compare models (metric outputs depth; relative/SSI models output disparity -> invert to depth, the pipeline refits scale either way). depth_bag_replay was still calling fit_affine_disparity(a_min=...), removed when the lidar path moved to Theil-Sen -- point it at fit_affine_disparity_theilsen. Finding (rescale-anyway pipeline, 3 bags): relative V2-Small beats V2-Metric-Indoor on every bag -- delta1 90.6/74.3/76.9% vs 57.5/70.9/72.7%, lower AbsRel/RMSE, ~40 ms faster, and a stable non-bimodal fit. The metric model's absolute scale is discarded by our per-frame affine, so it buys nothing and is worse-conditioned. The slant persists across both, so it is geometric (back-projection/pitch), not the model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
depth_server_da3.py serves Depth Anything 3 over the same socket protocol so the rig compares it like any other model. DA3 needs Python<=3.13 (the pixi depth env is 3.14), so it runs in its own uv venv (documented in the file); its export path pulls heavy 3D/video deps (open3d/moviepy/pycolmap) that single-image depth never uses, so those are stubbed at import -- the actual install is a venv + CPU torch + a few small deps. Takes --model and --intrinsics (metric variants use a canonical-focal transform). depth_bag_eval now reports raw (no-rescale) AND affine-aligned accuracy, so 'does a metric model need rescaling' is answerable directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The node refits a full affine in disparity against lidar every frame, so a metric model's absolute scale is discarded; measured over 3 bags, relative V2-Small beats V2-Metric-Indoor on accuracy (aligned delta1 ~91 vs ~57% on one bag, better on all), is faster, and gives a stable non-bimodal fit. Make it the default MODEL. Relative models output disparity, so invert to depth by default; --metric passes a metric model through unchanged. Bare `pixi run depth-server` (used by depth_workstation.sh) now serves the relative model -- no node change needed. Note: swapping the model id alone (without the inversion) yields a garbled depth map, since the node expects depth and the relative model emits disparity -- the inversion is the missing piece. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Pack each point's camera pixel into the camera_obstacles and camera_cloud_full clouds as an RGB8 field and switch the RViz displays to RGB8, so the depth cloud renders in real camera colour rather than a Z-axis gradient. Make depth_obstacle_node lazy: skip the off-board inference and all downstream work unless an output is subscribed, gating each stage (raw and corrected depth images, full cloud, obstacle cloud) on its own subscription count. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
Fit the floor plane (z = a*x + b*y + c, RANSAC over near-floor points) each frame and rotate the whole cloud about the camera origin so the floor is level. This removes the residual camera tilt the level URDF transform misses (dynamic pitch / floor slope), which previously left a range-dependent z-ramp: floor points drifted above the obstacle gate (false positives) and verticals leaned into the distance. It is a rotation, not a z-only shear: the lean is a rotation of the cloud about the camera, so it sits in x as well as z; rotating the normal onto +z straightens verticals and flattens the floor together. Over-large or unconstrained fits are rejected and the last good rotation held, so the cloud can't snap flat<->tilted frame to frame. Add ground_projection.fit_ground_plane + level_rotation (shared, pure), a plane_fit toggle, and a plane-levelled side view to depth_bag_eval for before/after inspection. Validated on a synthetic known-tilt cloud (exact recovery) and a recorded bag (1-3 deg recovered, floor flat after). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JV1CtqERcMH4YB1gQLNDGS
The lidar affine slope a maps the model's disparity onto the true (lidar) disparity, so its magnitude absorbs the model's arbitrary disparity units. The pipeline default is now the relative (SSI) V2-Small model, whose disparity scale puts a valid fit near 0.25-0.5 -- an order of magnitude below the metric model a_min=0.5 was picked for. That threshold rejected every valid relative-model fit, so the node never obtained a lidar scale and silently fell back to the floor fit (the over-ranging path lidar anchoring exists to replace). Drop a_min to 0.05 so it guards only the sign/near-flat degeneracy (a <= ~0) it was meant to, independent of the model's disparity scale. Verified on recorded bags: valid fits land a=0.25-0.47 (corrected depth AbsRel 0.13-0.33, delta1 79-92% vs lidar); with a_min=0.5 all were rejected, with 0.05 all pass and the cloud is lidar-anchored again. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Wire the /camera_obstacles cloud into a dedicated VoxelLayer on the local costmap, so the low/near things the single lidar plane misses (cables, thresholds, chair/table legs, a clothes-horse cross-bar) actually stop the robot. Until now the cloud was produced but nothing consumed it. - Clamp the published cloud to the mount's validated near band by dropping the node range_max default 3.0 -> 1.2 m. Past ~1.2 m the monocular depth compresses into false positives; the near band is both where this layer is trusted and what the goal cares about. - Add camera_layer (VoxelLayer) to the local costmap only, separate from the lidar obstacle_layer so the camera can never clear a lidar mark and a laggy source never touches the global plan. It marks and clears from its own dense observations. sensor_frame pins the clearing-ray origin to the camera height (cloud stays leveled base_footprint) so rays descend onto the floor instead of rising through the low-obstacle band. Offline validation on the recorded low-obstacle bags: the near band (<=1.2 m) detects the dumbbell, clothes-horse legs, lamp base and chair legs while clean floor marks ~0 points. Live nav2/robot behaviour -- including standstill decay of any phantom over open floor -- is the documented remaining gate (see README caveat). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
All validation so far is offline against recorded bags. Record the workstation+Pi co-run, the ordered live checks (camera_layer actually marks / off-board latency vs transform_tolerance / drive past a low obstacle / motion-only FP from a stale held level rotation), and note the bright-glare-floor FP result, so the first robot session doesn't rediscover them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
The camera obstacle band had no meaningful upper bound (marked up to 0.8 m), so a chair seat or tabletop the robot fits beneath was marked as an obstacle and the robot would refuse to path under it. Cap the obstacle band at the robot's height plus a margin. Because the camera layer is a 3D VoxelLayer, capping the marking height means the legs of the furniture (which reach the floor) still mark while the seat or top above the cap does not -- so the robot avoids the legs but drives through the clear gap between them. - Nav2 camera_layer max_obstacle_height 0.8 -> 0.18 m (the authoritative gate: current ~0.13 m chassis + margin), voxel grid retuned to a 0.20 m top (z_resolution 0.025, z_voxels 8). Documented how to reconfigure for the ~0.30 m arm build (raise gate + grid; the benefit shrinks there). - Node: split the obstacle-cloud ceiling (new z_obstacle_max, default 0.5 m) from the debug-cloud ceiling (z_ceiling stays 1.6 m), so it stops streaming points Nav2 discards while the full debug view is unaffected. Verified offline (bag 145232 f1532): marks drop 26984 -> 5060 between an 0.8 m and 0.15 m cap, keeping the floor-reaching chair legs while the seats/backs above the cap stop marking (go_under_height_gate.png). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Inference is slower than the camera frame rate, so a deeper image queue only ever feeds stale frames. Depth 1 drops the backlog and keeps the published obstacle cloud as fresh as the pipeline allows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
The README's pipeline paragraph still described the old metric model + floor-plane rescale; update it to the relative model with lidar-anchored scale, and add live output images. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GLZtYP7Uf7uZd7XTh2XZnW
Make every degenerate path fail safe and loud instead of publishing a wrong or silently-empty obstacle cloud: - rescale_source:=lidar no longer falls back to the floor fit when the lidar is unavailable; it skips the frame as the forced mode promises. - An empty or degenerate floor seed returns inlier fraction 0 (frame rejected with a warning) instead of a maximally-confident garbage fit that published empty clouds forever. - _ground_correct now subtracts the fitted floor offset c after leveling, so a depth-scale error can't shift the whole floor across the 2 cm obstacle threshold. - Lidar returns outside the pinhole FOV are rejected before projection; the distortion polynomial could fold far-off-axis points back into image bounds and bias the Theil-Sen scale fit. - The depth map is resized to the camera_info resolution before any consumer indexes it, and a corrupt color frame is skipped instead of crashing the callback. - The server replies with an H=0,W=0 sentinel for a frame it cannot process and survives decode/inference/socket errors; previously one corrupt JPEG killed the process for the rest of the mission. The node validates msg.format (image_transport style, e.g. "rgb8; jpeg compressed bgr8") before sending and handles the sentinel without waiting out the socket timeout. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
depth_obstacles.py, bag_overlay.py, and bev_motion.py hardcoded output paths under an ephemeral per-session scratch directory that no longer exists, so the committed validation harness could not reproduce its results. depth_obstacles.py gains argparse with ~/.mote defaults; the other two derive their output directory from the bag path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The pipeline committed to relative Depth Anything V2-Small (measured both more accurate and faster); the DA3 server needed its own uv venv on Python <= 3.13 plus import stubs for unused 3D/video deps, and duplicated the whole serve loop. depth_bag_eval can still compare models by pointing at any server that speaks the wire protocol. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
self.grid was stored but never read after building the undistorted rays; pixels_to_ground NaN-ed invalid points twice; sensor_msgs_py was declared in package.xml but never imported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
4 tasks
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
Review feedback: the code had grown as self-contained files with the common pieces copy-pasted, the floor-plane scale anchor was still wired in despite testing showing it unreliable (floor gradients and resting pitch shift it), and the socket protocol existed only as inline code in three places. - New depth_wire.py: the wire protocol in one place — the spec, the framing helpers used by the server, a DepthClient (persistent connection, reconnect, sentinel handling) used by the node and the offline tools, and the rationale for a hand-rolled protocol over gRPC/ROS for this link. The node sheds all connection code. - Floor-plane rescale removed (depth_rescale.py deleted): the node is lidar-anchored only — fit, else hold the last good correction, else skip the frame loudly. One scale path to debug. The Theil-Sen fit and the disparity-affine apply move into lidar_rescale.py; the rescale_source parameter goes away. - GroundProjector gains cached pixel_rays() and back_project(), the one implementation of back-projection; the node, depth_bag_eval, and depth_obstacles all used private copies (depth_obstacles' copy had drifted: it ignored the distortion coefficients). - tools/bag_utils.py: shared bag loading, base transforms, scan matching, colorize — previously four private copies of the rosbag2 boilerplate. - depth_obstacles.py rewritten server-based on the shared modules: same stages and gates as the live node, BEV with both point sets in base_footprint. bag_overlay.py's lidar overlay now transforms through /tf_static — the scan frame is yawed 90 deg from base, so plotting raw scan coordinates was wrong, not approximate. - bev_motion.py deleted: the classical motion-parallax prototype the learned-depth approach replaced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The camera-vs-lidar BEV image was rendered by an ad-hoc script that was never committed, and the lidar in it does not show a doorway that should be visible — consistent with plotting raw scan coordinates without the scan->base transform (the scan frame is yawed 90 deg from base). Removed rather than defended; depth_obstacles.py is now the committed, frame-correct generator for a replacement once regenerated against a live bag. The README pipeline section drops the floor-fallback description, points at depth_wire.py for the protocol, and gains a tools inventory. CLAUDE.md's mote_perception section now describes the L1 pipeline instead of stopping at L0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
The robot navigates on a single 2D lidar plane ~13 cm up; everything below it (cables, thresholds, chair/table legs, a robot vacuum) is invisible — exactly what a low robot snags on. This PR turns the existing RGB camera into a
PointCloud2of those low/thin obstacles (/camera_obstacles) via off-board monocular depth, feeding a dedicated Nav2 layer. The lidar stays the primary, fast obstacle+clearing source; this is an additive near-band marker that can never clear a lidar hit — worst case is a spurious mark, never a missed wall.See it (live, on the robot)
Detection (right) vs. raw camera (left). The stool legs, the bin, and a transparent storage box all mark — the transparent box is something the lidar sees straight through. The open floor stays clean: no false positives.
Go-under height gate. Green (≤ 0.18 m) marks, so Nav2 avoids the legs; red is above the robot's height and passable — it paths through the gap under a seat or tabletop instead of treating the whole object as a wall.
(A third camera-vs-lidar BEV figure was removed in review: it was rendered by an uncommitted script and showed the lidar misaligned — consistent with plotting raw scan coordinates without the scan→base transform, which is yawed 90°.
tools/depth_obstacles.pyis now the committed, frame-correct generator for its replacement.)Review guide
Suggested reading order — integration first, then the pipeline core, then supporting pieces:
mote_bringup/config/nav2_params.yaml— the Nav2 integration: a self-clearingVoxelLayeron the local costmap only, separate from the lidar layer, near-band (≤ 1.2 m),max_obstacle_height: 0.18go-under gate. The comments carry the design rationale; this file is where a config mistake would matter most.mote_perception/mote_perception/depth_obstacle_node.py— the robot-side node (torch-free): compressed image → depth server → lidar-anchored metric rescale → back-project → ground-level → z/range gates →PointCloud2. Lazy: does zero work (including inference) unless an output is subscribed.mote_perception/mote_perception/depth_wire.py— the server↔node wire protocol in one shared module: spec, framing helpers,DepthClient, and the rationale for a hand-rolled length-prefixed TCP protocol over gRPC/ROS for this link.mote_perception/mote_perception/lidar_rescale.py— the metric-scale core: per-frame Theil-Sen affine-in-disparity fit anchored to lidar range returns; holds the last good fit when a scan can't constrain it. The module docstring explains the estimator choice (and why the earlier floor-plane anchor was removed).mote_perception/mote_perception/ground_projection.py— camera geometry:GroundProjector(shared back-projection viapixel_rays/back_project), floor-plane fit, cloud leveling.mote_perception/tools/depth_server.py+depth_workstation.sh+ thedepthpixi feature — the off-board torch server (own env,no-default-feature, never touches the robot/Pi solve) and the one-command bring-up (pixi run depth).mote_perception/tools/— offline bag harnesses on sharedbag_utils.py:depth_bag_replay(pipeline replay + fit diagnostics),depth_bag_eval(model accuracy/speed vs lidar),depth_obstacles(decision-level overlay + BEV),bag_overlay(geometry sanity),measure_camera_pitch(mount calibration). Dev-only, lower review priority.mote.rviz,README.mds,package.xml/setup.py— wiring and docs.Design choices
depth_bag_evalcan still compare any server speaking the wire protocol.)depth_wire.pydocstring has the trade-off against ROS/gRPC/HTTP).Measured, not asserted
Review hardening (later commits)
Two review passes are folded into the branch:
3b8f406,f9845ee,c2ec883,5cade87): forced-lidar mode no longer silently falls back; degenerate fits reject the frame with a warning instead of publishing confident garbage; the floor offset is subtracted after leveling; off-FOV lidar returns can't fold back through the distortion polynomial and bias the fit; one corrupt JPEG no longer kills the inference server; the offline tools run from a clean checkout.6dad7b5,200de85): the wire protocol moved into one shared module (depth_wire.py) with aDepthClientused everywhere; the unreliable floor-plane scale anchor was deleted outright; back-projection has one implementation (GroundProjector.back_project— one tool's private copy had drifted and ignored distortion); the rosbag2 boilerplate collapsed intotools/bag_utils.py;bag_overlay's lidar overlay now transforms through/tf_static(the scan frame is yawed 90° from base — skipping it was wrong, not approximate); the superseded motion-parallax prototype (bev_motion.py) and the misleading BEV figure were removed.Risk / rollout
Additive and isolated: local costmap only, near-band, a separate layer that can't clear lidar marks, self-clearing from its own dense frames. Merging changes nothing until
pixi run depthis running. Known caveat (documented in the layer config): a phantom mark over open floor with nothing behind it in range gets no clearing ray until the rolling window scrolls past; measured ≈ 0 on clean floor, andspatio_temporal_voxel_layeris the swap-in if it shows up live.Remaining gate
Developed and validated offline against recorded bags (
tools/depth_bag_eval.pyaccuracy/speed,tools/depth_obstacles.pydecision overlays vs lidar) plus the live cross-network run shown above. The one open gate is a live driving run confirming the controller avoids a marked low obstacle (tracked as a follow-up).Follow-ups
v4l2_camerawithusb_cam(framerate + MJPEG offload).tools/depth_obstacles.pyfrom a live bag.torch.set_num_threads, raise the nodesocket_timeout(deepen the scan buffer in tandem).