fix: resolve camera preview and export pipeline issues#1679
fix: resolve camera preview and export pipeline issues#1679richiemcilroy merged 3 commits intoCapSoftware:mainfrom
Conversation
| let layout = infer_camera_buffer_layout( | ||
| frame.pixel_format, | ||
| frame.width as usize, | ||
| frame.height as usize, | ||
| original_data.len(), | ||
| ) | ||
| .ok_or_else(|| { | ||
| windows::core::Error::new( | ||
| windows::core::HRESULT(-1), | ||
| format!( | ||
| "Failed to infer camera buffer layout for {:?} {}x{} ({} bytes)", | ||
| frame.pixel_format, | ||
| frame.width, | ||
| frame.height, | ||
| original_data.len() | ||
| ), | ||
| ) | ||
| })?; |
There was a problem hiding this comment.
MJPEG cameras will now hard-fail in
upload_mf_buffer_to_texture
infer_camera_buffer_layout always returns None for PixelFormat::MJPEG and PixelFormat::H264, so this function will now return an error immediately for any camera that streams in MJPEG format. Before this PR, the function continued (incorrectly, producing garbage), but now it fails outright.
camera_frame_to_ffmpeg correctly short-circuits MJPEG at the top of the function before reaching the layout inference. upload_mf_buffer_to_texture has no equivalent guard.
If any MJPEG cameras are used for the live preview path (which goes through texture upload), the preview will break silently. Consider adding a similar early-return guard:
// MJPEG/H264 frames cannot be uploaded as raw pixel textures
if matches!(
frame.pixel_format,
cap_camera_windows::PixelFormat::MJPEG | cap_camera_windows::PixelFormat::H264
) {
return Err(windows::core::Error::new(
windows::core::HRESULT(-1),
"MJPEG/H264 cannot be uploaded as a raw texture".to_string(),
));
}If MJPEG was already broken in this path before (the old bytes_per_pixel = _ => 2 was clearly wrong for compressed data), this is acceptable fail-fast behavior — but it should be documented explicitly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1850-1867
Comment:
**MJPEG cameras will now hard-fail in `upload_mf_buffer_to_texture`**
`infer_camera_buffer_layout` always returns `None` for `PixelFormat::MJPEG` and `PixelFormat::H264`, so this function will now return an error immediately for any camera that streams in MJPEG format. Before this PR, the function continued (incorrectly, producing garbage), but now it fails outright.
`camera_frame_to_ffmpeg` correctly short-circuits MJPEG at the top of the function before reaching the layout inference. `upload_mf_buffer_to_texture` has no equivalent guard.
If any MJPEG cameras are used for the live preview path (which goes through texture upload), the preview will break silently. Consider adding a similar early-return guard:
```rust
// MJPEG/H264 frames cannot be uploaded as raw pixel textures
if matches!(
frame.pixel_format,
cap_camera_windows::PixelFormat::MJPEG | cap_camera_windows::PixelFormat::H264
) {
return Err(windows::core::Error::new(
windows::core::HRESULT(-1),
"MJPEG/H264 cannot be uploaded as a raw texture".to_string(),
));
}
```
If MJPEG was already broken in this path before (the old `bytes_per_pixel = _ => 2` was clearly wrong for compressed data), this is acceptable fail-fast behavior — but it should be documented explicitly.
How can I resolve this? If you propose a fix, please make it concise.| fn infer_camera_buffer_layout( | ||
| pixel_format: cap_camera_windows::PixelFormat, | ||
| width: usize, | ||
| height: usize, | ||
| data_len: usize, | ||
| ) -> Option<CameraBufferLayout> { | ||
| use cap_camera_windows::PixelFormat; | ||
|
|
||
| if width == 0 || height == 0 || data_len == 0 { | ||
| return None; | ||
| } | ||
|
|
||
| let total_rows = match pixel_format { | ||
| PixelFormat::NV12 | ||
| | PixelFormat::NV21 | ||
| | PixelFormat::YUV420P | ||
| | PixelFormat::YV12 | ||
| | PixelFormat::P010 => height + (height / 2), | ||
| PixelFormat::YUYV422 | ||
| | PixelFormat::UYVY422 | ||
| | PixelFormat::GRAY16 | ||
| | PixelFormat::RGB565 | ||
| | PixelFormat::ARGB | ||
| | PixelFormat::RGB32 | ||
| | PixelFormat::RGB24 | ||
| | PixelFormat::BGR24 | ||
| | PixelFormat::GRAY8 => height, | ||
| PixelFormat::MJPEG | PixelFormat::H264 => return None, | ||
| }; | ||
|
|
||
| if total_rows == 0 || data_len % total_rows != 0 { | ||
| return None; | ||
| } | ||
|
|
||
| let primary_stride = data_len / total_rows; | ||
|
|
||
| match pixel_format { | ||
| PixelFormat::NV12 | PixelFormat::NV21 => { |
There was a problem hiding this comment.
Stride inference may fail for devices with asymmetric UV plane alignment
infer_camera_buffer_layout derives primary_stride from data_len / total_rows, which assumes both the luma and chroma planes in planar formats (YUV420P, YV12) use strides consistent with a single stride value. Specifically, for YUV420P it assumes secondary_stride = primary_stride / 2.
This assumption holds for the vast majority of Windows camera drivers (MF aligns all planes to the same width multiple), but a device that provides a Y plane with stride 1936 and U/V planes with stride 1024 (a different alignment unit) would fail the data_len % total_rows != 0 check and cause the frame to be dropped with an error.
The resulting error is logged clearly, so failed frames are visible. This is a reasonable trade-off given that such cameras are rare, but it's worth noting as a known limitation of the stride inference approach.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/win.rs
Line: 1149-1186
Comment:
**Stride inference may fail for devices with asymmetric UV plane alignment**
`infer_camera_buffer_layout` derives `primary_stride` from `data_len / total_rows`, which assumes both the luma and chroma planes in planar formats (YUV420P, YV12) use strides consistent with a single stride value. Specifically, for YUV420P it assumes `secondary_stride = primary_stride / 2`.
This assumption holds for the vast majority of Windows camera drivers (MF aligns all planes to the same width multiple), but a device that provides a Y plane with stride 1936 and U/V planes with stride 1024 (a different alignment unit) would fail the `data_len % total_rows != 0` check and cause the frame to be dropped with an error.
The resulting error is logged clearly, so failed frames are visible. This is a reasonable trade-off given that such cameras are rare, but it's worth noting as a known limitation of the stride inference approach.
How can I resolve this? If you propose a fix, please make it concise.| if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts()) | ||
| && pts < dts | ||
| { | ||
| self.packet.set_pts(Some(dts)); | ||
| } |
There was a problem hiding this comment.
Second
pts < dts guard is correct but worth a comment
The existing DTS monotonicity block (lines 76–86) already adjusts PTS when DTS is bumped. This new block catches the distinct case where the encoder emits a packet with pts < dts without any DTS ordering violation (i.e., the first block didn't fire). This is a real scenario with libx264 at certain configurations.
A short comment would help future readers understand why two separate guards exist:
// Secondary guard: encoder may emit pts < dts even when DTS is
// already monotonic (e.g. during stream init or B-frame drain).
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
&& pts < dts
{
self.packet.set_pts(Some(dts));
}Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/enc-ffmpeg/src/base.rs
Line: 88-92
Comment:
**Second `pts < dts` guard is correct but worth a comment**
The existing DTS monotonicity block (lines 76–86) already adjusts PTS when DTS is bumped. This new block catches the distinct case where the encoder emits a packet with `pts < dts` without any DTS ordering violation (i.e., the first block didn't fire). This is a real scenario with libx264 at certain configurations.
A short comment would help future readers understand why two separate guards exist:
```rust
// Secondary guard: encoder may emit pts < dts even when DTS is
// already monotonic (e.g. during stream init or B-frame drain).
if let (Some(pts), Some(dts)) = (self.packet.pts(), self.packet.dts())
&& pts < dts
{
self.packet.set_pts(Some(dts));
}
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
Hey @hidumou, was this an issue you have personally experienced? |
|
@greptileai please re-review the PR |
Summary
This PR fixes three issues in the desktop recording/export pipeline:
Failed to export recording: Encode: Invalid argumentbecause encoded packets could reach the MP4 muxer with invalid timestamp ordering (pts < dts).Changes
pts < dts.Root Cause
For the 4K60 export failure, logs showed non-monotonic timestamp ordering during software H.264 export. The encoder/muxer path could produce packets where
ptsbecame smaller thandts, which causes FFmpeg/MP4 muxing to fail withInvalid argument.For the Windows camera artifact issue, camera frames were previously treated as tightly packed buffers, but real device buffers can contain row padding/stride. That caused corrupted image interpretation in export/output paths.
Validation
cargo fmt --allcargo check -p cap-enc-ffmpeg -p cap-exportcargo check --manifest-path apps/desktop/src-tauri/Cargo.tomlAdditionally, the failing local 4K60 export project was re-run and progressed well past the previous early failure range without immediately reproducing
Encode: Invalid argument.Greptile Summary
This PR fixes three distinct bugs in the recording/export pipeline: the camera preview not re-opening when the same camera device is re-selected, Windows camera export artifacts caused by ignoring device buffer row-padding (stride), and 4K60 export failures due to
pts < dtstimestamp ordering violations.Key changes:
lib.rs: AddsCapWindowId::Camera.get(&app_handle).is_none()to the early-return branch so the preview window is re-shown only when it's actually closed; also extractsskip_camera_window.unwrap_or(false)to a single binding to eliminate repeated unwraps.base.rs: Introduces a secondary guard that clampsptstodtswhen the encoder emits a packet withpts < dtsindependently of any DTS ordering violation — a valid scenario at stream initialisation or during B-frame drain.h264.rs: Setsbf=0andb-adapt=0in the libx264HighThroughputsoftware path, removing B-frames as the primary cause ofpts < dtsduring 4K60 export.win.rs: Introducesinfer_camera_buffer_layout, which derivesprimary_strideandsecondary_stridefrom the actual buffer length, and threads both values throughcopy_plane,flip_camera_buffer_into,camera_frame_to_ffmpeg, andupload_mf_buffer_to_texture. All format branches (NV12, NV21, YUYV422, UYVY422, YUV420P, YV12, P010, RGB/BGR variants) now correctly skip padding bytes on both the FFmpeg conversion and D3D11 texture upload paths.Confidence Score: 5/5
Safe to merge — all changes are targeted bug fixes with correct implementations and no regressions identified.
All three bug fixes are logically sound: the camera window guard is correct, the pts<dts clamp is a standard muxer safety net, and the stride-aware buffer handling is comprehensively applied across every format branch. The two edge-case concerns raised in prior review threads are pre-existing limitations with clear error logging, not new regressions. No P0 or P1 issues remain.
crates/recording/src/output_pipeline/win.rs — largest and most complex change; worth a final read-through of the NV21 UV swap loop and the YUV420P V-plane offset arithmetic.
Important Files Changed
Reviews (2): Last reviewed commit: "fmt" | Re-trigger Greptile