Skip to content

Keyboard and captions tracks#1615

Merged
richiemcilroy merged 82 commits intomainfrom
cursor/keyboard-and-captions-tracks-8d45
Mar 25, 2026
Merged

Keyboard and captions tracks#1615
richiemcilroy merged 82 commits intomainfrom
cursor/keyboard-and-captions-tracks-8d45

Conversation

@richiemcilroy
Copy link
Member

@richiemcilroy richiemcilroy commented Feb 18, 2026

Adds keyboard press recording and display, and migrates captions and keyboard presses to interactive timeline tracks for improved editing and visualization.


Open in Cursor Open in Web

Greptile Summary

This PR adds keyboard press recording and display as a new feature, and migrates both captions and keyboard events into interactive timeline tracks — a significant UX improvement over the previous static sidebar-only controls.

Key changes:

  • New crates/project/src/keyboard.rs: Binary (magic-prefixed) + JSON-fallback serialization for KeyboardEvents, and group_key_events which converts raw key-press events into timeline segments with modifier-prefix support, Backspace handling, and command-shortcut grouping. Includes a solid unit test suite.
  • New crates/rendering/src/layers/keyboard.rs: GPU keyboard overlay renderer using glyphon for text and a dedicated WGPU pipeline for rounded backgrounds; handles fade, bounce animation, and caption-collision avoidance.
  • New CaptionsTrack / KeyboardTrack timeline components: Interactive drag-resize / drag-move / split / multi-select track rows, wired into the main Timeline component.
  • New KeyboardTab sidebar: Full font, color, position, animation, and behavior settings for keyboard overlays; includes a "Generate Keyboard Segments" button.
  • Auto-initialization: On first editor load, if recorded keyboard events exist and no segments have been generated yet, segments are auto-generated with default settings.
  • general_settings.rs: Adds capture_keyboard_events (default true) and transcription_hints fields. The default hints include placeholder example values ("My Brand Name", "mywebsite.com") that will be passed to Whisper for every new user.
  • KeyboardTab.tsx: generateSegments() silently does nothing when the backend returns 0 segments, giving no user feedback — unlike the caption generation path which emits a descriptive toast.
  • context.ts: timelineBounds is destructured but unused in both CaptionsTrack and KeyboardTrack.

Confidence Score: 4/5

  • PR is on the happy path to merge; two targeted fixes remain before shipping to production users.
  • The core keyboard recording, rendering, and timeline interaction are well-implemented with good test coverage. Prior concerns raised in earlier review rounds (duplicate split IDs, empty Backspace segments, missing keyboardSegments in fallback initializer, caption track enabled on failure) are all addressed. Two new issues remain: (1) generateSegments in KeyboardTab.tsx gives no user feedback on an empty result — surprising since the caption path does — and (2) the default transcription_hints contains placeholder strings that will bias Whisper for every new user. Neither is a crash-level bug, but (2) in particular is a data-quality issue that will silently affect all new installations.
  • apps/desktop/src/routes/editor/KeyboardTab.tsx (silent failure on empty generation) and apps/desktop/src-tauri/src/general_settings.rs (placeholder transcription hints as defaults).

Important Files Changed

Filename Overview
crates/project/src/keyboard.rs New module: keyboard event parsing, binary/JSON serialization, and group_key_events that groups raw key presses into timeline segments. Thorough unit tests included; known Space-key display issue (key stored as " " rather than "Space") is already tracked in prior review threads.
crates/rendering/src/layers/keyboard.rs New GPU rendering layer for keyboard overlays using glyphon for text and a WGPU pipeline for rounded background. Handles fade, bounce animation, position resolution, and caption-collision avoidance. Logic looks solid; re-uses the existing caption background shader.
apps/desktop/src/routes/editor/Timeline/index.tsx Adds caption and keyboard track rows, toggle handlers, delete/split keyboard-event dispatch, and the generateCaptionsFromTrack async function with proper toast feedback. captionSegments and keyboardSegments are now included in all timeline initializations.
apps/desktop/src/routes/editor/context.ts Adds splitKeyboardSegment, deleteKeyboardSegments, splitCaptionSegment, deleteCaptionSegments project actions; initializes keyboard segments on first load via an effect; migrates the obsolete "above-captions" position string to "bottom-center". IDs for split segments are correctly unique.
apps/desktop/src/routes/editor/Timeline/CaptionsTrack.tsx New interactive caption timeline track with drag-to-resize and drag-to-move handles, multi-select, split mode, and a fallback "Generate captions" button. timelineBounds destructured but unused.
apps/desktop/src/routes/editor/Timeline/KeyboardTrack.tsx New interactive keyboard timeline track; mirrors CaptionsTrack structure. timelineBounds is destructured but unused (same issue as CaptionsTrack).
apps/desktop/src/routes/editor/KeyboardTab.tsx New sidebar tab for keyboard settings; generateSegments() silently does nothing when 0 segments are returned, giving no user feedback — unlike the caption generation path which emits a toast.
apps/desktop/src-tauri/src/general_settings.rs Adds capture_keyboard_events (default true) and transcription_hints fields. Default hints include placeholder values "My Brand Name" and "mywebsite.com" that will be fed to Whisper for every new user, potentially degrading transcription accuracy.
crates/project/src/configuration.rs Adds CaptionTrackSegment, KeyboardSettings, KeyboardData structs and caption_segments/keyboard_segments fields to TimelineConfiguration. All new fields have #[serde(default)] for backward compatibility.
crates/project/src/meta.rs Adds keyboard path to MultipleSegment, keyboard_events() helper with binary/JSON fallback, and a migration that populates timeline.caption_segments from captions.segments for existing projects without timeline captions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    REC[Recording Session] -->|keyboard capture ON| KB_FILE[keyboard.bin per segment]
    KB_FILE --> META[MultipleSegment.keyboard_events]

    META -->|first editor load| AUTO_INIT{keyboardSegments empty?}
    AUTO_INIT -->|yes| GEN_CMD[generate_keyboard_segments Tauri command]
    AUTO_INIT -->|no| SKIP[skip init]

    GEN_CMD --> GROUP[group_key_events\nRust: threshold, linger, modifiers, special]
    GROUP --> KB_SEGMENTS[timeline.keyboardSegments]

    KB_SEGMENTS --> KB_TRACK[KeyboardTrack\ndrag / split / select]
    KB_TRACK -->|edit| KB_SEGMENTS

    KB_SEGMENTS --> RENDER[keyboard.rs GPU layer\nfade + bounce + position]

    CAPTIONS_DATA[captions.segments] -->|migration in meta.rs| CAP_SEGMENTS[timeline.captionSegments]
    CAP_SEGMENTS --> CAP_TRACK[CaptionsTrack\ndrag / split / select]
    CAP_TRACK -->|generate btn| TRANSCRIBE[transcribeEditorCaptions\nWhisper]
    TRANSCRIBE --> CAP_SEGMENTS

    RENDER --> OUTPUT[Exported video]
    CAP_SEGMENTS --> OUTPUT
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/desktop/src/routes/editor/KeyboardTab.tsx
Line: 106-115

Comment:
**Silent failure when no keyboard segments are generated**

When `generateSegments()` finishes but the backend returns zero segments (e.g., no keyboard events were recorded, or filtering removed them all), the function returns without showing any feedback to the user. The loading state resets and the button label reverts to "Generate Keyboard Segments" with no explanation, leaving the user wondering what happened.

The parallel caption generation path in `generateCaptionsFromTrack` (`index.tsx`) correctly handles this case by emitting a toast:

```typescript
if (result.segments.length < 1) {
  toast.error("No captions were generated. The audio might be too quiet or unclear.");
  return;
}
```

The same guard should be applied here so users understand why the track stays empty:

```suggestion
		if (segments.length > 0) {
				batch(() => {
					ensureKeyboardSettings(true);
					setProject("timeline", "keyboardSegments", segments);
					setEditorState("timeline", "tracks", "keyboard", true);
				});
			} else {
				toast.error(
					"No keyboard events found. Make sure keyboard capture was enabled during recording.",
				);
			}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/desktop/src/routes/editor/Timeline/KeyboardTrack.tsx
Line: 34

Comment:
**Unused destructured variable `timelineBounds`**

`timelineBounds` is destructured from `useTimelineContext()` but is never referenced anywhere in `KeyboardTrack`. The same dead destructure exists in `CaptionsTrack.tsx` (line 34).

```suggestion
	const { secsPerPixel } = useTimelineContext();
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/general_settings.rs
Line: 174-181

Comment:
**Default transcription hints contain placeholder example values**

`default_transcription_hints()` ships with `"My Brand Name"` and `"mywebsite.com"` as default entries. These look like template examples left in place rather than real defaults. If these strings are fed to the Whisper model as vocabulary bias hints, they will skew transcription toward those phonemes for every new user, potentially degrading accuracy for content that doesn't match them.

Only `"Cap"` (the product name) makes sense as a shipped default. The placeholder entries should either be removed or documented as examples the user replaces in settings.

```suggestion
fn default_transcription_hints() -> Vec<String> {
    vec!["Cap".to_string()]
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix(editor): defer caption track enable ..." | Re-trigger Greptile

cursoragent and others added 12 commits February 18, 2026 00:17
…ject

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…kwards-compatible caption migration

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…pipelines

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…ildup

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…line

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…ll drag/resize/split support

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
…board settings store

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
… generation

Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
Co-authored-by: Richie McIlroy <richiemcilroy@users.noreply.github.com>
@cursor
Copy link

cursor bot commented Feb 18, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@richiemcilroy richiemcilroy marked this pull request as ready for review February 19, 2026 00:32
Comment on lines +186 to +188
if is_special && !show_special_keys && !is_modifier {
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now Backspace gets skipped when show_special_keys is false (it hits the continue before the backspace handling), so the displayed text won’t reflect deletions unless special keys are enabled.

Suggested change
if is_special && !show_special_keys && !is_modifier {
continue;
}
if is_special && !show_special_keys && !is_modifier && event.key != "Backspace" {
continue;
}

Comment on lines +2002 to +2006
all_events.presses.sort_by(|a, b| {
a.time_ms
.partial_cmp(&b.time_ms)
.unwrap_or(std::cmp::Ordering::Equal)
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial_cmp + unwrap_or(Equal) can hide NaNs and makes ordering less explicit. Since this is f64, total_cmp is a nice drop-in here.

Suggested change
all_events.presses.sort_by(|a, b| {
a.time_ms
.partial_cmp(&b.time_ms)
.unwrap_or(std::cmp::Ordering::Equal)
});
all_events
.presses
.sort_by(|a, b| a.time_ms.total_cmp(&b.time_ms));

Comment on lines +64 to +77
fn flush_keyboard_data(output_path: &Path, presses: &[KeyPressEvent]) {
let events = KeyboardEvents {
presses: presses.to_vec(),
};
if let Ok(json) = serde_json::to_string_pretty(&events)
&& let Err(e) = std::fs::write(output_path, json)
{
tracing::error!(
"Failed to write keyboard data to {}: {}",
output_path.display(),
e
);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently drops JSON serialization errors silently, and to_string_pretty allocates a full string every flush. Writing directly to a file handles both and is a bit cheaper.

Suggested change
fn flush_keyboard_data(output_path: &Path, presses: &[KeyPressEvent]) {
let events = KeyboardEvents {
presses: presses.to_vec(),
};
if let Ok(json) = serde_json::to_string_pretty(&events)
&& let Err(e) = std::fs::write(output_path, json)
{
tracing::error!(
"Failed to write keyboard data to {}: {}",
output_path.display(),
e
);
}
}
fn flush_keyboard_data(output_path: &Path, presses: &[KeyPressEvent]) {
let events = KeyboardEvents {
presses: presses.to_vec(),
};
let file = match std::fs::File::create(output_path) {
Ok(file) => file,
Err(e) => {
tracing::error!(
"Failed to open keyboard data file {}: {}",
output_path.display(),
e
);
return;
}
};
let mut writer = std::io::BufWriter::new(file);
if let Err(e) = serde_json::to_writer(&mut writer, &events) {
tracing::error!(
"Failed to write keyboard data to {}: {}",
output_path.display(),
e
);
}
}

Comment on lines +2008 to +2014
let grouped = cap_project::group_key_events(
&all_events,
grouping_threshold_ms,
linger_duration_ms,
show_modifiers,
show_special_keys,
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor robustness: if these come from the UI as floats, clamping to non-negative avoids end < start segments when values go negative.

Suggested change
let grouped = cap_project::group_key_events(
&all_events,
grouping_threshold_ms,
linger_duration_ms,
show_modifiers,
show_special_keys,
);
let grouping_threshold_ms = grouping_threshold_ms.max(0.0);
let linger_duration_ms = linger_duration_ms.max(0.0);
let grouped = cap_project::group_key_events(
&all_events,
grouping_threshold_ms,
linger_duration_ms,
show_modifiers,
show_special_keys,
);

@richiemcilroy
Copy link
Member Author

@greptileai please re-review the PR

@richiemcilroy
Copy link
Member Author

@greptileai please re-review the pr

Comment on lines +245 to +250
let prefix = modifier_prefix(&modifiers);
if !prefix.is_empty() {
current_group_start = Some(event.time_ms);
current_display = prefix;
current_keys.push(KeyPressDisplay {
key: event.key.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Empty segments emitted when Backspace clears group

When a user types a character, presses Backspace to delete it (making current_display empty), then waits past grouping_threshold_ms before typing again, the should_start_new_group && let Some(start) = current_group_start branch fires and pushes a segment with an empty display_text. This produces invisible/blank keyboard overlay segments on the timeline.

The fix is to skip the push when current_display is empty, mirroring the guard already present on the final segment push at the bottom of the function:

Suggested change
let prefix = modifier_prefix(&modifiers);
if !prefix.is_empty() {
current_group_start = Some(event.time_ms);
current_display = prefix;
current_keys.push(KeyPressDisplay {
key: event.key.clone(),
if should_start_new_group && let Some(start) = current_group_start {
if !current_display.is_empty() {
segment_counter += 1;
segments.push(KeyboardTrackSegment {
id: format!("kb-{segment_counter}"),
start: start / 1000.0,
end: (last_key_time + linger_duration_ms) / 1000.0,
display_text: current_display.clone(),
keys: current_keys.clone(),
fade_duration_override: None,
position_override: None,
color_override: None,
background_color_override: None,
font_size_override: None,
});
}
current_display.clear();
current_keys.clear();
current_group_start = None;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/project/src/keyboard.rs
Line: 245-250

Comment:
**Empty segments emitted when Backspace clears group**

When a user types a character, presses Backspace to delete it (making `current_display` empty), then waits past `grouping_threshold_ms` before typing again, the `should_start_new_group && let Some(start) = current_group_start` branch fires and pushes a segment with an empty `display_text`. This produces invisible/blank keyboard overlay segments on the timeline.

The fix is to skip the push when `current_display` is empty, mirroring the guard already present on the final segment push at the bottom of the function:

```suggestion
        if should_start_new_group && let Some(start) = current_group_start {
            if !current_display.is_empty() {
                segment_counter += 1;
                segments.push(KeyboardTrackSegment {
                    id: format!("kb-{segment_counter}"),
                    start: start / 1000.0,
                    end: (last_key_time + linger_duration_ms) / 1000.0,
                    display_text: current_display.clone(),
                    keys: current_keys.clone(),
                    fade_duration_override: None,
                    position_override: None,
                    color_override: None,
                    background_color_override: None,
                    font_size_override: None,
                });
            }
            current_display.clear();
            current_keys.clear();
            current_group_start = None;
        }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +430 to +446
const duration = segment.end - segment.start;
const remaining = duration - time;
if (time < 0.3 || remaining < 0.3) return;

segments.splice(index + 1, 0, {
...segment,
start: segment.start + time,
end: segment.end,
});
segments[index].end = segment.start + time;
}),
);
},
deleteKeyboardSegments: (segmentIndices: number[]) => {
batch(() => {
setProject(
"timeline",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate segment id after split

splitKeyboardSegment spreads ...segment into the new right-hand segment, which copies the original's id. After the split both halves share the same id string. The same issue exists in splitCaptionSegment (~line 460). Any code that uses id as a unique key (e.g., the renderer looking up per-segment overrides, undo/redo reconciliation) will be unable to distinguish the two halves.

Generate a fresh ID for the new segment on each split, e.g.:

segments.splice(index + 1, 0, {
  ...segment,
  id: `kb-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
  start: segment.start + time,
  end: segment.end,
});

Apply the same fix in splitCaptionSegment.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/routes/editor/context.ts
Line: 430-446

Comment:
**Duplicate segment `id` after split**

`splitKeyboardSegment` spreads `...segment` into the new right-hand segment, which copies the original's `id`. After the split both halves share the same `id` string. The same issue exists in `splitCaptionSegment` (~line 460). Any code that uses `id` as a unique key (e.g., the renderer looking up per-segment overrides, undo/redo reconciliation) will be unable to distinguish the two halves.

Generate a fresh ID for the new segment on each split, e.g.:

```typescript
segments.splice(index + 1, 0, {
  ...segment,
  id: `kb-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
  start: segment.start + time,
  end: segment.end,
});
```

Apply the same fix in `splitCaptionSegment`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +559 to +565
}
});

const generateCaptionsFromTrack = async () => {
if (!editorInstance) return;

setEditorState("timeline", "tracks", "caption", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Caption track stays enabled on failed or empty generation

setEditorState("timeline", "tracks", "caption", true) runs unconditionally before the async call. When transcription throws or returns zero segments the caption track is left visible (showing the empty-state UI) with no automatic rollback. The user must manually toggle it off in the track header.

Consider only enabling the track on success, or resetting it on failure:

Suggested change
}
});
const generateCaptionsFromTrack = async () => {
if (!editorInstance) return;
setEditorState("timeline", "tracks", "caption", true);
const generateCaptionsFromTrack = async () => {
if (!editorInstance) return;
setEditorState("captions", "isGenerating", true);
try {
const result = await transcribeEditorCaptions(editorInstance.path);
if (result.segments.length < 1) {
toast.error(
"No captions were generated. The audio might be too quiet or unclear.",
);
return;
}

Then add setEditorState("timeline", "tracks", "caption", true) only inside the successful branch, after the produce call.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src/routes/editor/Timeline/index.tsx
Line: 559-565

Comment:
**Caption track stays enabled on failed or empty generation**

`setEditorState("timeline", "tracks", "caption", true)` runs unconditionally before the async call. When transcription throws or returns zero segments the caption track is left visible (showing the empty-state UI) with no automatic rollback. The user must manually toggle it off in the track header.

Consider only enabling the track on success, or resetting it on failure:

```suggestion
	const generateCaptionsFromTrack = async () => {
		if (!editorInstance) return;

		setEditorState("captions", "isGenerating", true);

		try {
			const result = await transcribeEditorCaptions(editorInstance.path);

			if (result.segments.length < 1) {
				toast.error(
					"No captions were generated. The audio might be too quiet or unclear.",
				);
				return;
			}
```

Then add `setEditorState("timeline", "tracks", "caption", true)` only inside the successful branch, after the `produce` call.

How can I resolve this? If you propose a fix, please make it concise.

@richiemcilroy
Copy link
Member Author

@greptileai please re-review the pr

@richiemcilroy richiemcilroy merged commit 7ab2456 into main Mar 25, 2026
18 checks passed

segments.splice(index + 1, 0, {
...segment,
id: `kb-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is running in the desktop app, would you be up for using crypto.randomUUID() (already used elsewhere in apps/desktop) instead of Date.now()/Math.random() for split IDs?

Suggested change
id: `kb-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
id: `kb-split-${crypto.randomUUID()}`,


segments.splice(index + 1, 0, {
...segment,
id: `cap-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought here for consistency + collision resistance.

Suggested change
id: `cap-split-${Date.now()}-${Math.random().toString(36).slice(2)}`,
id: `cap-split-${crypto.randomUUID()}`,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants