fix: building youtube contentHtml more reliably by lorenzozane · Pull Request #211 · kepano/defuddle

lorenzozane · 2026-03-28T18:05:22Z

Improving video data extraction to build `content` more reliably.

The lines:

	let contentHtml = `<iframe ...></iframe>${formattedDescription}`;

	if (transcript?.html) {
		contentHtml += transcript.html;
	}

inside buildResult() implies that the expected behavior is for contentHtml to contain also the formattedDescription of the video. This, right now, only happens after SPA navigation. If the video is opened from the homepage or another source, the description (inside the contentHtml body) results empty. Resulting in:

![](https://www.youtube.com/watch?v={{videoId}})
{{transcript}}

instead of:

![](https://www.youtube.com/watch?v={{videoId}})
{{description}}
{{transcript}}

Cause

This happens because getVideoData() returns the first VideoObject matched with script[type="application/ld+json"] and not the best one. Usually, the first VideoObject matched contains information about the first comment rendered on the page (that includes some video properties, but lacks the description), for example:

{
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "@id": "https://www.youtube.com/watch?v=cz-wqROuHIM",
    "name": "#114 First Cabin Finished",
    "thumbnailUrl": "https://i.ytimg.com/vi/cz-wqROuHIM/maxresdefault.jpg",
    "uploadDate": "2024-05-26T08:08:22-07:00",
    "comment": [
        {
            "@type": "https://schema.org/Comment",
            ...,
        }
    ]
}

and only later the VideoObject containing the main video metadata block get matched, for example:

{
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "description": "Ollie is still here. {...}",
    "duration": "PT2354S",
    "embedUrl": "https://www.youtube.com/embed/TCHFvt4W-j8",
    "name": "#115 Start Building the Greenhouse",
    "thumbnailUrl": [
        "https://i.ytimg.com/vi/TCHFvt4W-j8/maxresdefault.jpg"
    ],
    "uploadDate": "2024-06-02T01:01:18-07:00",
    "@id": "https://www.youtube.com/watch?v=TCHFvt4W-j8",
    "interactionStatistic": [
        ...,
    ],
    "genre": "Travel & Events",
    "author": "Martijn Doolaard"
}

This leaves videoData.description empty, consequentially const description = videoData.description || ''; empty, and the description is not added to the contentHtml body.

In defuddle.ts the empty description is later populated fetching it from the metadata:

description: extracted.variables?.description || metadata.description,

Fix

Improved getVideoData() to not return the first match but the best one, also containing the description.
Iterating over all the matched VideoObjects, storing the less promising as fallbackVideoObject (prioritizing the one containing the comment property, and all the others after that) and returning directly only if description is present.
The comments in the updated code of getVideoData() further explain the updated logic.

Extra

I also changed the comment:

// Fall back to og:* meta tags. YouTube updates these after SPA navigation,
// so they reliably reflect the current video.

to:

// Fall back to og:* meta tags. YouTube usually do not updates these after SPA navigation.

after verifying that the original was not correct.

Tests

All tests passed.
Solution tested building defuddle and using it in obsidian-clipper.

Improved getVideoData() to not return the first match but the best one, also containig the description

fix: fetch videoData from more reliable source

8502465

Improved getVideoData() to not return the first match but the best one, also containig the description

lorenzozane changed the title ~~fix: fetch videoData from more reliable source~~ fix: building youtube contentHtml more reliably Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: building youtube contentHtml more reliably#211

fix: building youtube contentHtml more reliably#211
lorenzozane wants to merge 1 commit intokepano:mainfrom
lorenzozane:fix-youtube-content-description

lorenzozane commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lorenzozane commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improving video data extraction to build content more reliably.

Cause

Fix

Extra

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lorenzozane commented Mar 28, 2026 •

edited

Loading

Improving video data extraction to build `content` more reliably.