Skip to content

fix: building youtube contentHtml more reliably#211

Open
lorenzozane wants to merge 1 commit intokepano:mainfrom
lorenzozane:fix-youtube-content-description
Open

fix: building youtube contentHtml more reliably#211
lorenzozane wants to merge 1 commit intokepano:mainfrom
lorenzozane:fix-youtube-content-description

Conversation

@lorenzozane
Copy link
Copy Markdown
Contributor

@lorenzozane lorenzozane commented Mar 28, 2026

Improving video data extraction to build content more reliably.

The lines:

	let contentHtml = `<iframe ...></iframe>${formattedDescription}`;

	if (transcript?.html) {
		contentHtml += transcript.html;
	}

inside buildResult() implies that the expected behavior is for contentHtml to contain also the formattedDescription of the video. This, right now, only happens after SPA navigation. If the video is opened from the homepage or another source, the description (inside the contentHtml body) results empty. Resulting in:

![](https://www.youtube.com/watch?v={{videoId}})
{{transcript}}

instead of:

![](https://www.youtube.com/watch?v={{videoId}})
{{description}}
{{transcript}}

Cause

This happens because getVideoData() returns the first VideoObject matched with script[type="application/ld+json"] and not the best one. Usually, the first VideoObject matched contains information about the first comment rendered on the page (that includes some video properties, but lacks the description), for example:

{
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "@id": "https://www.youtube.com/watch?v=cz-wqROuHIM",
    "name": "#114 First Cabin Finished",
    "thumbnailUrl": "https://i.ytimg.com/vi/cz-wqROuHIM/maxresdefault.jpg",
    "uploadDate": "2024-05-26T08:08:22-07:00",
    "comment": [
        {
            "@type": "https://schema.org/Comment",
            ...,
        }
    ]
}

and only later the VideoObject containing the main video metadata block get matched, for example:

{
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "description": "Ollie is still here. {...}",
    "duration": "PT2354S",
    "embedUrl": "https://www.youtube.com/embed/TCHFvt4W-j8",
    "name": "#115 Start Building the Greenhouse",
    "thumbnailUrl": [
        "https://i.ytimg.com/vi/TCHFvt4W-j8/maxresdefault.jpg"
    ],
    "uploadDate": "2024-06-02T01:01:18-07:00",
    "@id": "https://www.youtube.com/watch?v=TCHFvt4W-j8",
    "interactionStatistic": [
        ...,
    ],
    "genre": "Travel & Events",
    "author": "Martijn Doolaard"
}

This leaves videoData.description empty, consequentially const description = videoData.description || ''; empty, and the description is not added to the contentHtml body.

In defuddle.ts the empty description is later populated fetching it from the metadata:

description: extracted.variables?.description || metadata.description,

Fix

Improved getVideoData() to not return the first match but the best one, also containing the description.
Iterating over all the matched VideoObjects, storing the less promising as fallbackVideoObject (prioritizing the one containing the comment property, and all the others after that) and returning directly only if description is present.
The comments in the updated code of getVideoData() further explain the updated logic.

Extra

I also changed the comment:

// Fall back to og:* meta tags. YouTube updates these after SPA navigation,
// so they reliably reflect the current video.

to:

// Fall back to og:* meta tags. YouTube usually do not updates these after SPA navigation.

after verifying that the original was not correct.

Tests

All tests passed.
Solution tested building defuddle and using it in obsidian-clipper.

Improved getVideoData() to not return the first match but the best one, also containig the description
@lorenzozane lorenzozane changed the title fix: fetch videoData from more reliable source fix: building youtube contentHtml more reliably Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant