fix: building youtube contentHtml more reliably#211
Open
lorenzozane wants to merge 1 commit intokepano:mainfrom
Open
fix: building youtube contentHtml more reliably#211lorenzozane wants to merge 1 commit intokepano:mainfrom
lorenzozane wants to merge 1 commit intokepano:mainfrom
Conversation
Improved getVideoData() to not return the first match but the best one, also containig the description
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improving video data extraction to build
contentmore reliably.The lines:
inside
buildResult()implies that the expected behavior is forcontentHtmlto contain also theformattedDescriptionof the video. This, right now, only happens after SPA navigation. If the video is opened from the homepage or another source, the description (inside thecontentHtmlbody) results empty. Resulting in:instead of:
Cause
This happens because
getVideoData()returns the firstVideoObjectmatched withscript[type="application/ld+json"]and not the best one. Usually, the firstVideoObjectmatched contains information about the first comment rendered on the page (that includes some video properties, but lacks the description), for example:and only later the
VideoObjectcontaining the main video metadata block get matched, for example:This leaves
videoData.descriptionempty, consequentiallyconst description = videoData.description || '';empty, and thedescriptionis not added to thecontentHtmlbody.In
defuddle.tsthe emptydescriptionis later populated fetching it from the metadata:Fix
Improved
getVideoData()to not return the first match but the best one, also containing thedescription.Iterating over all the matched
VideoObjects, storing the less promising asfallbackVideoObject(prioritizing the one containing thecommentproperty, and all the others after that) and returning directly only ifdescriptionis present.The comments in the updated code of
getVideoData()further explain the updated logic.Extra
I also changed the comment:
to:
after verifying that the original was not correct.
Tests
All tests passed.
Solution tested building
defuddleand using it in obsidian-clipper.