Bugfix: DiffMatchPatch Surrogate Pairs Support#581
Bugfix: DiffMatchPatch Surrogate Pairs Support#581jleandroperez wants to merge 6 commits intodevelopfrom
Conversation
| Boolean lineBreak2 = whitespace2 && CFCharacterSetIsCharacterMember(controlSet, char2); | ||
| Boolean blankLine1 = lineBreak1 && diff_regExMatch(one, &blankLineEndRegEx); | ||
| Boolean blankLine2 = lineBreak2 && diff_regExMatch(two, &blankLineStartRegEx); | ||
|
|
There was a problem hiding this comment.
Can we remove the whitespace changes? It's making it hard to scan and see if any actual changes occurred here.
There was a problem hiding this comment.
Only two new lines there...
Boolean char1IsSurrogate = CFStringIsSurrogateLowCharacter(char1) || CFStringIsSurrogateHighCharacter(char1);
Boolean char2IsSurrogate = CFStringIsSurrogateLowCharacter(char2) || CFStringIsSurrogateHighCharacter(char1);
I've relocated all the things because it was next to impossible to understand what was going on. Lemme see if i can revert that... relatively easily!
There was a problem hiding this comment.
@dmsnell there you go! The changes themselves are very low footprint. And the beauty of it? if there are no surrogate pairs anywhere... this will run the same way it ever did.
| return MIN(i - 2, 0); | ||
| } | ||
|
|
||
| return i - 1; |
There was a problem hiding this comment.
did you see google/diff-match-patch#69 (comment) where new bugs appeared by using this approach?
|
Is the way we're modifying this here going to lend itself well to pushing a patch to the upstream repository? |
|
@dmsnell there is no upstream repository on this one. When this was built. DMP wasn't on Github, and (I'm pretty sure) it has a couple bugfixes. Okay to patch right here! |
|
Closed in favor of #582 |
Details:
In this PR we're upgrading DiffMatchPatch so that surrogate pairs are not cut in half.
This avoids invalid unicode strings to be formed.
Simplenote sibling here
Closes #580
Testing:
[TBD]