Skip to content

Hotfix bbcode regex#5

Open
alexdryden wants to merge 19 commits intomasterfrom
hotfix_bbcode_regex
Open

Hotfix bbcode regex#5
alexdryden wants to merge 19 commits intomasterfrom
hotfix_bbcode_regex

Conversation

@alexdryden
Copy link
Copy Markdown

No description provided.

This fixes a number of issues with the substitution parsing. If
further work is needed, we should investigate a library like
github.com/chriskonnertz/bbcode.

This commit includes fixes for:
-match only matching tags
-fix substitution syntax
-allow https
Comment thread packages/core/lib/archonobject.inc.php Outdated
Comment thread packages/core/lib/archonobject.inc.php Outdated
Comment thread test_bbcode.php Outdated
Comment thread test_bbcode.php Outdated
Comment thread test_bbcode.php Outdated
alexdryden and others added 2 commits March 22, 2026 13:31
Co-authored-by: graykr <graykr@users.noreply.github.com>
@alexdryden
Copy link
Copy Markdown
Author

I think ideally we would not call this bbcode_to_html because it doesn't produce html, but that probably isn't really strictly necessary.

@graykr After applying your suggestions I was getting some double mailto:mailto. Probably the easiest way to make sure this is really doing what you expect is to look at the test cases, which I made a bit easier to read. And adding/correcting any mistakes in the test cases might be the clearest way to indicate the expected behavior.

Comment thread test_bbcode.php Outdated
alexdryden and others added 2 commits March 23, 2026 11:43
These are some other ways that email links can appear

Co-authored-by: graykr <graykr@users.noreply.github.com>
@alexdryden alexdryden requested a review from graykr March 23, 2026 15:53
Copy link
Copy Markdown

@graykr graykr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added some comments/suggestions to the test version, but they also need to be copied to the one to be actually used in Archon.)

Also, since this is actually bbcode to xml, it looks like we'd need to encode any & in the URL to &amp; -- this is sometimes, though not always causing issues with the EAD. Since it is causing issues sometimes, it might just be worth adding into this (poorly named) bbcode_to_html function?

Comment thread test_bbcode.php Outdated
Comment thread test_bbcode.php Outdated
Comment thread test_bbcode.php Outdated
Comment thread test_bbcode.php Outdated
Co-authored-by: graykr <graykr@users.noreply.github.com>
@alexdryden alexdryden force-pushed the hotfix_bbcode_regex branch from 12df735 to d014e6c Compare March 25, 2026 23:27
@alexdryden
Copy link
Copy Markdown
Author

alexdryden commented Mar 26, 2026

It is a little hard to say without a deeper dive into Archon, but it sure seems like there are things that should be escaping for XML. I just added a function to preg_replace that uses the builtin special character escape (&<>' ' "). That will replace any of those characters that appear in any of the capture groups, which I thought should be okay because it seems like it would throw an error anywhere they showed up in this kind of content. But if you need it to just target only the url, we can do that instead (or alternatively, if you want to escape it absolutely everywhere we can just escape the entire string before doing the url substitutions). Edit: this will break nested tags, so just substituting the & during the replacements now.

I can not at all guarantee that the inline comments actually represent the target thing to find or the target thing to replace, but I tried to make it a little bit easier to read while removing the two redundant email substitutes. Double-check the examples I used in the tests to make sure I'm doing what you really want me to.

escaping all xml characters will escape the brackets of nested tags
@alexdryden alexdryden force-pushed the hotfix_bbcode_regex branch from 3720ebd to 0ff5d2d Compare March 26, 2026 01:39
Copy link
Copy Markdown

@graykr graykr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alex! I got a syntax error on the new php you added for the child class but was able to fix. Other than that, I am seeing the tests come back cleanly. Thank you so much for your work on this!

Comment thread test_bbcode.php Outdated
Comment on lines +4 to +7
$test_instance = new class() extends ArchonObject {
public function __construct() {
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$test_instance = new class() extends ArchonObject {
public function __construct() {
}
};
class TestArchonObject extends ArchonObject {
public function __construct() {
}
};
$test_instance = new TestArchonObject();

The earlier format is giving me a syntax error.

graykr added 3 commits April 15, 2026 15:57
…for comparison, add json encoding to mimic use in collection-list.inc.php and others, adjust output for display in browser
@graykr
Copy link
Copy Markdown

graykr commented Apr 15, 2026

@alexdryden I did some work on this in relation to how the data is sent to the migration code. Nothing was working in ArchonRecordInspector in the migration code, even though all the tests you wrote for Archon were passing.

Eventually I realized that before archon calls bbcode_to_html it encodes the string (or, array of strings) as json, and json_encode in PHP escapes all backslashes by default. This explains why the new patterns you wrote weren't working (and the old version of the code wasn't passing any of the tests you had written, even though it should have been working for some of the basic patterns). So, as it turns out, the patterns do all seem to need those 5 repeated backslashes in them!

I will push my edits to the branch shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants