Skip to content

Does not parse HTML properly #18

@wosc

Description

@wosc

Our production application contains quite a few inline <script> tags with accumulated javascript inside. An excerpt looks like this:

<head>
<script>
// snip
                            if ( something < other ) {
// snip
                            // explanatory comment: we replace " and ' as late as possible
// snip
</script>

<esi:remove>This directive is not executed</esi:remove>
</head>

When processing this kind of content, the esi crate does not execute any esi-directives (at least inside <head> in the example, directives later in <body> are picked up). I guess this is due to using quick_xml as the parser, which expects XML, where e.g. < inside the script tag would have to be escaped as &lt;, but is getting HTML, where the escaping rules are much more relaxed -- and conversely, applying XML-style escapes in an HTML document results in JavaScript syntax errors, so that's not a solution. I think we really need an HTML-aware parser here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions