fix(xml): reject invalid XML element and attribute names#536
fix(xml): reject invalid XML element and attribute names#536lawrence3699 wants to merge 1 commit intoTomWright:masterfrom
Conversation
Previously, keys containing characters not valid in XML names (e.g. <, >, &, spaces) were passed directly to Go's encoding/xml encoder, producing invalid XML that could not be parsed back. Validate element and attribute names against the XML 1.0 (Fifth Edition) specification before writing. Return a clear error for names containing characters not permitted by the spec. Fixes TomWright#425
There was a problem hiding this comment.
Pull request overview
Adds XML 1.0 (Fifth Edition) name validation to prevent emitting malformed XML when map keys contain invalid characters, returning an explicit error instead.
Changes:
- Introduces XML Name/NameChar validation helpers and validates element names before encoding.
- Validates attribute names (after stripping the
-prefix) before encoding attributes. - Adds tests covering invalid element names and unit tests for
isValidXMLName.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| parsing/xml/writer.go | Adds XML name validation for elements/attributes and returns errors for invalid names. |
| parsing/xml/writer_test.go | Adds integration tests asserting invalid element names cause write errors. |
| parsing/xml/writer_internal_test.go | Adds unit tests for the XML name validator helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if strings.HasPrefix(kv.Key, "-") { | ||
| attrName := kv.Key[1:] | ||
| if !isValidXMLName(attrName) { | ||
| return fmt.Errorf("key %q is not a valid XML attribute name", attrName) |
There was a problem hiding this comment.
The attribute-name error message reports the stripped attribute name as a “key” (e.g. input key "-lang" would produce an error about key "lang"). That can be confusing for users debugging map keys. Consider wording the message as an attribute name error and/or include the original map key (including the leading "-") in the error details.
| return fmt.Errorf("key %q is not a valid XML attribute name", attrName) | |
| return fmt.Errorf("invalid XML attribute name %q from map key %q", attrName, kv.Key) |
| if strings.HasPrefix(kv.Key, "-") { | ||
| attrName := kv.Key[1:] | ||
| if !isValidXMLName(attrName) { | ||
| return fmt.Errorf("key %q is not a valid XML attribute name", attrName) | ||
| } |
There was a problem hiding this comment.
New behavior rejects invalid attribute names, but there isn’t a test that exercises this error path (e.g. a map containing an attribute key like "-<" or "-foo bar"). Adding an integration test for invalid attribute names would help prevent regressions and ensure the writer fails with a clear error when encountering invalid attribute keys.
| t.Run("invalid element name", func(t *testing.T) { | ||
| w, err := xml.XML.NewWriter(parsing.DefaultWriterOptions()) | ||
| if err != nil { | ||
| t.Fatalf("Unexpected error: %s", err) | ||
| } | ||
|
|
||
| toEncode := model.NewMapValue() | ||
| _ = toEncode.SetMapKey("<", model.NewStringValue("value")) | ||
| _, err = w.Write(toEncode) | ||
| if err == nil { | ||
| t.Fatal("Expected error for invalid XML element name, got nil") | ||
| } | ||
| }) |
There was a problem hiding this comment.
These tests only assert that an error occurs, but they don’t verify the error message includes the offending key and indicates it’s an element-name validation failure. Since the PR’s goal is to return a clear error for invalid XML names, asserting on key/message content would help keep the error user-friendly over time.
Summary
Problem
When converting from other formats (e.g., JSON) to XML, keys containing characters that are not valid in XML names (such as
<,>,&, spaces) are passed directly to Go'sencoding/xmlencoder. The encoder does not validate element names, so it produces invalid XML that cannot be parsed back.This output is not valid XML. Even dasel itself cannot parse it:
Solution
Add XML Name validation per the XML 1.0 (Fifth Edition) specification §2.3. The
toElementfunction now checks element names, andextractAttrsAndTextchecks attribute names, before they are used. Invalid names produce a clear error:Test plan
<,&, and space in element names all return errorsisValidXMLNamecovering valid names (foo,_bar,ns:local,über, etc.) and invalid names (<,>,&,foo bar,1abc, etc.)go test ./...passesgo vetclean