Skip to content

Fixed: Bracketed IPv6 hosts, JS IPv6 detection#913

Open
ahankinson wants to merge 8 commits intogleam-lang:mainfrom
ahankinson:fixed-881-bracketed-ipv6
Open

Fixed: Bracketed IPv6 hosts, JS IPv6 detection#913
ahankinson wants to merge 8 commits intogleam-lang:mainfrom
ahankinson:fixed-881-bracketed-ipv6

Conversation

@ahankinson
Copy link
Copy Markdown

Previously, IPv6 hosts were not parsed and returned correctly in the URI parser due to two different issues.

  1. For JS targets, there was a bug in the char detection that prevented IPv6 addresses from being correctly identified. This was corrected and now JS targets are parsed correctly. The issue for this was twofold:

a. Bracketed IPv6 addresses were being rejected out of hand because the opening bracket, [, char 91, was not included as a valid character.
b. Logic errors in the char ranges meant that the ranges were exclusive and not inclusive. For example, 48 >= char == char <= 48, so that would only match characters NOT in the range of 48 - 57, which was the intended behaviour.

  1. For Erlang targets, the Erlang behaviour was not RFC-compliant and parsed IPv6 addresses were not being produced as bracketed versions. A decision was made to diverge from Erlang and introduce RFC-compliant bracketed formatting, so once hosts were parsed, brackets were re-introduced to the result. Note that IPv6 detection is done by looking for a : in the Host value.

AI Disclosure: OpenAI Codex was used to diagnose and propose the code changes, but these were verified, reviewed, and the tests run by me to ensure they do what they say. The commit messages and PR text are also non-AI.

Fixes #881

Fixes IPv6 parsing in JS where the char ranges were not specified correctly.

`48 >= char` is the same as `char <= 48`, so the range 48..57 was invalid.
Additionally, the first char in this string is `[`, which is 91, but this is not marked as a valid char so it was automatically rejected.

Refs gleam-lang#881
Refs gleam-lang/httpc#34
Adds a set of tests for bracketed IPv6 addresses in both full and collapsed form.

Refs gleam-lang#881
Fixes an issue where IPv6 hosts were not being produced in RFC-compliant bracketed forms. It parses the URI using the Erlang FFI, but then wraps the Host return in brackets to make it RFC-compliant. It looks for a `:` in the host component to identify whether a host is an IPv6 address.

Fixes gleam-lang#881
Refs gleam-lang/httpc#34  (should also fix that)
@lpil
Copy link
Copy Markdown
Member

lpil commented Mar 10, 2026

Hello! I think this will now permit [[[[[[[[[[] as a host? I think the bug in the original code was to leave the [ on the host on line 205. Could we pop it off there instead?

@ahankinson
Copy link
Copy Markdown
Author

Ah, ok, let me test this and will update.

@lpil
Copy link
Copy Markdown
Member

lpil commented Mar 10, 2026

Thank you! Good catch on finding the bug btw 💜

@ahankinson
Copy link
Copy Markdown
Author

So I've added a few tests:

pub fn parse_ipv6_host_with_path_query_fragment_test() {
  let assert Ok(parsed) = uri.parse("http://[2001:db8::2:1]/foo/bar?baz=bif#blah")
  assert parsed.scheme == Some("http")
  assert parsed.host == Some("[2001:db8::2:1]")
  assert parsed.path == "/foo/bar"
  assert parsed.query == Some("baz=bif")
  assert parsed.fragment == Some("blah")
  assert parsed.port == None
  assert parsed.userinfo == None
}

pub fn parse_malformed_many_opening_brackets_in_host_test() {
  assert uri.parse("http://[[[[[[[[[[]/") == Error(Nil)
}

pub fn parse_malformed_nested_opening_bracket_in_host_test() {
  assert uri.parse("http://[[::1]/") == Error(Nil)
}

pub fn parse_malformed_unclosed_bracket_host_test() {
  assert uri.parse("http://[::1[/") == Error(Nil)
}

pub fn parse_malformed_question_mark_within_bracket_host_test() {
  assert uri.parse("http://[::1?bad]/") == Error(Nil)
}

pub fn parse_malformed_slash_within_bracket_host_test() {
  assert uri.parse("http://[::1/bad]/") == Error(Nil)
}

The question is what to do with malformed URIs. The current behaviour is actually to accept them!

"[" <> _ -> parse_host_within_brackets(uri_string, pieces) (L205) (as far as I can tell) doesn't pop the first bracket. So we can change this to "[" <> rest -> parse_host_within_brackets(rest, pieces).

But parse_host_within_brackets_loop (L229 & ff) doesn't reject the URI with an error if it doesn't detect an end ] before detecting other characters (e.g., /, ?, etc.). Currently, these are parsed as part of the host component, as far as I can tell?

The tests that are added are an attempt to try and determine the behaviour of the parsing. I have a patch that will also fix it and reject malformed IPv6 hosts, but I want to make sure I'm not missing something. So I will commit the tests and the patch separately so the patch can be reverted.

Adds rejection tests to ensure malformed IPv6 Hosts are rejected instead of passing.
As best I can tell, these matches should be rejected since they do not form part of a valid IPv6 bracketed host.

The opening bracket is now popped off so that detection can continue on the rest of the host. Since it is popped off, it is no longer considered a valid char in is_valid_host_within_brackets_char.
However, if a `/`, `?`, `#` are detected BEFORE a closing `]` the host is considered malformed and a parsing error is now raised. (This would also align with `is_valid_host_within_brackets_char` since these characters are not in the ranges checked.)

The comment:

"if we find a special one then we know that we're actually parsing the other format for the host and we switch to that!" would indicate that there was a mixup in the code, since for IPv6 in brackets (which is what this code is parsing) there should only be one format, AFAIK?
@ahankinson
Copy link
Copy Markdown
Author

OK, two more commits added. I think there was a deeper bug with parsing IPv6 hosts within brackets; now, chars that are not strictly alphanumeric or : and . will raise an error. I couldn't quite determine what "other format" the comment was trying to communicate, since the only format in this function should be an IPv6 host within brackets, and other characters shouldn't be valid within those brackets.

@lupodevelop
Copy link
Copy Markdown

lupodevelop commented Mar 10, 2026

@ahankinson Looking at your implementation, you could also extend the tests to include "weird" cases like:

pub fn parse_ipv6_with_invalid_char_test() {
  // 'g' is not a hex digit
  assert uri.parse("http://[::g]/") == Error(Nil)
}

pub fn parse_bracket_followed_by_text_error_test() {
  // characters immediately after closing bracket and before slash should error
  assert uri.parse("http://[::1]foo") == Error(Nil)
}

pub fn parse_double_closing_bracket_test() {
  assert uri.parse("http://[::1]]/") == Error(Nil)
}

nothing transcendental, but since you're at it, they're fine 😆


EDIT: New little ideas

pub fn ipv6_uppercase_test() {
  // ensure A–F upper case are accepted
  let assert Ok(parsed) = uri.parse("http://[2001:DB8::1]")
  assert parsed.host == Some("[2001:DB8::1]")
  assert uri.to_string(parsed) == "http://[2001:DB8::1]/"
}

pub fn ipv6_mixedcase_test() {
  let assert Ok(parsed) = uri.parse("http://[2001:dB8:ABcd::]")
  assert parsed.host == Some("[2001:dB8:ABcd::]")

@ahankinson
Copy link
Copy Markdown
Author

ahankinson commented Mar 10, 2026

Oh good call! Will do that after lunch

Fixes the detection for valid chars in brackets to only allow A-Fa-f, given that it will only ever be hex digits.

Credit to @upodevelop
@ahankinson
Copy link
Copy Markdown
Author

OK; fixed

Copy link
Copy Markdown
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice work! I've left one tiny request inline 🙏

"]" <> _ if size == 0 -> Error(Nil)
"]" <> rest -> {
let host = codeunit_slice(original, at_index: 0, length: size + 1)
let host = "[" <> codeunit_slice(original, at_index: 0, length: size) <> "]"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice one character before and one character after rather than slicing and then appending please 🙏

Copy link
Copy Markdown
Author

@ahankinson ahankinson Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure if I've fixed it the way you were thinking. I've extended size by 1 to capture the closing bracket on original, but the opening bracket is already consumed (L205 -- [ <> rest, and rest is what is passed in to the loop.) So I've had to re-add it back in.

If I missed something obvious, though, please let me know.

The `is_valid_host_within_brackets` already checks whether the character is valid, so these case branches were redundant with that check.
@ahankinson
Copy link
Copy Markdown
Author

I noticed that the extra character checks were now redundant with is_valid_host_within_brackets_char so I removed those case branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPv6 hosts are not correctly represented in the URI module

3 participants