Fixed: Bracketed IPv6 hosts, JS IPv6 detection#913
Fixed: Bracketed IPv6 hosts, JS IPv6 detection#913ahankinson wants to merge 8 commits intogleam-lang:mainfrom
Conversation
Fixes IPv6 parsing in JS where the char ranges were not specified correctly. `48 >= char` is the same as `char <= 48`, so the range 48..57 was invalid. Additionally, the first char in this string is `[`, which is 91, but this is not marked as a valid char so it was automatically rejected. Refs gleam-lang#881 Refs gleam-lang/httpc#34
Adds a set of tests for bracketed IPv6 addresses in both full and collapsed form. Refs gleam-lang#881
Fixes an issue where IPv6 hosts were not being produced in RFC-compliant bracketed forms. It parses the URI using the Erlang FFI, but then wraps the Host return in brackets to make it RFC-compliant. It looks for a `:` in the host component to identify whether a host is an IPv6 address. Fixes gleam-lang#881 Refs gleam-lang/httpc#34 (should also fix that)
|
Hello! I think this will now permit |
|
Ah, ok, let me test this and will update. |
|
Thank you! Good catch on finding the bug btw 💜 |
|
So I've added a few tests: The question is what to do with malformed URIs. The current behaviour is actually to accept them!
But The tests that are added are an attempt to try and determine the behaviour of the parsing. I have a patch that will also fix it and reject malformed IPv6 hosts, but I want to make sure I'm not missing something. So I will commit the tests and the patch separately so the patch can be reverted. |
Adds rejection tests to ensure malformed IPv6 Hosts are rejected instead of passing.
As best I can tell, these matches should be rejected since they do not form part of a valid IPv6 bracketed host. The opening bracket is now popped off so that detection can continue on the rest of the host. Since it is popped off, it is no longer considered a valid char in is_valid_host_within_brackets_char. However, if a `/`, `?`, `#` are detected BEFORE a closing `]` the host is considered malformed and a parsing error is now raised. (This would also align with `is_valid_host_within_brackets_char` since these characters are not in the ranges checked.) The comment: "if we find a special one then we know that we're actually parsing the other format for the host and we switch to that!" would indicate that there was a mixup in the code, since for IPv6 in brackets (which is what this code is parsing) there should only be one format, AFAIK?
|
OK, two more commits added. I think there was a deeper bug with parsing IPv6 hosts within brackets; now, chars that are not strictly alphanumeric or |
|
@ahankinson Looking at your implementation, you could also extend the tests to include "weird" cases like: pub fn parse_ipv6_with_invalid_char_test() {
// 'g' is not a hex digit
assert uri.parse("http://[::g]/") == Error(Nil)
}
pub fn parse_bracket_followed_by_text_error_test() {
// characters immediately after closing bracket and before slash should error
assert uri.parse("http://[::1]foo") == Error(Nil)
}
pub fn parse_double_closing_bracket_test() {
assert uri.parse("http://[::1]]/") == Error(Nil)
}nothing transcendental, but since you're at it, they're fine 😆 EDIT: New little ideas pub fn ipv6_uppercase_test() {
// ensure A–F upper case are accepted
let assert Ok(parsed) = uri.parse("http://[2001:DB8::1]")
assert parsed.host == Some("[2001:DB8::1]")
assert uri.to_string(parsed) == "http://[2001:DB8::1]/"
}
pub fn ipv6_mixedcase_test() {
let assert Ok(parsed) = uri.parse("http://[2001:dB8:ABcd::]")
assert parsed.host == Some("[2001:dB8:ABcd::]") |
|
Oh good call! Will do that after lunch |
Fixes the detection for valid chars in brackets to only allow A-Fa-f, given that it will only ever be hex digits. Credit to @upodevelop
|
OK; fixed |
lpil
left a comment
There was a problem hiding this comment.
Super nice work! I've left one tiny request inline 🙏
src/gleam/uri.gleam
Outdated
| "]" <> _ if size == 0 -> Error(Nil) | ||
| "]" <> rest -> { | ||
| let host = codeunit_slice(original, at_index: 0, length: size + 1) | ||
| let host = "[" <> codeunit_slice(original, at_index: 0, length: size) <> "]" |
There was a problem hiding this comment.
Slice one character before and one character after rather than slicing and then appending please 🙏
There was a problem hiding this comment.
I'm not quite sure if I've fixed it the way you were thinking. I've extended size by 1 to capture the closing bracket on original, but the opening bracket is already consumed (L205 -- [ <> rest, and rest is what is passed in to the loop.) So I've had to re-add it back in.
If I missed something obvious, though, please let me know.
The `is_valid_host_within_brackets` already checks whether the character is valid, so these case branches were redundant with that check.
|
I noticed that the extra character checks were now redundant with |
Previously, IPv6 hosts were not parsed and returned correctly in the URI parser due to two different issues.
a. Bracketed IPv6 addresses were being rejected out of hand because the opening bracket,
[, char 91, was not included as a valid character.b. Logic errors in the char ranges meant that the ranges were exclusive and not inclusive. For example,
48 >= char==char <= 48, so that would only match characters NOT in the range of 48 - 57, which was the intended behaviour.:in the Host value.AI Disclosure: OpenAI Codex was used to diagnose and propose the code changes, but these were verified, reviewed, and the tests run by me to ensure they do what they say. The commit messages and PR text are also non-AI.
Fixes #881