ahttp_client.erl: Add chunked transfer encoding#2272
ahttp_client.erl: Add chunked transfer encoding#2272bettio wants to merge 12 commits intoatomvm:release-0.7from
Conversation
|
Amp: PR Review:
|
| Scenario | Expected |
|---|---|
Transfer-Encoding: gzip, chunked |
{error, {parser, {unsupported_transfer_encoding, _}}} |
Invalid chunk size (e.g. "ZZ\r\n") |
{error, {parser, {invalid_chunk_size, _}}} |
| Truncated chunk body (server closes mid-chunk) | {error, {parser, unexpected_eof}} |
Malformed Content-Length (e.g. "abc") |
{error, {parser, {invalid_content_length, _}}} |
Empty Transfer-Encoding value |
{error, {parser, {unsupported_transfer_encoding, _}}} |
Missing final 0\r\n\r\n (connection closes) |
{error, {parser, unexpected_eof}} |
Review generated: 2026-04-17
I fixed all the issues mentioned here. |
|
AMP: it seems confused re the "chunked" - pick and choose.. PR Review: ahttp_client chunked encoding & bug fixesCommits reviewed:
Overall: The chunked transfer encoding implementation is solid — the state machine High Severity1.
|
| # | Issue | Severity | Fix size |
|---|---|---|---|
| 1 | Content-Length: 0 hangs / false truncation |
High | ~3 lines |
| 2 | Fixed-length body overrun past CL | High | ~6 lines |
| 3 | Duplicate Content-Length silently accepted |
High | ~8 lines |
| 4 | Stacked TE rejected (gzip, chunked) |
Medium | Doc or ~15 lines |
| 5 | TE + CL coexistence undefined | Medium | ~5 lines + test |
| 6 | Obs-fold header continuation too permissive | Medium | Design decision |
| 7 | Trailer denylist comment overstates RFC scope | Medium | Comment fix |
| 8 | Sub-binary retention on embedded | Low | Monitor |
| 9 | No header line size caps | Low | ~5 lines |
| 10 | Asymmetric close behavior | Low | Doc |
| false -> | ||
| loop_collect_active(UpdatedConn, NewAcc) | ||
| end; | ||
| {ok, _Conn, closed} -> |
There was a problem hiding this comment.
We need to fix the warning here...
| is_forbidden_trailer_field(<<"Transfer-Encoding">>) -> true; | ||
| is_forbidden_trailer_field(_) -> false. | ||
|
|
||
| trim_left_spaces(Bin, Count) -> |
There was a problem hiding this comment.
So I see that our string:trim/2 is incomplete (spec is wrong).
This implementation is slower and bigger on both BEAM and AtomVM than a naive recursive implementation, at least when compiled with OTP 28:
trim_recursive_left(<<C, Tail/binary>>) when C =:= $\t orelse C =:= $\s ->
trim_recursive_left(Tail);
trim_recursive_left(Other) ->
Other.
And we may want to factor this into string:trim/3
There was a problem hiding this comment.
string:trim/3 cannot be used with http, e.g. it trims out characters that should not be trimmed according http spec.
I have a plan for future fixes outside this PR.
There was a problem hiding this comment.
Do you mean string:trim/2 ? I think string:trim/3 can be passed the characters we want to trim.
There was a problem hiding this comment.
This still means assuming that the binary we are going to trim is utf8, that is not a possible assumption with http.
All the issues in the second AMP round are fixed. |
petermm
left a comment
There was a problem hiding this comment.
Some suggestions, might be worth doing at least #1?
Final PR Review: ahttp_client chunked encoding & hardening
Status: ✅ APPROVED
Commits reviewed (12):
| # | Commit | Description |
|---|---|---|
| 1 | fbbd02141 |
Add chunked transfer encoding |
| 2 | 226b09bf2 |
Fix crash on bad Content-Length |
| 3 | cd2f9b695 |
Fix empty header value crash |
| 4 | 5f23c65e3 |
Fix silent truncation on close |
| 5 | 077ef1b62 |
Fix Content-Length conflict |
| 6 | 050f1a69a |
Fix hang on empty response body |
| 7 | 31c5bdff3 |
Fix Content-Length overrun |
| 8 | 1068dfc0f |
Document data binary retention |
| 9 | 8427d31ea |
Remove obs-fold support |
| 10 | aad449849 |
Add 16 KiB parsed line cap |
| 11 | 2c542cd79 |
Refactor OWS trim helpers |
| 12 | 24a1a4c35 |
Document close return shapes |
Previous Review Issues — All Resolved
| # | Issue | Severity | Resolution |
|---|---|---|---|
| 1 | Content-Length: 0 hangs / false truncation |
High | ✅ Fixed in 050f1a69a — emits done at end-of-headers |
| 2 | Fixed-length body overrun past CL | High | ✅ Fixed in 31c5bdff3 — splits at N bytes, discards excess |
| 3 | Duplicate Content-Length silently accepted |
High | ✅ Fixed in 077ef1b62 — same value accepted, conflict rejected |
| 4 | Stacked TE rejected (gzip, chunked) |
Medium | ✅ Documented as intentional in te_is_chunked/1 comment |
| 5 | TE + CL coexistence undefined | Medium | ✅ Strict reject at end-of-headers (content_length_with_transfer_encoding) |
| 6 | Obs-fold header continuation too permissive | Medium | ✅ Removed in 8427d31ea — returns deprecated_obs_fold error |
| 7 | Trailer denylist comment overstates RFC scope | Medium | ✅ Comment updated to "subset of RFC 9110 §6.5.1" |
| 8 | Sub-binary retention on embedded | Low | ✅ Documented in stream/2 edoc |
| 9 | No header line size caps | Low | ✅ 16 KiB cap in aad449849 with line_too_long error |
| 10 | Asymmetric close behavior | Low | ✅ Documented in both stream/2 and recv/2 edoc |
Verification of Fixes
CL:0 completion (commit 050f1a69a)
The end-of-headers clause ordering is correct. When remaining_body_bytes = 0, the parser
transitions straight to done instead of entering body state:
%% Lines 431-440: clause ordering matters
parse_line(... body_encoding = chunked, remaining_body_bytes = N ..., <<>>) when is_integer(N) ->
{error, ...}; % TE+CL conflict — checked first
parse_line(... body_encoding = chunked ..., <<>>) ->
{ok, ... chunked_size ...}; % pure chunked
parse_line(... remaining_body_bytes = 0 ..., <<>>) ->
{ok, ... done ..., done}; % CL:0 — emits done immediately
parse_line(... headers ..., <<>>) ->
{consume_bytes, ... body ...}. % normal CL bodyTested by test_content_length_zero/0 using a 204 No Content response.
CL overrun (commit 31c5bdff3)
consume_bytes/2 now has a clause that fires when byte_size(Chunk) >= N:
%% Lines 331-336
consume_bytes(
#parser_state{acc = Chunk, remaining_body_bytes = N} = Parser, ParsedAcc
) when is_binary(Chunk) andalso is_integer(N) andalso N > 0 andalso byte_size(Chunk) >= N ->
<<Data:N/binary, _Rest/binary>> = Chunk,
UpdatedParser = Parser#parser_state{state = done, acc = undefined, remaining_body_bytes = 0},
{ok, UpdatedParser, [done, {data, Data} | ParsedAcc]};This mirrors the chunked path's consume_chunk_data/2 split logic. Excess bytes are
discarded, which is safe since this client doesn't support pipelining.
Tested by test_content_length_overrun/0.
Duplicate CL (commit 077ef1b62)
apply_header_semantics/3 compares parsed integers, so "5" and "05" are treated
as the same value (safer than raw-binary comparison):
%% Lines 503-518
case Parser#parser_state.remaining_body_bytes of
undefined -> {ok, Parser#parser_state{remaining_body_bytes = N}};
N -> {ok, Parser}; % same value, accept per RFC 9112
_Other -> {error, {conflicting_content_length, Value}}
end;Tested by test_duplicate_content_length_same_value/0 and test_conflicting_content_length/0.
TE+CL coexistence (commit 050f1a69a)
Detection is deferred to end-of-headers, making header order irrelevant:
%% Lines 431-435
parse_line(
#parser_state{state = headers, body_encoding = chunked, remaining_body_bytes = N} = Parser,
<<>>
) when is_integer(N) ->
{error, Parser, {content_length_with_transfer_encoding, N}};Tested by test_content_length_and_transfer_encoding/0.
Non-Blocking Suggestions
1. trim_ows_right/1 — double reverse is correct but allocates more than necessary
The current implementation reverses the entire binary twice:
trim_ows_right(Bin) ->
binary_reverse(trim_ows_left(binary_reverse(Bin))).This is correct and bounded by the 16 KiB line cap, so it's not a blocker. For a
future micro-optimization on AtomVM's constrained runtime, a right-edge scan avoids
the copies:
-%% Trim HTTP OWS from the right.
-trim_ows_right(Bin) ->
- binary_reverse(trim_ows_left(binary_reverse(Bin))).
-
-binary_reverse(Bin) ->
- binary_reverse(Bin, <<>>).
-binary_reverse(<<C, Rest/binary>>, Acc) ->
- binary_reverse(Rest, <<C, Acc/binary>>);
-binary_reverse(<<>>, Acc) ->
- Acc.
+%% Trim HTTP OWS from the right.
+trim_ows_right(Bin) ->
+ trim_ows_right_scan(Bin, byte_size(Bin)).
+
+trim_ows_right_scan(_Bin, 0) ->
+ <<>>;
+trim_ows_right_scan(Bin, Len) ->
+ Pos = Len - 1,
+ case binary:at(Bin, Pos) of
+ C when C =:= $\s orelse C =:= $\t ->
+ trim_ows_right_scan(Bin, Pos);
+ _ ->
+ binary:part(Bin, 0, Len)
+ end.2. Missing test: obs-fold rejection in trailers
test_obs_fold_rejected/0 only tests folded lines in headers. The trailer path has
the same rejection logic (line 479-482) but no dedicated test:
%% Suggested test
test_obs_fold_rejected_trailer() ->
Segments = [
+ <<
+ "HTTP/1.1 200 OK\r\n"
+ "Transfer-Encoding: chunked\r\n\r\n"
+ "5\r\nHello\r\n"
+ "0\r\n"
+ "X-Trailer: value\r\n"
+ "\t continuation\r\n"
+ "\r\n"
+ >>
],
{ServerPid, Port} = start_chunked_server(Segments),
{ok, Conn} = ahttp_client:connect(http, "localhost", Port, [
{active, false}, {parse_headers, [<<"X-Trailer">>]}
]),
{ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
{error, {parser, {deprecated_obs_fold, <<"\t continuation">>}}} =
drain_until_error(Conn2),
ahttp_client:close(Conn2),
wait_server(ServerPid),
ok.This also covers the HTAB variant (the existing test only uses space).
3. Missing test: line cap boundary
The existing test_header_line_too_long/0 uses a 20,000 byte line. A boundary test at
exactly the limit would be more precise:
%% Suggested test
test_header_line_at_boundary() ->
%% 16384 bytes is exactly the limit — should be accepted.
ExactLine = binary:copy(<<$A>>, 16384 - byte_size(<<"X-Big: ">>)),
AcceptSegment = <<"HTTP/1.1 200 OK\r\nX-Big: ",
ExactLine/binary, "\r\nContent-Length: 0\r\n\r\n">>,
{ServerPid, Port} = start_chunked_server([AcceptSegment]),
{ok, Conn} = ahttp_client:connect(http, "localhost", Port, [{active, false}]),
{ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
Acc = loop_collect_passive(Conn2, #{}),
#{status := 200, done := true} = Acc,
wait_server(ServerPid),
%% 16385 bytes exceeds the limit — should be rejected.
OverLine = binary:copy(<<$A>>, 16385 - byte_size(<<"X-Big: ">>)),
RejectSegment = <<"HTTP/1.1 200 OK\r\nX-Big: ",
OverLine/binary, "\r\nContent-Length: 0\r\n\r\n">>,
{ServerPid2, Port2} = start_chunked_server([RejectSegment]),
{ok, Conn3} = ahttp_client:connect(http, "localhost", Port2, [{active, false}]),
{ok, Conn4, _Ref2} = ahttp_client:request(Conn3, <<"GET">>, <<"/">>, [], undefined),
{error, {parser, {line_too_long, _Prefix}}} = drain_until_error(Conn4),
ahttp_client:close(Conn4),
wait_server(ServerPid2),
ok.4. Missing test: TE before CL header order
test_content_length_and_transfer_encoding/0 sends Content-Length first, then
Transfer-Encoding. The reverse order should also be rejected:
%% Suggested test
test_transfer_encoding_before_content_length() ->
Segments = [
<<
"HTTP/1.1 200 OK\r\n"
"Transfer-Encoding: chunked\r\n"
"Content-Length: 5\r\n\r\n"
"5\r\nHello\r\n0\r\n\r\n"
>>
],
{ServerPid, Port} = start_chunked_server(Segments),
{ok, Conn} = ahttp_client:connect(http, "localhost", Port, [{active, false}]),
{ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
{error, {parser, {content_length_with_transfer_encoding, _}}} =
ahttp_client:recv(Conn2, 0),
ahttp_client:close(Conn2),
wait_server(ServerPid),
ok.Test Coverage Summary
22 new test cases added, covering:
| Category | Tests | Status |
|---|---|---|
| Chunked basic (passive/active) | 2 | ✅ |
| Chunked split across segments | 1 | ✅ |
| Chunk extensions | 1 | ✅ |
| Trailers (with framing filtering) | 2 | ✅ |
| Bad Transfer-Encoding (3 variants) | 3 | ✅ |
| TE + CL coexistence | 1 | ✅ |
| Obs-fold rejection (headers) | 1 | ✅ |
| Header line too long | 1 | ✅ |
| Bad Content-Length (non-numeric, negative) | 2 | ✅ |
| Duplicate CL (same, conflicting) | 2 | ✅ |
| Content-Length: 0 | 1 | ✅ |
| Content-Length overrun | 1 | ✅ |
| Empty header value | 1 | ✅ |
| Chunked truncated | 1 | ✅ |
| Active-mode close | 1 | ✅ |
| Passive-mode close | 1 | ✅ |
Optional additions (non-blocking):
- ❌ Obs-fold rejection in trailers / with HTAB
- ❌ Line cap boundary (exactly 16384 vs 16385)
- ❌ TE-before-CL header order
Final Assessment
The PR is well-structured across 12 focused commits with clear separation between the
feature (chunked encoding), targeted bug fixes, hardening (line cap, obs-fold removal),
and documentation. Each commit has a descriptive message with rationale.
The chunked state machine is correct. All three high-severity pre-existing bugs in the
fixed-length path have been fixed. Error reporting is consistent under the {error, {parser, Reason}} namespace. The CHANGELOG entries are accurate and well-organized under
Added/Changed/Removed/Fixed sections.
No blocking issues found. Ready to merge.
Parse `Transfer-Encoding: chunked` response bodies so callers can
consume responses from servers that stream without a `Content-Length`
header. The parser now emits `trailer_header` and
`trailer_header_continuation` responses, and surfaces malformed or
unsupported encodings as `{error, {parser, Reason}}`.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Non-numeric or negative Content-Length values previously crashed the parser process with badarg from binary_to_integer/1. Surface them as a parser error so callers can recover; per RFC 9112 §6.3 the field value is 1*DIGIT, anything else is malformed framing. Signed-off-by: Davide Bettio <davide@uninstall.it>
A header whose value is empty or all-whitespace (e.g. "X-Foo: \r\n") reached trim_right_spaces/2 with Count = 0 and raised badarg on a negative-length binary pattern, crashing the parser process. Signed-off-by: Davide Bettio <davide@uninstall.it>
Premature socket close used to be indistinguishable from a normal
end of response: stream/2 returned ok-closed, recv/2 returned the
backend-closed tuple, and ssl_closed had no handler at all.
RFC 9112 §6.3 requires incomplete messages to be treated as an
error; return {error, {parser, incomplete_response}} when the
parser is still inside headers, a Content-Length body with bytes
remaining, or any chunked state. Close after a complete response
and HTTP/1.0 close-delimited bodies stay on the normal path.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Conflicting Content-Length values were silently accepted with the
last one winning, a classic request-smuggling primitive. Per
RFC 9112 §6.3 such a message is unrecoverable; accept duplicates
only when values match, otherwise return
{error, {parser, {conflicting_content_length, V}}}.
Signed-off-by: Davide Bettio <davide@uninstall.it>
A Content-Length: 0 response left the parser in body state forever: the transition to done only fired from consume_bytes after processing bytes, which never ran for an empty body. Passive callers hung and truncation detection misflagged a legitimate close as incomplete_response. Transition to done at end-of-headers when remaining_body_bytes is 0, matching the message-completion rule in RFC 9112 §6.3. Signed-off-by: Davide Bettio <davide@uninstall.it>
consume_bytes/2 emitted the entire buffered chunk without bounding it by remaining_body_bytes, handing bytes past Content-Length to the caller as body and stalling the parser. Per RFC 9112 §6.3, bytes beyond Content-Length are not part of the current message: cap emission at the promised count, discard the rest (this client does not pipeline), and transition to done. Signed-off-by: Davide Bettio <davide@uninstall.it>
{data, Ref, Binary} responses carry sub-binaries of the socket
receive buffer, not copies. Piping chunks straight into the
next parser is the common case and fine; callers that retain a
chunk beyond the current processing step should binary:copy/1
first so the underlying receive buffer can be released.
Signed-off-by: Davide Bettio <davide@uninstall.it>
RFC 9112 §5.2 deprecated line folding and different recipients
merge continuations differently (with or without the required
SP octet), making obs-fold a known header-smuggling primitive.
Reject folded header and trailer lines with
{error, {parser, {deprecated_obs_fold, Line}}} and drop the
header_continuation and trailer_header_continuation response
events, no in-tree caller relied on them, and the API grammar
is materially smaller without them.
Signed-off-by: Davide Bettio <davide@uninstall.it>
consume_lines/2 appended incoming data to acc without bound, so
a hostile peer could pressure memory by streaming a long status,
header or chunk-size line that never closed with CRLF. Cap each
parsed line at 16 KiB; a longer line returns
{error, {parser, {line_too_long, Prefix}}}. Not RFC-mandated;
matches the defaults used by nginx and Apache.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Replace the Count-based trim_left_spaces/2 and trim_right_spaces/2 with the idiomatic recursive byte-wise pattern. Rename to match the HTTP spec term (RFC 9110 §5.6.3: OWS = *( SP / HTAB )). Same observable behavior; smaller and faster on BEAM through match- context reuse. Stays byte-wise rather than calling string:trim/3 because HTTP field values are not UTF-8 (RFC 9110 §5.5 admits obs-text = %x80-FF) and OTP string functions require valid unicode:chardata. Signed-off-by: Davide Bettio <davide@uninstall.it>
The active-mode stream/2 returns {ok, Conn, closed} on a normal
peer close; the passive-mode recv/2 returns
{error, {SocketType, closed}} in the same situation. This
asymmetry is intentional: active mode surfaces end-of-stream as
a distinct ok marker so a receive loop has something to match
against, and passive mode inherits the native gen_tcp:recv/2 /
ssl:recv/2 contract so callers can keep their case expression a
single match on {ok, _, _}. Both paths return
{error, {parser, incomplete_response}} when the close is
premature.
Document the shapes on both functions so the asymmetry is not
mistaken for a bug, and pin the behaviour with test_active_close
and test_passive_close.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Parse
Transfer-Encoding: chunkedresponse bodies so callers canconsume servers that stream without a
Content-Lengthheader. Theseries also lands the review-driven robustness, framing-safety and
documentation fixes to
ahttp_clientthat the chunked featureexposed.
Highlights:
Content-Lengthand empty headervalues.
{error, {parser, unexpected_eof}};ssl_closedis handled.Content-Length,CL + Transfer-Encoding,stacked
Transfer-Encoding, and obs-fold (RFC 9110 §6.5.2,RFC 9112 §5.2, §6.1, §6.3).
{data, _, _}binary retention and the asymmetric closeshapes of
stream/2vsrecv/2.These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later