Skip to content

ahttp_client.erl: Add chunked transfer encoding#2272

Open
bettio wants to merge 12 commits intoatomvm:release-0.7from
bettio:http-chunked
Open

ahttp_client.erl: Add chunked transfer encoding#2272
bettio wants to merge 12 commits intoatomvm:release-0.7from
bettio:http-chunked

Conversation

@bettio
Copy link
Copy Markdown
Collaborator

@bettio bettio commented Apr 16, 2026

Parse Transfer-Encoding: chunked response bodies so callers can
consume servers that stream without a Content-Length header. The
series also lands the review-driven robustness, framing-safety and
documentation fixes to ahttp_client that the chunked feature
exposed.

Highlights:

  • Chunked responses with HTTP trailers (RFC 9112 §7.1).
  • Parser crash fixes on malformed Content-Length and empty header
    values.
  • Premature close now returns {error, {parser, unexpected_eof}};
    ssl_closed is handled.
  • Reject conflicting Content-Length, CL + Transfer-Encoding,
    stacked Transfer-Encoding, and obs-fold (RFC 9110 §6.5.2,
    RFC 9112 §5.2, §6.1, §6.3).
  • Cap parsed status, header and chunk-size lines at 16 KiB.
  • Document {data, _, _} binary retention and the asymmetric close
    shapes of stream/2 vs recv/2.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 17, 2026

Amp:

PR Review: ahttp_client.erl – Chunked Transfer Encoding

Commit: 3b09f76ahttp_client.erl: Add chunked transfer encoding
Files changed: ahttp_client.erl, test_ahttp_client.erl, CHANGELOG.md
Reviewer: Amp (oracle-assisted)


Summary

Good, well-structured addition. The chunked state machine (chunked_size → chunked_body → chunked_crlf → chunked_size, and 0 → chunked_trailers → done) is correctly modelled. Tests cover happy-path passive, active, split-across-segments, extensions, trailers, and bad TE header. Solid foundation.

However, there are several issues to address before merging.


🔴 Must-Fix Issues

1. te_is_chunked/1 is too permissive — accepts gzip, chunked

File: ahttp_client.erl:469-477

The function checks only whether the last comma-separated token is "chunked". Per RFC 7230, Transfer-Encoding: gzip, chunked means the body is first gzip-encoded, then chunked. This client only strips chunked framing — it does not decompress gzip. Accepting this silently delivers garbled body data to the caller.

Current code:

te_is_chunked(Value) ->
    Tokens = binary:split(Value, <<",">>, [global]),
    Last0 = lists:last(Tokens),
    LTrim = trim_left_spaces(Last0, 0),
    Last = trim_right_spaces(LTrim, byte_size(LTrim)),
    case byte_size(Last) =:= byte_size(<<"chunked">>) of
        true -> icmp(Last, <<"chunked">>);
        false -> false
    end.

Suggested fix — only accept exactly chunked (with optional whitespace):

te_is_chunked(Value) ->
    LTrim = trim_left_spaces(Value, 0),
    Trimmed = trim_right_spaces(LTrim, byte_size(LTrim)),
    case byte_size(Trimmed) =:= byte_size(<<"chunked">>) of
        true -> icmp(Trimmed, <<"chunked">>);
        false -> false
    end.

Add a test:

test_bad_transfer_encoding_stacked() ->
    Segments = [
        <<"HTTP/1.1 200 OK\r\n"
          "Transfer-Encoding: gzip, chunked\r\n\r\n"
          "whatever">>
    ],
    {ServerPid, Port} = start_chunked_server(Segments),
    {ok, Conn} = ahttp_client:connect(http, "localhost", Port, [{active, false}]),
    {ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
    {error, {parser, {unsupported_transfer_encoding, <<"gzip, chunked">>}}} =
        ahttp_client:recv(Conn2, 0),
    ahttp_client:close(Conn2),
    wait_server(ServerPid),
    ok.

2. Content-Length parsing can crash on malformed values

File: ahttp_client.erl:459-460

binary_to_integer/1 will throw badarg on non-numeric input (e.g. Content-Length: abc), crashing the caller process instead of returning a parser error.

Current code:

apply_header_semantics(<<"Content-Length">>, Value, Parser) ->
    {ok, Parser#parser_state{remaining_body_bytes = binary_to_integer(Value)}};

Suggested fix:

apply_header_semantics(<<"Content-Length">>, Value, Parser) ->
    try binary_to_integer(Value) of
        N when N >= 0 ->
            {ok, Parser#parser_state{remaining_body_bytes = N}};
        _ ->
            {error, {invalid_content_length, Value}}
    catch
        error:badarg -> {error, {invalid_content_length, Value}}
    end;

3. Premature socket close during chunked parsing returns {ok, Conn, closed} instead of error

File: ahttp_client.erl:265-266 (stream/2) and 577-580 (recv/2)

If the peer closes the connection mid-chunk, the caller receives {ok, Conn, closed} (active) or {error, {gen_tcp, closed}} (passive). For chunked responses, this should be a parser error — the response is incomplete.

Current code (active mode):

stream(#http_client{socket = {gen_tcp, TSocket}} = Conn, {tcp_closed, TSocket}) ->
    {ok, Conn, closed};

Suggested fix:

stream(#http_client{socket = {gen_tcp, TSocket}, parser = Parser} = Conn, {tcp_closed, TSocket}) ->
    case Parser#parser_state.state of
        done -> {ok, Conn, closed};
        undefined -> {ok, Conn, closed};
        _ -> {error, {parser, unexpected_eof}}
    end;

Similarly for the ssl ssl_closed case (if it exists or is added), and in recv/2 when socket_recv returns {error, closed}:

recv(#http_client{socket = {SocketType, _}, parser = Parser} = Conn, Len) ->
    case socket_recv(Conn, Len) of
        {ok, Data} -> stream_data(Conn, Data);
        {error, closed} ->
            case Parser#parser_state.state of
                done -> {ok, Conn, []};
                undefined -> {error, {SocketType, closed}};
                _ -> {error, {parser, unexpected_eof}}
            end;
        {error, Reason} -> {error, {SocketType, Reason}}
    end.

4. trim_right_spaces/2 crashes on empty binary

File: ahttp_client.erl:487-494

trim_right_spaces(<<>>, 0) will fail because Len = Count - 1 becomes -1, and the binary match <<_LBin:-1/binary, ...>> is invalid. This can be hit when a header has an empty or all-whitespace value.

Current code:

trim_right_spaces(Bin, Count) ->
    Len = Count - 1,
    case Bin of
        <<_LBin:Len/binary, C, _TrimmedSpaces/binary>> when C == $\s orelse C == $\t ->
            trim_right_spaces(Bin, Count - 1);
        <<LBin:Count/binary, _TrimmedSpaces/binary>> ->
            LBin
    end.

Suggested fix — add a base case:

trim_right_spaces(_Bin, 0) ->
    <<>>;
trim_right_spaces(Bin, Count) ->
    Len = Count - 1,
    case Bin of
        <<_LBin:Len/binary, C, _TrimmedSpaces/binary>> when C == $\s orelse C == $\t ->
            trim_right_spaces(Bin, Count - 1);
        <<LBin:Count/binary, _TrimmedSpaces/binary>> ->
            LBin
    end.

🟡 Minor / Improvement Suggestions

5. Trailers can expose forbidden framing headers

RFC 7230 §4.1.2 forbids Transfer-Encoding, Content-Length, and a few others in trailers. Because ?DEFAULT_WANTED_HEADERS includes both of these, the parser will currently surface them as trailer_header events if a server sends them. Consider filtering them out during trailer parsing, or at least documenting the limitation.

6. Missing ssl_closed handler in stream/2

There is a handler for {tcp_closed, TSocket} but no matching clause for {ssl_closed, SSLSocket}. If an SSL connection closes, it will fall through to unknown instead of reporting the close.

7. Test helper lists:append/2 in accumulate/2 for trailers

accumulate([{trailer_header, _, KV} | T], Acc) ->
    Ts = maps:get(trailers, Acc, []),
    accumulate(T, Acc#{trailers => lists:append(Ts, [KV])});

This is O(n²) for many trailers. Not a real issue in practice, but [KV | Ts] with a final lists:reverse/1 would be more idiomatic. (Test-only, very low priority.)


✅ What Looks Good

  • State machine design is clean and correct for the happy path
  • consume_chunk_data/2 correctly handles bodies split across TCP segments (partial chunk, exact chunk, chunk-plus-overflow)
  • Chunk extension parsing (binary:split(Line, <<";">>)) correctly strips extensions per RFC
  • dispatch_feed/2 refactor properly propagates parser errors to callers
  • Test coverage of split-across-segments, extensions, trailers, and bad TE is excellent for a first pass
  • CHANGELOG entry is clear and accurate

Test Coverage Gaps

Missing test cases that should be added:

Scenario Expected
Transfer-Encoding: gzip, chunked {error, {parser, {unsupported_transfer_encoding, _}}}
Invalid chunk size (e.g. "ZZ\r\n") {error, {parser, {invalid_chunk_size, _}}}
Truncated chunk body (server closes mid-chunk) {error, {parser, unexpected_eof}}
Malformed Content-Length (e.g. "abc") {error, {parser, {invalid_content_length, _}}}
Empty Transfer-Encoding value {error, {parser, {unsupported_transfer_encoding, _}}}
Missing final 0\r\n\r\n (connection closes) {error, {parser, unexpected_eof}}

Review generated: 2026-04-17

@bettio
Copy link
Copy Markdown
Collaborator Author

bettio commented Apr 20, 2026

Amp:

PR Review: ahttp_client.erl – Chunked Transfer Encoding

I fixed all the issues mentioned here.

@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 20, 2026

AMP: it seems confused re the "chunked" - pick and choose..

PR Review: ahttp_client chunked encoding & bug fixes

Commits reviewed:

  • ac3e9e58a — Add chunked transfer encoding
  • 75a23d8ea — Fix crash on bad Content-Length
  • 2efe67d8c — Fix empty header value crash
  • 5b1c120a9 — Fix silent truncation on close

Overall: The chunked transfer encoding implementation is solid — the state machine
(chunked_size → chunked_body → chunked_crlf → chunked_trailers → done) is correct,
segment-split handling works, and trailer separation from headers is RFC-sound. The three
bug-fix commits are good targeted patches. However, the new truncation detection exposes
some pre-existing issues in the fixed-length body path.


High Severity

1. Content-Length: 0 responses never complete

When a response has Content-Length: 0, the parser transitions to body state but with
no bytes buffered it never emits done. It stays in body with remaining_body_bytes = 0.

Effects:

  • Passive mode hangs waiting for data that will never come
  • If the server closes, is_parser_truncated/1 incorrectly returns true, reporting
    {error, {parser, unexpected_eof}} for a valid complete response

Suggested fix — handle zero remaining bytes at end-of-headers:

+parse_line(#parser_state{state = headers, body_encoding = chunked} = Parser, <<>>) ->
+    {ok, Parser#parser_state{state = chunked_size, last_header = undefined}};
+parse_line(#parser_state{state = headers, remaining_body_bytes = 0} = Parser, <<>>) ->
+    {ok, Parser#parser_state{state = done, last_header = undefined}, done};
 parse_line(#parser_state{state = headers} = Parser, <<>>) ->
     {consume_bytes, Parser#parser_state{state = body}};

Note: the chunked clause must remain first since chunked encoding takes precedence.


2. Fixed-length bodies can overrun Content-Length

consume_bytes/2 emits the entire buffered chunk as body data without capping at the
remaining byte count. If the socket delivers more than Content-Length bytes, extra data
is emitted as body and remaining_body_bytes can go negative.

The chunked path already handles this correctly via <<Data:N/binary, Rest/binary>> in
consume_chunk_data/2.

Suggested fix — mirror the chunked split logic in consume_bytes/2:

 consume_bytes(#parser_state{acc = undefined} = Parser, ParsedAcc) ->
     {ok, Parser, ParsedAcc};
+consume_bytes(#parser_state{acc = Chunk, remaining_body_bytes = N} = Parser, ParsedAcc)
+  when is_binary(Chunk) andalso is_integer(N) andalso byte_size(Chunk) >= N andalso N > 0 ->
+    <<Data:N/binary, _Rest/binary>> = Chunk,
+    UpdatedParser = Parser#parser_state{state = done, acc = undefined, remaining_body_bytes = 0},
+    {ok, UpdatedParser, [{data, Data}, done | ParsedAcc]};
 consume_bytes(#parser_state{acc = Chunk} = Parser, ParsedAcc) when is_binary(Chunk) ->
     ...

Since this client does not support pipelining, discarding Rest and marking done is the
safest behavior.


3. Duplicate/conflicting Content-Length is silently accepted

If multiple Content-Length headers appear, the last one silently wins. Per RFC 9112 §6.3,
a message with conflicting Content-Length values is unrecoverable.

Suggested fix — reject conflicting values in apply_header_semantics/3:

 apply_header_semantics(<<"Content-Length">>, Value, Parser) ->
     try binary_to_integer(Value) of
         N when N >= 0 ->
-            {ok, Parser#parser_state{remaining_body_bytes = N}};
+            case Parser#parser_state.remaining_body_bytes of
+                undefined ->
+                    {ok, Parser#parser_state{remaining_body_bytes = N}};
+                N ->
+                    {ok, Parser};  % same value, accept per RFC 9112
+                _Other ->
+                    {error, {conflicting_content_length, Value}}
+            end;
         _ ->
             {error, {invalid_content_length, Value}}
     catch
         error:badarg -> {error, {invalid_content_length, Value}}
     end;

Medium Severity

4. te_is_chunked/1 rejects valid stacked transfer-codings

te_is_chunked/1 only accepts an exact case-insensitive match for "chunked". Per
RFC 9112 §6.1, Transfer-Encoding: gzip, chunked is valid when chunked is the final
coding.

This is acceptable for an embedded client if documented. The test
test_bad_transfer_encoding_stacked currently asserts this is an error, which is
consistent with the implementation but not with RFC.

Recommendation: Either:

  • Simple path: Keep current behavior but add a comment/doc note:
    %% Only bare "Transfer-Encoding: chunked" is supported.
  • Better RFC path: Parse the TE list and accept when the final coding is chunked.

5. No policy for Transfer-Encoding + Content-Length coexistence

When both headers are present, behavior depends on header order — there is no explicit
policy. Per RFC 9112, Transfer-Encoding overrides Content-Length for framing.

Recommendation: Pick one:

  • Strict: Reject responses containing both (safest)
  • Pragmatic: Ignore Content-Length when Transfer-Encoding: chunked is present

Either way, add a test for the chosen behavior.


6. Header continuation (folded lines) is permissive

The parser treats whitespace-prefixed lines as continuations of the previous header field.
RFC 9112 §2.2 says recipients should reject or ignore obs-fold lines rather than interpret
them. Permissive folding has a history of enabling smuggling/splitting attacks.

Recommendation: If broad compatibility is not required, consider rejecting
whitespace-prefixed lines instead of folding them. At minimum, do not expand continuation
behavior further.


7. Trailer denylist is narrower than the RFC comment implies

is_forbidden_trailer_field/1 only blocks Content-Length and Transfer-Encoding. The
comment references RFC 9110 §6.5.2, but the actual forbidden list in RFC 9110 §6.5.1 is
larger (includes Host, Content-Encoding, authentication fields, etc.).

The current approach is sufficient for framing safety since trailers are kept separate from
headers and parse_headers is opt-in. The comment just overstates compliance.

Suggested fix — clarify the comment:

-%% Framing-affecting fields forbidden in trailers per RFC 9110 §6.5.2.
+%% Framing-affecting fields filtered from trailers (subset of RFC 9110 §6.5.1).
 is_forbidden_trailer_field(<<"Content-Length">>) -> true;
 is_forbidden_trailer_field(<<"Transfer-Encoding">>) -> true;
 is_forbidden_trailer_field(_) -> false.

Low Severity

8. Sub-binary retention in consume_chunk_data/2

The split <<Data:N/binary, Rest/binary>> = Chunk creates sub-binaries that keep the
parent binary alive. On a memory-constrained embedded target this could increase retention
if callers hold small data chunks. Not a correctness issue — use binary:copy/1 if it
becomes a problem.

9. No size caps on header/chunk-size line accumulation

The parser accumulates arbitrarily long header lines or chunk-size lines before finding
\r\n. If this client is exposed to untrusted peers, a malicious server could cause
unbounded memory growth. Consider adding conservative caps (e.g. 8 KB per header line).

10. Asymmetric active vs passive close behavior

  • Active close: stream/2 returns {ok, Conn, closed}
  • Passive close: recv/2 returns {error, {SocketType, closed}}

This is fine if intentional, but worth documenting and testing both paths explicitly.


Test Coverage Assessment

The new tests are well-structured and cover the main chunked scenarios:

  • ✅ Basic passive/active chunked
  • ✅ Chunk split across TCP segments
  • ✅ Chunk extensions
  • ✅ Trailers (including framing-field filtering)
  • ✅ Bad transfer encodings (3 variants)
  • ✅ Bad Content-Length (non-numeric, negative)
  • ✅ Empty header values
  • ✅ Truncated chunked response

Missing tests:

  • Content-Length: 0 response
  • ❌ Body overrun past Content-Length
  • ❌ Duplicate Content-Length (same value and conflicting)
  • Transfer-Encoding: chunked + Content-Length together
  • ❌ Active-mode tcp_closed / ssl_closed truncation
  • ❌ Empty header value actually returned as <<>>

Summary

# Issue Severity Fix size
1 Content-Length: 0 hangs / false truncation High ~3 lines
2 Fixed-length body overrun past CL High ~6 lines
3 Duplicate Content-Length silently accepted High ~8 lines
4 Stacked TE rejected (gzip, chunked) Medium Doc or ~15 lines
5 TE + CL coexistence undefined Medium ~5 lines + test
6 Obs-fold header continuation too permissive Medium Design decision
7 Trailer denylist comment overstates RFC scope Medium Comment fix
8 Sub-binary retention on embedded Low Monitor
9 No header line size caps Low ~5 lines
10 Asymmetric close behavior Low Doc

false ->
loop_collect_active(UpdatedConn, NewAcc)
end;
{ok, _Conn, closed} ->
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix the warning here...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread libs/avm_network/src/ahttp_client.erl Outdated
is_forbidden_trailer_field(<<"Transfer-Encoding">>) -> true;
is_forbidden_trailer_field(_) -> false.

trim_left_spaces(Bin, Count) ->
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see that our string:trim/2 is incomplete (spec is wrong).

This implementation is slower and bigger on both BEAM and AtomVM than a naive recursive implementation, at least when compiled with OTP 28:

trim_recursive_left(<<C, Tail/binary>>) when C =:= $\t orelse C =:= $\s ->
    trim_recursive_left(Tail);
trim_recursive_left(Other) ->
    Other.

And we may want to factor this into string:trim/3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string:trim/3 cannot be used with http, e.g. it trims out characters that should not be trimmed according http spec.
I have a plan for future fixes outside this PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean string:trim/2 ? I think string:trim/3 can be passed the characters we want to trim.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still means assuming that the binary we are going to trim is utf8, that is not a possible assumption with http.

@bettio
Copy link
Copy Markdown
Collaborator Author

bettio commented Apr 21, 2026

PR Review: ahttp_client chunked encoding & bug fixes

All the issues in the second AMP round are fixed.

Copy link
Copy Markdown
Contributor

@petermm petermm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions, might be worth doing at least #1?

Final PR Review: ahttp_client chunked encoding & hardening

Status: ✅ APPROVED

Commits reviewed (12):

# Commit Description
1 fbbd02141 Add chunked transfer encoding
2 226b09bf2 Fix crash on bad Content-Length
3 cd2f9b695 Fix empty header value crash
4 5f23c65e3 Fix silent truncation on close
5 077ef1b62 Fix Content-Length conflict
6 050f1a69a Fix hang on empty response body
7 31c5bdff3 Fix Content-Length overrun
8 1068dfc0f Document data binary retention
9 8427d31ea Remove obs-fold support
10 aad449849 Add 16 KiB parsed line cap
11 2c542cd79 Refactor OWS trim helpers
12 24a1a4c35 Document close return shapes

Previous Review Issues — All Resolved

# Issue Severity Resolution
1 Content-Length: 0 hangs / false truncation High ✅ Fixed in 050f1a69a — emits done at end-of-headers
2 Fixed-length body overrun past CL High ✅ Fixed in 31c5bdff3 — splits at N bytes, discards excess
3 Duplicate Content-Length silently accepted High ✅ Fixed in 077ef1b62 — same value accepted, conflict rejected
4 Stacked TE rejected (gzip, chunked) Medium ✅ Documented as intentional in te_is_chunked/1 comment
5 TE + CL coexistence undefined Medium ✅ Strict reject at end-of-headers (content_length_with_transfer_encoding)
6 Obs-fold header continuation too permissive Medium ✅ Removed in 8427d31ea — returns deprecated_obs_fold error
7 Trailer denylist comment overstates RFC scope Medium ✅ Comment updated to "subset of RFC 9110 §6.5.1"
8 Sub-binary retention on embedded Low ✅ Documented in stream/2 edoc
9 No header line size caps Low ✅ 16 KiB cap in aad449849 with line_too_long error
10 Asymmetric close behavior Low ✅ Documented in both stream/2 and recv/2 edoc

Verification of Fixes

CL:0 completion (commit 050f1a69a)

The end-of-headers clause ordering is correct. When remaining_body_bytes = 0, the parser
transitions straight to done instead of entering body state:

%% Lines 431-440: clause ordering matters
parse_line(... body_encoding = chunked, remaining_body_bytes = N ..., <<>>) when is_integer(N) ->
    {error, ...};                                    % TE+CL conflict — checked first
parse_line(... body_encoding = chunked ..., <<>>) ->
    {ok, ... chunked_size ...};                      % pure chunked
parse_line(... remaining_body_bytes = 0 ..., <<>>) ->
    {ok, ... done ..., done};                        % CL:0 — emits done immediately
parse_line(... headers ..., <<>>) ->
    {consume_bytes, ... body ...}.                   % normal CL body

Tested by test_content_length_zero/0 using a 204 No Content response.

CL overrun (commit 31c5bdff3)

consume_bytes/2 now has a clause that fires when byte_size(Chunk) >= N:

%% Lines 331-336
consume_bytes(
    #parser_state{acc = Chunk, remaining_body_bytes = N} = Parser, ParsedAcc
) when is_binary(Chunk) andalso is_integer(N) andalso N > 0 andalso byte_size(Chunk) >= N ->
    <<Data:N/binary, _Rest/binary>> = Chunk,
    UpdatedParser = Parser#parser_state{state = done, acc = undefined, remaining_body_bytes = 0},
    {ok, UpdatedParser, [done, {data, Data} | ParsedAcc]};

This mirrors the chunked path's consume_chunk_data/2 split logic. Excess bytes are
discarded, which is safe since this client doesn't support pipelining.

Tested by test_content_length_overrun/0.

Duplicate CL (commit 077ef1b62)

apply_header_semantics/3 compares parsed integers, so "5" and "05" are treated
as the same value (safer than raw-binary comparison):

%% Lines 503-518
case Parser#parser_state.remaining_body_bytes of
    undefined -> {ok, Parser#parser_state{remaining_body_bytes = N}};
    N         -> {ok, Parser};                           % same value, accept per RFC 9112
    _Other    -> {error, {conflicting_content_length, Value}}
end;

Tested by test_duplicate_content_length_same_value/0 and test_conflicting_content_length/0.

TE+CL coexistence (commit 050f1a69a)

Detection is deferred to end-of-headers, making header order irrelevant:

%% Lines 431-435
parse_line(
    #parser_state{state = headers, body_encoding = chunked, remaining_body_bytes = N} = Parser,
    <<>>
) when is_integer(N) ->
    {error, Parser, {content_length_with_transfer_encoding, N}};

Tested by test_content_length_and_transfer_encoding/0.


Non-Blocking Suggestions

1. trim_ows_right/1 — double reverse is correct but allocates more than necessary

The current implementation reverses the entire binary twice:

trim_ows_right(Bin) ->
    binary_reverse(trim_ows_left(binary_reverse(Bin))).

This is correct and bounded by the 16 KiB line cap, so it's not a blocker. For a
future micro-optimization on AtomVM's constrained runtime, a right-edge scan avoids
the copies:

-%% Trim HTTP OWS from the right.
-trim_ows_right(Bin) ->
-    binary_reverse(trim_ows_left(binary_reverse(Bin))).
-
-binary_reverse(Bin) ->
-    binary_reverse(Bin, <<>>).
-binary_reverse(<<C, Rest/binary>>, Acc) ->
-    binary_reverse(Rest, <<C, Acc/binary>>);
-binary_reverse(<<>>, Acc) ->
-    Acc.
+%% Trim HTTP OWS from the right.
+trim_ows_right(Bin) ->
+    trim_ows_right_scan(Bin, byte_size(Bin)).
+
+trim_ows_right_scan(_Bin, 0) ->
+    <<>>;
+trim_ows_right_scan(Bin, Len) ->
+    Pos = Len - 1,
+    case binary:at(Bin, Pos) of
+        C when C =:= $\s orelse C =:= $\t ->
+            trim_ows_right_scan(Bin, Pos);
+        _ ->
+            binary:part(Bin, 0, Len)
+    end.

2. Missing test: obs-fold rejection in trailers

test_obs_fold_rejected/0 only tests folded lines in headers. The trailer path has
the same rejection logic (line 479-482) but no dedicated test:

%% Suggested test
test_obs_fold_rejected_trailer() ->
    Segments = [
+        <<
+            "HTTP/1.1 200 OK\r\n"
+            "Transfer-Encoding: chunked\r\n\r\n"
+            "5\r\nHello\r\n"
+            "0\r\n"
+            "X-Trailer: value\r\n"
+            "\t continuation\r\n"
+            "\r\n"
+        >>
    ],
    {ServerPid, Port} = start_chunked_server(Segments),
    {ok, Conn} = ahttp_client:connect(http, "localhost", Port, [
        {active, false}, {parse_headers, [<<"X-Trailer">>]}
    ]),
    {ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
    {error, {parser, {deprecated_obs_fold, <<"\t continuation">>}}} =
        drain_until_error(Conn2),
    ahttp_client:close(Conn2),
    wait_server(ServerPid),
    ok.

This also covers the HTAB variant (the existing test only uses space).

3. Missing test: line cap boundary

The existing test_header_line_too_long/0 uses a 20,000 byte line. A boundary test at
exactly the limit would be more precise:

%% Suggested test
test_header_line_at_boundary() ->
    %% 16384 bytes is exactly the limit — should be accepted.
    ExactLine = binary:copy(<<$A>>, 16384 - byte_size(<<"X-Big: ">>)),
    AcceptSegment = <<"HTTP/1.1 200 OK\r\nX-Big: ",
        ExactLine/binary, "\r\nContent-Length: 0\r\n\r\n">>,
    {ServerPid, Port} = start_chunked_server([AcceptSegment]),
    {ok, Conn} = ahttp_client:connect(http, "localhost", Port, [{active, false}]),
    {ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
    Acc = loop_collect_passive(Conn2, #{}),
    #{status := 200, done := true} = Acc,
    wait_server(ServerPid),

    %% 16385 bytes exceeds the limit — should be rejected.
    OverLine = binary:copy(<<$A>>, 16385 - byte_size(<<"X-Big: ">>)),
    RejectSegment = <<"HTTP/1.1 200 OK\r\nX-Big: ",
        OverLine/binary, "\r\nContent-Length: 0\r\n\r\n">>,
    {ServerPid2, Port2} = start_chunked_server([RejectSegment]),
    {ok, Conn3} = ahttp_client:connect(http, "localhost", Port2, [{active, false}]),
    {ok, Conn4, _Ref2} = ahttp_client:request(Conn3, <<"GET">>, <<"/">>, [], undefined),
    {error, {parser, {line_too_long, _Prefix}}} = drain_until_error(Conn4),
    ahttp_client:close(Conn4),
    wait_server(ServerPid2),
    ok.

4. Missing test: TE before CL header order

test_content_length_and_transfer_encoding/0 sends Content-Length first, then
Transfer-Encoding. The reverse order should also be rejected:

%% Suggested test
test_transfer_encoding_before_content_length() ->
    Segments = [
        <<
            "HTTP/1.1 200 OK\r\n"
            "Transfer-Encoding: chunked\r\n"
            "Content-Length: 5\r\n\r\n"
            "5\r\nHello\r\n0\r\n\r\n"
        >>
    ],
    {ServerPid, Port} = start_chunked_server(Segments),
    {ok, Conn} = ahttp_client:connect(http, "localhost", Port, [{active, false}]),
    {ok, Conn2, _Ref} = ahttp_client:request(Conn, <<"GET">>, <<"/">>, [], undefined),
    {error, {parser, {content_length_with_transfer_encoding, _}}} =
        ahttp_client:recv(Conn2, 0),
    ahttp_client:close(Conn2),
    wait_server(ServerPid),
    ok.

Test Coverage Summary

22 new test cases added, covering:

Category Tests Status
Chunked basic (passive/active) 2
Chunked split across segments 1
Chunk extensions 1
Trailers (with framing filtering) 2
Bad Transfer-Encoding (3 variants) 3
TE + CL coexistence 1
Obs-fold rejection (headers) 1
Header line too long 1
Bad Content-Length (non-numeric, negative) 2
Duplicate CL (same, conflicting) 2
Content-Length: 0 1
Content-Length overrun 1
Empty header value 1
Chunked truncated 1
Active-mode close 1
Passive-mode close 1

Optional additions (non-blocking):

  • ❌ Obs-fold rejection in trailers / with HTAB
  • ❌ Line cap boundary (exactly 16384 vs 16385)
  • ❌ TE-before-CL header order

Final Assessment

The PR is well-structured across 12 focused commits with clear separation between the
feature (chunked encoding), targeted bug fixes, hardening (line cap, obs-fold removal),
and documentation. Each commit has a descriptive message with rationale.

The chunked state machine is correct. All three high-severity pre-existing bugs in the
fixed-length path have been fixed. Error reporting is consistent under the {error, {parser, Reason}} namespace. The CHANGELOG entries are accurate and well-organized under
Added/Changed/Removed/Fixed sections.

No blocking issues found. Ready to merge.

bettio added 12 commits April 25, 2026 14:44
Parse `Transfer-Encoding: chunked` response bodies so callers can
consume responses from servers that stream without a `Content-Length`
header. The parser now emits `trailer_header` and
`trailer_header_continuation` responses, and surfaces malformed or
unsupported encodings as `{error, {parser, Reason}}`.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Non-numeric or negative Content-Length values previously crashed
the parser process with badarg from binary_to_integer/1. Surface
them as a parser error so callers can recover; per RFC 9112 §6.3
the field value is 1*DIGIT, anything else is malformed framing.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A header whose value is empty or all-whitespace (e.g.
"X-Foo:   \r\n") reached trim_right_spaces/2 with Count = 0 and
raised badarg on a negative-length binary pattern, crashing the
parser process.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Premature socket close used to be indistinguishable from a normal
end of response: stream/2 returned ok-closed, recv/2 returned the
backend-closed tuple, and ssl_closed had no handler at all.
RFC 9112 §6.3 requires incomplete messages to be treated as an
error; return {error, {parser, incomplete_response}} when the
parser is still inside headers, a Content-Length body with bytes
remaining, or any chunked state. Close after a complete response
and HTTP/1.0 close-delimited bodies stay on the normal path.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Conflicting Content-Length values were silently accepted with the
last one winning, a classic request-smuggling primitive. Per
RFC 9112 §6.3 such a message is unrecoverable; accept duplicates
only when values match, otherwise return
{error, {parser, {conflicting_content_length, V}}}.

Signed-off-by: Davide Bettio <davide@uninstall.it>
A Content-Length: 0 response left the parser in body state
forever: the transition to done only fired from consume_bytes
after processing bytes, which never ran for an empty body.
Passive callers hung and truncation detection misflagged a
legitimate close as incomplete_response. Transition to done at
end-of-headers when remaining_body_bytes is 0, matching the
message-completion rule in RFC 9112 §6.3.

Signed-off-by: Davide Bettio <davide@uninstall.it>
consume_bytes/2 emitted the entire buffered chunk without
bounding it by remaining_body_bytes, handing bytes past
Content-Length to the caller as body and stalling the parser.
Per RFC 9112 §6.3, bytes beyond Content-Length are not part of
the current message: cap emission at the promised count,
discard the rest (this client does not pipeline), and
transition to done.

Signed-off-by: Davide Bettio <davide@uninstall.it>
{data, Ref, Binary} responses carry sub-binaries of the socket
receive buffer, not copies. Piping chunks straight into the
next parser is the common case and fine; callers that retain a
chunk beyond the current processing step should binary:copy/1
first so the underlying receive buffer can be released.

Signed-off-by: Davide Bettio <davide@uninstall.it>
RFC 9112 §5.2 deprecated line folding and different recipients
merge continuations differently (with or without the required
SP octet), making obs-fold a known header-smuggling primitive.
Reject folded header and trailer lines with
{error, {parser, {deprecated_obs_fold, Line}}} and drop the
header_continuation and trailer_header_continuation response
events, no in-tree caller relied on them, and the API grammar
is materially smaller without them.

Signed-off-by: Davide Bettio <davide@uninstall.it>
consume_lines/2 appended incoming data to acc without bound, so
a hostile peer could pressure memory by streaming a long status,
header or chunk-size line that never closed with CRLF. Cap each
parsed line at 16 KiB; a longer line returns
{error, {parser, {line_too_long, Prefix}}}. Not RFC-mandated;
matches the defaults used by nginx and Apache.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Replace the Count-based trim_left_spaces/2 and trim_right_spaces/2
with the idiomatic recursive byte-wise pattern. Rename to match
the HTTP spec term (RFC 9110 §5.6.3: OWS = *( SP / HTAB )). Same
observable behavior; smaller and faster on BEAM through match-
context reuse.

Stays byte-wise rather than calling string:trim/3 because HTTP
field values are not UTF-8 (RFC 9110 §5.5 admits obs-text =
%x80-FF) and OTP string functions require valid unicode:chardata.

Signed-off-by: Davide Bettio <davide@uninstall.it>
The active-mode stream/2 returns {ok, Conn, closed} on a normal
peer close; the passive-mode recv/2 returns
{error, {SocketType, closed}} in the same situation. This
asymmetry is intentional: active mode surfaces end-of-stream as
a distinct ok marker so a receive loop has something to match
against, and passive mode inherits the native gen_tcp:recv/2 /
ssl:recv/2 contract so callers can keep their case expression a
single match on {ok, _, _}. Both paths return
{error, {parser, incomplete_response}} when the close is
premature.

Document the shapes on both functions so the asymmetry is not
mistaken for a bug, and pin the behaviour with test_active_close
and test_passive_close.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants