fix(readable): install a StringDecoder when setEncoding is called by SAY-5 · Pull Request #5086 · nodejs/undici

SAY-5 · 2026-04-21T18:57:30Z

What kind of change does this PR introduce?

bug fix

What is the current behavior?

BodyReadable.setEncoding only wrote _readableState.encoding and skipped the Node-standard step of installing a StringDecoder. With an encoding set but no decoder, the Readable machinery falls through to Buffer.prototype.toString(encoding) on each individual chunk, which corrupts any multi-byte character (UTF-8 CJK, emoji, any 3+ byte sequence) that straddles a chunk boundary. The partial bytes become U+FFFD, and the next chunk's leading bytes do too.

Fixes #5002.

What is the new behavior?

Match Node's Readable.prototype.setEncoding: build a StringDecoder from the supplied encoding and wire it into _readableState. The decoder buffers unfinished byte sequences between chunks, so callers iterating for await (const chunk of body) with setEncoding('utf8') get lossless text.

Other information

Added test/readable-set-encoding-utf8.js: pushes the 3-byte sequence for 中 split across two chunks and asserts no U+FFFD in the decoded output. Pre-fix the test fails with "\uFFFD\uFFFD"; post-fix the decoded string is exactly "中".

codecov-commenter · 2026-04-21T19:12:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.13%. Comparing base (2a6f9c7) to head (6db976f).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5086   +/-   ##
=======================================
  Coverage   93.13%   93.13%           
=======================================
  Files         110      110           
  Lines       35816    35824    +8     
=======================================
+ Hits        33356    33364    +8     
  Misses       2460     2460

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mcollina

Can you check why was this done in this way? Also, why are we overriding setEncoding now? Shouldn't removing the override do the same?

SAY-5 · 2026-04-22T07:45:47Z

Good point @mcollina — you're right that removing the override is the cleaner fix. node:stream's Readable.prototype.setEncoding already installs a StringDecoder and buffers partial byte sequences between chunks; the pre-existing override only set _readableState.encoding and nothing else, which is exactly what caused #5002. Force-pushed to just delete the override (and the now-unused node:string_decoder import). Regression test still passes.

@mcollina

Per @mcollina's review: the original BodyReadable.setEncoding override only wrote `_readableState.encoding`, which made the base Readable fall back to Buffer.prototype.toString(encoding) per chunk -- the exact failure mode that produced U+FFFD on multi-byte UTF-8 chunks straddling boundaries (nodejs#5002). node:stream's Readable.prototype.setEncoding already installs a StringDecoder and buffers partial byte sequences between chunks. Removing the override is the minimal fix and inherits the correct behaviour. Dropped the now-unused `node:string_decoder` import. Regression test test/readable-set-encoding-utf8.js still passes.

mcollina reviewed Apr 22, 2026

View reviewed changes

SAY-5 closed this Apr 23, 2026

SAY-5 force-pushed the fix/readable-set-encoding-utf8-5002 branch from d881946 to c7f1904 Compare April 23, 2026 04:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(readable): install a StringDecoder when setEncoding is called#5086

fix(readable): install a StringDecoder when setEncoding is called#5086
SAY-5 wants to merge 1 commit intonodejs:mainfrom
SAY-5:fix/readable-set-encoding-utf8-5002

SAY-5 commented Apr 21, 2026

Uh oh!

codecov-commenter commented Apr 21, 2026

Uh oh!

mcollina left a comment

Uh oh!

SAY-5 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SAY-5 commented Apr 21, 2026

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Other information

Uh oh!

codecov-commenter commented Apr 21, 2026

Codecov Report

Uh oh!

mcollina left a comment

Choose a reason for hiding this comment

Uh oh!

SAY-5 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants