fix(decoder): GzipDecoder fallback should decompress when headers lack gzip content type (AI-Triage PR) #895
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix(decoder): GzipDecoder fallback should decompress when headers lack gzip content type
Summary
One-line fix in
create_gzip_decoder(): changes thefallback_parserfromgzip_parser.inner_parser(e.g.,CsvParser) togzip_parser(the fullGzipParserwrapping the inner parser).Problem: When
GzipDecoderis explicitly configured (e.g., asdownload_decoderinAsyncRetriever) and the HTTP response lacksContent-Encoding: gziporContent-Type: application/gzipheaders, the_select_parser()method falls back to the inner parser directly (e.g.,CsvParser), skipping decompression entirely. This causesUnicodeDecodeError: 'utf-8' codec can't decode byte 0x8bbecause raw gzip bytes are fed to a text parser.This is common for S3 pre-signed URL downloads (e.g., CircleCI Usage Export, Amazon Ads reports) where files are gzip-compressed but served as
Content-Type: binary/octet-stream.Fix: Since the user has explicitly declared
GzipDecoder, the fallback should still decompress. Changefallback_parser=gzip_parser.inner_parser→fallback_parser=gzip_parser.Related issues: airbytehq/airbyte#56988, airbytehq/airbyte#66208, airbytehq/oncall#11173
Introduced in airbytehq/airbyte-python-cdk#378.
Review & Testing Checklist for Human
GzipParser.parse()will be invoked even when there are no gzip headers. If any connector usesGzipDecoderin a context where some responses are genuinely non-gzipped AND lack gzip headers,GzipParserwill now fail on those responses instead of gracefully parsing them. Check whetherGzipParser.parse()needs a try/except fallback toinner_parserfor non-gzip data, or whether this scenario is not possible when the user explicitly configuresGzipDecoder.Content-Type: binary/octet-stream.Nested Decodersto decode the streaming responses, instead ofResponseToFileExtractor#378 intent: The original implementation may have had a reason for the header-based fallback behavior. Confirm with @bazarnov or @maxi297 if needed.Notes