perf: cache type introspection in _transform_recursive to eliminate redundant dispatch#1216
Open
giulio-leone wants to merge 3 commits intoanthropics:mainfrom
Open
Conversation
…edundant dispatch The _transform_recursive function and its async variant performed type introspection (strip_annotated_type, get_origin, is_typeddict, is_list_type, is_union_type, etc.) on every recursive call, even though the type annotation is the same for all values of a given field. On large payloads (~90K messages), this consumed ~6.6% of total CPU time with zero transformation output since Messages API types have no PropertyInfo annotations. Changes: - Add _cached_transform_dispatch(): LRU-cached function that precomputes the dispatch path (typeddict/dict/sequence/union/other) and extracts type args once per annotation type. Subsequent calls are O(1) dict lookups instead of re-running type introspection. - Add _get_field_key_map(): LRU-cached function that precomputes the key alias mapping for each TypedDict type, replacing per-field _maybe_transform_key calls with a single dict.get() lookup. - Expand _no_transform_needed() to include str and bool, allowing lists of strings/bools to skip per-element recursion. - Apply same optimizations to _async_transform_recursive and _async_transform_typeddict. Fixes anthropics#1195
There was a problem hiding this comment.
Pull request overview
This PR optimizes the Python SDK’s transform machinery by caching type-introspection-driven dispatch decisions and precomputing TypedDict key-alias mappings, reducing repeated work during deep recursive walks of large payloads (per #1195).
Changes:
- Add an
@lru_cached dispatch function to avoid repeating type introspection on every recursive call in_transform_recursive/_async_transform_recursive. - Cache TypedDict field key-alias mappings to replace per-field alias computation with a single lookup.
- Expand “no transform needed” fast-path to include
strandbool, plus add tests for cache behavior and large-payload scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/anthropic/_utils/_transform.py |
Introduces cached dispatch + cached TypedDict key map; updates sync/async recursion paths to use cached results. |
tests/test_transform.py |
Adds tests for passthrough behavior, cache hits, key-map caching, and a large-payload performance check. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback: 1. The dict branch in _async_transform_recursive called the synchronous _transform_recursive, defeating async benefits. Changed to await _async_transform_recursive. 2. Relaxed wall-clock assertion in performance test from 2s to 10s to avoid flakiness in CI environments with variable load.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1195
_transform_recursiveperforms type introspection (strip_annotated_type,get_origin,is_typeddict,is_list_type,is_union_type, etc.) on every recursive call, even though the type annotation is the same for all values of a given field. On large payloads (~90K messages), this consumes ~6.6% of total CPU time with zero transformation output since Messages API types have noPropertyInfoannotations.Changes
1. Cached dispatch via
_cached_transform_dispatch()Adds an
@lru_cache-decorated function that precomputes the dispatch path (typeddict / dict / sequence / union / other) and extracts type args once per annotation type. Subsequent calls are O(1) dict lookups instead of re-running type introspection.2. Cached key mapping via
_get_field_key_map()Precomputes the key alias mapping for each TypedDict type. Replaces per-field
_maybe_transform_key()calls with a singledict.get()lookup.3. Expanded
_no_transform_needed()Now includes
strandboolin addition tointandfloat, allowing lists of strings/bools to skip per-element recursion entirely.4. Async parity
Same optimizations applied to
_async_transform_recursiveand_async_transform_typeddict.Performance Impact
For a 10,000-message payload with no
PropertyInfoannotations (the standard Messages API case):The optimization is purely internal — all existing behavior and correctness is preserved.
Tests
strandboollist skip optimization