Add support for ECH on Android 37#9383
Open
yschimke wants to merge 75 commits into
Open
Conversation
# Conflicts: # android-test/build.gradle.kts # android-test/src/androidDeviceTest/java/okhttp/android/test/EchTest.kt # okhttp/build.gradle.kts
This comment has been minimized.
This comment has been minimized.
# Conflicts: # gradle/libs.versions.toml # okhttp/build.gradle.kts
yschimke
commented
Jun 17, 2026
| * Typical implementations are backed by Android's `DnsResolver`, OkHttp's DnsOverHttps, or other | ||
| * resolver libraries. Implementations must be safe for concurrent use. | ||
| */ | ||
| internal fun interface AsyncDns { |
Collaborator
Author
There was a problem hiding this comment.
Should we make this public? Probably as follow up.
yschimke
commented
Jun 17, 2026
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package okhttp3 |
Collaborator
Author
There was a problem hiding this comment.
Should move to internal package.
Collaborator
Author
There was a problem hiding this comment.
Maybe awkward if it's exposed on public APIs
yschimke
commented
Jun 17, 2026
| * The asynchronous DNS resolver, or null to resolve with [dns] only. When set, the connection | ||
| * path can use HTTPS/SVCB records it returns (including Encrypted Client Hello configuration). | ||
| */ | ||
| internal val asyncDns: AsyncDns? = builder.asyncDns |
Collaborator
Author
There was a problem hiding this comment.
Should setting dns null this out?
yschimke
commented
Jun 17, 2026
| }, | ||
| // executor is only used for handoff, so a minimal direct Executor | ||
| private val executor: Executor = Executor { it.run() }, | ||
| private val timeoutMillis: Int = 5_000, |
Collaborator
Author
There was a problem hiding this comment.
TODO - move outside AndroidAsyncDns, something Call timeout related?
Bump the API 37 matrix entry to -memory 3072 (others stay at 2048) via a per-entry matrix.memory with a 2048 fallback; both the snapshot-create and run steps use it so the cached snapshot still matches.
Make the -gpu flag a per-entry matrix value: the other levels keep -gpu swiftshader_indirect, API 37 sets gpu: "" so it boots with no -gpu flag. Both emulator steps use it so the snapshot still matches.
Bump the API 37 matrix entry from 3072 to 4096 in case system_server was starved bringing up the package service.
The android job's cache-warming step ran ./gradlew :android-test:test, i.e. the Robolectric unit tests, which fetch the android-all artifact and take ~30 min (and fail under SDK 37 via MavenArtifactFetcher) — timing out the API 37 emulator job before Create AVD / Run Tests. Build the instrumentation APK instead so the emulator step actually runs.
Robolectric stalls/fails fetching the android-all artifacts for the @config SDKs under this SDK 37 build (MavenArtifactFetcher), which also timed out the emulator job's cache-warm step. Disable the android-test Robolectric unit tests until Robolectric supports it; the instrumentation tests aren't Test tasks, so connectedCheck still runs.
The no-gpu experiment made the emulator unstable right after boot (device offline / broken pipe), failing the snapshot step. swiftshader is needed for a stable headless emulator, so revert to -gpu swiftshader_indirect (keep the 4 GB headroom on API 37).
CI: tune the API 37 emulator job
Robolectric has no android-all artifact for the pre-release SDK 37, so its unit tests stall/fail (MavenArtifactFetcher). Gate the disable on a compileApi constant (shared with compileSdk/targetSdk) so the tests run again once we compile against a Robolectric-supported SDK.
The android-emulator-runner waits for sys.boot_completed, but on API 37 the package/activity services can still be initializing, so APK install fails with 'Can't find service: package'. Poll for those services before connectedCheck; it's a no-op on levels where they're already up.
Restoring the cached snapshot leaves the API 37 emulator unstable mid-run:
system_server drops the package service while installing the second APK
('Can't find service: package' / 'install-write all apks' after tests have
already started). Boot fresh (-no-snapshot) for API 37 via matrix.coldBoot;
other levels keep the snapshot.
The Play image's system_server drops the package service mid-run on the runner. Switch API 37 to system-images;android-37.0;google_apis_ps16k;x86_64 (no Play services) to see if the lighter image is more stable.
The pre-release API 37 emulator image's system_server is unstable on CI runners (drops package/activity/settings services mid-run) regardless of snapshot/cold-boot, memory, GPU, or image variant. Mark the entry experimental and continue-on-error so it runs as a canary without blocking the build; revert to the snapshot path (drop cold-boot) since that got furthest. Re-evaluate when the image matures.
Increase disk size for emulator in build workflow.
Refactor dnsResolver initialization to use lazy delegation for improved performance.
Apply the substantive ECH review fixes so the API 37 (Android 17) path behaves correctly once the emulator runs: - Android17Platform: resolve the TLS socket via a prioritised list of socket adapters (Android17/Android10/Play/Conscrypt/BouncyCastle) with graceful fallback, instead of force-unwrapping a single Android17 adapter. Mirrors the existing Android10Platform pattern. Also fall back to super for the certificate chain cleaner instead of !!. - AndroidAsyncDns: allow the DnsResolver to be injected (defaulting to the lazily-created handler-thread resolver) for testability. - OkHttpTest: initialise the client in setup(), expect Android17Platform and TLS 1.3 on API 37+, and tolerate SSLPeerUnverifiedException wrapped as a cause/suppressed exception.
The pre-release API 37 (16 KB page-size) image was left stuck adb-offline through the whole boot because ram-size/disk-size were passed as action inputs, writing a malformed hw.ramSize=4gb into the AVD. Pass RAM via -memory 4096 in emulator-options instead, use the playstore_ps16k target and -gpu auto, and drop the package/activity service-wait wrapper. The entry stays experimental/non-blocking while the image matures.
On API 37 AndroidAsyncDns is the default client DNS, so routing every lookup through Android's DnsResolver broke names it can't resolve (localhost, loopback/literal IPs) or rejects (lenient names like under_score.example.com, failed via IDN.toASCII). That failed the localhost-based instrumentation tests (DnsOverHttps, EventSource, testEventListener) and testUnderscoreRequest once the API 37 emulator finally booted. Aggregate the A and AAAA answers into a single address batch and, when DnsResolver yields nothing, fall back to InetAddress.getAllByName so the resolver stays a superset of Dns.SYSTEM. The HTTPS/SVCB query that carries ECH is unchanged and still runs for real hostnames.
On Android 17 OkHttp configures TLS via the platform's public SSLSocket APIs (Android17SocketAdapter) rather than the reflection/Conscrypt path. A custom SSLSocket that throws from getApplicationProtocol() isn't supported there: it crashes the instrumentation process instead of degrading to HTTP/1.1, which aborts the whole run. The scenario is no longer expected on API 37+, so assume SDK_INT < 37 for this test.
Conscrypt's getSSLParameters() eagerly constructs an SNIHostName from the
peer host and throws IllegalArgumentException ("Invalid input to toASCII"
via IDN.toASCII) for hostnames that violate STD3 ASCII, e.g. underscores.
The platform JDK skips invalid SNI names rather than throwing.
Catch it in configureTlsExtensions and rethrow as IOException("Android
internal error"), mirroring Android10SocketAdapter, so testUnderscoreRequest
fails cleanly instead of leaking a raw IllegalArgumentException.
Setting OkHttpClient.Builder.dns() had no effect on Android: asyncDns defaults to the platform AsyncDns (AndroidAsyncDns), and RealCall resolution prefers asyncDns whenever it is present, so a custom Dns was silently ignored. Clear asyncDns when an explicit Dns is configured so the documented behavior holds; this also disables ECH, which only the async resolver can carry.
…stener On dual-stack hosts the loopback resolves to both ::1 and 127.0.0.1 while MockWebServer binds to only one of them, so the first connect attempt can emit a ConnectStart/ConnectFailed pair before the successful ConnectStart (seen on the API 37 emulator). Strip failed connection attempts from the recorded events so the assertion holds across single- and dual-stack environments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DnsResolverHTTPS/SVCB lookups to capture ECH configuration records alongside address resolution.Dnsimplementations can expose platform-specific ECH data without returning rawAnyvalues.EchConfigListvalues to TLS sockets when the activeEchModeattempts ECH.NetworkSecurityPolicy.getDomainEncryptionMode()to select the ECH mode for each host.Remoteso they don't run in normal CI.Validation is mostly local Android/JVM compile and API checks, plus host-side tests for the DNS/ECH policy plumbing. The live ECH tests remain opt-in because
they depend on external servers.