Skip to content

Add support for ECH on Android 37#9383

Open
yschimke wants to merge 75 commits into
square:masterfrom
yschimke:testing_ech
Open

Add support for ECH on Android 37#9383
yschimke wants to merge 75 commits into
square:masterfrom
yschimke:testing_ech

Conversation

@yschimke

@yschimke yschimke commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator
  • Adds an Android 17 platform path that uses public Android APIs for ALPN, session tickets, domain encryption policy, and ECH socket configuration.
  • Uses Android DnsResolver HTTPS/SVCB lookups to capture ECH configuration records alongside address resolution.
  • Introduces typed ECH DNS records so Dns implementations can expose platform-specific ECH data without returning raw Any values.
  • Applies Android EchConfigList values to TLS sockets when the active EchMode attempts ECH.
  • Uses NetworkSecurityPolicy.getDomainEncryptionMode() to select the ECH mode for each host.
  • Retries once without ECH when the platform reports an ECH configuration mismatch and the active mode permits fallback.
  • Tags live external ECH checks as Remote so they don't run in normal CI.
  • Includes CI cleanup for the AGP source-set API change and Android localhost cleartext test traffic.

Validation is mostly local Android/JVM compile and API checks, plus host-side tests for the DNS/ECH policy plumbing. The live ECH tests remain opt-in because
they depend on external servers.

yschimke and others added 13 commits January 16, 2026 11:23
@github-advanced-security

This comment has been minimized.

@yschimke yschimke changed the title Testing ech Add support for ECH on Android 37 May 4, 2026
@yschimke yschimke marked this pull request as ready for review May 4, 2026 10:19
@yschimke yschimke requested a review from swankjesse May 4, 2026 10:51
* Typical implementations are backed by Android's `DnsResolver`, OkHttp's DnsOverHttps, or other
* resolver libraries. Implementations must be safe for concurrent use.
*/
internal fun interface AsyncDns {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this public? Probably as follow up.

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package okhttp3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move to internal package.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe awkward if it's exposed on public APIs

* The asynchronous DNS resolver, or null to resolve with [dns] only. When set, the connection
* path can use HTTPS/SVCB records it returns (including Encrypted Client Hello configuration).
*/
internal val asyncDns: AsyncDns? = builder.asyncDns

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should setting dns null this out?

},
// executor is only used for handoff, so a minimal direct Executor
private val executor: Executor = Executor { it.run() },
private val timeoutMillis: Int = 5_000,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO - move outside AndroidAsyncDns, something Call timeout related?

yschimke added 10 commits June 17, 2026 12:47
Bump the API 37 matrix entry to -memory 3072 (others stay at 2048) via a
per-entry matrix.memory with a 2048 fallback; both the snapshot-create and
run steps use it so the cached snapshot still matches.
Make the -gpu flag a per-entry matrix value: the other levels keep
-gpu swiftshader_indirect, API 37 sets gpu: "" so it boots with no -gpu
flag. Both emulator steps use it so the snapshot still matches.
Bump the API 37 matrix entry from 3072 to 4096 in case system_server was
starved bringing up the package service.
The android job's cache-warming step ran ./gradlew :android-test:test, i.e.
the Robolectric unit tests, which fetch the android-all artifact and take
~30 min (and fail under SDK 37 via MavenArtifactFetcher) — timing out the
API 37 emulator job before Create AVD / Run Tests. Build the instrumentation
APK instead so the emulator step actually runs.
Robolectric stalls/fails fetching the android-all artifacts for the @config
SDKs under this SDK 37 build (MavenArtifactFetcher), which also timed out the
emulator job's cache-warm step. Disable the android-test Robolectric unit
tests until Robolectric supports it; the instrumentation tests aren't Test
tasks, so connectedCheck still runs.
The no-gpu experiment made the emulator unstable right after boot (device
offline / broken pipe), failing the snapshot step. swiftshader is needed for
a stable headless emulator, so revert to -gpu swiftshader_indirect (keep the
4 GB headroom on API 37).
Robolectric has no android-all artifact for the pre-release SDK 37, so its
unit tests stall/fail (MavenArtifactFetcher). Gate the disable on a compileApi
constant (shared with compileSdk/targetSdk) so the tests run again once we
compile against a Robolectric-supported SDK.
The android-emulator-runner waits for sys.boot_completed, but on API 37 the
package/activity services can still be initializing, so APK install fails with
'Can't find service: package'. Poll for those services before connectedCheck;
it's a no-op on levels where they're already up.
Restoring the cached snapshot leaves the API 37 emulator unstable mid-run:
system_server drops the package service while installing the second APK
('Can't find service: package' / 'install-write all apks' after tests have
already started). Boot fresh (-no-snapshot) for API 37 via matrix.coldBoot;
other levels keep the snapshot.
yschimke and others added 15 commits June 17, 2026 16:04
The Play image's system_server drops the package service mid-run on the
runner. Switch API 37 to system-images;android-37.0;google_apis_ps16k;x86_64
(no Play services) to see if the lighter image is more stable.
The pre-release API 37 emulator image's system_server is unstable on CI
runners (drops package/activity/settings services mid-run) regardless of
snapshot/cold-boot, memory, GPU, or image variant. Mark the entry
experimental and continue-on-error so it runs as a canary without blocking
the build; revert to the snapshot path (drop cold-boot) since that got
furthest. Re-evaluate when the image matures.
Increase disk size for emulator in build workflow.
Refactor dnsResolver initialization to use lazy delegation for improved performance.
Apply the substantive ECH review fixes so the API 37 (Android 17) path
behaves correctly once the emulator runs:

- Android17Platform: resolve the TLS socket via a prioritised list of
  socket adapters (Android17/Android10/Play/Conscrypt/BouncyCastle) with
  graceful fallback, instead of force-unwrapping a single Android17
  adapter. Mirrors the existing Android10Platform pattern. Also fall back
  to super for the certificate chain cleaner instead of !!.
- AndroidAsyncDns: allow the DnsResolver to be injected (defaulting to the
  lazily-created handler-thread resolver) for testability.
- OkHttpTest: initialise the client in setup(), expect Android17Platform
  and TLS 1.3 on API 37+, and tolerate SSLPeerUnverifiedException wrapped
  as a cause/suppressed exception.
The pre-release API 37 (16 KB page-size) image was left stuck adb-offline
through the whole boot because ram-size/disk-size were passed as action
inputs, writing a malformed hw.ramSize=4gb into the AVD. Pass RAM via
-memory 4096 in emulator-options instead, use the playstore_ps16k target
and -gpu auto, and drop the package/activity service-wait wrapper.

The entry stays experimental/non-blocking while the image matures.
On API 37 AndroidAsyncDns is the default client DNS, so routing every
lookup through Android's DnsResolver broke names it can't resolve
(localhost, loopback/literal IPs) or rejects (lenient names like
under_score.example.com, failed via IDN.toASCII). That failed the
localhost-based instrumentation tests (DnsOverHttps, EventSource,
testEventListener) and testUnderscoreRequest once the API 37 emulator
finally booted.

Aggregate the A and AAAA answers into a single address batch and, when
DnsResolver yields nothing, fall back to InetAddress.getAllByName so the
resolver stays a superset of Dns.SYSTEM. The HTTPS/SVCB query that carries
ECH is unchanged and still runs for real hostnames.
On Android 17 OkHttp configures TLS via the platform's public SSLSocket
APIs (Android17SocketAdapter) rather than the reflection/Conscrypt path.
A custom SSLSocket that throws from getApplicationProtocol() isn't
supported there: it crashes the instrumentation process instead of
degrading to HTTP/1.1, which aborts the whole run. The scenario is no
longer expected on API 37+, so assume SDK_INT < 37 for this test.
Conscrypt's getSSLParameters() eagerly constructs an SNIHostName from the
peer host and throws IllegalArgumentException ("Invalid input to toASCII"
via IDN.toASCII) for hostnames that violate STD3 ASCII, e.g. underscores.
The platform JDK skips invalid SNI names rather than throwing.

Catch it in configureTlsExtensions and rethrow as IOException("Android
internal error"), mirroring Android10SocketAdapter, so testUnderscoreRequest
fails cleanly instead of leaking a raw IllegalArgumentException.
Setting OkHttpClient.Builder.dns() had no effect on Android: asyncDns
defaults to the platform AsyncDns (AndroidAsyncDns), and RealCall
resolution prefers asyncDns whenever it is present, so a custom Dns was
silently ignored. Clear asyncDns when an explicit Dns is configured so
the documented behavior holds; this also disables ECH, which only the
async resolver can carry.
…stener

On dual-stack hosts the loopback resolves to both ::1 and 127.0.0.1 while
MockWebServer binds to only one of them, so the first connect attempt can
emit a ConnectStart/ConnectFailed pair before the successful ConnectStart
(seen on the API 37 emulator). Strip failed connection attempts from the
recorded events so the assertion holds across single- and dual-stack
environments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants