feat(nodejs): add base images and build infrastructure for Node.js weblogs#6903
feat(nodejs): add base images and build infrastructure for Node.js weblogs#6903rochdev wants to merge 46 commits into
Conversation
|
|
|
✨ Fix all issues with BitsAI or with Cursor
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e84743ef90
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
robertomonteromiguel
left a comment
There was a problem hiding this comment.
There are CI failures related with this PR
Pre-build node_modules into versioned base images hosted on datadog/system-tests, following the same pattern as Python and PHP. Regular Dockerfiles now inherit from the base and only add source code and ddtrace, keeping rebuilds fast. - Add *.base.Dockerfile for all 9 Node.js weblogs - Add docker-bake.hcl and build_nodejs_base_images.sh - Update all 10 regular Dockerfiles to FROM datadog/system-tests:X.base-v1 - Wire build-nodejs-base-images CI label into ci.yml and system-tests.yml - Document the new label in docs/edit/update-docker-images.md Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
e84743e to
c05de90
Compare
Push Node.js weblog base images to ghcr.io/datadog/system-tests instead of Docker Hub, triggered automatically when the build-nodejs-base-images label is present on a PR. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Tag pushed base images with the PR number instead of a static version to prevent accidental overwrites across PRs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The reusable workflow job must explicitly declare packages:write to push to GHCR, even when the caller grants it. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Pushing to GHCR requires packages:write on the job in system-tests.yml, which would break all external callers of that reusable workflow. Instead, push from a dedicated job in ci.yml that external callers never invoke. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use ghcr.io/datadog/system-tests/weblog/nodejs:<name>-v<PR> instead of a flat tag on the system-tests package. One scoped package, cleaner tags without dots, and extensible for other languages if they move to GHCR. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
@robertomonteromiguel Actually I couldn't figure out how to prevent PRs from being reopened to overwrite the image, so I'm reverting to the original approach, not sure why that was failing to begin with as the image should be available locally. |
Moves the GHCR push logic to publish-nodejs-base-images.yml so it can later be protected with a dd-octo-sts claim (see PR #6957) to restrict pushes to main-branch runs only. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Use Docker Hub (datadog/system-tests) instead of GHCR - Build with --load inside build_end_to_end (same runner as weblog build) - Push to Docker Hub at merge time via get_pr_merged_labels.sh - Remove dedicated publish job and reusable workflow from ci.yml Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
openai-js, anthropic-js and google_genai-js rebuild for every framework version test anyway, so base images add complexity without benefit. Revert them to build from node:22-alpine directly, same as Python equivalents. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use --output type=docker,dest=... directly from the buildx build step instead of --load then docker save | gzip. BuildKit writes already- compressed layer blobs to the tar, avoiding the decompress/recompress cycle that caused the ~25s save time. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…river Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pigz uses all available CPU cores, cutting the ~25s docker save | gzip step down to ~6-12s with no format or compatibility changes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
zstd is ~3-5x faster than gzip on both compress and decompress sides. Both zstd and the decompression path (zstd -d -c | docker load) are available on GitHub Actions Ubuntu runners. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Switch express4 and uds-express4 base images to use bun instead of npm ci, with hoisted linker and network concurrency of 8. Also use bun add for dd-trace installation via install_ddtrace_bun.sh. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
bun's hoisted linker flattens dd-trace's internal package structure, breaking relative path resolution within the package. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
dd-trace generates vendor files via a postinstall script; bun requires --trust to run lifecycle scripts for packages. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Replace npm ci with bun install across all Node.js base Dockerfiles - Copy bun binary from oven/bun Alpine/Debian image instead of running installer - Generate bun.lock files for all weblogs - Switch install_ddtrace.sh to use bun add (drop _bun suffix, remove from shellcheck TODO) - Switch all weblog Dockerfiles to use bun run build instead of npm run build - Replace artifact-based weblog image sharing with GHCR (all languages) - Remove docker save | zstd and artifact load paths from build.sh - Add scheduled cleanup for ephemeral GHCR run images (3-day retention) - Add packages:write to ci.yml and system-tests.yml build job - Add _weblog_ghcr_image input to run-end-to-end.yml for GHCR pull Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
GHCR hit rate limits (2000 req/min) with parallel builds. Artifacts don't have this constraint. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Node.js weblog builds take ~15s with base images — same as loading an artifact. Skip the separate build job and artifact round-trip entirely by rebuilding directly in each run job. Other languages keep artifacts. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Without a build job, the weblog-specific artifact never gets uploaded. Point to the upstream binaries artifact instead so the download doesn't fail. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
For Node.js weblogs, save only the specific base image to the artifact instead of the full weblog image. Run jobs load the base image then rebuild the weblog in ~15s. Filename includes the weblog variant to avoid conflicts when building multiple variants locally. Loading is handled by a shared load_base_image.sh sourced by both build.sh and run.sh, keeping infrastructure concerns in Bash. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
run.sh has set -u; use default-empty expansion so the function is a safe no-op when called in contexts where these vars aren't set. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tall CI runners have fast datacenter networking; 16 concurrent downloads is faster than 8 without risking rate limits. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
get_image_list() was parsing the Dockerfile and trying to pre-pull the base image from Docker Hub. When the base image artifact is present in binaries/, the build handles loading itself so no pre-pull is needed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
For nodejs with -s, the base image is already in the daemon. Use continue to save only the base image and skip the weblog build, since run jobs rebuild the weblog themselves from the artifact. Also skip pre-pulling when the base image artifact is present. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
TEST_LIBRARY is not a standalone variable in run.sh; use libraries[0] so load_base_image constructs the correct artifact filename. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The parametric test client rebuilds on every run (like Python/PHP). Using a custom base image requires it to be on Docker Hub, which it isn't yet. Revert to node:18.10-slim so it works the same as other languages. The optimization can come after the base images are published. Also removes load_base_image.sh (no longer needed in run.sh) and the extra artifact download step from run-parametric.yml. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Parametric tests rebuild on every run like Python/PHP, using a standard public base image. No need for the custom base image or its associated infrastructure. Clean up load_base_image.sh, run.sh and bake.hcl. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
install_ddtrace.sh uses bun but node:18.10-slim doesn't have it. Copy bun from oven/bun:1.3.13 and switch npm install to bun install. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
When false (default), Node.js weblogs have no build_end_to_end job and run jobs rebuild from Docker Hub base images — no artifact needed. External callers use _build_nodejs_base_images; internally mapped to the generic build_weblog_images parameter in compute-workflow-parameters. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…01/FBT002 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ithout label build_weblog_images=false should only apply to nodejs when the label is not set. All other languages always need build jobs. Fix by defaulting to true and computing false only for nodejs via the expression in system-tests.yml. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…s step The step was using inputs._build_weblog_images (not a declared input) instead of inputs._build_nodejs_base_images, causing it to never run, docker save to produce an empty artifact, and docker load to fail. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… description Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use $BINARIES_FILENAME for nodejs base image saves (same as other languages) instead of a separate -base-image.tar.zst name. After loading the artifact, check if system_tests/weblog is in the daemon to decide whether to rebuild. Removes load_base_image function. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Summary
*.base.Dockerfilefor 6 Node.js weblogs (express4, express5, fastify, express4-typescript, nextjs, uds-express4) — each pre-installsnode_modulesviabun installdocker-bake.hclandbuild_nodejs_base_images.shto build all base images in paralleldatadog/system-testson Docker Hub, same as Python and PHP (manual push via#apm-shared-testingafter merge, or via thebuild-nodejs-base-imageslabel in CI)docker image inspect system_tests/weblogafter loading from the artifact — if not present, they rebuild from the base imagedocs/edit/update-docker-images.mdto document the Node.js workflow (same as Python/PHP)_build_weblog_imagesgeneric parameter tocompute-workflow-parametersto allow external callers to skip build jobs when base images are already on Docker Hubzstdinstead ofgzipfor weblog artifact compression (faster on both save and load)How it works
When
build-nodejs-base-imageslabel is set:--load(into Docker daemon)build_end_to_endjob saves the base image as the weblog artifactAfter merge (base images on Docker Hub): run jobs pull directly from Docker Hub and rebuild — no label needed.
Test plan
build-nodejs-base-imageslabel to trigger base image build in CIexpress4weblog builds successfully against the base imageHuman notes
In dd-trace-js, we were spending almost half or our CI run time building weblog images. I tried simply switching to building base images, but it turned out not to be enough because the way that is done is extremely inefficient. System tests include a custom Docker layer save/load system that is so slow it's actually slower than doing a full rebuild every time. So in this PR I also made sure to skip this entirely. When the base images are available for Node, the weblog is built in the "run end-to-end" instead of in previous jobs to avoid all of this slow custom logic.
With this change, building the weblog goes from 2+ minutes to 10 seconds with no need for any separate prior jobs.