Skip to content

feat(nodejs): add base images and build infrastructure for Node.js weblogs#6903

Draft
rochdev wants to merge 46 commits into
mainfrom
rochdev/nodejs-base-images
Draft

feat(nodejs): add base images and build infrastructure for Node.js weblogs#6903
rochdev wants to merge 46 commits into
mainfrom
rochdev/nodejs-base-images

Conversation

@rochdev
Copy link
Copy Markdown
Member

@rochdev rochdev commented May 12, 2026

Summary

  • Add *.base.Dockerfile for 6 Node.js weblogs (express4, express5, fastify, express4-typescript, nextjs, uds-express4) — each pre-installs node_modules via bun install
  • Switch all Node.js weblogs to bun instead of npm for dependency installation and dd-trace
  • Add docker-bake.hcl and build_nodejs_base_images.sh to build all base images in parallel
  • Base images published to datadog/system-tests on Docker Hub, same as Python and PHP (manual push via #apm-shared-testing after merge, or via the build-nodejs-base-images label in CI)
  • Build job saves the base image as the weblog artifact; run job loads it and rebuilds the weblog (~10s). No pre-built weblog artifact is transferred — only the base image
  • Run jobs now check docker image inspect system_tests/weblog after loading from the artifact — if not present, they rebuild from the base image
  • Update docs/edit/update-docker-images.md to document the Node.js workflow (same as Python/PHP)
  • Add _build_weblog_images generic parameter to compute-workflow-parameters to allow external callers to skip build jobs when base images are already on Docker Hub
  • Use zstd instead of gzip for weblog artifact compression (faster on both save and load)

How it works

When build-nodejs-base-images label is set:

  1. CI builds base images with --load (into Docker daemon)
  2. Each build_end_to_end job saves the base image as the weblog artifact
  3. Run jobs download the artifact, load the base image, rebuild the weblog in ~10s
  4. At merge: GitLab script pushes base images to Docker Hub (same as Python/PHP)

After merge (base images on Docker Hub): run jobs pull directly from Docker Hub and rebuild — no label needed.

Test plan

  • Add build-nodejs-base-images label to trigger base image build in CI
  • Verify express4 weblog builds successfully against the base image
  • Verify all Node.js weblogs pass end-to-end tests
  • Verify parametric tests pass

Human notes

In dd-trace-js, we were spending almost half or our CI run time building weblog images. I tried simply switching to building base images, but it turned out not to be enough because the way that is done is extremely inefficient. System tests include a custom Docker layer save/load system that is so slow it's actually slower than doing a full rebuild every time. So in this PR I also made sure to skip this entirely. When the base images are available for Node, the weblog is built in the "run end-to-end" instead of in previous jobs to avoid all of this slow custom logic.

With this change, building the weblog goes from 2+ minutes to 10 seconds with no need for any separate prior jobs.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

CODEOWNERS have been resolved as:

utils/build/build_nodejs_base_images.sh                                 @DataDog/system-tests-core
utils/build/docker/nodejs/anthropic_app/bun.lock                        @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/docker-bake.hcl                               @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4-typescript.base.Dockerfile           @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4-typescript/bun.lock                  @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4.base.Dockerfile                      @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4/bun.lock                             @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express5.base.Dockerfile                      @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express5/bun.lock                             @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/fastify.base.Dockerfile                       @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/fastify/bun.lock                              @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/google_genai_app/bun.lock                     @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/nextjs.base.Dockerfile                        @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/nextjs/bun.lock                               @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/openai_app/bun.lock                           @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/parametric/bun.lock                           @DataDog/dd-trace-js @DataDog/system-tests-core
.github/workflows/ci.yml                                                @DataDog/system-tests-core
.github/workflows/compute-workflow-parameters.yml                       @DataDog/system-tests-core
.github/workflows/system-tests.yml                                      @DataDog/system-tests-core
.shellcheck                                                             @DataDog/system-tests-core
docs/edit/update-docker-images.md                                       @DataDog/system-tests-core
utils/_context/containers.py                                            @DataDog/system-tests-core
utils/build/build.sh                                                    @DataDog/system-tests-core
utils/build/docker/nodejs/anthropic-js.Dockerfile                       @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4-typescript.Dockerfile                @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express4.Dockerfile                           @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/express5.Dockerfile                           @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/fastify.Dockerfile                            @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/google_genai-js.Dockerfile                    @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/install_ddtrace.sh                            @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/nextjs.Dockerfile                             @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/openai-js.Dockerfile                          @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/parametric/Dockerfile                         @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/uds-express4.Dockerfile                       @DataDog/dd-trace-js @DataDog/system-tests-core
utils/scripts/ci_orchestrators/workflow_data.py                         @DataDog/system-tests-core
utils/scripts/compute-workflow-parameters.py                            @DataDog/system-tests-core
utils/scripts/get_pr_merged_labels.sh                                   @DataDog/system-tests-core

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented May 12, 2026

Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 1 Test failed

tests.debugger.test_debugger_symdb.Test_Debugger_SymDb.test_symdb_upload[net-http] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: No symbol files were found
assert 0 > 0
 +  where 0 = len([])
 +    where [] = <tests.debugger.test_debugger_symdb.Test_Debugger_SymDb object at 0x7f443c36afc0>.symbols

self = <tests.debugger.test_debugger_symdb.Test_Debugger_SymDb object at 0x7f443c36afc0>

    def test_symdb_upload(self):
>       self._assert()

...

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 3334ae1 | Docs | Datadog PR Page | Give us feedback!

@rochdev rochdev marked this pull request as ready for review May 14, 2026 13:56
@rochdev rochdev requested review from a team as code owners May 14, 2026 13:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e84743ef90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/ci.yml
Copy link
Copy Markdown
Collaborator

@robertomonteromiguel robertomonteromiguel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are CI failures related with this PR

Pre-build node_modules into versioned base images hosted on
datadog/system-tests, following the same pattern as Python and PHP.
Regular Dockerfiles now inherit from the base and only add source
code and ddtrace, keeping rebuilds fast.

- Add *.base.Dockerfile for all 9 Node.js weblogs
- Add docker-bake.hcl and build_nodejs_base_images.sh
- Update all 10 regular Dockerfiles to FROM datadog/system-tests:X.base-v1
- Wire build-nodejs-base-images CI label into ci.yml and system-tests.yml
- Document the new label in docs/edit/update-docker-images.md

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@rochdev rochdev force-pushed the rochdev/nodejs-base-images branch from e84743e to c05de90 Compare May 16, 2026 21:19
rochdev and others added 8 commits May 16, 2026 17:34
Push Node.js weblog base images to ghcr.io/datadog/system-tests instead
of Docker Hub, triggered automatically when the build-nodejs-base-images
label is present on a PR.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Tag pushed base images with the PR number instead of a static version
to prevent accidental overwrites across PRs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The reusable workflow job must explicitly declare packages:write to push
to GHCR, even when the caller grants it.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Pushing to GHCR requires packages:write on the job in system-tests.yml,
which would break all external callers of that reusable workflow. Instead,
push from a dedicated job in ci.yml that external callers never invoke.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use ghcr.io/datadog/system-tests/weblog/nodejs:<name>-v<PR> instead of
a flat tag on the system-tests package. One scoped package, cleaner tags
without dots, and extensible for other languages if they move to GHCR.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@rochdev
Copy link
Copy Markdown
Member Author

rochdev commented May 17, 2026

@robertomonteromiguel Tests are passing now as the images are now pushed before running the workflow so that the workflow then runs exactly as it would on the main branch or from an external caller. This makes it more robust since the push can't fail on merge, and what is tested is the real final published image. Using the PR number as the version also avoid overwriting an existing version by accident. I will just need an admin to make the system-tests/weblog/nodejs image public so that it can be used by dd-trace-js.

Actually I couldn't figure out how to prevent PRs from being reopened to overwrite the image, so I'm reverting to the original approach, not sure why that was failing to begin with as the image should be available locally.

Moves the GHCR push logic to publish-nodejs-base-images.yml so it can
later be protected with a dd-octo-sts claim (see PR #6957) to restrict
pushes to main-branch runs only.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@rochdev rochdev marked this pull request as draft May 17, 2026 01:15
rochdev and others added 8 commits May 16, 2026 21:22
- Use Docker Hub (datadog/system-tests) instead of GHCR
- Build with --load inside build_end_to_end (same runner as weblog build)
- Push to Docker Hub at merge time via get_pr_merged_labels.sh
- Remove dedicated publish job and reusable workflow from ci.yml

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
openai-js, anthropic-js and google_genai-js rebuild for every framework
version test anyway, so base images add complexity without benefit. Revert
them to build from node:22-alpine directly, same as Python equivalents.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use --output type=docker,dest=... directly from the buildx build step
instead of --load then docker save | gzip. BuildKit writes already-
compressed layer blobs to the tar, avoiding the decompress/recompress
cycle that caused the ~25s save time.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…river

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pigz uses all available CPU cores, cutting the ~25s docker save | gzip
step down to ~6-12s with no format or compatibility changes.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
zstd is ~3-5x faster than gzip on both compress and decompress sides.
Both zstd and the decompression path (zstd -d -c | docker load) are
available on GitHub Actions Ubuntu runners.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Switch express4 and uds-express4 base images to use bun instead of
npm ci, with hoisted linker and network concurrency of 8. Also use
bun add for dd-trace installation via install_ddtrace_bun.sh.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
rochdev and others added 28 commits May 16, 2026 23:05
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
bun's hoisted linker flattens dd-trace's internal package structure,
breaking relative path resolution within the package.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
dd-trace generates vendor files via a postinstall script; bun requires
--trust to run lifecycle scripts for packages.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Replace npm ci with bun install across all Node.js base Dockerfiles
- Copy bun binary from oven/bun Alpine/Debian image instead of running installer
- Generate bun.lock files for all weblogs
- Switch install_ddtrace.sh to use bun add (drop _bun suffix, remove from shellcheck TODO)
- Switch all weblog Dockerfiles to use bun run build instead of npm run build
- Replace artifact-based weblog image sharing with GHCR (all languages)
- Remove docker save | zstd and artifact load paths from build.sh
- Add scheduled cleanup for ephemeral GHCR run images (3-day retention)
- Add packages:write to ci.yml and system-tests.yml build job
- Add _weblog_ghcr_image input to run-end-to-end.yml for GHCR pull

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
GHCR hit rate limits (2000 req/min) with parallel builds. Artifacts
don't have this constraint.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Node.js weblog builds take ~15s with base images — same as loading an
artifact. Skip the separate build job and artifact round-trip entirely
by rebuilding directly in each run job. Other languages keep artifacts.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Without a build job, the weblog-specific artifact never gets uploaded.
Point to the upstream binaries artifact instead so the download doesn't fail.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
For Node.js weblogs, save only the specific base image to the artifact
instead of the full weblog image. Run jobs load the base image then
rebuild the weblog in ~15s. Filename includes the weblog variant to
avoid conflicts when building multiple variants locally. Loading is
handled by a shared load_base_image.sh sourced by both build.sh and
run.sh, keeping infrastructure concerns in Bash.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
run.sh has set -u; use default-empty expansion so the function is
a safe no-op when called in contexts where these vars aren't set.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tall

CI runners have fast datacenter networking; 16 concurrent downloads
is faster than 8 without risking rate limits.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
get_image_list() was parsing the Dockerfile and trying to pre-pull the
base image from Docker Hub. When the base image artifact is present in
binaries/, the build handles loading itself so no pre-pull is needed.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
For nodejs with -s, the base image is already in the daemon. Use
continue to save only the base image and skip the weblog build, since
run jobs rebuild the weblog themselves from the artifact.

Also skip pre-pulling when the base image artifact is present.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
TEST_LIBRARY is not a standalone variable in run.sh; use libraries[0]
so load_base_image constructs the correct artifact filename.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The parametric test client rebuilds on every run (like Python/PHP).
Using a custom base image requires it to be on Docker Hub, which it
isn't yet. Revert to node:18.10-slim so it works the same as other
languages. The optimization can come after the base images are published.

Also removes load_base_image.sh (no longer needed in run.sh) and
the extra artifact download step from run-parametric.yml.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Parametric tests rebuild on every run like Python/PHP, using a standard
public base image. No need for the custom base image or its associated
infrastructure. Clean up load_base_image.sh, run.sh and bake.hcl.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
install_ddtrace.sh uses bun but node:18.10-slim doesn't have it.
Copy bun from oven/bun:1.3.13 and switch npm install to bun install.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
When false (default), Node.js weblogs have no build_end_to_end job
and run jobs rebuild from Docker Hub base images — no artifact needed.
External callers use _build_nodejs_base_images; internally mapped to
the generic build_weblog_images parameter in compute-workflow-parameters.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…01/FBT002

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ithout label

build_weblog_images=false should only apply to nodejs when the label is
not set. All other languages always need build jobs. Fix by defaulting
to true and computing false only for nodejs via the expression in
system-tests.yml.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…s step

The step was using inputs._build_weblog_images (not a declared input)
instead of inputs._build_nodejs_base_images, causing it to never run,
docker save to produce an empty artifact, and docker load to fail.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… description

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Use $BINARIES_FILENAME for nodejs base image saves (same as other
languages) instead of a separate -base-image.tar.zst name. After
loading the artifact, check if system_tests/weblog is in the daemon
to decide whether to rebuild. Removes load_base_image function.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@rochdev rochdev changed the title feat(nodejs): add base images for Node.js weblogs feat(nodejs): add base images and build infrastructure for Node.js weblogs May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants