Skip to content

fix(process): give backend workers a parent-death safety net#10639

Merged
mudler merged 2 commits into
masterfrom
fix/backend-parent-death-signal
Jul 2, 2026
Merged

fix(process): give backend workers a parent-death safety net#10639
mudler merged 2 commits into
masterfrom
fix/backend-parent-death-signal

Conversation

@localai-bot

@localai-bot localai-bot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Symptom

A backend model-worker subprocess — the per-model gRPC server LocalAI spawns (e.g. a llama.cpp/whisper/etc. worker) — can be orphaned and linger, holding VRAM and its listen port, if the LocalAI process itself is killed non-gracefully (for example, a supervising process's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before LocalAI's own teardown runs. This was hit by a downstream project that supervises LocalAI as a child process.

Root cause

LocalAI does have a working-by-design graceful teardown:

  • pkg/signals/handler.go installs signal.Notify(c, SIGINT, SIGTERM), runs registered handlers, then exits.
  • The serve path registers app.Shutdown() (core/cli/run.go), which calls ModelLoader.StopAllGRPC()process.Stop() (pkg/model/process.go).

That teardown only runs if LocalAI receives a catchable signal and survives long enough to run its handlers. If LocalAI is SIGKILLed, none of it runs.

Backends are spawned via github.com/mudler/go-processmanager v0.1.1. Its getSysProcAttr() (in the library's process_unix.go) sets Setpgid: true — intentional, so the graceful path can signal the backend's whole process group — but it never sets PR_SET_PDEATHSIG/Pdeathsig, and the library exposes no Config field or functional option to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (pkg/model/process.go calls process.New(...).Run(); it never builds the exec.Cmd itself), so LocalAI cannot set a kernel parent-death signal at the spawn site. When LocalAI dies without cleaning up, the backend is reparented to init and keeps running. There is no fallback that makes an orphaned backend self-terminate.

Fix

Add a best-effort, backend-side safety net that detects reparenting: on startup each backend captures getppid() and polls it; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal that the original parent has died) it logs and self-terminates. getppid() detection is portable across Linux + macOS, unlike Linux-only PR_SET_PDEATHSIG (which also has a false-positive with a Go parent: the signal fires when the spawning thread exits, which the Go runtime may retire while the process lives).

The same mechanism, env vars and semantics are now applied across all three backend languages LocalAI ships:

  • Gopkg/grpc/parentwatch.go, armed in the shared grpc.StartServer / grpc.RunServer choke point that every out-of-process Go backend routes through.
  • C++backend/cpp/llama-cpp/parent_watch.h, a dependency-free header wired into grpc-server.cpp's main() (and copied at build time via prepare.sh).
  • Pythonbackend/python/common/parent_watch.py, armed from common/grpc_auth.py's get_auth_interceptors() — the single shared helper every Python backend invokes while building its gRPC server.

Shared configuration (identical across all three):

  • LOCALAI_BACKEND_PARENT_WATCH — default on; falsy values false/0/no/off (case-insensitive) disable it; automatically off on Windows (different reparenting semantics).
  • LOCALAI_BACKEND_PARENT_WATCH_INTERVAL — poll interval, default 2s; accepts Go-style durations (500ms, 2s, 1m) in every language for parity.
  • Skips entirely when already orphaned at startup (getppid() <= 1).

This is strictly a backstop alongside the existing graceful SIGTERM → grace → SIGKILL teardown, which is unchanged in all three languages. No shutdown timing, GracefulTimeout, or IsBusy() polling was touched.

Test coverage

Each language has a real process-tree reparent test (test → middle → grandchild): the middle process exits to orphan the grandchild (running the real watcher), and the test asserts the watcher detects the reparent and self-terminates.

Gopkg/grpc/parentwatch_test.go:

$ go test ./pkg/grpc/ -run TestParentDeathWatcherDetectsReparent -v -count=1
=== RUN   TestParentDeathWatcherDetectsReparent
--- PASS: TestParentDeathWatcherDetectsReparent (0.06s)
PASS
ok  	github.com/mudler/LocalAI/pkg/grpc	0.069s

C++backend/cpp/llama-cpp/parent_watch_test.cpp (uses fork(2); standard library only, so it runs via the existing standalone backend/cpp/run-unit-tests.sh runner — no CUDA/gRPC build needed; also buildable under ctest with -DLLAMA_GRPC_BUILD_TESTS=ON):

$ bash backend/cpp/run-unit-tests.sh
==> backend/cpp/llama-cpp/parent_watch_test.cpp
ok:   interval default 2000ms
ok:   interval 500ms / 2s / 1m / bare-3 / garbage-fallback
ok:   enabled by default; disabled by false/0/no/off/OFF/' False '
ok:   grandchild signaled readiness
ok:   watcher detected parent death and self-terminated
All parent_watch tests passed.
Ran 2 standalone C++ unit test file(s)   # exit 0

(The full backend build needs the llama.cpp + gRPC toolchain, so the watcher is verified by compiling and running its own translation unit standalone — the header is intentionally dependency-free precisely so this is possible.)

Pythonbackend/python/common/parent_watch_test.py (uses os.fork; standard library only):

$ cd backend/python/common && python3 -m unittest parent_watch_test -v
test_detects_reparent ... ok
test_disabled_by_falsey ... ok
test_enabled_by_truthy ... ok
test_enabled_default ... ok
test_interval_default ... ok
test_interval_garbage_falls_back ... ok
test_interval_units ... ok
Ran 7 tests in 0.062s
OK

Known limitations / follow-ups (not overclaiming)

  • C++ coverage is the llama-cpp backend only. C++ backends have no shared server scaffolding (each backend/cpp/*/grpc-server.cpp has its own main/RunServer), so the watcher was added to the originally-reported, most-used backend (llama.cpp). The other C++ backends — ds4, ik-llama-cpp, privacy-filter — are not yet covered; each would need the same one-line #include "parent_watch.h" + start_parent_death_watcher() as a follow-up (the header is reusable as-is).
  • Python coverage is all backends via the shared common/ choke point, with no per-backend edits.
  • The fully general fix would be for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language. That is a change to a separate repo and is intentionally out of scope for this LocalAI-only PR — suggested as an upstream follow-up.
  • Windows is not covered in any language (different reparenting model); the watcher is a no-op there.

🤖 Generated with Claude Code

mudler and others added 2 commits July 2, 2026 07:31
…fully

Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI
spawns) can be orphaned and linger — holding VRAM and its listen port — if the
LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown
grace period elapses and LocalAI is SIGKILLed) before its own teardown runs.

Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the
SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown ->
ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when
LocalAI receives a catchable signal and survives long enough to run its
handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1,
whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be
signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or
option for a caller to inject/extend SysProcAttr. LocalAI fully delegates
spawning to that library (it never builds the exec.Cmd itself), so it cannot set
a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing
tells the backend to exit and it is reparented to init.

Fix: add a best-effort, backend-side safety net at the one shared choke point
every out-of-process Go backend routes through — grpc.StartServer / RunServer in
pkg/grpc. On startup it captures getppid() and polls; when the process is
reparented (getppid changes / becomes 1 — the standard POSIX signal the original
parent died) it logs and self-terminates. getppid() reparent detection is
portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via
LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the
existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged.

Scope/limitations: covers Go-based backends (everything using pkg/grpc). The
C++ backends (e.g. llama-cpp) and Python backends do not route through
pkg/grpc and are not covered by this mechanism — they would each need an
equivalent parent-death check (follow-up). The fully general fix is for
go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig
at spawn for every backend regardless of language (suggested upstream follow-up;
out of scope for this LocalAI-only PR).

Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild
process tree, lets the middle process exit to orphan the grandchild running the
real watchParentDeath, and asserts it detects the reparent and self-terminates.
Unix-only (build-tagged), runs in CI (Linux).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.

Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:

- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
  (getppid() != orig || == 1), portable across Linux/macOS, no-op on
  Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
  false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
  (default 2s; accepts Go-style durations like 500ms/2s/1m)

C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.

Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.

Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the fix/backend-parent-death-signal branch from 04b474b to 94e3e06 Compare July 2, 2026 07:32
@mudler mudler merged commit a4e6e01 into master Jul 2, 2026
70 checks passed
@mudler mudler deleted the fix/backend-parent-death-signal branch July 2, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants