Gate service ffmpeg install at runtime#2052
Conversation
Greptile SummaryThis PR makes
|
| Filename | Overview |
|---|---|
| Dockerfile | Removes build-time ffmpeg install; adds sudo, copies entrypoint and install script, creates a narrowly scoped sudoers rule for nemo user, validates it with visudo. |
| docker/scripts/retriever_install_ffmpeg.sh | New wrapper script that rejects all arguments, uses full absolute paths for apt-get, and is the sole target of the sudoers NOPASSWD rule. |
| docker/scripts/retriever_service_entrypoint.sh | New service entrypoint: lowercases INSTALL_FFMPEG, skips install if binaries already present, delegates to the restricted wrapper via sudo, and exec-replaces itself with CMD arguments. |
| nemo_retriever/src/nemo_retriever/audio/media_interface.py | Replaces monolithic _FFMPEG_AVAILABLE flag with granular per-tool availability checks, adds centralized media_dependency_error_message(), narrows import guard to except ImportError, adds explicit RuntimeError/OSError handlers. |
| nemo_retriever/helm/templates/deployment.yaml | Adds installFfmpeg rendering and a duplicate-env guard placed before the topology branch so it applies to both standalone and split modes. |
| nemo_retriever/tests/test_container_ffmpeg_install.py | Good coverage of the new runtime install contract. One assertion uses a whitespace-sensitive literal to detect sudo in the Dockerfile. |
| nemo_retriever/tests/test_media_dependency_availability.py | Comprehensive unit tests for each availability check and per-operation dependency requirements. |
| docs/docs/extraction/deployment-options.md | Updated to describe the runtime install path via service.installFfmpeg=true; previously incorrect Kubernetes claims have been corrected. |
Sequence Diagram
sequenceDiagram
participant User
participant Container as Service Container (nemo user)
participant Entrypoint as retriever-service-entrypoint
participant Sudo as sudo
participant Wrapper as retriever-install-ffmpeg (root)
participant APT as apt-get
User->>Container: "docker run -e INSTALL_FFMPEG=true ..."
Container->>Entrypoint: "exec entrypoint $@"
Entrypoint->>Entrypoint: lowercase INSTALL_FFMPEG
alt ffmpeg/ffprobe already present
Entrypoint->>Entrypoint: skip install
else not present
Entrypoint->>Sudo: sudo /usr/local/sbin/retriever-install-ffmpeg
Sudo->>Wrapper: exec as root (NOPASSWD sudoers rule)
Wrapper->>Wrapper: reject if args present
Wrapper->>APT: /usr/bin/apt-get update
APT-->>Wrapper: ok
Wrapper->>APT: /usr/bin/apt-get install -y ffmpeg
APT-->>Wrapper: installs ffmpeg + ffprobe
Wrapper->>APT: /usr/bin/apt-get clean
Wrapper-->>Sudo: exit 0
Sudo-->>Entrypoint: ok
end
Entrypoint->>Container: exec retriever service start ...
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
nemo_retriever/src/nemo_retriever/audio/media_interface.py:424-426
`is_media_available()` manually replicates the three-check conjunction instead of delegating to `missing_media_dependencies()`. If `MEDIA_DEPENDENCIES` ever gains a fourth entry (e.g. a new codec tool), `is_media_available()` would silently fall out of sync and return `True` even when the new dependency is absent, while `media_dependency_error_message()` would correctly report it missing.
```suggestion
def is_media_available() -> bool:
"""True if the full audio/video media pipeline can run."""
return not missing_media_dependencies()
```
### Issue 2 of 2
nemo_retriever/tests/test_container_ffmpeg_install.py:38
This assertion checks for `sudo` via a whitespace-sensitive literal that embeds the exact six-space indentation and the continuation backslash from the `apt-get` block. Any reformatting of that RUN layer (e.g. switching to four-space indentation) would silently break this test while the actual Dockerfile stays correct. Checking for the `sudo` package in the package list without caring about surrounding whitespace is more robust.
```suggestion
self.assertIn("sudo", dockerfile)
```
Reviews (14): Last reviewed commit: "Fix ffmpeg installer policy test in serv..." | Re-trigger Greptile
| @@ -26,11 +26,59 @@ | |||
|
|
|||
| try: | |||
There was a problem hiding this comment.
couldnt we do all these checks here, that way they are not peppered throughout the code modules. This way all the gating logic is always at the top of the module that is affected. Wdyt?
There was a problem hiding this comment.
Think the biggest thing we'd want to be careful around is collapsing the import-check into just "are ffmpeg-python & ffmpeg/ffprobe available?" as it looks like there are methods of extraction (i.e. frame extraction) that exclusively make use of ffmpeg but not ffprobe
IMO, I would feel better about keeping this PR restricted mostly to media interface import contract stuff (making it non-default, ensuring users an informative error message when it's not available), and following up with a media interface clean-up PR post-release that cleanly centralizes the import checks
jperez999
left a comment
There was a problem hiding this comment.
Seems like this PR does not actually allow for the user to activate the installation on running the container. We want this to actually pull down the built container and before it starts running the service, it should look for this env var and it should install ffmpeg if activated. docker run --env INSTALL_FFMPEG=true ...
00e6fa9 to
ea0e927
Compare
|
Just a reminder to merge this into the 26.05 branch as well when ready. |
(cherry picked from commit 13ee35b)
Description
Make ffmpeg/ffprobe optional in the bundled service image by removing the custom source-build installer and replacing the old build-time path with a runtime install flow.
The service image now keeps ffmpeg/ffprobe out by default. When audio or video extraction is needed, users can set
INSTALL_FFMPEG=truefor the service container, or setservice.installFfmpeg=truein Helm. The service entrypoint installs the Ubuntuffmpegpackage at startup through narrowly scoped passwordlesssudo, which provides bothffmpegandffprobe.This PR also:
ffmpeg-python,ffmpeg, andffprobechecks so missing dependencies produce actionable install guidance.ffmpeg-pythonplus theffmpegCLI, while media splitting/probing still requiresffprobe.service.installFfmpegvalue and a duplicate-env guard so users cannot set bothservice.installFfmpeg=trueandINSTALL_FFMPEGmanually inservice.env.Validation:
env UV_CACHE_DIR=/localhome/charlesb/.cache/tmp/uv-cache uv run --directory nemo_retriever --extra dev pytest tests/test_container_ffmpeg_install.py tests/test_media_dependency_availability.py -qgit diff --checkbash -n docker/scripts/retriever_service_entrypoint.shblack --checkandflake8on touched Python filesChecklist