Llmflux using api#98
Conversation
…ompatible service on a SLURM compute node Add llmflux connect <job_id> command that waits for the model to finish loading, pings the endpoint, and prints connection details with example code Extend vLLM and Ollama batch scripts with mode="serve" to handle dynamic port finding, API key injection, and connection file writing after health check passes Update llmflux jobs and llmflux status to display job type (serve/batch) and show serve-specific fields (endpoint, API key, email) instead of input/output paths
There was a problem hiding this comment.
Pull request overview
Adds a new “serve” workflow to LLMFlux’s SLURM integration, enabling long-running OpenAI-compatible endpoints (vLLM/Ollama) and a companion “connect” command to discover/ping the service and print usage info.
Changes:
- Introduces
SlurmRunner.serve()to submit a long-running SLURM job, generate an API key, and record serve-job metadata in the local job registry. - Extends SLURM job script generators (vLLM/Ollama) with a
mode="serve"branch that finds a free port, starts the server with an API key, and writesconnection.json+ sends a readiness email. - Adds
llmflux connect, adds a TYPE column tollmflux jobs, and updatesllmflux statusto show endpoint/API key/email for serve jobs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/llmflux/slurm/runner.py | Adds serve() to submit long-running serving jobs and persist API key/email/type metadata in the registry. |
| src/llmflux/slurm/engine/vllm.py | Adds serve-mode branching: port selection, API-key flag, connection file writing, and readiness email. |
| src/llmflux/slurm/engine/ollama.py | Adds serve-mode branching: port selection, API-key env injection, connection file writing, and readiness email. |
| src/llmflux/slurm/connection.py | New helper module to read/poll connection info and ping the endpoint for llmflux connect. |
| src/llmflux/slurm/init.py | Re-exports new connection helpers. |
| src/llmflux/cli.py | Adds serve/connect subcommands and updates jobs/status output to handle serve jobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
joshfactorial
left a comment
There was a problem hiding this comment.
Had Claude do it's own review. It found similar things to Copilot, so they are probably worth addressing. They sound similar to work we did on another project.
|
@Ralhazmy1 Can you also update the readme with your notes on how to run this? |
- Validate job_id with isdigit() in _connection_file_path to prevent path traversal (e.g. ../../.ssh) from user-supplied arguments - Quote email address with shlex.quote() before shell interpolation and reference via $LLMFLUX_EMAIL to prevent injection - Write connection.json with umask 077 + chmod 600 and parent dir chmod 700 to prevent API key leakage on shared filesystems - Enforce 0600/0700 permissions on ~/.llmflux/jobs.json and its parent directory; atomic tmp-file write in save_jobs() - Check st_uid == os.getuid() before reading connection file to reject attacker-planted files on world-writable paths - Sanitize node/model/engine/api_key with _sanitize() before printing to prevent ANSI terminal injection from malicious connection files Error handling: - serve() raises ValueError instead of returning "1" on config/ validation errors; CLI catches ValueError and CalledProcessError and returns exit code 1 without printing a success message - serve() returns None (not "unknown") when sbatch output is empty; CLI guards against falsy job_id before printing submission details - Validate required fields (node, port) in connection file before use and return exit code 1 with a clear message instead of crashing - Wrap int(info["port"]) in try/except to handle non-numeric values Behaviour: - Change --mail-type from END,FAIL to FAIL only; custom ready email handles notification, avoiding a confusing second email on shutdown - Make --time a required argument for llmflux serve Tests: - Add TestSlurmRunnerServe (7 tests) covering job ID return, API key env injection, registry metadata, invalid model, engine mismatch, serve-mode script content, and empty sbatch output - Add TestServeCommand, TestConnectCommand, TestJobsServeType, and TestStatusServeView in test_cli.py covering argument parsing, registry checks, state gating, and serve vs batch display logic Docs: - Add Interactive Serving section to README covering serve, connect, status, and cancel workflows; update --help output to include new subcommands
Vismayak
left a comment
There was a problem hiding this comment.
just a couple of comments for now, am unable to test it in CampusCluster
| def __init__(self, registry_file: str = os.path.expanduser("~/.llmflux/jobs.json")): | ||
| self.registry_file = Path(registry_file) | ||
| self.registry_file.parent.mkdir(parents=True, exist_ok=True) | ||
| self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True) |
There was a problem hiding this comment.
Do we need the mode parameter here?
| self.registry_file.parent.mkdir(parents=True, exist_ok=True) | ||
| self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True) | ||
| if not self.registry_file.exists(): | ||
| self.registry_file.touch(mode=0o600) |
There was a problem hiding this comment.
It is so no one can just access the jobIDs and api keys of all the serve requests
| print() | ||
| print("Example usage:") | ||
| print() | ||
| print(" from openai import OpenAI") |
wrote tests for slurm/connection.py
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 33056202 | Triggered | Generic High Entropy Secret | f80f170 | tests/slurm/test_connection.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
joshfactorial
left a comment
There was a problem hiding this comment.
A couple small issues found by claude. I merged in my connect.py tests. The trap for job cancelation I think is something we need for sure, as I have run into this issue and it could represent a real security flaw.
| type=int, | ||
| default=8000, | ||
| help="Local port to bind on the head node (default: 8000)", | ||
| ) |
There was a problem hiding this comment.
Given the above comment in connect.py - "local_port: Unused — kept for CLI compatibility. Access is direct to node:port." - make this help say the same thing so users aren't misled.
| """ | ||
| env = self._setup_environment() | ||
|
|
||
| try: |
There was a problem hiding this comment.
"bool" of any value never raises an error, so we can eliminate try/except
|
|
||
| # input_file/output_file are unused in serve mode but required by the | ||
| # engine function signatures (they only appear in the batch branch) | ||
| dummy = Path("") |
There was a problem hiding this comment.
Using an empty path may cause downstream problems. None or a clearly fake file name would be more clear later if we need to modify this or files that use this.
| "", | ||
| "# Cleanup", | ||
| *( ["rm -f \"$CONNECTION_FILE\""] if mode == "serve" else [] ), | ||
| "pkill -f \"vllm serve\" || true", |
There was a problem hiding this comment.
If the job is cancelled via scancel, SLURM sends SIGTERM and the script exits immediately without running the cleanup section. The connection.json (which contains the API key) will be left on disk. A trap handles this:
trap 'rm -f "$CONNECTION_FILE"; pkill -f "vllm serve" || true' EXIT TERM INT
|
The gitguardian warning is a false positive on test data. Cleared it. |
…rd failure Clusters like UIUC Campus Cluster firewall compute node ports from login nodes, so the direct HTTP ping fails even though the server is running. Now connect prints all endpoint info regardless and, when unreachable, shows the SSH tunnel command needed to forward the port. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ping failure is now a warning rather than a hard failure, so connect() returns 0. Also adds a test that the SSH tunnel hint is printed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fake API keys in test fixtures trigger false-positive secret detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a new serving mode to LLMFlux that lets users start a model as a long-running OpenAI-compatible endpoint on a compute node, without managing SLURM themselves.
llmflux serve— submits a SLURM job that starts vLLM or Ollama as a persistent service, generates a unique API key, and emails the user when the model is ready with full connection detailsllmflux connect <job_id>— waits for the model to finish loading, pings the endpoint, and prints the URL, API key, and copy-paste Python codellmflux jobs— now shows a TYPE column (serve / batch)llmflux status <job_id>— detects serve jobs and shows Endpoint, API Key, and Email instead of Input/OutputServe command:
`llmflux serve --model gemma-3-1b-it --engine vllm --email you@email.com --time 02:00:00'
note the job ID printed
show endpoint
llmflux connect <job_id>prints endpoint, API key, and example code
Verify the endpoint works in a test python script
`from openai import OpenAI
client = OpenAI(
base_url="http://:/v1",
api_key="llmflux-"
)
response = client.chat.completions.create(
model="google/gemma-3-1b-it",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)`
check job
llmflux jobs # TYPE column shows "serve"llmflux status <job_id> # shows Endpoint, API Key, Emailcancel job
llmflux cancel <job_id>