Llmflux using api by Ralhazmy1 · Pull Request #98 · Center-for-AI-Innovation/LLMFlux

Ralhazmy1 · 2026-05-20T16:56:50Z

Adds a new serving mode to LLMFlux that lets users start a model as a long-running OpenAI-compatible endpoint on a compute node, without managing SLURM themselves.

llmflux serve — submits a SLURM job that starts vLLM or Ollama as a persistent service, generates a unique API key, and emails the user when the model is ready with full connection details
llmflux connect <job_id> — waits for the model to finish loading, pings the endpoint, and prints the URL, API key, and copy-paste Python code
llmflux jobs — now shows a TYPE column (serve / batch)
llmflux status <job_id> — detects serve jobs and shows Endpoint, API Key, and Email instead of Input/Output

llmflux serve generates a secrets.token_hex(16) API key at submission time and stores it in the local job registry alongside the email
The SLURM job finds a free port on the compute node using ss, starts the engine with the API key injected via environment, then runs a health check loop
Once the health check passes, the job writes ~/.llmflux/serve/<job_id>/connection.json with the node, port, model, and API key — this file's existence signals readiness
A custom email is sent from the job script at this point with the full endpoint details and example code
llmflux connect polls for the connection file, then does a live HTTP ping before displaying connection info — no SSH tunnel needed since the head node reaches compute nodes directly on the cluster's internal network

Serve command:
`llmflux serve --model gemma-3-1b-it --engine vllm --email you@email.com --time 02:00:00'
note the job ID printed

show endpoint
llmflux connect <job_id>
prints endpoint, API key, and example code

Verify the endpoint works in a test python script
`from openai import OpenAI

client = OpenAI(
base_url="http://:/v1",
api_key="llmflux-"
)
response = client.chat.completions.create(
model="google/gemma-3-1b-it",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)`

check job
llmflux jobs # TYPE column shows "serve"
llmflux status <job_id> # shows Endpoint, API Key, Email

cancel job
llmflux cancel <job_id>

…ompatible service on a SLURM compute node Add llmflux connect <job_id> command that waits for the model to finish loading, pings the endpoint, and prints connection details with example code Extend vLLM and Ollama batch scripts with mode="serve" to handle dynamic port finding, API key injection, and connection file writing after health check passes Update llmflux jobs and llmflux status to display job type (serve/batch) and show serve-specific fields (endpoint, API key, email) instead of input/output paths

Copilot

Pull request overview

Adds a new “serve” workflow to LLMFlux’s SLURM integration, enabling long-running OpenAI-compatible endpoints (vLLM/Ollama) and a companion “connect” command to discover/ping the service and print usage info.

Changes:

Introduces SlurmRunner.serve() to submit a long-running SLURM job, generate an API key, and record serve-job metadata in the local job registry.
Extends SLURM job script generators (vLLM/Ollama) with a mode="serve" branch that finds a free port, starts the server with an API key, and writes connection.json + sends a readiness email.
Adds llmflux connect, adds a TYPE column to llmflux jobs, and updates llmflux status to show endpoint/API key/email for serve jobs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
src/llmflux/slurm/runner.py	Adds `serve()` to submit long-running serving jobs and persist API key/email/type metadata in the registry.
src/llmflux/slurm/engine/vllm.py	Adds serve-mode branching: port selection, API-key flag, connection file writing, and readiness email.
src/llmflux/slurm/engine/ollama.py	Adds serve-mode branching: port selection, API-key env injection, connection file writing, and readiness email.
src/llmflux/slurm/connection.py	New helper module to read/poll connection info and ping the endpoint for `llmflux connect`.
src/llmflux/slurm/init.py	Re-exports new connection helpers.
src/llmflux/cli.py	Adds `serve`/`connect` subcommands and updates jobs/status output to handle serve jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

joshfactorial

Had Claude do it's own review. It found similar things to Copilot, so they are probably worth addressing. They sound similar to work we did on another project.

joshfactorial · 2026-05-28T19:23:44Z

@Ralhazmy1 Can you also update the readme with your notes on how to run this?

- Validate job_id with isdigit() in _connection_file_path to prevent path traversal (e.g. ../../.ssh) from user-supplied arguments - Quote email address with shlex.quote() before shell interpolation and reference via $LLMFLUX_EMAIL to prevent injection - Write connection.json with umask 077 + chmod 600 and parent dir chmod 700 to prevent API key leakage on shared filesystems - Enforce 0600/0700 permissions on ~/.llmflux/jobs.json and its parent directory; atomic tmp-file write in save_jobs() - Check st_uid == os.getuid() before reading connection file to reject attacker-planted files on world-writable paths - Sanitize node/model/engine/api_key with _sanitize() before printing to prevent ANSI terminal injection from malicious connection files Error handling: - serve() raises ValueError instead of returning "1" on config/ validation errors; CLI catches ValueError and CalledProcessError and returns exit code 1 without printing a success message - serve() returns None (not "unknown") when sbatch output is empty; CLI guards against falsy job_id before printing submission details - Validate required fields (node, port) in connection file before use and return exit code 1 with a clear message instead of crashing - Wrap int(info["port"]) in try/except to handle non-numeric values Behaviour: - Change --mail-type from END,FAIL to FAIL only; custom ready email handles notification, avoiding a confusing second email on shutdown - Make --time a required argument for llmflux serve Tests: - Add TestSlurmRunnerServe (7 tests) covering job ID return, API key env injection, registry metadata, invalid model, engine mismatch, serve-mode script content, and empty sbatch output - Add TestServeCommand, TestConnectCommand, TestJobsServeType, and TestStatusServeView in test_cli.py covering argument parsing, registry checks, state gating, and serve vs batch display logic Docs: - Add Interactive Serving section to README covering serve, connect, status, and cancel workflows; update --help output to include new subcommands

Vismayak

just a couple of comments for now, am unable to test it in CampusCluster

Vismayak · 2026-06-17T17:42:40Z

    def __init__(self, registry_file: str = os.path.expanduser("~/.llmflux/jobs.json")):
        self.registry_file = Path(registry_file)
-        self.registry_file.parent.mkdir(parents=True, exist_ok=True)
+        self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True)


Do we need the mode parameter here?

Vismayak · 2026-06-17T17:42:49Z

-        self.registry_file.parent.mkdir(parents=True, exist_ok=True)
+        self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True)
        if not self.registry_file.exists():
+            self.registry_file.touch(mode=0o600)


Is this line needed?

It is so no one can just access the jobIDs and api keys of all the serve requests

Vismayak · 2026-06-17T18:28:04Z

+    print()
+    print("Example usage:")
+    print()
+    print("  from openai import OpenAI")


Suggestion: Removing the little space before every line can help keep a consistent indent so we can copy paste it directly to a file and run it.

wrote tests for slurm/connection.py

gitguardian · 2026-06-18T14:51:32Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
33056202	Triggered	Generic High Entropy Secret	`f80f170`	tests/slurm/test_connection.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

joshfactorial

A couple small issues found by claude. I merged in my connect.py tests. The trap for job cancelation I think is something we need for sure, as I have run into this issue and it could represent a real security flaw.

joshfactorial · 2026-06-18T14:44:39Z

+        type=int,
+        default=8000,
+        help="Local port to bind on the head node (default: 8000)",
+    )


Given the above comment in connect.py - "local_port: Unused — kept for CLI compatibility. Access is direct to node:port." - make this help say the same thing so users aren't misled.

joshfactorial · 2026-06-18T14:46:01Z

+        """
+        env = self._setup_environment()
+
+        try:


"bool" of any value never raises an error, so we can eliminate try/except

joshfactorial · 2026-06-18T14:48:26Z

+
+        # input_file/output_file are unused in serve mode but required by the
+        # engine function signatures (they only appear in the batch branch)
+        dummy = Path("")


Using an empty path may cause downstream problems. None or a clearly fake file name would be more clear later if we need to modify this or files that use this.

joshfactorial · 2026-06-18T14:49:22Z

        "",
        "# Cleanup",
+        *( ["rm -f \"$CONNECTION_FILE\""] if mode == "serve" else [] ),
        "pkill -f \"vllm serve\" || true",


If the job is cancelled via scancel, SLURM sends SIGTERM and the script exits immediately without running the cleanup section. The connection.json (which contains the API key) will be left on disk. A trap handles this:

trap 'rm -f "$CONNECTION_FILE"; pkill -f "vllm serve" || true' EXIT TERM INT

joshfactorial · 2026-06-18T14:53:47Z

The gitguardian warning is a false positive on test data. Cleared it.

…ddleware

…rd failure Clusters like UIUC Campus Cluster firewall compute node ports from login nodes, so the direct HTTP ping fails even though the server is running. Now connect prints all endpoint info regardless and, when unreachable, shows the SSH tunnel command needed to forward the port. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ping failure is now a warning rather than a hard failure, so connect() returns 0. Also adds a test that the SSH tunnel hint is printed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fake API keys in test fixtures trigger false-positive secret detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Raghid Alhazmy added 2 commits May 19, 2026 12:52

Send email when ready

33dc9a6

Ralhazmy1 self-assigned this May 20, 2026

Ralhazmy1 linked an issue May 20, 2026 that may be closed by this pull request

LLMFLUX API #96

Open

2 tasks

Vismayak requested a review from Copilot May 20, 2026 19:46

Copilot started reviewing on behalf of Vismayak May 20, 2026 19:46 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

joshfactorial self-requested a review May 20, 2026 21:54

joshfactorial added 2 commits May 20, 2026 23:33

wrote tests for slurm/connection.py

f80f170

Fixing 'secret' (fake) exposure flagged by gitguardian

e8c31a1

joshfactorial requested changes May 21, 2026

View reviewed changes

Comment thread src/llmflux/slurm/connection.py Outdated

Comment thread src/llmflux/slurm/connection.py

Comment thread src/llmflux/slurm/connection.py Outdated

Comment thread src/llmflux/slurm/connection.py

Comment thread src/llmflux/slurm/connection.py

Raghid Alhazmy added 2 commits June 12, 2026 10:45

make time required and check if job id is valid

fdbcdb4

Vismayak reviewed Jun 17, 2026

View reviewed changes

Merge pull request #99 from Center-for-AI-Innovation/api_add_tests

2571133

wrote tests for slurm/connection.py

joshfactorial mentioned this pull request Jun 18, 2026

Security: SSRF risk via attacker-controlled node/port in connection.json #109

Open

joshfactorial reviewed Jun 18, 2026

View reviewed changes

Raghid Alhazmy and others added 4 commits June 18, 2026 13:32

Fix vLLM serve endpoint failing on clusters with broken prometheus mi…

69cbc58

…ddleware

test: update connect ping-failure test to expect zero exit code

4a40290

Ping failure is now a warning rather than a hard failure, so connect() returns 0. Also adds a test that the SSH tunnel hint is printed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ci: exclude tests/ from GitGuardian scanning

642e4ae

Fake API keys in test fixtures trigger false-positive secret detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Conversation

Ralhazmy1 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joshfactorial left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joshfactorial commented May 28, 2026

Uh oh!

Vismayak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gitguardian Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

joshfactorial left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joshfactorial commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ralhazmy1 commented May 20, 2026 •

edited

Loading

gitguardian Bot commented Jun 18, 2026 •

edited

Loading