Skip to content

Llmflux using api#98

Open
Ralhazmy1 wants to merge 11 commits into
mainfrom
llmflux_using_api
Open

Llmflux using api#98
Ralhazmy1 wants to merge 11 commits into
mainfrom
llmflux_using_api

Conversation

@Ralhazmy1

@Ralhazmy1 Ralhazmy1 commented May 20, 2026

Copy link
Copy Markdown
Contributor

Adds a new serving mode to LLMFlux that lets users start a model as a long-running OpenAI-compatible endpoint on a compute node, without managing SLURM themselves.

llmflux serve — submits a SLURM job that starts vLLM or Ollama as a persistent service, generates a unique API key, and emails the user when the model is ready with full connection details
llmflux connect <job_id> — waits for the model to finish loading, pings the endpoint, and prints the URL, API key, and copy-paste Python code
llmflux jobs — now shows a TYPE column (serve / batch)
llmflux status <job_id> — detects serve jobs and shows Endpoint, API Key, and Email instead of Input/Output

  1. llmflux serve generates a secrets.token_hex(16) API key at submission time and stores it in the local job registry alongside the email
  2. The SLURM job finds a free port on the compute node using ss, starts the engine with the API key injected via environment, then runs a health check loop
  3. Once the health check passes, the job writes ~/.llmflux/serve/<job_id>/connection.json with the node, port, model, and API key — this file's existence signals readiness
  4. A custom email is sent from the job script at this point with the full endpoint details and example code
  5. llmflux connect polls for the connection file, then does a live HTTP ping before displaying connection info — no SSH tunnel needed since the head node reaches compute nodes directly on the cluster's internal network

Serve command:
`llmflux serve --model gemma-3-1b-it --engine vllm --email you@email.com --time 02:00:00'
note the job ID printed

show endpoint
llmflux connect <job_id>
prints endpoint, API key, and example code

Verify the endpoint works in a test python script
`from openai import OpenAI

client = OpenAI(
base_url="http://:/v1",
api_key="llmflux-"
)
response = client.chat.completions.create(
model="google/gemma-3-1b-it",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)`

check job
llmflux jobs # TYPE column shows "serve"
llmflux status <job_id> # shows Endpoint, API Key, Email

cancel job
llmflux cancel <job_id>

Raghid Alhazmy added 2 commits May 19, 2026 12:52
…ompatible service on a SLURM compute node

Add llmflux connect <job_id> command that waits for the model to finish loading, pings the endpoint, and prints connection details with example code

Extend vLLM and Ollama batch scripts with mode="serve" to handle dynamic port finding, API key injection, and connection file writing after health check passes

Update llmflux jobs and llmflux status to display job type (serve/batch) and show serve-specific fields (endpoint, API key, email) instead of input/output paths
@Ralhazmy1 Ralhazmy1 self-assigned this May 20, 2026
@Ralhazmy1 Ralhazmy1 linked an issue May 20, 2026 that may be closed by this pull request
2 tasks
@Vismayak Vismayak requested a review from Copilot May 20, 2026 19:46

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “serve” workflow to LLMFlux’s SLURM integration, enabling long-running OpenAI-compatible endpoints (vLLM/Ollama) and a companion “connect” command to discover/ping the service and print usage info.

Changes:

  • Introduces SlurmRunner.serve() to submit a long-running SLURM job, generate an API key, and record serve-job metadata in the local job registry.
  • Extends SLURM job script generators (vLLM/Ollama) with a mode="serve" branch that finds a free port, starts the server with an API key, and writes connection.json + sends a readiness email.
  • Adds llmflux connect, adds a TYPE column to llmflux jobs, and updates llmflux status to show endpoint/API key/email for serve jobs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/llmflux/slurm/runner.py Adds serve() to submit long-running serving jobs and persist API key/email/type metadata in the registry.
src/llmflux/slurm/engine/vllm.py Adds serve-mode branching: port selection, API-key flag, connection file writing, and readiness email.
src/llmflux/slurm/engine/ollama.py Adds serve-mode branching: port selection, API-key env injection, connection file writing, and readiness email.
src/llmflux/slurm/connection.py New helper module to read/poll connection info and ping the endpoint for llmflux connect.
src/llmflux/slurm/init.py Re-exports new connection helpers.
src/llmflux/cli.py Adds serve/connect subcommands and updates jobs/status output to handle serve jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/llmflux/slurm/runner.py Outdated
Comment thread src/llmflux/slurm/runner.py Outdated
Comment thread src/llmflux/slurm/runner.py
Comment thread src/llmflux/slurm/engine/vllm.py
Comment thread src/llmflux/slurm/engine/vllm.py Outdated
Comment thread src/llmflux/slurm/engine/ollama.py Outdated
Comment thread src/llmflux/slurm/connection.py Outdated
Comment thread src/llmflux/cli.py Outdated
Comment thread src/llmflux/cli.py
Comment thread src/llmflux/cli.py
@joshfactorial joshfactorial self-requested a review May 20, 2026 21:54

@joshfactorial joshfactorial left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had Claude do it's own review. It found similar things to Copilot, so they are probably worth addressing. They sound similar to work we did on another project.

Comment thread src/llmflux/slurm/connection.py Outdated
Comment thread src/llmflux/slurm/connection.py
Comment thread src/llmflux/slurm/connection.py Outdated
Comment thread src/llmflux/slurm/connection.py
Comment thread src/llmflux/slurm/connection.py
@joshfactorial

Copy link
Copy Markdown
Collaborator

@Ralhazmy1 Can you also update the readme with your notes on how to run this?

Raghid Alhazmy added 2 commits June 12, 2026 10:45
- Validate job_id with isdigit() in _connection_file_path to prevent
  path traversal (e.g. ../../.ssh) from user-supplied arguments
- Quote email address with shlex.quote() before shell interpolation
  and reference via $LLMFLUX_EMAIL to prevent injection
- Write connection.json with umask 077 + chmod 600 and parent dir
  chmod 700 to prevent API key leakage on shared filesystems
- Enforce 0600/0700 permissions on ~/.llmflux/jobs.json and its
  parent directory; atomic tmp-file write in save_jobs()
- Check st_uid == os.getuid() before reading connection file to
  reject attacker-planted files on world-writable paths
- Sanitize node/model/engine/api_key with _sanitize() before printing
  to prevent ANSI terminal injection from malicious connection files
Error handling:
- serve() raises ValueError instead of returning "1" on config/
  validation errors; CLI catches ValueError and CalledProcessError
  and returns exit code 1 without printing a success message
- serve() returns None (not "unknown") when sbatch output is empty;
  CLI guards against falsy job_id before printing submission details
- Validate required fields (node, port) in connection file before use
  and return exit code 1 with a clear message instead of crashing
- Wrap int(info["port"]) in try/except to handle non-numeric values
Behaviour:
- Change --mail-type from END,FAIL to FAIL only; custom ready email
  handles notification, avoiding a confusing second email on shutdown
- Make --time a required argument for llmflux serve
Tests:
- Add TestSlurmRunnerServe (7 tests) covering job ID return, API key
  env injection, registry metadata, invalid model, engine mismatch,
  serve-mode script content, and empty sbatch output
- Add TestServeCommand, TestConnectCommand, TestJobsServeType, and
  TestStatusServeView in test_cli.py covering argument parsing,
  registry checks, state gating, and serve vs batch display logic
Docs:
- Add Interactive Serving section to README covering serve, connect,
  status, and cancel workflows; update --help output to include new subcommands

@Vismayak Vismayak left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of comments for now, am unable to test it in CampusCluster

def __init__(self, registry_file: str = os.path.expanduser("~/.llmflux/jobs.json")):
self.registry_file = Path(registry_file)
self.registry_file.parent.mkdir(parents=True, exist_ok=True)
self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the mode parameter here?

self.registry_file.parent.mkdir(parents=True, exist_ok=True)
self.registry_file.parent.mkdir(mode=0o700, parents=True, exist_ok=True)
if not self.registry_file.exists():
self.registry_file.touch(mode=0o600)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is so no one can just access the jobIDs and api keys of all the serve requests

Comment thread src/llmflux/slurm/connection.py Outdated
print()
print("Example usage:")
print()
print(" from openai import OpenAI")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Removing the little space before every line can help keep a consistent indent so we can copy paste it directly to a file and run it.

Image

@gitguardian

gitguardian Bot commented Jun 18, 2026

Copy link
Copy Markdown

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
33056202 Triggered Generic High Entropy Secret f80f170 tests/slurm/test_connection.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@joshfactorial joshfactorial left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small issues found by claude. I merged in my connect.py tests. The trap for job cancelation I think is something we need for sure, as I have run into this issue and it could represent a real security flaw.

Comment thread src/llmflux/cli.py
type=int,
default=8000,
help="Local port to bind on the head node (default: 8000)",
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above comment in connect.py - "local_port: Unused — kept for CLI compatibility. Access is direct to node:port." - make this help say the same thing so users aren't misled.

"""
env = self._setup_environment()

try:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bool" of any value never raises an error, so we can eliminate try/except


# input_file/output_file are unused in serve mode but required by the
# engine function signatures (they only appear in the batch branch)
dummy = Path("")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an empty path may cause downstream problems. None or a clearly fake file name would be more clear later if we need to modify this or files that use this.

"",
"# Cleanup",
*( ["rm -f \"$CONNECTION_FILE\""] if mode == "serve" else [] ),
"pkill -f \"vllm serve\" || true",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the job is cancelled via scancel, SLURM sends SIGTERM and the script exits immediately without running the cleanup section. The connection.json (which contains the API key) will be left on disk. A trap handles this:

trap 'rm -f "$CONNECTION_FILE"; pkill -f "vllm serve" || true' EXIT TERM INT

Comment thread src/llmflux/slurm/connection.py
@joshfactorial

Copy link
Copy Markdown
Collaborator

The gitguardian warning is a false positive on test data. Cleared it.

Raghid Alhazmy and others added 4 commits June 18, 2026 13:32
…rd failure

Clusters like UIUC Campus Cluster firewall compute node ports from login
nodes, so the direct HTTP ping fails even though the server is running.
Now connect prints all endpoint info regardless and, when unreachable,
shows the SSH tunnel command needed to forward the port.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ping failure is now a warning rather than a hard failure, so connect()
returns 0. Also adds a test that the SSH tunnel hint is printed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fake API keys in test fixtures trigger false-positive secret detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLMFLUX API

5 participants