Skip to content

feat: enable auth in Flame.#390

Open
k82cn wants to merge 2 commits into
xflops:mainfrom
k82cn:flm_389
Open

feat: enable auth in Flame.#390
k82cn wants to merge 2 commits into
xflops:mainfrom
k82cn:flm_389

Conversation

@k82cn
Copy link
Copy Markdown
Contributor

@k82cn k82cn commented Mar 29, 2026

No description provided.

Signed-off-by: Klaus Ma <klausm@nvidia.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design for JWT-based authentication in Flame, aimed at providing workspace isolation and secure access control across the cluster. The proposal includes specifications for user and session tokens, local validation logic, and updates to the CLI and SDKs. The review feedback identifies several critical improvements: implementing support for multiple public keys to allow zero-downtime key rotation, resolving contradictory pseudocode in the cache authorization section, and clarifying the configuration for symmetric encryption. Other suggestions focus on reducing redundancy in token claims, handling tilde expansion in file paths, and defining allowed character sets for workspaces to prevent parsing errors.

**Limitations:**
- Tokens must be manually distributed to users (no built-in user management)
- No automatic token rotation; users must regenerate expired tokens
- Single signing key per cluster (key rotation requires cluster restart)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The limitation "Single signing key per cluster (key rotation requires cluster restart)" presents a significant operational challenge, as it would cause downtime or service disruption during key rotation.

To support graceful key rotation, I suggest modifying the design to allow for multiple public keys for verification. The configuration could accept a list of public keys, for example:

auth:
  config:
    # ...
    public_key_files:
      - "/etc/flame/jwt-public-v1.pem"
      - "/etc/flame/jwt-public-v2.pem"

During verification, the server would try each key until one succeeds. This allows an administrator to add a new key, perform a rolling restart of components, start signing new tokens with the new private key, and eventually remove the old public key after all old tokens have expired, all without service interruption.

Comment on lines +1018 to +1049
Scope::Cluster => Ok(()),

// Workspace scope can only access objects in their workspaces
Scope::Workspace => {
if auth.workspaces.is_empty() {
return Err(AuthError::InvalidToken("workspaces required"));
}

if auth.workspaces.contains(&object_key.workspace) {
Ok(())
} else {
Err(AuthError::PermissionDenied(format!(
"Access denied to workspace {}", object_key.workspace
)))
}
}

// System scope (executors) can access workspaces they're bound to
Scope::System => {
// Executors have bound_workspaces in their context
// (set during executor registration)
if auth.can_access_bound_workspace(&object_key.workspace) {
Ok(())
} else {
Err(AuthError::PermissionDenied(format!(
"System scope not bound to workspace {}", object_key.workspace
)))
}
}
}
}
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of pseudocode appears to be a copy-paste error and contradicts the section's goal of only allowing session tokens for cache access. The preceding block (lines 988-1017) correctly implements the desired logic. This block introduces confusion by handling user-scoped tokens (workspace, cluster, system), which should be rejected. Please remove this block to maintain design consistency.

Comment on lines +57 to +58
config:
algorithm: "EdDSA" # Default: EdDSA. Options: EdDSA, RS256, HS256
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The design lists HS256 as a supported algorithm, but the configuration (private_key_file, public_key_file) and the flmadm token init command are described in terms of asymmetric key pairs. HS256 is a symmetric algorithm and uses a single shared secret.

The design should be updated to clarify how HS256 is handled:

  • How is the shared secret configured? A secret_key_file option in auth.config would be appropriate.
  • How does flmadm token init generate a secret for HS256?
  • How does flmadm token sign use the secret?

provider: "jwt" # Default: "jwt". Future: "oidc", "custom"
config:
algorithm: "EdDSA" # Default: EdDSA. Options: EdDSA, RS256, HS256
private_key_file: "/etc/flame/jwt-private.pem" # Private key for signing tokens (flmadm only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for private_key_file states it's for flmadm only. However, the "Session Token Flow" section (lines 145-173) and the implementation detail for create_session (lines 895-935) correctly state that the Session Manager also needs the private key to sign session-scoped tokens. The comment should be updated to reflect this.

Suggested change
private_key_file: "/etc/flame/jwt-private.pem" # Private key for signing tokens (flmadm only)
private_key_file: "/etc/flame/jwt-private.pem" # Private key for signing tokens (used by flmadm and session-manager)

"iat": 1711670400, // Issued at timestamp
"scope": "session", // Scope: always "session"
"workspace": "team-ml", // Workspace this session belongs to
"session_id": "session-abc123" // Session identifier
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the Session Token claims structure, both sub (subject) and session_id are present and appear to hold the same session identifier. The sub claim is the standard way to identify the principal that is the subject of the JWT. Having a duplicate session_id claim is redundant. To simplify the token structure and adhere to JWT best practices, I recommend removing the session_id claim and using sub exclusively to carry the session ID. This would also require updating the SessionClaims struct (lines 621-638) and its usage accordingly.

endpoint: "https://flame.example.com:8080"
token: "eyJhbGciOiJFZERTQSI..."
# Or reference a file
token_file: "~/.flame/token"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example configuration for flmctl uses token_file: "~/.flame/token". The tilde (~) is a shell construct and is not automatically expanded to the user's home directory by most application code, including Rust's standard file I/O.

The design should acknowledge this and specify that the implementation must explicitly handle tilde expansion (e.g., by using a library like shellexpand in Rust) to provide a good user experience. Otherwise, users might be confused when the path doesn't work as expected.

Comment on lines +790 to +808
pub fn parse(key: &str) -> Result<Self, FlameError> {
let parts: Vec<&str> = key.split('/').collect();
if parts.len() != 3 {
return Err(FlameError::InvalidConfig(format!(
"Invalid key format '{}': expected workspace/session_id/object_id",
key
)));
}
Ok(Self {
workspace: parts[0].to_string(),
session_id: parts[1].to_string(),
object_id: parts[2].to_string(),
})
}

pub fn to_string(&self) -> String {
format!("{}/{}/{}", self.workspace, self.session_id, self.object_id)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ObjectKey::parse implementation (lines 790-791) splits the key string by / to extract workspace, session_id, and object_id. This assumes these components do not contain /. While the design includes validation functions for session_id and object_id that prevent this, it's not explicitly mentioned for workspace.

To prevent ambiguity and potential parsing errors, the design document should explicitly define the allowed character set for workspace names and confirm that / is disallowed. This makes the parsing logic more robust and secure.

Signed-off-by: Klaus Ma <klausm@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant