Skip to content

Conversation

@ilopezluna
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the data types used for context-size and reasoning-budget parameters, standardizing them to int32 across the application. This change improves type consistency, especially when interacting with backend systems like llama.cpp which may expect specific integer sizes. It also introduces more robust handling of optional configuration values through the use of pointers and simplifies the configuration flow for certain backends and API endpoints.

Highlights

  • Type Refactoring: The data types for context-size and reasoning-budget parameters have been consistently changed from uint64 or int64 to int32 across the codebase. This affects various CLI commands, internal model configuration, and backend inference settings.
  • Pointer Usage for Configuration: Many configuration fields, such as ContextSize in BackendConfiguration and ModelPackageRequest, are now pointers to int32 (*int32). This allows for explicit differentiation between a zero value and an unset value, providing more precise control over configuration application.
  • MLX Backend Simplification: The MLX backend's GetMaxTokens function has been simplified to always return nil, indicating that MLX will now rely on its internal model defaults for context size rather than deriving it from external configurations. The associated test file (mlx_config_test.go) has been removed.
  • Ollama API Configuration Flow: The configureModel function in the Ollama API handler no longer returns the context size and has been streamlined. The logic for preloading models with a specific context size in handleGenerate has been removed, as configuration is now handled directly within configureModel.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily refactors the handling of context size and reasoning budget parameters across the codebase, standardizing their types from uint64 or int64 to *int32. This change impacts various CLI commands (compose, configure, package, mdltool), internal distribution logic (builder, mutate, store), and inference backend configurations (backend.go, llamacpp, vllm). Specifically, context size is now passed as a pointer to int32 and is only applied if explicitly set by a flag or a valid positive/unlimited value. The llamacpp backend now has an UnlimitedContextSize constant and defaults to 4096 if no context size is specified. The mlx backend's GetMaxTokens function was simplified to always return nil, effectively removing its ability to configure context size. The ollama HTTP handler was updated to use the new int32 types for context size and reasoning budget, and its model configuration logic was streamlined, removing a redundant preload step. Review comments highlighted a potential regression in the mlx backend's context size configuration, suggesting the previous logic be restored and adapted. Additionally, the review pointed out potential integer overflow issues when converting uint64 to int32 for context size in package.go and main.go, recommending validation checks. Minor suggestions included removing a redundant type cast in llamacpp.go and refining the context size validation logic in llamacpp_config.go and vllm_config.go to ensure consistency and prevent invalid values.

Comment on lines 67 to 69
func GetMaxTokens(modelCfg types.Config, backendCfg *inference.BackendConfiguration) *uint64 {
// Model config takes precedence
if modelCfg.ContextSize != nil {
return modelCfg.ContextSize
}
// else use backend config
if backendCfg != nil && backendCfg.ContextSize > 0 {
val := uint64(backendCfg.ContextSize)
return &val
}
// Return nil to let MLX use model defaults
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change removes the ability to configure the context size (--max-tokens) for the MLX backend. The GetMaxTokens function now always returns nil. This appears to be a regression, as the functionality existed before and the corresponding test file (mlx_config_test.go) has been deleted. If this removal is intentional, it should be documented. Otherwise, the previous logic should be restored and adapted for the new *int32 type for ContextSize.

Suggested change
func GetMaxTokens(modelCfg types.Config, backendCfg *inference.BackendConfiguration) *uint64 {
// Model config takes precedence
if modelCfg.ContextSize != nil {
return modelCfg.ContextSize
}
// else use backend config
if backendCfg != nil && backendCfg.ContextSize > 0 {
val := uint64(backendCfg.ContextSize)
return &val
}
// Return nil to let MLX use model defaults
return nil
}
func GetMaxTokens(modelCfg types.Config, backendCfg *inference.BackendConfiguration) *uint64 {
// Model config takes precedence
if modelCfg.ContextSize != nil && *modelCfg.ContextSize > 0 {
val := uint64(*modelCfg.ContextSize)
return &val
}
// Fallback to backend config
if backendCfg != nil && backendCfg.ContextSize != nil && *backendCfg.ContextSize > 0 {
val := uint64(*backendCfg.ContextSize)
return &val
}
// Return nil to let MLX use model defaults
return nil
}

Comment on lines +287 to 290
if cmd.Flags().Changed("context-size") {
cmd.PrintErrf("Setting context size %d\n", opts.contextSize)
pkg = pkg.WithContextSize(opts.contextSize)
pkg = pkg.WithContextSize(int32(opts.contextSize))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The conversion from uint64 to int32 for contextSize could lead to an integer overflow if a user provides a value larger than math.MaxInt32. While unlikely for a context size, adding a validation check would make the code more robust.

if cmd.Flags().Changed("context-size") {
		if opts.contextSize > 2147483647 { // math.MaxInt32
			return fmt.Errorf("context size %d is too large, must be less than or equal to 2147483647", opts.contextSize)
		}
		cmd.PrintErrf("Setting context size %d\n", opts.contextSize)
		pkg = pkg.WithContextSize(int32(opts.contextSize))
	}

Comment on lines 322 to 325
if contextSize > 0 {
fmt.Println("Setting context size:", contextSize)
b = b.WithContextSize(contextSize)
b = b.WithContextSize(int32(contextSize))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to another file, the conversion from uint64 to int32 for contextSize could overflow. It's best to add a check to ensure the value is within the valid range for an int32.

if contextSize > 0 {
		if contextSize > 2147483647 { // math.MaxInt32
			fmt.Fprintf(os.Stderr, "context size %d is too large, must be less than or equal to 2147483647\n", contextSize)
			return 1
		}
		fmt.Println("Setting context size:", contextSize)
		b = b.WithContextSize(int32(contextSize))
	}

configuredContextSize := GetContextSize(mdlConfig, config)
contextSize := int32(4096) // default context size
if configuredContextSize != nil {
contextSize = int32(*configuredContextSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type cast int32(*configuredContextSize) is redundant because configuredContextSize is of type *int32, so dereferencing it with * already yields an int32 value. You can remove the explicit cast to improve code clarity.

Suggested change
contextSize = int32(*configuredContextSize)
contextSize = *configuredContextSize

// Model config takes precedence
if modelCfg.ContextSize != nil {
return *modelCfg.ContextSize
if modelCfg.ContextSize != nil && (*modelCfg.ContextSize == UnlimitedContextSize || *modelCfg.ContextSize > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition *modelCfg.ContextSize == UnlimitedContextSize is also checked for the backendCfg. It might be cleaner to check for *modelCfg.ContextSize < 0 to handle all negative values, assuming only -1 is a special valid negative value. However, the current implementation is correct if -1 is the only special value intended.

Comment on lines 81 to 83
if modelCfg.ContextSize != nil {
return modelCfg.ContextSize
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for retrieving ContextSize from modelCfg doesn't check if the value is positive. This is inconsistent with the check for backendCfg and could lead to passing 0 or a negative value to vLLM if it's set in the model configuration. It's safer to ensure the context size is a positive value.

Suggested change
if modelCfg.ContextSize != nil {
return modelCfg.ContextSize
}
if modelCfg.ContextSize != nil && *modelCfg.ContextSize > 0 {
return modelCfg.ContextSize
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants