LLMUnity allows to integrate, run and deploy LLMs (Large Language Models) in the Unity engine.
LLMUnity is built on top of the awesome llama.cpp and llamafile libraries.
- 💻 Cross-platform! Supports Windows, Linux and macOS (supported versions)
- 🏠 Runs locally without internet access but also supports remote servers
- ⚡ Fast inference on CPU and GPU
- 🤗 Support of the major LLM models (supported models)
- 🔧 Easy to setup, call with a single line code
- 💰 Free to use for both personal and commercial purposes
To install the package you can follow the typical asset / package process in Unity:
Method 1: Install the asset using the asset store
- Open the LLMUnity asset page and click
Add to My Assets - Open the Package Manager:
Window > Package Manager - Select the
Packages: My Assetsoption from the drop-down - Select the
LLMUnitypackage, clickDownloadand thenImport
Method 2: Install the asset using the GitHub repo:
- Open the Package Manager:
Window > Package Manager - Click the
+button and selectAdd package from git URL - Use the repository URL
https://github.com/undreamai/LLMUnity.gitand clickAdd
For a step-by-step tutorial you can have a look at our guide:
Create a GameObject for the LLM ♟️:
- Create an empty GameObject. In the GameObject Inspector click
Add Componentand select the LLM script (Scripts>LLM). - Download the default model with the
Download Modelbutton (this will take a while as it is ~4GB).
You can also load your own model in .gguf format with theLoad modelbutton (see Use your own model). - Define the role of your AI in the
Prompt. You can also define the name of the AI (AI Mame) and the player (Player Name). - (Optional) By default the LLM script is set up to receive the reply from the model as is it is produced in real-time (recommended). If you prefer to receive the full reply in one go, you can deselect the
Streamoption. - (Optional) Adjust the server or model settings to your preference (see Options).
In your script you can then use it as follows 🦄:
public class MyScript {
public LLM llm;
void HandleReply(string reply){
// do something with the reply from the model
Debug.Log(reply);
}
void Game(){
// your game function
...
string message = "Hello bot!"
_ = llm.Chat(message, HandleReply);
...
}
}You can also specify a function to call when the model reply has been completed.
This is useful if the Stream option is selected for continuous output from the model (default behaviour):
void ReplyCompleted(){
// do something when the reply from the model is complete
Debug.Log("The AI replied");
}
void Game(){
// your game function
...
string message = "Hello bot!"
_ = llm.Chat(message, HandleReply, ReplyCompleted);
...
}- Finally, in the Inspector of the GameObject of your script, select the LLM GameObject created above as the llm property.
That's all ✨!
You can also:
Wait for the reply before proceeding to the next lines of code
For this you can use the async/await functionality:
async void Game(){
// your game function
...
string message = "Hello bot!"
await llm.Chat(message, HandleReply, ReplyCompleted);
...
}Process the prompt at the beginning of your app for faster initial processing time
void WarmupCompleted(){
// do something when the warmup is complete
Debug.Log("The AI is warm");
}
void Game(){
// your game function
...
_ = llm.Warmup(WarmupCompleted);
...
}The Samples~ folder contains several examples of interaction 🤖:
- SimpleInteraction: Demonstrates simple interaction between a player and a AI
- ServerClient: Demonstrates simple interaction between a player and multiple AIs using a
LLMand aLLMClient - ChatBot: Demonstrates interaction between a player and a AI with a UI similar to a messaging app (see image below)
If you install the package as an asset, the samples will already be in the Assets/Samples folder.
Otherwise if you install it with the GitHub URL, to install a sample:
- Open the Package Manager:
Window > Package Manager - Select the
LLMUnityPackage. From theSamplesTab, clickImportnext to the sample you want to install.
The samples can be run with the Scene.unity scene they contain inside their folder.
In the scene, select the LLM GameObject and click the Download Model button to download the default model.
You can also load your own model in .gguf format with the Load model button (see Use your own model).
Save the scene, run and enjoy!
Alternative models can be downloaded from HuggingFace.
The required model format is .gguf as defined by the llama.cpp.
The easiest way is to download gguf models directly by TheBloke who has converted an astonishing number of models 🌈!
Otherwise other model formats can be converted to gguf with the convert.py script of the llama.cpp as described here.
❕ Before using any model make sure you check their license ❕
In addition to the LLM server functionality, LLMUnity defines the LLMClient client class that handles the client functionality.
The LLMClient contains a subset of options of the LLM class described in the Options.
It can be used to have multiple clients with different options e.g. different prompts that use the same server.
This is important as multiple server instances would require additional compute resources.
To use multiple instances, you can define one LLM GameObject (as described in How to use) and then multiple LLMClient objects.
See the ServerClient sample for a server-client example.
The LLMClient can be configured to connect to a remote instance by providing the IP address of the server in the host property.
The server can be either a LLMUnity server or a standard llama.cpp server.
Show/Hide Advanced OptionsToggle to show/hide advanced options from below
Num Threadsnumber of threads to use (default: -1 = all)Num GPU Layersnumber of model layers to offload to the GPU. If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
If the user's GPU is not supported, the LLM will fall back to the CPUStreamselect to receive the reply from the model as it is produced (recommended!).
If it is not selected, the full reply from the model is received in one go- Advanced options:
Parallel Promptsnumber of prompts that can happen in parallel (default: -1 = number of LLM/LLMClient objects)Debugselect to log the output of the model in the Unity EditorPortport to run the server
Download modelclick to download the default model (Mistral 7B Instruct)Load modelclick to load your own model in .gguf formatLoad loraclick to load a LORA model in .bin formatModelthe model being used (inside the Assets/StreamingAssets folder)Lorathe LORA model being used (inside the Assets/StreamingAssets folder)- Advanced options:
Context SizeSize of the prompt context (0 = context size of the model)Batch SizeBatch size for prompt processing (default: 512)Seedseed for reproducibility. For random results every time select -1TemperatureLLM temperature, lower values give more deterministic answersTop Ktop-k sampling (default: 40, 0 = disabled)Top Ptop-p sampling (default: 0.9, 1.0 = disabled)Num Predictnumber of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)
Player Namethe name of the playerAI Namethe name of the AIPrompta description of the AI role
The license of LLMUnity is MIT (LICENSE.md) and uses third-party software with MIT and Apache licenses (Third Party Notices.md).


