Ollama is great, but it doesn't support authentication nor caching. This small wrapper provides an Api-key based authentication.
-
Install nvidia gpu docker support on your host. Instructions are in the Ollama Docker Page. Don't do the run
-
Create a
.envfile based on the example
- If you already have a set of salted keys, add the local path to the file containing the keys, one per line, to the
API_KEY_FILEvariable and add the salt to theAPI_KEY_SALTvariable. - If you don't have salted keys already, just plain text ones, keep the
API_KEY_SALTvariable empty, and a salt will be created for you - If you don't have any API keys already, you can either create some random ones and add them to
- Make sure the
OLLAMA_HOSTandOLLAMA_PORTvariables point to your ollama server. The ones included in the example work fine if you launch ollama with default setting on Linux, so far.
-
Launch with
docker compose up -
If you didn't provide any keys, check the terminal for a valid API key
-
After running, launch your models (will pull if needed), e.g.
docker exec -it ollama_server run gpt-oss:20b
You can now call your Ollama's /api/generate endpoint by POSTing to the ollprox container's call_model endpoint. By default, this is mapped to http://localhost:8000/call_model. Request's have the same format as Ollama expects according to the doc. But now you have to add the api key as a header called apikey to the request.
This can be done in CURL with
curl -X POST http://localhost:8000/call_model \
-H "APIKEY: secretgardenkey" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss:20b",
"prompt": "What is the capital of France?"
}'
Or in Python requests with
import requests
response = requests.post(
"http://localhost:8000/call_model",
headers={"APIKEY": "secretgardenkey"},
json={"model": "llama2", "prompt": "What is the capital of France?"}
)
print(response.json())
- Just change the key's file
- If you revoke a key, it might take up to
$KEY_REFRESH$ seconds for it to be invalidated