LLM Providers Guide
LibreFang ships with a comprehensive model catalog covering 3 native LLM drivers, 49 providers, 230+ builtin models, and 23 aliases. Every provider uses one of three battle-tested drivers: the native Anthropic driver, the native Gemini driver, or the universal OpenAI-compatible driver. This guide is the single source of truth for configuring, selecting, and managing LLM providers in LibreFang.
The model catalog also supports dynamic loading — you can add custom provider definitions by placing TOML files in ~/.librefang/providers/. Any file matching ~/.librefang/providers/*.toml is merged into the catalog at boot, allowing you to add private endpoints, on-premises deployments, or new providers without modifying the core configuration.
Table of Contents
- Quick Setup
- Provider Reference
- Model Catalog
- Model Aliases
- Per-Agent Model Override
- Model Routing
- Cost Tracking
- Fallback Providers
- API Endpoints
- Channel Commands
Quick Setup
The fastest path from zero to running:
# Pick ONE provider — set its env var — done.
export GEMINI_API_KEY="your-key" # Free tier available
# OR
export GROQ_API_KEY="your-key" # Free tier available
# OR
export ANTHROPIC_API_KEY="your-key"
# OR
export OPENAI_API_KEY="your-key"
LibreFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.
For Gemini specifically, either GEMINI_API_KEY or GOOGLE_API_KEY will work.
Provider Reference
1. Anthropic
| Display Name | Anthropic |
| Driver | Native Anthropic (Messages API) |
| Env Var | ANTHROPIC_API_KEY |
| Base URL | https://api.anthropic.com |
| Key Required | Yes |
| Free Tier | No |
| Auth | x-api-key header |
| Models | 7 |
Available Models:
claude-opus-4-20250514(Frontier)claude-sonnet-4-20250514(Smart)claude-haiku-4-5-20251001(Fast)
Setup:
- Sign up at console.anthropic.com
- Create an API key under Settings > API Keys
export ANTHROPIC_API_KEY="sk-ant-..."
2. OpenAI
| Display Name | OpenAI |
| Driver | OpenAI-compatible |
| Env Var | OPENAI_API_KEY |
| Base URL | https://api.openai.com/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 18 |
Available Models:
gpt-4.1(Frontier)gpt-4o(Smart)o3-mini(Smart)gpt-4.1-mini(Balanced)gpt-4o-mini(Fast)gpt-4.1-nano(Fast)
Setup:
- Sign up at platform.openai.com
- Create an API key under API Keys
export OPENAI_API_KEY="sk-..."
3. Google Gemini
| Display Name | Google Gemini |
| Driver | Native Gemini (generateContent API) |
| Env Var | GEMINI_API_KEY (or GOOGLE_API_KEY) |
| Base URL | https://generativelanguage.googleapis.com |
| Key Required | Yes |
| Free Tier | Yes (generous free tier) |
| Auth | x-goog-api-key header |
| Models | 10 |
Available Models:
gemini-2.5-pro(Frontier)gemini-2.5-flash(Smart)gemini-2.0-flash(Fast)
Setup:
- Go to aistudio.google.com
- Get an API key (free tier included)
export GEMINI_API_KEY="AIza..."orexport GOOGLE_API_KEY="AIza..."
Notes: The Gemini driver is a fully native implementation. It is not OpenAI-compatible. Model goes in the URL path, system prompt via systemInstruction, tools via functionDeclarations, streaming via streamGenerateContent?alt=sse.
4. DeepSeek
| Display Name | DeepSeek |
| Driver | OpenAI-compatible |
| Env Var | DEEPSEEK_API_KEY |
| Base URL | https://api.deepseek.com/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 4 |
Available Models:
deepseek-chat(Smart) -- DeepSeek V3deepseek-reasoner(Smart) -- DeepSeek R1, no tool support
Setup:
- Sign up at platform.deepseek.com
- Create an API key
export DEEPSEEK_API_KEY="sk-..."
5. Groq
| Display Name | Groq |
| Driver | OpenAI-compatible |
| Env Var | GROQ_API_KEY |
| Base URL | https://api.groq.com/openai/v1 |
| Key Required | Yes |
| Free Tier | Yes (rate-limited) |
| Auth | Authorization: Bearer header |
| Models | 10 |
Available Models:
llama-3.3-70b-versatile(Balanced)mixtral-8x7b-32768(Balanced)llama-3.1-8b-instant(Fast)gemma2-9b-it(Fast)
Setup:
- Sign up at console.groq.com
- Create an API key
export GROQ_API_KEY="gsk_..."
Notes: Groq runs open-source models on custom LPU hardware. Extremely fast inference. Free tier has rate limits but is very usable.
6. OpenRouter
| Display Name | OpenRouter |
| Driver | OpenAI-compatible |
| Env Var | OPENROUTER_API_KEY |
| Base URL | https://openrouter.ai/api/v1 |
| Key Required | Yes |
| Free Tier | Yes (8 free models including Step 3.5 Flash, DeepSeek R1, Llama 3.1 8B, etc.) |
| Auth | Authorization: Bearer header |
| Models | 17 |
Available Models:
openrouter/google/gemini-2.5-flash(Smart) -- cheap, fast, 1M context (default)openrouter/anthropic/claude-sonnet-4(Smart) -- strong reasoning + toolsopenrouter/openai/gpt-4o(Smart) -- GPT-4o via OpenRouteropenrouter/deepseek/deepseek-chat(Smart) -- DeepSeek V3openrouter/meta-llama/llama-3.3-70b-instruct(Balanced) -- Llama 3.3 70Bopenrouter/qwen/qwen-2.5-72b-instruct(Balanced) -- Qwen 2.5 72Bopenrouter/google/gemini-2.5-pro(Frontier) -- Gemini 2.5 Proopenrouter/mistralai/mistral-large-latest(Smart) -- Mistral Largeopenrouter/google/gemma-2-9b-it(Fast) -- Gemma 2 9B, freeopenrouter/deepseek/deepseek-r1(Frontier) -- DeepSeek R1 reasoning
Setup:
- Sign up at openrouter.ai
- Create an API key under Keys
export OPENROUTER_API_KEY="sk-or-..."
Notes: OpenRouter is a unified gateway to 200+ models from many providers. Model IDs use the upstream format (e.g. google/gemini-2.5-flash). You can use any model from OpenRouter's catalog by specifying the full model path with the openrouter/ prefix.
7. Mistral AI
| Display Name | Mistral AI |
| Driver | OpenAI-compatible |
| Env Var | MISTRAL_API_KEY |
| Base URL | https://api.mistral.ai/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 6 |
Available Models:
mistral-large-latest(Smart)codestral-latest(Smart)mistral-small-latest(Fast)
Setup:
- Sign up at console.mistral.ai
- Create an API key
export MISTRAL_API_KEY="..."
8. Together AI
| Display Name | Together AI |
| Driver | OpenAI-compatible |
| Env Var | TOGETHER_API_KEY |
| Base URL | https://api.together.xyz/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits on signup) |
| Auth | Authorization: Bearer header |
| Models | 8 |
Available Models:
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo(Frontier)Qwen/Qwen2.5-72B-Instruct-Turbo(Smart)mistralai/Mixtral-8x22B-Instruct-v0.1(Balanced)
Setup:
- Sign up at api.together.ai
- Create an API key
export TOGETHER_API_KEY="..."
9. Fireworks AI
| Display Name | Fireworks AI |
| Driver | OpenAI-compatible |
| Env Var | FIREWORKS_API_KEY |
| Base URL | https://api.fireworks.ai/inference/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits on signup) |
| Auth | Authorization: Bearer header |
| Models | 5 |
Available Models:
accounts/fireworks/models/llama-v3p1-405b-instruct(Frontier)accounts/fireworks/models/mixtral-8x22b-instruct(Balanced)
Setup:
- Sign up at fireworks.ai
- Create an API key
export FIREWORKS_API_KEY="..."
10. Ollama
| Display Name | Ollama |
| Driver | OpenAI-compatible |
| Env Var | OLLAMA_API_KEY (not required) |
| Base URL | http://localhost:11434/v1 |
| Key Required | No |
| Free Tier | Free (local) |
| Auth | None (local) |
| Models | 3 builtin + auto-discovered |
Available Models (builtin):
llama3.2(Local)mistral:latest(Local)phi3(Local)
Setup:
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2 - Start the server:
ollama serve - No env var needed -- Ollama is always available
Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.
11. vLLM
| Display Name | vLLM |
| Driver | OpenAI-compatible |
| Env Var | VLLM_API_KEY (not required) |
| Base URL | http://localhost:8000/v1 |
| Key Required | No |
| Free Tier | Free (self-hosted) |
| Auth | None (local) |
| Models | 1 builtin + auto-discovered |
Available Models (builtin):
vllm-local(Local)
Setup:
- Install vLLM:
pip install vllm - Start the server:
python -m vllm.entrypoints.openai.api_server --model <model-name> - No env var needed
12. LM Studio
| Display Name | LM Studio |
| Driver | OpenAI-compatible |
| Env Var | LMSTUDIO_API_KEY (not required) |
| Base URL | http://localhost:1234/v1 |
| Key Required | No |
| Free Tier | Free (local) |
| Auth | None (local) |
| Models | 1 builtin + auto-discovered |
Available Models (builtin):
lmstudio-local(Local)
Setup:
- Download LM Studio from lmstudio.ai
- Download a model from the built-in model browser
- Start the local server from the "Local Server" tab
- No env var needed
13. Perplexity AI
| Display Name | Perplexity AI |
| Driver | OpenAI-compatible |
| Env Var | PERPLEXITY_API_KEY |
| Base URL | https://api.perplexity.ai |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 2 |
Available Models:
sonar-pro(Smart) -- online search-augmentedsonar(Balanced) -- online search-augmented
Setup:
- Sign up at perplexity.ai
- Go to API settings and generate a key
export PERPLEXITY_API_KEY="pplx-..."
Notes: Perplexity models have built-in web search. They do not support tool use.
14. Cohere
| Display Name | Cohere |
| Driver | OpenAI-compatible |
| Env Var | COHERE_API_KEY |
| Base URL | https://api.cohere.com/v2 |
| Key Required | Yes |
| Free Tier | Yes (rate-limited trial) |
| Auth | Authorization: Bearer header |
| Models | 2 |
Available Models:
command-r-plus(Smart)command-r(Balanced)
Setup:
- Sign up at dashboard.cohere.com
- Create an API key
export COHERE_API_KEY="..."
15. AI21 Labs
| Display Name | AI21 Labs |
| Driver | OpenAI-compatible |
| Env Var | AI21_API_KEY |
| Base URL | https://api.ai21.com/studio/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | 1 |
Available Models:
jamba-1.5-large(Smart)
Setup:
- Sign up at studio.ai21.com
- Create an API key
export AI21_API_KEY="..."
16. Cerebras
| Display Name | Cerebras |
| Driver | OpenAI-compatible |
| Env Var | CEREBRAS_API_KEY |
| Base URL | https://api.cerebras.ai/v1 |
| Key Required | Yes |
| Free Tier | Yes (generous free tier) |
| Auth | Authorization: Bearer header |
| Models | 2 |
Available Models:
cerebras/llama3.3-70b(Balanced)cerebras/llama3.1-8b(Fast)
Setup:
- Sign up at cloud.cerebras.ai
- Create an API key
export CEREBRAS_API_KEY="..."
Notes: Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap ($0.06/M tokens for both input and output on the 70B model).
17. SambaNova
| Display Name | SambaNova |
| Driver | OpenAI-compatible |
| Env Var | SAMBANOVA_API_KEY |
| Base URL | https://api.sambanova.ai/v1 |
| Key Required | Yes |
| Free Tier | Yes (3 free models) |
| Auth | Authorization: Bearer header |
| Models | 3 |
Available Models:
sambanova/llama-3.3-70b(Balanced)
Setup:
- Sign up at cloud.sambanova.ai
- Create an API key
export SAMBANOVA_API_KEY="..."
18. Hugging Face
| Display Name | Hugging Face |
| Driver | OpenAI-compatible |
| Env Var | HF_API_KEY |
| Base URL | https://api-inference.huggingface.co/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 1 |
Available Models:
hf/meta-llama/Llama-3.3-70B-Instruct(Balanced)
Setup:
- Sign up at huggingface.co
- Create a token under Settings > Access Tokens
export HF_API_KEY="hf_..."
19. xAI
| Display Name | xAI |
| Driver | OpenAI-compatible |
| Env Var | XAI_API_KEY |
| Base URL | https://api.x.ai/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited free credits) |
| Auth | Authorization: Bearer header |
| Models | 2 |
Available Models:
grok-2(Smart) -- supports visiongrok-2-mini(Fast)
Setup:
- Sign up at console.x.ai
- Create an API key
export XAI_API_KEY="xai-..."
20. Replicate
| Display Name | Replicate |
| Driver | OpenAI-compatible |
| Env Var | REPLICATE_API_TOKEN |
| Base URL | https://api.replicate.com/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 1 |
Available Models:
replicate/meta-llama-3.3-70b-instruct(Balanced)
Setup:
- Sign up at replicate.com
- Go to Account > API Tokens
export REPLICATE_API_TOKEN="r8_..."
21. Claude Code
| Display Name | Claude Code |
| Driver | Native Anthropic (Messages API) |
| Env Var | ANTHROPIC_API_KEY |
| Base URL | https://api.anthropic.com |
| Key Required | Yes |
| Free Tier | No |
| Auth | x-api-key header |
| Models | Claude models with extended tool use |
Notes: Claude Code is an Anthropic model variant optimized for agentic coding tasks. It uses the same API key and base URL as Anthropic but targets models tuned for long-horizon tool-use workflows.
22. NVIDIA NIM
| Display Name | NVIDIA NIM |
| Driver | OpenAI-compatible |
| Env Var | NVIDIA_API_KEY |
| Base URL | https://integrate.api.nvidia.com/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Llama, Mistral, and NVIDIA-optimized models |
Setup:
- Sign up at build.nvidia.com
- Create an API key
export NVIDIA_API_KEY="nvapi-..."
23. Voyage AI
| Display Name | Voyage AI |
| Driver | OpenAI-compatible |
| Env Var | VOYAGE_API_KEY |
| Base URL | https://api.voyageai.com/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Embedding and reranking models |
Notes: Voyage AI specializes in embedding and reranking models used for semantic search and RAG pipelines.
Setup:
- Sign up at voyageai.com
- Create an API key
export VOYAGE_API_KEY="pa-..."
24. Anyscale
| Display Name | Anyscale |
| Driver | OpenAI-compatible |
| Env Var | ANYSCALE_API_KEY |
| Base URL | https://api.endpoints.anyscale.com/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Open-source models (Llama, Mistral, etc.) |
Setup:
- Sign up at anyscale.com
- Create an API key
export ANYSCALE_API_KEY="esecret_..."
25. DeepInfra
| Display Name | DeepInfra |
| Driver | OpenAI-compatible |
| Env Var | DEEPINFRA_API_KEY |
| Base URL | https://api.deepinfra.com/v1/openai |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Open-source models at low cost |
Setup:
- Sign up at deepinfra.com
- Create an API key
export DEEPINFRA_API_KEY="..."
26. Azure OpenAI
| Display Name | Azure OpenAI |
| Driver | OpenAI-compatible |
| Env Var | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT |
| Base URL | https://<your-resource>.openai.azure.com/openai/deployments/<deployment> |
| Key Required | Yes |
| Free Tier | No |
| Auth | api-key header |
| Models | GPT-4o, GPT-4, and other Azure-hosted models |
Setup:
- Create an Azure OpenAI resource in the Azure Portal
- Deploy a model in Azure OpenAI Studio
- Set environment variables:
export AZURE_OPENAI_API_KEY="..." export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"
27. Amazon Bedrock
| Display Name | Amazon Bedrock |
| Driver | OpenAI-compatible (via Bedrock Converse API) |
| Env Var | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION |
| Base URL | AWS regional endpoint |
| Key Required | Yes (AWS credentials) |
| Free Tier | No |
| Auth | AWS Signature v4 |
| Models | Claude, Llama, Titan, Mistral via Bedrock |
Setup:
- Enable model access in the AWS Bedrock console
- Configure AWS credentials:
export AWS_ACCESS_KEY_ID="AKIA..." export AWS_SECRET_ACCESS_KEY="..." export AWS_REGION="us-east-1"
28. GitHub Copilot
| Display Name | GitHub Copilot |
| Driver | OpenAI-compatible (via Copilot token exchange) |
| Env Var | GITHUB_TOKEN |
| Base URL | https://api.githubcopilot.com |
| Key Required | Yes (GitHub PAT or OAuth token) |
| Free Tier | Included with GitHub Copilot subscription |
| Auth | OAuth PKCE flow; exchanges GitHub PAT for short-lived Copilot API token |
| Models | GitHub Copilot-hosted models (GPT-4o, Claude, etc.) |
Setup:
- Subscribe to GitHub Copilot
- Create a Personal Access Token with
copilotscope export GITHUB_TOKEN="ghp_..."
Notes: The Copilot driver handles OAuth PKCE token exchange automatically — it obtains a short-lived Copilot API token from https://api.github.com/copilot_internal/v2/token and caches it with auto-refresh. The Copilot API uses OpenAI-compatible chat completions format. Tokens are refreshed 5 minutes before expiry.
29. Aider
| Display Name | Aider |
| Type | CLI Provider |
| Driver | Subprocess (CLI) |
| Env Var | None (uses its own provider env vars) |
| Binary | aider (must be on PATH) |
| Key Required | No (uses Aider's own auth) |
| Free Tier | Depends on Aider's configured backend |
Setup:
- Install Aider:
pip install aider-install && aider-install - Configure Aider's LLM provider via its own env vars (e.g.
OPENAI_API_KEY) - No additional LibreFang configuration needed
Notes: CLI Provider — LibreFang spawns the aider binary as a subprocess in non-interactive mode (--message). Aider handles its own LLM provider authentication via standard environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). Aider's --yes-always, --no-auto-commits, and --no-git flags are applied automatically. Use AIDER_CLI_PATH to override the binary path.
30. Claude Code CLI
| Display Name | Claude Code CLI |
| Type | CLI Provider |
| Driver | Subprocess (CLI) |
| Env Var | None (uses its own OAuth session) |
| Binary | claude (must be on PATH) |
| Key Required | No (uses Claude Code's own session auth) |
| Free Tier | Depends on Claude Code subscription |
Setup:
- Install Claude Code:
npm install -g @anthropic-ai/claude-code - Authenticate:
claude auth login - No additional LibreFang configuration needed
Notes: CLI Provider — LibreFang spawns the claude binary as a subprocess in print mode (-p). The driver strips other providers' API keys from the subprocess environment to prevent leakage. Active subprocess PIDs are tracked and message timeouts (default 5 minutes) prevent hung processes from blocking agents. Vision input is supported via base64-encoded images.
31. Codex CLI
| Display Name | Codex CLI |
| Type | CLI Provider |
| Driver | Subprocess (CLI) |
| Env Var | OPENAI_API_KEY (used by the Codex CLI itself) |
| Binary | codex (must be on PATH) |
| Key Required | Yes (OpenAI API key for Codex CLI) |
| Free Tier | No |
Setup:
- Install Codex CLI:
npm install -g @openai/codex export OPENAI_API_KEY="sk-..."- No additional LibreFang configuration needed
Notes: CLI Provider — LibreFang spawns the codex binary as a subprocess in quiet mode (-q). The driver strips other providers' API keys from the subprocess environment (preserving only OPENAI_API_KEY and CODEX_* variables). This allows users with Codex CLI installed to use it as an LLM provider without additional configuration.
32. Gemini CLI
| Display Name | Gemini CLI |
| Type | CLI Provider |
| Driver | Subprocess (CLI) |
| Env Var | None (uses Google OAuth by default) |
| Binary | gemini (must be on PATH) |
| Key Required | No (uses Google OAuth) |
| Free Tier | Yes (via Google account) |
Setup:
- Install Gemini CLI:
npm install -g @google/gemini-cli - Authenticate:
gemini auth login - No additional LibreFang configuration needed
Notes: CLI Provider — LibreFang spawns the gemini binary as a subprocess in print mode (-p). The driver preserves GEMINI_* and GOOGLE_* environment variables while stripping other providers' secrets. No separate API key is needed when using Google OAuth authentication.
33. Qwen Code
| Display Name | Qwen Code |
| Type | CLI Provider |
| Driver | Subprocess (CLI) |
| Env Var | None (uses Qwen OAuth by default) |
| Binary | qwen (must be on PATH) |
| Key Required | No (uses Qwen OAuth) |
| Free Tier | Yes (via Alibaba Cloud account) |
Setup:
- Install Qwen Code:
npm install -g @alibaba/qwen-code - Authenticate:
qwen auth login - No additional LibreFang configuration needed
Notes: CLI Provider — LibreFang spawns the qwen binary as a subprocess in print mode (-p). The driver preserves QWEN_* environment variables while stripping other providers' secrets. Supports streaming JSON output from the Qwen Code CLI. No separate API key is needed when using Qwen OAuth authentication.
34. Qwen (DashScope)
| Display Name | Qwen |
| Driver | OpenAI-compatible |
| Env Var | DASHSCOPE_API_KEY |
| Base URL | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| Aliases | dashscope, model_studio |
| Key Required | Yes |
| Free Tier | Yes (limited credits on signup) |
| Auth | Authorization: Bearer header |
Regions:
| Region | Endpoint | API Key Env |
|---|---|---|
| (default) | dashscope.aliyuncs.com | DASHSCOPE_API_KEY |
intl | dashscope-intl.aliyuncs.com | DASHSCOPE_API_KEY |
us | dashscope-us.aliyuncs.com | DASHSCOPE_API_KEY |
Setup:
- Sign up at DashScope Console
- Create an API key
export DASHSCOPE_API_KEY="sk-..."- Optionally select a region in
config.toml:[provider_regions] qwen = "intl" # or "us"
Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.
35. MiniMax
| Display Name | MiniMax |
| Driver | OpenAI-compatible |
| Env Var | MINIMAX_API_KEY |
| Base URL | https://api.minimax.io/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
Regions:
| Region | Endpoint | API Key Env |
|---|---|---|
| (default) | api.minimax.io | MINIMAX_API_KEY |
china | api.minimaxi.com | MINIMAX_CN_API_KEY |
Setup:
- Sign up at minimax.io (international) or minimaxi.com (China)
- Create an API key
export MINIMAX_API_KEY="..."- For China region:
[provider_regions] minimax = "china"export MINIMAX_CN_API_KEY="..."
Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.
36. Vertex AI
| Display Name | Google Vertex AI |
| Driver | Native Gemini (generateContent API via Vertex) |
| Config Section | [vertex_ai] |
| Env Var | GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION |
| Base URL | https://<location>-aiplatform.googleapis.com |
| Key Required | Yes (service account JSON or gcloud CLI) |
| Free Tier | No |
| Auth | OAuth2 service account or gcloud auth print-access-token |
| Models | Gemini models via Google Cloud Vertex AI enterprise endpoint |
Setup:
- Enable Vertex AI API in Google Cloud Console
- Either create a service account key file, or authenticate with
gcloud auth application-default login - Set environment variables:
# Option A: Service account key file export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" export VERTEX_PROJECT="your-gcp-project" export VERTEX_LOCATION="us-central1" # Option B: gcloud CLI (no key file needed) gcloud auth application-default login export VERTEX_PROJECT="your-gcp-project" export VERTEX_LOCATION="us-central1" - Configure in
config.toml:[vertex_ai] project = "your-gcp-project" location = "us-central1"
Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.
Dynamic Provider Loading
Place custom provider definitions in ~/.librefang/providers/. Each .toml file defines one provider:
# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true
[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false
Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.
Provider Regions
Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:
# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY" # Optional: override the default API key env var
Select a region in config.toml:
[provider_regions]
qwen = "intl"
minimax = "china"
Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.
Model Catalog
The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.
| # | Model ID | Display Name | Provider | Tier | Context Window | Max Output | Input $/M | Output $/M | Tools | Vision |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | claude-opus-4-20250514 | Claude Opus 4 | anthropic | Frontier | 200,000 | 32,000 | $15.00 | $75.00 | Yes | Yes |
| 2 | claude-sonnet-4-20250514 | Claude Sonnet 4 | anthropic | Smart | 200,000 | 64,000 | $3.00 | $15.00 | Yes | Yes |
| 3 | claude-haiku-4-5-20251001 | Claude Haiku 4.5 | anthropic | Fast | 200,000 | 8,192 | $0.25 | $1.25 | Yes | Yes |
| 4 | gpt-4.1 | GPT-4.1 | openai | Frontier | 1,047,576 | 32,768 | $2.00 | $8.00 | Yes | Yes |
| 5 | gpt-4o | GPT-4o | openai | Smart | 128,000 | 16,384 | $2.50 | $10.00 | Yes | Yes |
| 6 | o3-mini | o3-mini | openai | Smart | 200,000 | 100,000 | $1.10 | $4.40 | Yes | No |
| 7 | gpt-4.1-mini | GPT-4.1 Mini | openai | Balanced | 1,047,576 | 32,768 | $0.40 | $1.60 | Yes | Yes |
| 8 | gpt-4o-mini | GPT-4o Mini | openai | Fast | 128,000 | 16,384 | $0.15 | $0.60 | Yes | Yes |
| 9 | gpt-4.1-nano | GPT-4.1 Nano | openai | Fast | 1,047,576 | 32,768 | $0.10 | $0.40 | Yes | No |
| 10 | gemini-2.5-pro | Gemini 2.5 Pro | gemini | Frontier | 1,048,576 | 65,536 | $1.25 | $10.00 | Yes | Yes |
| 11 | gemini-2.5-flash | Gemini 2.5 Flash | gemini | Smart | 1,048,576 | 65,536 | $0.15 | $0.60 | Yes | Yes |
| 12 | gemini-2.0-flash | Gemini 2.0 Flash | gemini | Fast | 1,048,576 | 8,192 | $0.10 | $0.40 | Yes | Yes |
| 13 | deepseek-chat | DeepSeek V3 | deepseek | Smart | 64,000 | 8,192 | $0.27 | $1.10 | Yes | No |
| 14 | deepseek-reasoner | DeepSeek R1 | deepseek | Smart | 64,000 | 8,192 | $0.55 | $2.19 | No | No |
| 15 | llama-3.3-70b-versatile | Llama 3.3 70B | groq | Balanced | 128,000 | 32,768 | $0.059 | $0.079 | Yes | No |
| 16 | mixtral-8x7b-32768 | Mixtral 8x7B | groq | Balanced | 32,768 | 4,096 | $0.024 | $0.024 | Yes | No |
| 17 | llama-3.1-8b-instant | Llama 3.1 8B | groq | Fast | 128,000 | 8,192 | $0.05 | $0.08 | Yes | No |
| 18 | gemma2-9b-it | Gemma 2 9B | groq | Fast | 8,192 | 4,096 | $0.02 | $0.02 | No | No |
| 19 | openrouter/google/gemini-2.5-flash | Gemini 2.5 Flash (OpenRouter) | openrouter | Smart | 1,048,576 | 65,536 | $0.15 | $0.60 | Yes | Yes |
| 20 | openrouter/anthropic/claude-sonnet-4 | Claude Sonnet 4 (OpenRouter) | openrouter | Smart | 200,000 | 64,000 | $3.00 | $15.00 | Yes | Yes |
| 21 | openrouter/openai/gpt-4o | GPT-4o (OpenRouter) | openrouter | Smart | 128,000 | 16,384 | $2.50 | $10.00 | Yes | Yes |
| 22 | openrouter/deepseek/deepseek-chat | DeepSeek V3 (OpenRouter) | openrouter | Smart | 128,000 | 32,768 | $0.14 | $0.28 | Yes | No |
| 23 | openrouter/meta-llama/llama-3.3-70b-instruct | Llama 3.3 70B (OpenRouter) | openrouter | Balanced | 128,000 | 32,768 | $0.39 | $0.39 | Yes | No |
| 24 | openrouter/qwen/qwen-2.5-72b-instruct | Qwen 2.5 72B (OpenRouter) | openrouter | Balanced | 128,000 | 32,768 | $0.36 | $0.36 | Yes | No |
| 25 | openrouter/google/gemini-2.5-pro | Gemini 2.5 Pro (OpenRouter) | openrouter | Frontier | 1,048,576 | 65,536 | $1.25 | $10.00 | Yes | Yes |
| 26 | openrouter/mistralai/mistral-large-latest | Mistral Large (OpenRouter) | openrouter | Smart | 128,000 | 8,192 | $2.00 | $6.00 | Yes | No |
| 27 | openrouter/google/gemma-2-9b-it | Gemma 2 9B (OpenRouter) | openrouter | Fast | 8,192 | 4,096 | $0.00 | $0.00 | No | No |
| 28 | openrouter/deepseek/deepseek-r1 | DeepSeek R1 (OpenRouter) | openrouter | Frontier | 128,000 | 32,768 | $0.55 | $2.19 | No | No |
| 29 | mistral-large-latest | Mistral Large | mistral | Smart | 128,000 | 8,192 | $2.00 | $6.00 | Yes | No |
| 30 | codestral-latest | Codestral | mistral | Smart | 32,000 | 8,192 | $0.30 | $0.90 | Yes | No |
| 31 | mistral-small-latest | Mistral Small | mistral | Fast | 128,000 | 8,192 | $0.10 | $0.30 | Yes | No |
| 32 | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | Llama 3.1 405B (Together) | together | Frontier | 130,000 | 4,096 | $3.50 | $3.50 | Yes | No |
| 33 | Qwen/Qwen2.5-72B-Instruct-Turbo | Qwen 2.5 72B (Together) | together | Smart | 32,768 | 4,096 | $0.20 | $0.60 | Yes | No |
| 34 | mistralai/Mixtral-8x22B-Instruct-v0.1 | Mixtral 8x22B (Together) | together | Balanced | 65,536 | 4,096 | $0.60 | $0.60 | Yes | No |
| 35 | accounts/fireworks/models/llama-v3p1-405b-instruct | Llama 3.1 405B (Fireworks) | fireworks | Frontier | 131,072 | 16,384 | $3.00 | $3.00 | Yes | No |
| 36 | accounts/fireworks/models/mixtral-8x22b-instruct | Mixtral 8x22B (Fireworks) | fireworks | Balanced | 65,536 | 4,096 | $0.90 | $0.90 | Yes | No |
| 37 | llama3.2 | Llama 3.2 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | Yes | No |
| 38 | mistral:latest | Mistral (Ollama) | ollama | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 39 | phi3 | Phi-3 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | No | No |
| 40 | vllm-local | vLLM Local Model | vllm | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 41 | lmstudio-local | LM Studio Local Model | lmstudio | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 42 | sonar-pro | Sonar Pro | perplexity | Smart | 200,000 | 8,192 | $3.00 | $15.00 | No | No |
| 43 | sonar | Sonar | perplexity | Balanced | 128,000 | 8,192 | $1.00 | $5.00 | No | No |
| 44 | command-r-plus | Command R+ | cohere | Smart | 128,000 | 4,096 | $2.50 | $10.00 | Yes | No |
| 45 | command-r | Command R | cohere | Balanced | 128,000 | 4,096 | $0.15 | $0.60 | Yes | No |
| 46 | jamba-1.5-large | Jamba 1.5 Large | ai21 | Smart | 256,000 | 4,096 | $2.00 | $8.00 | Yes | No |
| 47 | cerebras/llama3.3-70b | Llama 3.3 70B (Cerebras) | cerebras | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
| 48 | cerebras/llama3.1-8b | Llama 3.1 8B (Cerebras) | cerebras | Fast | 128,000 | 8,192 | $0.01 | $0.01 | Yes | No |
| 49 | sambanova/llama-3.3-70b | Llama 3.3 70B (SambaNova) | sambanova | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
| 50 | grok-2 | Grok 2 | xai | Smart | 131,072 | 32,768 | $2.00 | $10.00 | Yes | Yes |
| 51 | grok-2-mini | Grok 2 Mini | xai | Fast | 131,072 | 32,768 | $0.30 | $0.50 | Yes | No |
| 52 | hf/meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B (HF) | huggingface | Balanced | 128,000 | 4,096 | $0.30 | $0.30 | No | No |
| 53 | replicate/meta-llama-3.3-70b-instruct | Llama 3.3 70B (Replicate) | replicate | Balanced | 128,000 | 4,096 | $0.40 | $0.40 | No | No |
Model Tiers:
| Tier | Description | Typical Use |
|---|---|---|
| Frontier | Most capable, highest cost | Orchestration, architecture, security audits |
| Smart | Strong reasoning, moderate cost | Coding, code review, research, analysis |
| Balanced | Good cost/quality tradeoff | Planning, writing, DevOps, day-to-day tasks |
| Fast | Cheapest cloud inference | Ops, translation, simple Q&A, health checks |
| Local | Self-hosted, zero cost | Privacy-first, offline, development |
Notes:
- Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with
Localtier and zero cost. - The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.
Model Aliases
All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.
| Alias | Resolves To |
|---|---|
sonnet | claude-sonnet-4-20250514 |
claude-sonnet | claude-sonnet-4-20250514 |
haiku | claude-haiku-4-5-20251001 |
claude-haiku | claude-haiku-4-5-20251001 |
opus | claude-opus-4-20250514 |
claude-opus | claude-opus-4-20250514 |
gpt4 | gpt-4o |
gpt4o | gpt-4o |
gpt4-mini | gpt-4o-mini |
flash | gemini-2.5-flash |
gemini-flash | gemini-2.5-flash |
gemini-pro | gemini-2.5-pro |
deepseek | deepseek-chat |
llama | llama-3.3-70b-versatile |
llama-70b | llama-3.3-70b-versatile |
mixtral | mixtral-8x7b-32768 |
mistral | mistral-large-latest |
codestral | codestral-latest |
grok | grok-2 |
grok-mini | grok-2-mini |
sonar | sonar-pro |
jamba | jamba-1.5-large |
command-r | command-r-plus |
You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.
Per-Agent Model Override
Each agent in your config.toml can specify its own model, overriding the global default:
# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"
# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus" # alias for claude-opus-4-20250514
[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile" # cheap Groq model for simple ops
[[agents]]
name = "coder"
model = "gemini-2.5-flash" # fast + cheap + 1M context
[[agents]]
name = "researcher"
model = "sonar-pro" # Perplexity with built-in web search
# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514" # never auto-routed
When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.
Model Routing
LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.
How It Works
- The ModelRouter scores each incoming
CompletionRequestbased on heuristics - The score maps to a TaskComplexity tier:
Simple,Medium, orComplex - Each tier has a pre-configured model
Scoring Heuristics
| Signal | Weight | Logic |
|---|---|---|
| Total message length | 1 point per ~4 chars | Rough token proxy |
| Tool availability | +20 per tool defined | Tools imply multi-step work |
| Code markers | +30 per marker found | Backticks, fn, def, class, import, function, async, await, struct, impl, return |
| Conversation depth | +15 per message > 10 | Deep context = harder reasoning |
| System prompt length | +1 per 10 chars > 500 | Long system prompts imply complex tasks |
Thresholds
| Complexity | Score Range | Default Model |
|---|---|---|
| Simple | score < 100 | claude-haiku-4-5-20251001 |
| Medium | 100 <= score < 500 | claude-sonnet-4-20250514 |
| Complex | score >= 500 | claude-sonnet-4-20250514 |
Configuration
# In agent manifest or config.toml
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500
The router also integrates with the model catalog:
validate_models()checks that all configured model IDs exist in the catalogresolve_aliases()expands aliases to canonical IDs (e.g.,"sonnet"becomes"claude-sonnet-4-20250514")
Cost Tracking
LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.
Per-Response Cost Estimation
After each LLM call, cost is calculated as:
cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate
The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.
Cost Rates (per million tokens)
| Model Pattern | Input $/M | Output $/M |
|---|---|---|
*haiku* | $0.25 | $1.25 |
*sonnet* | $3.00 | $15.00 |
*opus* | $15.00 | $75.00 |
gpt-4o-mini | $0.15 | $0.60 |
gpt-4o | $2.50 | $10.00 |
gpt-4.1-nano | $0.10 | $0.40 |
gpt-4.1-mini | $0.40 | $1.60 |
gpt-4.1 | $2.00 | $8.00 |
o3-mini | $1.10 | $4.40 |
gemini-2.5-pro | $1.25 | $10.00 |
gemini-2.5-flash | $0.15 | $0.60 |
gemini-2.0-flash | $0.10 | $0.40 |
deepseek-reasoner / deepseek-r1 | $0.55 | $2.19 |
*deepseek* | $0.27 | $1.10 |
*cerebras* | $0.06 | $0.06 |
*sambanova* | $0.06 | $0.06 |
*replicate* | $0.40 | $0.40 |
*llama* / *mixtral* | $0.05 | $0.10 |
*qwen* | $0.20 | $0.60 |
mistral-large* | $2.00 | $6.00 |
*mistral* (other) | $0.10 | $0.30 |
command-r-plus | $2.50 | $10.00 |
command-r | $0.15 | $0.60 |
sonar-pro | $3.00 | $15.00 |
*sonar* (other) | $1.00 | $5.00 |
grok-2-mini / grok-mini | $0.30 | $0.50 |
*grok* (other) | $2.00 | $10.00 |
*jamba* | $2.00 | $8.00 |
| Default (unknown) | $1.00 | $3.00 |
Quota Enforcement
Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.
# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00 # cap at $5/hour
The usage footer (when enabled) appends cost information to each response:
> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514
Fallback Providers
The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.
Behavior
- On success: returns immediately
- On rate limit / overload errors (
429,529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff) - On all other errors: logs a warning and tries the next driver in the chain
- If all drivers fail: returns the last error
Configuration
Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.
# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]
The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).
API Endpoints
List All Models
GET /api/models
Returns the complete model catalog with metadata, pricing, and feature flags.
Response:
[
{
"id": "claude-sonnet-4-20250514",
"display_name": "Claude Sonnet 4",
"provider": "anthropic",
"tier": "Smart",
"context_window": 200000,
"max_output_tokens": 64000,
"input_cost_per_m": 3.0,
"output_cost_per_m": 15.0,
"supports_tools": true,
"supports_vision": true,
"supports_streaming": true,
"aliases": ["sonnet", "claude-sonnet"]
}
]
Get Specific Model
GET /api/models/{id}
Returns a single model entry. Supports both canonical IDs and aliases.
GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514
List Aliases
GET /api/models/aliases
Returns a map of all alias-to-canonical-ID mappings.
Response:
{
"sonnet": "claude-sonnet-4-20250514",
"haiku": "claude-haiku-4-5-20251001",
"flash": "gemini-2.5-flash",
"grok": "grok-2"
}
List Providers
GET /api/providers
Returns all 49 providers with auth status and model counts.
Response:
[
{
"id": "anthropic",
"display_name": "Anthropic",
"api_key_env": "ANTHROPIC_API_KEY",
"base_url": "https://api.anthropic.com",
"key_required": true,
"auth_status": "Configured",
"model_count": 3
},
{
"id": "ollama",
"display_name": "Ollama",
"api_key_env": "OLLAMA_API_KEY",
"base_url": "http://localhost:11434/v1",
"key_required": false,
"auth_status": "NotRequired",
"model_count": 5
}
]
Auth status values: Configured, Missing, NotRequired.
Set Provider API Key
POST /api/providers/{name}/key
Content-Type: application/json
{ "api_key": "sk-..." }
Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).
Remove Provider API Key
DELETE /api/providers/{name}/key
Removes the configured API key for a provider.
Test Provider Connection
POST /api/providers/{name}/test
Sends a minimal test request to verify the provider is reachable and the API key is valid.
Channel Commands
Two chat commands are available in any channel for inspecting models and providers:
/models
Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).
/models
Example output:
Available models (12):
Frontier:
claude-opus-4-20250514 (Anthropic) — 200K ctx
gemini-2.5-pro (Google Gemini) — 1M ctx
Smart:
claude-sonnet-4-20250514 (Anthropic) — 200K ctx
gemini-2.5-flash (Google Gemini) — 1M ctx
deepseek-chat (DeepSeek) — 64K ctx
Balanced:
llama-3.3-70b-versatile (Groq) — 128K ctx
Fast:
claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
gemini-2.0-flash (Google Gemini) — 1M ctx
Local:
llama3.2 (Ollama) — 128K ctx
/providers
Lists all 49 providers with their authentication status.
/providers
Example output:
LLM Providers (49):
Anthropic ANTHROPIC_API_KEY Configured 3 models
OpenAI OPENAI_API_KEY Missing 6 models
Google Gemini GEMINI_API_KEY Configured 3 models
DeepSeek DEEPSEEK_API_KEY Missing 2 models
Groq GROQ_API_KEY Configured 4 models
Ollama (no key needed) Ready 3 models
vLLM (no key needed) Ready 1 model
LM Studio (no key needed) Ready 1 model
...
Environment Variables Summary
Quick reference for all provider environment variables:
| Provider | Env Var | Required |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | Yes |
| OpenAI | OPENAI_API_KEY | Yes |
| Google Gemini | GEMINI_API_KEY or GOOGLE_API_KEY | Yes |
| DeepSeek | DEEPSEEK_API_KEY | Yes |
| Groq | GROQ_API_KEY | Yes |
| OpenRouter | OPENROUTER_API_KEY | Yes |
| Mistral AI | MISTRAL_API_KEY | Yes |
| Together AI | TOGETHER_API_KEY | Yes |
| Fireworks AI | FIREWORKS_API_KEY | Yes |
| Ollama | OLLAMA_API_KEY | No |
| vLLM | VLLM_API_KEY | No |
| LM Studio | LMSTUDIO_API_KEY | No |
| Perplexity AI | PERPLEXITY_API_KEY | Yes |
| Cohere | COHERE_API_KEY | Yes |
| AI21 Labs | AI21_API_KEY | Yes |
| Cerebras | CEREBRAS_API_KEY | Yes |
| SambaNova | SAMBANOVA_API_KEY | Yes |
| Hugging Face | HF_API_KEY | Yes |
| xAI | XAI_API_KEY | Yes |
| Replicate | REPLICATE_API_TOKEN | Yes |
| Claude Code | ANTHROPIC_API_KEY | Yes |
| NVIDIA NIM | NVIDIA_API_KEY | Yes |
| Voyage AI | VOYAGE_API_KEY | Yes |
| Anyscale | ANYSCALE_API_KEY | Yes |
| DeepInfra | DEEPINFRA_API_KEY | Yes |
| Azure OpenAI | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT | Yes |
| Amazon Bedrock | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION | Yes |
| Google Vertex AI | GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION | Yes |
Security Notes
- All API keys are stored as
Zeroizing<String>-- the key material is automatically overwritten with zeros when the value is dropped from memory. - Auth detection (
detect_auth()) only checksstd::env::var()for presence -- it never reads or logs the actual secret value. - Provider API keys set via the REST API (
POST /api/providers/{name}/key) follow the same zeroization policy. - The health endpoint (
/api/health) never exposes provider auth status or API keys. Detailed info is behind/api/health/detailwhich requires authentication. - All
DriverConfigandKernelConfigstructs implementDebugwith secret redaction -- API keys are printed as"***"in logs.