LLM Providers Guide

LibreFang ships with a comprehensive model catalog covering 3 native LLM drivers, 49 providers, 230+ builtin models, and 23 aliases. Every provider uses one of three battle-tested drivers: the native Anthropic driver, the native Gemini driver, or the universal OpenAI-compatible driver. This guide is the single source of truth for configuring, selecting, and managing LLM providers in LibreFang.

The model catalog also supports dynamic loading — you can add custom provider definitions by placing TOML files in ~/.librefang/providers/. Any file matching ~/.librefang/providers/*.toml is merged into the catalog at boot, allowing you to add private endpoints, on-premises deployments, or new providers without modifying the core configuration.

Quick Setup
Provider Reference
Model Catalog
Model Aliases
Per-Agent Model Override
Model Routing
Cost Tracking
Fallback Providers
API Endpoints
Channel Commands

Quick Setup

The fastest path from zero to running:

# Pick ONE provider — set its env var — done.
export GEMINI_API_KEY="your-key"        # Free tier available
# OR
export GROQ_API_KEY="your-key"          # Free tier available
# OR
export ANTHROPIC_API_KEY="your-key"
# OR
export OPENAI_API_KEY="your-key"

LibreFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.

For Gemini specifically, either GEMINI_API_KEY or GOOGLE_API_KEY will work.

Provider Reference

1. Anthropic


Display Name	Anthropic
Driver	Native Anthropic (Messages API)
Env Var	`ANTHROPIC_API_KEY`
Base URL	`https://api.anthropic.com`
Key Required	Yes
Free Tier	No
Auth	`x-api-key` header
Models	7

Available Models:

claude-opus-4-20250514 (Frontier)
claude-sonnet-4-20250514 (Smart)
claude-haiku-4-5-20251001 (Fast)

Setup:

Sign up at console.anthropic.com
Create an API key under Settings > API Keys
export ANTHROPIC_API_KEY="sk-ant-..."

2. OpenAI


Display Name	OpenAI
Driver	OpenAI-compatible
Env Var	`OPENAI_API_KEY`
Base URL	`https://api.openai.com/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	18

Available Models:

gpt-4.1 (Frontier)
gpt-4o (Smart)
o3-mini (Smart)
gpt-4.1-mini (Balanced)
gpt-4o-mini (Fast)
gpt-4.1-nano (Fast)

Setup:

Sign up at platform.openai.com
Create an API key under API Keys
export OPENAI_API_KEY="sk-..."

3. Google Gemini


Display Name	Google Gemini
Driver	Native Gemini (generateContent API)
Env Var	`GEMINI_API_KEY` (or `GOOGLE_API_KEY`)
Base URL	`https://generativelanguage.googleapis.com`
Key Required	Yes
Free Tier	Yes (generous free tier)
Auth	`x-goog-api-key` header
Models	10

Available Models:

gemini-2.5-pro (Frontier)
gemini-2.5-flash (Smart)
gemini-2.0-flash (Fast)

Setup:

Go to aistudio.google.com
Get an API key (free tier included)
export GEMINI_API_KEY="AIza..." or export GOOGLE_API_KEY="AIza..."

Notes: The Gemini driver is a fully native implementation. It is not OpenAI-compatible. Model goes in the URL path, system prompt via systemInstruction, tools via functionDeclarations, streaming via streamGenerateContent?alt=sse.

4. DeepSeek


Display Name	DeepSeek
Driver	OpenAI-compatible
Env Var	`DEEPSEEK_API_KEY`
Base URL	`https://api.deepseek.com/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	4

Available Models:

deepseek-chat (Smart) -- DeepSeek V3
deepseek-reasoner (Smart) -- DeepSeek R1, no tool support

Setup:

Sign up at platform.deepseek.com
Create an API key
export DEEPSEEK_API_KEY="sk-..."

5. Groq


Display Name	Groq
Driver	OpenAI-compatible
Env Var	`GROQ_API_KEY`
Base URL	`https://api.groq.com/openai/v1`
Key Required	Yes
Free Tier	Yes (rate-limited)
Auth	`Authorization: Bearer` header
Models	10

Available Models:

llama-3.3-70b-versatile (Balanced)
mixtral-8x7b-32768 (Balanced)
llama-3.1-8b-instant (Fast)
gemma2-9b-it (Fast)

Setup:

Sign up at console.groq.com
Create an API key
export GROQ_API_KEY="gsk_..."

Notes: Groq runs open-source models on custom LPU hardware. Extremely fast inference. Free tier has rate limits but is very usable.

6. OpenRouter


Display Name	OpenRouter
Driver	OpenAI-compatible
Env Var	`OPENROUTER_API_KEY`
Base URL	`https://openrouter.ai/api/v1`
Key Required	Yes
Free Tier	Yes (8 free models including Step 3.5 Flash, DeepSeek R1, Llama 3.1 8B, etc.)
Auth	`Authorization: Bearer` header
Models	17

Available Models:

openrouter/google/gemini-2.5-flash (Smart) -- cheap, fast, 1M context (default)
openrouter/anthropic/claude-sonnet-4 (Smart) -- strong reasoning + tools
openrouter/openai/gpt-4o (Smart) -- GPT-4o via OpenRouter
openrouter/deepseek/deepseek-chat (Smart) -- DeepSeek V3
openrouter/meta-llama/llama-3.3-70b-instruct (Balanced) -- Llama 3.3 70B
openrouter/qwen/qwen-2.5-72b-instruct (Balanced) -- Qwen 2.5 72B
openrouter/google/gemini-2.5-pro (Frontier) -- Gemini 2.5 Pro
openrouter/mistralai/mistral-large-latest (Smart) -- Mistral Large
openrouter/google/gemma-2-9b-it (Fast) -- Gemma 2 9B, free
openrouter/deepseek/deepseek-r1 (Frontier) -- DeepSeek R1 reasoning

Setup:

Sign up at openrouter.ai
Create an API key under Keys
export OPENROUTER_API_KEY="sk-or-..."

Notes: OpenRouter is a unified gateway to 200+ models from many providers. Model IDs use the upstream format (e.g. google/gemini-2.5-flash). You can use any model from OpenRouter's catalog by specifying the full model path with the openrouter/ prefix.

7. Mistral AI


Display Name	Mistral AI
Driver	OpenAI-compatible
Env Var	`MISTRAL_API_KEY`
Base URL	`https://api.mistral.ai/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	6

Available Models:

mistral-large-latest (Smart)
codestral-latest (Smart)
mistral-small-latest (Fast)

Setup:

Sign up at console.mistral.ai
Create an API key
export MISTRAL_API_KEY="..."

8. Together AI


Display Name	Together AI
Driver	OpenAI-compatible
Env Var	`TOGETHER_API_KEY`
Base URL	`https://api.together.xyz/v1`
Key Required	Yes
Free Tier	Yes (limited credits on signup)
Auth	`Authorization: Bearer` header
Models	8

Available Models:

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo (Frontier)
Qwen/Qwen2.5-72B-Instruct-Turbo (Smart)
mistralai/Mixtral-8x22B-Instruct-v0.1 (Balanced)

Setup:

Sign up at api.together.ai
Create an API key
export TOGETHER_API_KEY="..."

9. Fireworks AI


Display Name	Fireworks AI
Driver	OpenAI-compatible
Env Var	`FIREWORKS_API_KEY`
Base URL	`https://api.fireworks.ai/inference/v1`
Key Required	Yes
Free Tier	Yes (limited credits on signup)
Auth	`Authorization: Bearer` header
Models	5

Available Models:

accounts/fireworks/models/llama-v3p1-405b-instruct (Frontier)
accounts/fireworks/models/mixtral-8x22b-instruct (Balanced)

Setup:

Sign up at fireworks.ai
Create an API key
export FIREWORKS_API_KEY="..."

10. Ollama


Display Name	Ollama
Driver	OpenAI-compatible
Env Var	`OLLAMA_API_KEY` (not required)
Base URL	`http://localhost:11434/v1`
Key Required	No
Free Tier	Free (local)
Auth	None (local)
Models	3 builtin + auto-discovered

Available Models (builtin):

llama3.2 (Local)
mistral:latest (Local)
phi3 (Local)

Setup:

Install Ollama from ollama.com
Pull a model: ollama pull llama3.2
Start the server: ollama serve
No env var needed -- Ollama is always available

Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.

11. vLLM


Display Name	vLLM
Driver	OpenAI-compatible
Env Var	`VLLM_API_KEY` (not required)
Base URL	`http://localhost:8000/v1`
Key Required	No
Free Tier	Free (self-hosted)
Auth	None (local)
Models	1 builtin + auto-discovered

Available Models (builtin):

vllm-local (Local)

Setup:

Install vLLM: pip install vllm
Start the server: python -m vllm.entrypoints.openai.api_server --model <model-name>
No env var needed

12. LM Studio


Display Name	LM Studio
Driver	OpenAI-compatible
Env Var	`LMSTUDIO_API_KEY` (not required)
Base URL	`http://localhost:1234/v1`
Key Required	No
Free Tier	Free (local)
Auth	None (local)
Models	1 builtin + auto-discovered

Available Models (builtin):

lmstudio-local (Local)

Setup:

Download LM Studio from lmstudio.ai
Download a model from the built-in model browser
Start the local server from the "Local Server" tab
No env var needed

13. Perplexity AI


Display Name	Perplexity AI
Driver	OpenAI-compatible
Env Var	`PERPLEXITY_API_KEY`
Base URL	`https://api.perplexity.ai`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	2

Available Models:

sonar-pro (Smart) -- online search-augmented
sonar (Balanced) -- online search-augmented

Setup:

Sign up at perplexity.ai
Go to API settings and generate a key
export PERPLEXITY_API_KEY="pplx-..."

Notes: Perplexity models have built-in web search. They do not support tool use.

14. Cohere


Display Name	Cohere
Driver	OpenAI-compatible
Env Var	`COHERE_API_KEY`
Base URL	`https://api.cohere.com/v2`
Key Required	Yes
Free Tier	Yes (rate-limited trial)
Auth	`Authorization: Bearer` header
Models	2

Available Models:

command-r-plus (Smart)
command-r (Balanced)

Setup:

Sign up at dashboard.cohere.com
Create an API key
export COHERE_API_KEY="..."

15. AI21 Labs


Display Name	AI21 Labs
Driver	OpenAI-compatible
Env Var	`AI21_API_KEY`
Base URL	`https://api.ai21.com/studio/v1`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	1

Available Models:

jamba-1.5-large (Smart)

Setup:

Sign up at studio.ai21.com
Create an API key
export AI21_API_KEY="..."

16. Cerebras


Display Name	Cerebras
Driver	OpenAI-compatible
Env Var	`CEREBRAS_API_KEY`
Base URL	`https://api.cerebras.ai/v1`
Key Required	Yes
Free Tier	Yes (generous free tier)
Auth	`Authorization: Bearer` header
Models	2

Available Models:

cerebras/llama3.3-70b (Balanced)
cerebras/llama3.1-8b (Fast)

Setup:

Sign up at cloud.cerebras.ai
Create an API key
export CEREBRAS_API_KEY="..."

Notes: Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap ($0.06/M tokens for both input and output on the 70B model).

17. SambaNova


Display Name	SambaNova
Driver	OpenAI-compatible
Env Var	`SAMBANOVA_API_KEY`
Base URL	`https://api.sambanova.ai/v1`
Key Required	Yes
Free Tier	Yes (3 free models)
Auth	`Authorization: Bearer` header
Models	3

Available Models:

sambanova/llama-3.3-70b (Balanced)

Setup:

Sign up at cloud.sambanova.ai
Create an API key
export SAMBANOVA_API_KEY="..."

18. Hugging Face


Display Name	Hugging Face
Driver	OpenAI-compatible
Env Var	`HF_API_KEY`
Base URL	`https://api-inference.huggingface.co/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	1

Available Models:

hf/meta-llama/Llama-3.3-70B-Instruct (Balanced)

Setup:

Sign up at huggingface.co
Create a token under Settings > Access Tokens
export HF_API_KEY="hf_..."

19. xAI


Display Name	xAI
Driver	OpenAI-compatible
Env Var	`XAI_API_KEY`
Base URL	`https://api.x.ai/v1`
Key Required	Yes
Free Tier	Yes (limited free credits)
Auth	`Authorization: Bearer` header
Models	2

Available Models:

grok-2 (Smart) -- supports vision
grok-2-mini (Fast)

Setup:

Sign up at console.x.ai
Create an API key
export XAI_API_KEY="xai-..."

20. Replicate


Display Name	Replicate
Driver	OpenAI-compatible
Env Var	`REPLICATE_API_TOKEN`
Base URL	`https://api.replicate.com/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	1

Available Models:

replicate/meta-llama-3.3-70b-instruct (Balanced)

Setup:

Sign up at replicate.com
Go to Account > API Tokens
export REPLICATE_API_TOKEN="r8_..."

21. Claude Code


Display Name	Claude Code
Driver	Native Anthropic (Messages API)
Env Var	`ANTHROPIC_API_KEY`
Base URL	`https://api.anthropic.com`
Key Required	Yes
Free Tier	No
Auth	`x-api-key` header
Models	Claude models with extended tool use

Notes: Claude Code is an Anthropic model variant optimized for agentic coding tasks. It uses the same API key and base URL as Anthropic but targets models tuned for long-horizon tool-use workflows.

22. NVIDIA NIM


Display Name	NVIDIA NIM
Driver	OpenAI-compatible
Env Var	`NVIDIA_API_KEY`
Base URL	`https://integrate.api.nvidia.com/v1`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Llama, Mistral, and NVIDIA-optimized models

Setup:

Sign up at build.nvidia.com
Create an API key
export NVIDIA_API_KEY="nvapi-..."

23. Voyage AI


Display Name	Voyage AI
Driver	OpenAI-compatible
Env Var	`VOYAGE_API_KEY`
Base URL	`https://api.voyageai.com/v1`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Embedding and reranking models

Notes: Voyage AI specializes in embedding and reranking models used for semantic search and RAG pipelines.

Setup:

Sign up at voyageai.com
Create an API key
export VOYAGE_API_KEY="pa-..."

24. Anyscale


Display Name	Anyscale
Driver	OpenAI-compatible
Env Var	`ANYSCALE_API_KEY`
Base URL	`https://api.endpoints.anyscale.com/v1`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Open-source models (Llama, Mistral, etc.)

Setup:

Sign up at anyscale.com
Create an API key
export ANYSCALE_API_KEY="esecret_..."

25. DeepInfra


Display Name	DeepInfra
Driver	OpenAI-compatible
Env Var	`DEEPINFRA_API_KEY`
Base URL	`https://api.deepinfra.com/v1/openai`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Open-source models at low cost

Setup:

Sign up at deepinfra.com
Create an API key
export DEEPINFRA_API_KEY="..."

26. Azure OpenAI


Display Name	Azure OpenAI
Driver	OpenAI-compatible
Env Var	`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`
Base URL	`https://<your-resource>.openai.azure.com/openai/deployments/<deployment>`
Key Required	Yes
Free Tier	No
Auth	`api-key` header
Models	GPT-4o, GPT-4, and other Azure-hosted models

Setup:

Create an Azure OpenAI resource in the Azure Portal
Deploy a model in Azure OpenAI Studio

Set environment variables:

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"

27. Amazon Bedrock


Display Name	Amazon Bedrock
Driver	OpenAI-compatible (via Bedrock Converse API)
Env Var	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
Base URL	AWS regional endpoint
Key Required	Yes (AWS credentials)
Free Tier	No
Auth	AWS Signature v4
Models	Claude, Llama, Titan, Mistral via Bedrock

Setup:

Enable model access in the AWS Bedrock console

Configure AWS credentials:

export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

28. GitHub Copilot


Display Name	GitHub Copilot
Driver	OpenAI-compatible (via Copilot token exchange)
Env Var	`GITHUB_TOKEN`
Base URL	`https://api.githubcopilot.com`
Key Required	Yes (GitHub PAT or OAuth token)
Free Tier	Included with GitHub Copilot subscription
Auth	OAuth PKCE flow; exchanges GitHub PAT for short-lived Copilot API token
Models	GitHub Copilot-hosted models (GPT-4o, Claude, etc.)

Setup:

Subscribe to GitHub Copilot
Create a Personal Access Token with copilot scope
export GITHUB_TOKEN="ghp_..."

Notes: The Copilot driver handles OAuth PKCE token exchange automatically — it obtains a short-lived Copilot API token from https://api.github.com/copilot_internal/v2/token and caches it with auto-refresh. The Copilot API uses OpenAI-compatible chat completions format. Tokens are refreshed 5 minutes before expiry.

29. Aider


Display Name	Aider
Type	CLI Provider
Driver	Subprocess (CLI)
Env Var	None (uses its own provider env vars)
Binary	`aider` (must be on PATH)
Key Required	No (uses Aider's own auth)
Free Tier	Depends on Aider's configured backend

Setup:

Install Aider: pip install aider-install && aider-install
Configure Aider's LLM provider via its own env vars (e.g. OPENAI_API_KEY)
No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the aider binary as a subprocess in non-interactive mode (--message). Aider handles its own LLM provider authentication via standard environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). Aider's --yes-always, --no-auto-commits, and --no-git flags are applied automatically. Use AIDER_CLI_PATH to override the binary path.

30. Claude Code CLI


Display Name	Claude Code CLI
Type	CLI Provider
Driver	Subprocess (CLI)
Env Var	None (uses its own OAuth session)
Binary	`claude` (must be on PATH)
Key Required	No (uses Claude Code's own session auth)
Free Tier	Depends on Claude Code subscription

Setup:

Install Claude Code: npm install -g @anthropic-ai/claude-code
Authenticate: claude auth login
No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the claude binary as a subprocess in print mode (-p). The driver strips other providers' API keys from the subprocess environment to prevent leakage. Active subprocess PIDs are tracked and message timeouts (default 5 minutes) prevent hung processes from blocking agents. Vision input is supported via base64-encoded images.

31. Codex CLI


Display Name	Codex CLI
Type	CLI Provider
Driver	Subprocess (CLI)
Env Var	`OPENAI_API_KEY` (used by the Codex CLI itself)
Binary	`codex` (must be on PATH)
Key Required	Yes (OpenAI API key for Codex CLI)
Free Tier	No

Setup:

Install Codex CLI: npm install -g @openai/codex
export OPENAI_API_KEY="sk-..."
No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the codex binary as a subprocess in quiet mode (-q). The driver strips other providers' API keys from the subprocess environment (preserving only OPENAI_API_KEY and CODEX_* variables). This allows users with Codex CLI installed to use it as an LLM provider without additional configuration.

32. Gemini CLI


Display Name	Gemini CLI
Type	CLI Provider
Driver	Subprocess (CLI)
Env Var	None (uses Google OAuth by default)
Binary	`gemini` (must be on PATH)
Key Required	No (uses Google OAuth)
Free Tier	Yes (via Google account)

Setup:

Install Gemini CLI: npm install -g @google/gemini-cli
Authenticate: gemini auth login
No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the gemini binary as a subprocess in print mode (-p). The driver preserves GEMINI_* and GOOGLE_* environment variables while stripping other providers' secrets. No separate API key is needed when using Google OAuth authentication.

33. Qwen Code


Display Name	Qwen Code
Type	CLI Provider
Driver	Subprocess (CLI)
Env Var	None (uses Qwen OAuth by default)
Binary	`qwen` (must be on PATH)
Key Required	No (uses Qwen OAuth)
Free Tier	Yes (via Alibaba Cloud account)

Setup:

Install Qwen Code: npm install -g @alibaba/qwen-code
Authenticate: qwen auth login
No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the qwen binary as a subprocess in print mode (-p). The driver preserves QWEN_* environment variables while stripping other providers' secrets. Supports streaming JSON output from the Qwen Code CLI. No separate API key is needed when using Qwen OAuth authentication.

34. Qwen (DashScope)


Display Name	Qwen
Driver	OpenAI-compatible
Env Var	`DASHSCOPE_API_KEY`
Base URL	`https://dashscope.aliyuncs.com/compatible-mode/v1`
Aliases	`dashscope`, `model_studio`
Key Required	Yes
Free Tier	Yes (limited credits on signup)
Auth	`Authorization: Bearer` header

Regions:

Region	Endpoint	API Key Env
(default)	`dashscope.aliyuncs.com`	`DASHSCOPE_API_KEY`
`intl`	`dashscope-intl.aliyuncs.com`	`DASHSCOPE_API_KEY`
`us`	`dashscope-us.aliyuncs.com`	`DASHSCOPE_API_KEY`

Setup:

Sign up at DashScope Console
Create an API key
export DASHSCOPE_API_KEY="sk-..."

Optionally select a region in config.toml:

[provider_regions]
qwen = "intl"    # or "us"

Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.

35. MiniMax


Display Name	MiniMax
Driver	OpenAI-compatible
Env Var	`MINIMAX_API_KEY`
Base URL	`https://api.minimax.io/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header

Regions:

Region	Endpoint	API Key Env
(default)	`api.minimax.io`	`MINIMAX_API_KEY`
`china`	`api.minimaxi.com`	`MINIMAX_CN_API_KEY`

Setup:

Sign up at minimax.io (international) or minimaxi.com (China)
Create an API key
export MINIMAX_API_KEY="..."

For China region:

[provider_regions]
minimax = "china"

export MINIMAX_CN_API_KEY="..."

Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.

36. Vertex AI


Display Name	Google Vertex AI
Driver	Native Gemini (generateContent API via Vertex)
Config Section	`[vertex_ai]`
Env Var	`GOOGLE_APPLICATION_CREDENTIALS`, `VERTEX_PROJECT`, `VERTEX_LOCATION`
Base URL	`https://<location>-aiplatform.googleapis.com`
Key Required	Yes (service account JSON or gcloud CLI)
Free Tier	No
Auth	OAuth2 service account or `gcloud auth print-access-token`
Models	Gemini models via Google Cloud Vertex AI enterprise endpoint

Setup:

Enable Vertex AI API in Google Cloud Console
Either create a service account key file, or authenticate with gcloud auth application-default login

Set environment variables:

# Option A: Service account key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export VERTEX_PROJECT="your-gcp-project"
export VERTEX_LOCATION="us-central1"

# Option B: gcloud CLI (no key file needed)
gcloud auth application-default login
export VERTEX_PROJECT="your-gcp-project"
export VERTEX_LOCATION="us-central1"

Configure in config.toml:

[vertex_ai]
project = "your-gcp-project"
location = "us-central1"

Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.

Dynamic Provider Loading

Place custom provider definitions in ~/.librefang/providers/. Each .toml file defines one provider:

# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true

[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false

Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.

Provider Regions

Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:

# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY"    # Optional: override the default API key env var

Select a region in config.toml:

[provider_regions]
qwen = "intl"
minimax = "china"

Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.

Model Catalog

The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.

#	Model ID	Display Name	Provider	Tier	Context Window	Max Output	Input $/M	Output $/M	Tools	Vision
1	`claude-opus-4-20250514`	Claude Opus 4	anthropic	Frontier	200,000	32,000	$15.00	$75.00	Yes	Yes
2	`claude-sonnet-4-20250514`	Claude Sonnet 4	anthropic	Smart	200,000	64,000	$3.00	$15.00	Yes	Yes
3	`claude-haiku-4-5-20251001`	Claude Haiku 4.5	anthropic	Fast	200,000	8,192	$0.25	$1.25	Yes	Yes
4	`gpt-4.1`	GPT-4.1	openai	Frontier	1,047,576	32,768	$2.00	$8.00	Yes	Yes
5	`gpt-4o`	GPT-4o	openai	Smart	128,000	16,384	$2.50	$10.00	Yes	Yes
6	`o3-mini`	o3-mini	openai	Smart	200,000	100,000	$1.10	$4.40	Yes	No
7	`gpt-4.1-mini`	GPT-4.1 Mini	openai	Balanced	1,047,576	32,768	$0.40	$1.60	Yes	Yes
8	`gpt-4o-mini`	GPT-4o Mini	openai	Fast	128,000	16,384	$0.15	$0.60	Yes	Yes
9	`gpt-4.1-nano`	GPT-4.1 Nano	openai	Fast	1,047,576	32,768	$0.10	$0.40	Yes	No
10	`gemini-2.5-pro`	Gemini 2.5 Pro	gemini	Frontier	1,048,576	65,536	$1.25	$10.00	Yes	Yes
11	`gemini-2.5-flash`	Gemini 2.5 Flash	gemini	Smart	1,048,576	65,536	$0.15	$0.60	Yes	Yes
12	`gemini-2.0-flash`	Gemini 2.0 Flash	gemini	Fast	1,048,576	8,192	$0.10	$0.40	Yes	Yes
13	`deepseek-chat`	DeepSeek V3	deepseek	Smart	64,000	8,192	$0.27	$1.10	Yes	No
14	`deepseek-reasoner`	DeepSeek R1	deepseek	Smart	64,000	8,192	$0.55	$2.19	No	No
15	`llama-3.3-70b-versatile`	Llama 3.3 70B	groq	Balanced	128,000	32,768	$0.059	$0.079	Yes	No
16	`mixtral-8x7b-32768`	Mixtral 8x7B	groq	Balanced	32,768	4,096	$0.024	$0.024	Yes	No
17	`llama-3.1-8b-instant`	Llama 3.1 8B	groq	Fast	128,000	8,192	$0.05	$0.08	Yes	No
18	`gemma2-9b-it`	Gemma 2 9B	groq	Fast	8,192	4,096	$0.02	$0.02	No	No
19	`openrouter/google/gemini-2.5-flash`	Gemini 2.5 Flash (OpenRouter)	openrouter	Smart	1,048,576	65,536	$0.15	$0.60	Yes	Yes
20	`openrouter/anthropic/claude-sonnet-4`	Claude Sonnet 4 (OpenRouter)	openrouter	Smart	200,000	64,000	$3.00	$15.00	Yes	Yes
21	`openrouter/openai/gpt-4o`	GPT-4o (OpenRouter)	openrouter	Smart	128,000	16,384	$2.50	$10.00	Yes	Yes
22	`openrouter/deepseek/deepseek-chat`	DeepSeek V3 (OpenRouter)	openrouter	Smart	128,000	32,768	$0.14	$0.28	Yes	No
23	`openrouter/meta-llama/llama-3.3-70b-instruct`	Llama 3.3 70B (OpenRouter)	openrouter	Balanced	128,000	32,768	$0.39	$0.39	Yes	No
24	`openrouter/qwen/qwen-2.5-72b-instruct`	Qwen 2.5 72B (OpenRouter)	openrouter	Balanced	128,000	32,768	$0.36	$0.36	Yes	No
25	`openrouter/google/gemini-2.5-pro`	Gemini 2.5 Pro (OpenRouter)	openrouter	Frontier	1,048,576	65,536	$1.25	$10.00	Yes	Yes
26	`openrouter/mistralai/mistral-large-latest`	Mistral Large (OpenRouter)	openrouter	Smart	128,000	8,192	$2.00	$6.00	Yes	No
27	`openrouter/google/gemma-2-9b-it`	Gemma 2 9B (OpenRouter)	openrouter	Fast	8,192	4,096	$0.00	$0.00	No	No
28	`openrouter/deepseek/deepseek-r1`	DeepSeek R1 (OpenRouter)	openrouter	Frontier	128,000	32,768	$0.55	$2.19	No	No
29	`mistral-large-latest`	Mistral Large	mistral	Smart	128,000	8,192	$2.00	$6.00	Yes	No
30	`codestral-latest`	Codestral	mistral	Smart	32,000	8,192	$0.30	$0.90	Yes	No
31	`mistral-small-latest`	Mistral Small	mistral	Fast	128,000	8,192	$0.10	$0.30	Yes	No
32	`meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`	Llama 3.1 405B (Together)	together	Frontier	130,000	4,096	$3.50	$3.50	Yes	No
33	`Qwen/Qwen2.5-72B-Instruct-Turbo`	Qwen 2.5 72B (Together)	together	Smart	32,768	4,096	$0.20	$0.60	Yes	No
34	`mistralai/Mixtral-8x22B-Instruct-v0.1`	Mixtral 8x22B (Together)	together	Balanced	65,536	4,096	$0.60	$0.60	Yes	No
35	`accounts/fireworks/models/llama-v3p1-405b-instruct`	Llama 3.1 405B (Fireworks)	fireworks	Frontier	131,072	16,384	$3.00	$3.00	Yes	No
36	`accounts/fireworks/models/mixtral-8x22b-instruct`	Mixtral 8x22B (Fireworks)	fireworks	Balanced	65,536	4,096	$0.90	$0.90	Yes	No
37	`llama3.2`	Llama 3.2 (Ollama)	ollama	Local	128,000	4,096	$0.00	$0.00	Yes	No
38	`mistral:latest`	Mistral (Ollama)	ollama	Local	32,768	4,096	$0.00	$0.00	Yes	No
39	`phi3`	Phi-3 (Ollama)	ollama	Local	128,000	4,096	$0.00	$0.00	No	No
40	`vllm-local`	vLLM Local Model	vllm	Local	32,768	4,096	$0.00	$0.00	Yes	No
41	`lmstudio-local`	LM Studio Local Model	lmstudio	Local	32,768	4,096	$0.00	$0.00	Yes	No
42	`sonar-pro`	Sonar Pro	perplexity	Smart	200,000	8,192	$3.00	$15.00	No	No
43	`sonar`	Sonar	perplexity	Balanced	128,000	8,192	$1.00	$5.00	No	No
44	`command-r-plus`	Command R+	cohere	Smart	128,000	4,096	$2.50	$10.00	Yes	No
45	`command-r`	Command R	cohere	Balanced	128,000	4,096	$0.15	$0.60	Yes	No
46	`jamba-1.5-large`	Jamba 1.5 Large	ai21	Smart	256,000	4,096	$2.00	$8.00	Yes	No
47	`cerebras/llama3.3-70b`	Llama 3.3 70B (Cerebras)	cerebras	Balanced	128,000	8,192	$0.06	$0.06	Yes	No
48	`cerebras/llama3.1-8b`	Llama 3.1 8B (Cerebras)	cerebras	Fast	128,000	8,192	$0.01	$0.01	Yes	No
49	`sambanova/llama-3.3-70b`	Llama 3.3 70B (SambaNova)	sambanova	Balanced	128,000	8,192	$0.06	$0.06	Yes	No
50	`grok-2`	Grok 2	xai	Smart	131,072	32,768	$2.00	$10.00	Yes	Yes
51	`grok-2-mini`	Grok 2 Mini	xai	Fast	131,072	32,768	$0.30	$0.50	Yes	No
52	`hf/meta-llama/Llama-3.3-70B-Instruct`	Llama 3.3 70B (HF)	huggingface	Balanced	128,000	4,096	$0.30	$0.30	No	No
53	`replicate/meta-llama-3.3-70b-instruct`	Llama 3.3 70B (Replicate)	replicate	Balanced	128,000	4,096	$0.40	$0.40	No	No

Model Tiers:

Tier	Description	Typical Use
Frontier	Most capable, highest cost	Orchestration, architecture, security audits
Smart	Strong reasoning, moderate cost	Coding, code review, research, analysis
Balanced	Good cost/quality tradeoff	Planning, writing, DevOps, day-to-day tasks
Fast	Cheapest cloud inference	Ops, translation, simple Q&A, health checks
Local	Self-hosted, zero cost	Privacy-first, offline, development

Notes:

Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with Local tier and zero cost.
The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.

Model Aliases

All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.

Alias	Resolves To
`sonnet`	`claude-sonnet-4-20250514`
`claude-sonnet`	`claude-sonnet-4-20250514`
`haiku`	`claude-haiku-4-5-20251001`
`claude-haiku`	`claude-haiku-4-5-20251001`
`opus`	`claude-opus-4-20250514`
`claude-opus`	`claude-opus-4-20250514`
`gpt4`	`gpt-4o`
`gpt4o`	`gpt-4o`
`gpt4-mini`	`gpt-4o-mini`
`flash`	`gemini-2.5-flash`
`gemini-flash`	`gemini-2.5-flash`
`gemini-pro`	`gemini-2.5-pro`
`deepseek`	`deepseek-chat`
`llama`	`llama-3.3-70b-versatile`
`llama-70b`	`llama-3.3-70b-versatile`
`mixtral`	`mixtral-8x7b-32768`
`mistral`	`mistral-large-latest`
`codestral`	`codestral-latest`
`grok`	`grok-2`
`grok-mini`	`grok-2-mini`
`sonar`	`sonar-pro`
`jamba`	`jamba-1.5-large`
`command-r`	`command-r-plus`

You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.

Per-Agent Model Override

Each agent in your config.toml can specify its own model, overriding the global default:

# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus"                      # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"   # cheap Groq model for simple ops

[[agents]]
name = "coder"
model = "gemini-2.5-flash"          # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"                 # Perplexity with built-in web search

# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514"  # never auto-routed

When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.

Model Routing

LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.

How It Works

The ModelRouter scores each incoming CompletionRequest based on heuristics
The score maps to a TaskComplexity tier: Simple, Medium, or Complex
Each tier has a pre-configured model

Scoring Heuristics

Signal	Weight	Logic
Total message length	1 point per ~4 chars	Rough token proxy
Tool availability	+20 per tool defined	Tools imply multi-step work
Code markers	+30 per marker found	Backticks, `fn`, `def`, `class`, `import`, `function`, `async`, `await`, `struct`, `impl`, `return`
Conversation depth	+15 per message > 10	Deep context = harder reasoning
System prompt length	+1 per 10 chars > 500	Long system prompts imply complex tasks

Thresholds

Complexity	Score Range	Default Model
Simple	`score < 100`	`claude-haiku-4-5-20251001`
Medium	`100 <= score < 500`	`claude-sonnet-4-20250514`
Complex	`score >= 500`	`claude-sonnet-4-20250514`

Configuration

# In agent manifest or config.toml
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

The router also integrates with the model catalog:

validate_models() checks that all configured model IDs exist in the catalog
resolve_aliases() expands aliases to canonical IDs (e.g., "sonnet" becomes "claude-sonnet-4-20250514")

Cost Tracking

LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.

Per-Response Cost Estimation

After each LLM call, cost is calculated as:

cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate

The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.

Cost Rates (per million tokens)

Model Pattern	Input $/M	Output $/M
`haiku`	$0.25	$1.25
`sonnet`	$3.00	$15.00
`opus`	$15.00	$75.00
`gpt-4o-mini`	$0.15	$0.60
`gpt-4o`	$2.50	$10.00
`gpt-4.1-nano`	$0.10	$0.40
`gpt-4.1-mini`	$0.40	$1.60
`gpt-4.1`	$2.00	$8.00
`o3-mini`	$1.10	$4.40
`gemini-2.5-pro`	$1.25	$10.00
`gemini-2.5-flash`	$0.15	$0.60
`gemini-2.0-flash`	$0.10	$0.40
`deepseek-reasoner` / `deepseek-r1`	$0.55	$2.19
`deepseek`	$0.27	$1.10
`cerebras`	$0.06	$0.06
`sambanova`	$0.06	$0.06
`replicate`	$0.40	$0.40
`llama` / `mixtral`	$0.05	$0.10
`qwen`	$0.20	$0.60
`mistral-large*`	$2.00	$6.00
`mistral` (other)	$0.10	$0.30
`command-r-plus`	$2.50	$10.00
`command-r`	$0.15	$0.60
`sonar-pro`	$3.00	$15.00
`sonar` (other)	$1.00	$5.00
`grok-2-mini` / `grok-mini`	$0.30	$0.50
`grok` (other)	$2.00	$10.00
`jamba`	$2.00	$8.00
Default (unknown)	$1.00	$3.00

Quota Enforcement

Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.

# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00   # cap at $5/hour

The usage footer (when enabled) appends cost information to each response:

> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514

Fallback Providers

The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.

Behavior

On success: returns immediately
On rate limit / overload errors (429, 529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
On all other errors: logs a warning and tries the next driver in the chain
If all drivers fail: returns the last error

Configuration

Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.

# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]

The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).

API Endpoints

List All Models

GET /api/models

Returns the complete model catalog with metadata, pricing, and feature flags.

Response:

[
  {
    "id": "claude-sonnet-4-20250514",
    "display_name": "Claude Sonnet 4",
    "provider": "anthropic",
    "tier": "Smart",
    "context_window": 200000,
    "max_output_tokens": 64000,
    "input_cost_per_m": 3.0,
    "output_cost_per_m": 15.0,
    "supports_tools": true,
    "supports_vision": true,
    "supports_streaming": true,
    "aliases": ["sonnet", "claude-sonnet"]
  }
]

Get Specific Model

GET /api/models/{id}

Returns a single model entry. Supports both canonical IDs and aliases.

GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514

List Aliases

GET /api/models/aliases

Returns a map of all alias-to-canonical-ID mappings.

Response:

{
  "sonnet": "claude-sonnet-4-20250514",
  "haiku": "claude-haiku-4-5-20251001",
  "flash": "gemini-2.5-flash",
  "grok": "grok-2"
}

List Providers

GET /api/providers

Returns all 49 providers with auth status and model counts.

Response:

[
  {
    "id": "anthropic",
    "display_name": "Anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": "https://api.anthropic.com",
    "key_required": true,
    "auth_status": "Configured",
    "model_count": 3
  },
  {
    "id": "ollama",
    "display_name": "Ollama",
    "api_key_env": "OLLAMA_API_KEY",
    "base_url": "http://localhost:11434/v1",
    "key_required": false,
    "auth_status": "NotRequired",
    "model_count": 5
  }
]

Auth status values: Configured, Missing, NotRequired.

Set Provider API Key

POST /api/providers/{name}/key
Content-Type: application/json

{ "api_key": "sk-..." }

Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).

Remove Provider API Key

DELETE /api/providers/{name}/key

Removes the configured API key for a provider.

Test Provider Connection

POST /api/providers/{name}/test

Sends a minimal test request to verify the provider is reachable and the API key is valid.

Channel Commands

Two chat commands are available in any channel for inspecting models and providers:

`/models`

Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).

/models

Example output:

Available models (12):

Frontier:
  claude-opus-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-pro (Google Gemini) — 1M ctx

Smart:
  claude-sonnet-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-flash (Google Gemini) — 1M ctx
  deepseek-chat (DeepSeek) — 64K ctx

Balanced:
  llama-3.3-70b-versatile (Groq) — 128K ctx

Fast:
  claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
  gemini-2.0-flash (Google Gemini) — 1M ctx

Local:
  llama3.2 (Ollama) — 128K ctx

`/providers`

Lists all 49 providers with their authentication status.

/providers

Example output:

LLM Providers (49):

  Anthropic          ANTHROPIC_API_KEY       Configured    3 models
  OpenAI             OPENAI_API_KEY          Missing       6 models
  Google Gemini      GEMINI_API_KEY          Configured    3 models
  DeepSeek           DEEPSEEK_API_KEY        Missing       2 models
  Groq               GROQ_API_KEY            Configured    4 models
  Ollama             (no key needed)         Ready         3 models
  vLLM               (no key needed)         Ready         1 model
  LM Studio          (no key needed)         Ready         1 model
  ...

Environment Variables Summary

Quick reference for all provider environment variables:

Provider	Env Var	Required
Anthropic	`ANTHROPIC_API_KEY`	Yes
OpenAI	`OPENAI_API_KEY`	Yes
Google Gemini	`GEMINI_API_KEY` or `GOOGLE_API_KEY`	Yes
DeepSeek	`DEEPSEEK_API_KEY`	Yes
Groq	`GROQ_API_KEY`	Yes
OpenRouter	`OPENROUTER_API_KEY`	Yes
Mistral AI	`MISTRAL_API_KEY`	Yes
Together AI	`TOGETHER_API_KEY`	Yes
Fireworks AI	`FIREWORKS_API_KEY`	Yes
Ollama	`OLLAMA_API_KEY`	No
vLLM	`VLLM_API_KEY`	No
LM Studio	`LMSTUDIO_API_KEY`	No
Perplexity AI	`PERPLEXITY_API_KEY`	Yes
Cohere	`COHERE_API_KEY`	Yes
AI21 Labs	`AI21_API_KEY`	Yes
Cerebras	`CEREBRAS_API_KEY`	Yes
SambaNova	`SAMBANOVA_API_KEY`	Yes
Hugging Face	`HF_API_KEY`	Yes
xAI	`XAI_API_KEY`	Yes
Replicate	`REPLICATE_API_TOKEN`	Yes
Claude Code	`ANTHROPIC_API_KEY`	Yes
NVIDIA NIM	`NVIDIA_API_KEY`	Yes
Voyage AI	`VOYAGE_API_KEY`	Yes
Anyscale	`ANYSCALE_API_KEY`	Yes
DeepInfra	`DEEPINFRA_API_KEY`	Yes
Azure OpenAI	`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`	Yes
Amazon Bedrock	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`	Yes
Google Vertex AI	`GOOGLE_APPLICATION_CREDENTIALS`, `VERTEX_PROJECT`, `VERTEX_LOCATION`	Yes

Security Notes

All API keys are stored as Zeroizing<String> -- the key material is automatically overwritten with zeros when the value is dropped from memory.
Auth detection (detect_auth()) only checks std::env::var() for presence -- it never reads or logs the actual secret value.
Provider API keys set via the REST API (POST /api/providers/{name}/key) follow the same zeroization policy.
The health endpoint (/api/health) never exposes provider auth status or API keys. Detailed info is behind /api/health/detail which requires authentication.
All DriverConfig and KernelConfig structs implement Debug with secret redaction -- API keys are printed as "***" in logs.