LLM Providers Guide

LibreFang ships with a comprehensive model catalog covering 3 native LLM drivers, 49 providers, 230+ builtin models, and 23 aliases. Every provider uses one of three battle-tested drivers: the native Anthropic driver, the native Gemini driver, or the universal OpenAI-compatible driver. This guide is the single source of truth for configuring, selecting, and managing LLM providers in LibreFang.

The model catalog also supports dynamic loading — you can add custom provider definitions by placing TOML files in ~/.librefang/providers/. Any file matching ~/.librefang/providers/*.toml is merged into the catalog at boot, allowing you to add private endpoints, on-premises deployments, or new providers without modifying the core configuration.


Table of Contents

  1. Quick Setup
  2. Provider Reference
  3. Model Catalog
  4. Model Aliases
  5. Per-Agent Model Override
  6. Model Routing
  7. Cost Tracking
  8. Fallback Providers
  9. API Endpoints
  10. Channel Commands

Quick Setup

The fastest path from zero to running:

# Pick ONE provider — set its env var — done.
export GEMINI_API_KEY="your-key"        # Free tier available
# OR
export GROQ_API_KEY="your-key"          # Free tier available
# OR
export ANTHROPIC_API_KEY="your-key"
# OR
export OPENAI_API_KEY="your-key"

LibreFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.

For Gemini specifically, either GEMINI_API_KEY or GOOGLE_API_KEY will work.


Provider Reference

1. Anthropic

Display NameAnthropic
DriverNative Anthropic (Messages API)
Env VarANTHROPIC_API_KEY
Base URLhttps://api.anthropic.com
Key RequiredYes
Free TierNo
Authx-api-key header
Models7

Available Models:

  • claude-opus-4-20250514 (Frontier)
  • claude-sonnet-4-20250514 (Smart)
  • claude-haiku-4-5-20251001 (Fast)

Setup:

  1. Sign up at console.anthropic.com
  2. Create an API key under Settings > API Keys
  3. export ANTHROPIC_API_KEY="sk-ant-..."

2. OpenAI

Display NameOpenAI
DriverOpenAI-compatible
Env VarOPENAI_API_KEY
Base URLhttps://api.openai.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models18

Available Models:

  • gpt-4.1 (Frontier)
  • gpt-4o (Smart)
  • o3-mini (Smart)
  • gpt-4.1-mini (Balanced)
  • gpt-4o-mini (Fast)
  • gpt-4.1-nano (Fast)

Setup:

  1. Sign up at platform.openai.com
  2. Create an API key under API Keys
  3. export OPENAI_API_KEY="sk-..."

3. Google Gemini

Display NameGoogle Gemini
DriverNative Gemini (generateContent API)
Env VarGEMINI_API_KEY (or GOOGLE_API_KEY)
Base URLhttps://generativelanguage.googleapis.com
Key RequiredYes
Free TierYes (generous free tier)
Authx-goog-api-key header
Models10

Available Models:

  • gemini-2.5-pro (Frontier)
  • gemini-2.5-flash (Smart)
  • gemini-2.0-flash (Fast)

Setup:

  1. Go to aistudio.google.com
  2. Get an API key (free tier included)
  3. export GEMINI_API_KEY="AIza..." or export GOOGLE_API_KEY="AIza..."

Notes: The Gemini driver is a fully native implementation. It is not OpenAI-compatible. Model goes in the URL path, system prompt via systemInstruction, tools via functionDeclarations, streaming via streamGenerateContent?alt=sse.


4. DeepSeek

Display NameDeepSeek
DriverOpenAI-compatible
Env VarDEEPSEEK_API_KEY
Base URLhttps://api.deepseek.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models4

Available Models:

  • deepseek-chat (Smart) -- DeepSeek V3
  • deepseek-reasoner (Smart) -- DeepSeek R1, no tool support

Setup:

  1. Sign up at platform.deepseek.com
  2. Create an API key
  3. export DEEPSEEK_API_KEY="sk-..."

5. Groq

Display NameGroq
DriverOpenAI-compatible
Env VarGROQ_API_KEY
Base URLhttps://api.groq.com/openai/v1
Key RequiredYes
Free TierYes (rate-limited)
AuthAuthorization: Bearer header
Models10

Available Models:

  • llama-3.3-70b-versatile (Balanced)
  • mixtral-8x7b-32768 (Balanced)
  • llama-3.1-8b-instant (Fast)
  • gemma2-9b-it (Fast)

Setup:

  1. Sign up at console.groq.com
  2. Create an API key
  3. export GROQ_API_KEY="gsk_..."

Notes: Groq runs open-source models on custom LPU hardware. Extremely fast inference. Free tier has rate limits but is very usable.


6. OpenRouter

Display NameOpenRouter
DriverOpenAI-compatible
Env VarOPENROUTER_API_KEY
Base URLhttps://openrouter.ai/api/v1
Key RequiredYes
Free TierYes (8 free models including Step 3.5 Flash, DeepSeek R1, Llama 3.1 8B, etc.)
AuthAuthorization: Bearer header
Models17

Available Models:

  • openrouter/google/gemini-2.5-flash (Smart) -- cheap, fast, 1M context (default)
  • openrouter/anthropic/claude-sonnet-4 (Smart) -- strong reasoning + tools
  • openrouter/openai/gpt-4o (Smart) -- GPT-4o via OpenRouter
  • openrouter/deepseek/deepseek-chat (Smart) -- DeepSeek V3
  • openrouter/meta-llama/llama-3.3-70b-instruct (Balanced) -- Llama 3.3 70B
  • openrouter/qwen/qwen-2.5-72b-instruct (Balanced) -- Qwen 2.5 72B
  • openrouter/google/gemini-2.5-pro (Frontier) -- Gemini 2.5 Pro
  • openrouter/mistralai/mistral-large-latest (Smart) -- Mistral Large
  • openrouter/google/gemma-2-9b-it (Fast) -- Gemma 2 9B, free
  • openrouter/deepseek/deepseek-r1 (Frontier) -- DeepSeek R1 reasoning

Setup:

  1. Sign up at openrouter.ai
  2. Create an API key under Keys
  3. export OPENROUTER_API_KEY="sk-or-..."

Notes: OpenRouter is a unified gateway to 200+ models from many providers. Model IDs use the upstream format (e.g. google/gemini-2.5-flash). You can use any model from OpenRouter's catalog by specifying the full model path with the openrouter/ prefix.


7. Mistral AI

Display NameMistral AI
DriverOpenAI-compatible
Env VarMISTRAL_API_KEY
Base URLhttps://api.mistral.ai/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models6

Available Models:

  • mistral-large-latest (Smart)
  • codestral-latest (Smart)
  • mistral-small-latest (Fast)

Setup:

  1. Sign up at console.mistral.ai
  2. Create an API key
  3. export MISTRAL_API_KEY="..."

8. Together AI

Display NameTogether AI
DriverOpenAI-compatible
Env VarTOGETHER_API_KEY
Base URLhttps://api.together.xyz/v1
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header
Models8

Available Models:

  • meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo (Frontier)
  • Qwen/Qwen2.5-72B-Instruct-Turbo (Smart)
  • mistralai/Mixtral-8x22B-Instruct-v0.1 (Balanced)

Setup:

  1. Sign up at api.together.ai
  2. Create an API key
  3. export TOGETHER_API_KEY="..."

9. Fireworks AI

Display NameFireworks AI
DriverOpenAI-compatible
Env VarFIREWORKS_API_KEY
Base URLhttps://api.fireworks.ai/inference/v1
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header
Models5

Available Models:

  • accounts/fireworks/models/llama-v3p1-405b-instruct (Frontier)
  • accounts/fireworks/models/mixtral-8x22b-instruct (Balanced)

Setup:

  1. Sign up at fireworks.ai
  2. Create an API key
  3. export FIREWORKS_API_KEY="..."

10. Ollama

Display NameOllama
DriverOpenAI-compatible
Env VarOLLAMA_API_KEY (not required)
Base URLhttp://localhost:11434/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models3 builtin + auto-discovered

Available Models (builtin):

  • llama3.2 (Local)
  • mistral:latest (Local)
  • phi3 (Local)

Setup:

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start the server: ollama serve
  4. No env var needed -- Ollama is always available

Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.


11. vLLM

Display NamevLLM
DriverOpenAI-compatible
Env VarVLLM_API_KEY (not required)
Base URLhttp://localhost:8000/v1
Key RequiredNo
Free TierFree (self-hosted)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • vllm-local (Local)

Setup:

  1. Install vLLM: pip install vllm
  2. Start the server: python -m vllm.entrypoints.openai.api_server --model <model-name>
  3. No env var needed

12. LM Studio

Display NameLM Studio
DriverOpenAI-compatible
Env VarLMSTUDIO_API_KEY (not required)
Base URLhttp://localhost:1234/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • lmstudio-local (Local)

Setup:

  1. Download LM Studio from lmstudio.ai
  2. Download a model from the built-in model browser
  3. Start the local server from the "Local Server" tab
  4. No env var needed

13. Perplexity AI

Display NamePerplexity AI
DriverOpenAI-compatible
Env VarPERPLEXITY_API_KEY
Base URLhttps://api.perplexity.ai
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models2

Available Models:

  • sonar-pro (Smart) -- online search-augmented
  • sonar (Balanced) -- online search-augmented

Setup:

  1. Sign up at perplexity.ai
  2. Go to API settings and generate a key
  3. export PERPLEXITY_API_KEY="pplx-..."

Notes: Perplexity models have built-in web search. They do not support tool use.


14. Cohere

Display NameCohere
DriverOpenAI-compatible
Env VarCOHERE_API_KEY
Base URLhttps://api.cohere.com/v2
Key RequiredYes
Free TierYes (rate-limited trial)
AuthAuthorization: Bearer header
Models2

Available Models:

  • command-r-plus (Smart)
  • command-r (Balanced)

Setup:

  1. Sign up at dashboard.cohere.com
  2. Create an API key
  3. export COHERE_API_KEY="..."

15. AI21 Labs

Display NameAI21 Labs
DriverOpenAI-compatible
Env VarAI21_API_KEY
Base URLhttps://api.ai21.com/studio/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
Models1

Available Models:

  • jamba-1.5-large (Smart)

Setup:

  1. Sign up at studio.ai21.com
  2. Create an API key
  3. export AI21_API_KEY="..."

16. Cerebras

Display NameCerebras
DriverOpenAI-compatible
Env VarCEREBRAS_API_KEY
Base URLhttps://api.cerebras.ai/v1
Key RequiredYes
Free TierYes (generous free tier)
AuthAuthorization: Bearer header
Models2

Available Models:

  • cerebras/llama3.3-70b (Balanced)
  • cerebras/llama3.1-8b (Fast)

Setup:

  1. Sign up at cloud.cerebras.ai
  2. Create an API key
  3. export CEREBRAS_API_KEY="..."

Notes: Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap ($0.06/M tokens for both input and output on the 70B model).


17. SambaNova

Display NameSambaNova
DriverOpenAI-compatible
Env VarSAMBANOVA_API_KEY
Base URLhttps://api.sambanova.ai/v1
Key RequiredYes
Free TierYes (3 free models)
AuthAuthorization: Bearer header
Models3

Available Models:

  • sambanova/llama-3.3-70b (Balanced)

Setup:

  1. Sign up at cloud.sambanova.ai
  2. Create an API key
  3. export SAMBANOVA_API_KEY="..."

18. Hugging Face

Display NameHugging Face
DriverOpenAI-compatible
Env VarHF_API_KEY
Base URLhttps://api-inference.huggingface.co/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models1

Available Models:

  • hf/meta-llama/Llama-3.3-70B-Instruct (Balanced)

Setup:

  1. Sign up at huggingface.co
  2. Create a token under Settings > Access Tokens
  3. export HF_API_KEY="hf_..."

19. xAI

Display NamexAI
DriverOpenAI-compatible
Env VarXAI_API_KEY
Base URLhttps://api.x.ai/v1
Key RequiredYes
Free TierYes (limited free credits)
AuthAuthorization: Bearer header
Models2

Available Models:

  • grok-2 (Smart) -- supports vision
  • grok-2-mini (Fast)

Setup:

  1. Sign up at console.x.ai
  2. Create an API key
  3. export XAI_API_KEY="xai-..."

20. Replicate

Display NameReplicate
DriverOpenAI-compatible
Env VarREPLICATE_API_TOKEN
Base URLhttps://api.replicate.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models1

Available Models:

  • replicate/meta-llama-3.3-70b-instruct (Balanced)

Setup:

  1. Sign up at replicate.com
  2. Go to Account > API Tokens
  3. export REPLICATE_API_TOKEN="r8_..."

21. Claude Code

Display NameClaude Code
DriverNative Anthropic (Messages API)
Env VarANTHROPIC_API_KEY
Base URLhttps://api.anthropic.com
Key RequiredYes
Free TierNo
Authx-api-key header
ModelsClaude models with extended tool use

Notes: Claude Code is an Anthropic model variant optimized for agentic coding tasks. It uses the same API key and base URL as Anthropic but targets models tuned for long-horizon tool-use workflows.


22. NVIDIA NIM

Display NameNVIDIA NIM
DriverOpenAI-compatible
Env VarNVIDIA_API_KEY
Base URLhttps://integrate.api.nvidia.com/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsLlama, Mistral, and NVIDIA-optimized models

Setup:

  1. Sign up at build.nvidia.com
  2. Create an API key
  3. export NVIDIA_API_KEY="nvapi-..."

23. Voyage AI

Display NameVoyage AI
DriverOpenAI-compatible
Env VarVOYAGE_API_KEY
Base URLhttps://api.voyageai.com/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsEmbedding and reranking models

Notes: Voyage AI specializes in embedding and reranking models used for semantic search and RAG pipelines.

Setup:

  1. Sign up at voyageai.com
  2. Create an API key
  3. export VOYAGE_API_KEY="pa-..."

24. Anyscale

Display NameAnyscale
DriverOpenAI-compatible
Env VarANYSCALE_API_KEY
Base URLhttps://api.endpoints.anyscale.com/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsOpen-source models (Llama, Mistral, etc.)

Setup:

  1. Sign up at anyscale.com
  2. Create an API key
  3. export ANYSCALE_API_KEY="esecret_..."

25. DeepInfra

Display NameDeepInfra
DriverOpenAI-compatible
Env VarDEEPINFRA_API_KEY
Base URLhttps://api.deepinfra.com/v1/openai
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsOpen-source models at low cost

Setup:

  1. Sign up at deepinfra.com
  2. Create an API key
  3. export DEEPINFRA_API_KEY="..."

26. Azure OpenAI

Display NameAzure OpenAI
DriverOpenAI-compatible
Env VarAZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT
Base URLhttps://<your-resource>.openai.azure.com/openai/deployments/<deployment>
Key RequiredYes
Free TierNo
Authapi-key header
ModelsGPT-4o, GPT-4, and other Azure-hosted models

Setup:

  1. Create an Azure OpenAI resource in the Azure Portal
  2. Deploy a model in Azure OpenAI Studio
  3. Set environment variables:
    export AZURE_OPENAI_API_KEY="..."
    export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"
    

27. Amazon Bedrock

Display NameAmazon Bedrock
DriverOpenAI-compatible (via Bedrock Converse API)
Env VarAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
Base URLAWS regional endpoint
Key RequiredYes (AWS credentials)
Free TierNo
AuthAWS Signature v4
ModelsClaude, Llama, Titan, Mistral via Bedrock

Setup:

  1. Enable model access in the AWS Bedrock console
  2. Configure AWS credentials:
    export AWS_ACCESS_KEY_ID="AKIA..."
    export AWS_SECRET_ACCESS_KEY="..."
    export AWS_REGION="us-east-1"
    

28. GitHub Copilot

Display NameGitHub Copilot
DriverOpenAI-compatible (via Copilot token exchange)
Env VarGITHUB_TOKEN
Base URLhttps://api.githubcopilot.com
Key RequiredYes (GitHub PAT or OAuth token)
Free TierIncluded with GitHub Copilot subscription
AuthOAuth PKCE flow; exchanges GitHub PAT for short-lived Copilot API token
ModelsGitHub Copilot-hosted models (GPT-4o, Claude, etc.)

Setup:

  1. Subscribe to GitHub Copilot
  2. Create a Personal Access Token with copilot scope
  3. export GITHUB_TOKEN="ghp_..."

Notes: The Copilot driver handles OAuth PKCE token exchange automatically — it obtains a short-lived Copilot API token from https://api.github.com/copilot_internal/v2/token and caches it with auto-refresh. The Copilot API uses OpenAI-compatible chat completions format. Tokens are refreshed 5 minutes before expiry.


29. Aider

Display NameAider
TypeCLI Provider
DriverSubprocess (CLI)
Env VarNone (uses its own provider env vars)
Binaryaider (must be on PATH)
Key RequiredNo (uses Aider's own auth)
Free TierDepends on Aider's configured backend

Setup:

  1. Install Aider: pip install aider-install && aider-install
  2. Configure Aider's LLM provider via its own env vars (e.g. OPENAI_API_KEY)
  3. No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the aider binary as a subprocess in non-interactive mode (--message). Aider handles its own LLM provider authentication via standard environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). Aider's --yes-always, --no-auto-commits, and --no-git flags are applied automatically. Use AIDER_CLI_PATH to override the binary path.


30. Claude Code CLI

Display NameClaude Code CLI
TypeCLI Provider
DriverSubprocess (CLI)
Env VarNone (uses its own OAuth session)
Binaryclaude (must be on PATH)
Key RequiredNo (uses Claude Code's own session auth)
Free TierDepends on Claude Code subscription

Setup:

  1. Install Claude Code: npm install -g @anthropic-ai/claude-code
  2. Authenticate: claude auth login
  3. No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the claude binary as a subprocess in print mode (-p). The driver strips other providers' API keys from the subprocess environment to prevent leakage. Active subprocess PIDs are tracked and message timeouts (default 5 minutes) prevent hung processes from blocking agents. Vision input is supported via base64-encoded images.


31. Codex CLI

Display NameCodex CLI
TypeCLI Provider
DriverSubprocess (CLI)
Env VarOPENAI_API_KEY (used by the Codex CLI itself)
Binarycodex (must be on PATH)
Key RequiredYes (OpenAI API key for Codex CLI)
Free TierNo

Setup:

  1. Install Codex CLI: npm install -g @openai/codex
  2. export OPENAI_API_KEY="sk-..."
  3. No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the codex binary as a subprocess in quiet mode (-q). The driver strips other providers' API keys from the subprocess environment (preserving only OPENAI_API_KEY and CODEX_* variables). This allows users with Codex CLI installed to use it as an LLM provider without additional configuration.


32. Gemini CLI

Display NameGemini CLI
TypeCLI Provider
DriverSubprocess (CLI)
Env VarNone (uses Google OAuth by default)
Binarygemini (must be on PATH)
Key RequiredNo (uses Google OAuth)
Free TierYes (via Google account)

Setup:

  1. Install Gemini CLI: npm install -g @google/gemini-cli
  2. Authenticate: gemini auth login
  3. No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the gemini binary as a subprocess in print mode (-p). The driver preserves GEMINI_* and GOOGLE_* environment variables while stripping other providers' secrets. No separate API key is needed when using Google OAuth authentication.


33. Qwen Code

Display NameQwen Code
TypeCLI Provider
DriverSubprocess (CLI)
Env VarNone (uses Qwen OAuth by default)
Binaryqwen (must be on PATH)
Key RequiredNo (uses Qwen OAuth)
Free TierYes (via Alibaba Cloud account)

Setup:

  1. Install Qwen Code: npm install -g @alibaba/qwen-code
  2. Authenticate: qwen auth login
  3. No additional LibreFang configuration needed

Notes: CLI Provider — LibreFang spawns the qwen binary as a subprocess in print mode (-p). The driver preserves QWEN_* environment variables while stripping other providers' secrets. Supports streaming JSON output from the Qwen Code CLI. No separate API key is needed when using Qwen OAuth authentication.


34. Qwen (DashScope)

Display NameQwen
DriverOpenAI-compatible
Env VarDASHSCOPE_API_KEY
Base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
Aliasesdashscope, model_studio
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header

Regions:

RegionEndpointAPI Key Env
(default)dashscope.aliyuncs.comDASHSCOPE_API_KEY
intldashscope-intl.aliyuncs.comDASHSCOPE_API_KEY
usdashscope-us.aliyuncs.comDASHSCOPE_API_KEY

Setup:

  1. Sign up at DashScope Console
  2. Create an API key
  3. export DASHSCOPE_API_KEY="sk-..."
  4. Optionally select a region in config.toml:
    [provider_regions]
    qwen = "intl"    # or "us"
    

Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.


35. MiniMax

Display NameMiniMax
DriverOpenAI-compatible
Env VarMINIMAX_API_KEY
Base URLhttps://api.minimax.io/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header

Regions:

RegionEndpointAPI Key Env
(default)api.minimax.ioMINIMAX_API_KEY
chinaapi.minimaxi.comMINIMAX_CN_API_KEY

Setup:

  1. Sign up at minimax.io (international) or minimaxi.com (China)
  2. Create an API key
  3. export MINIMAX_API_KEY="..."
  4. For China region:
    [provider_regions]
    minimax = "china"
    
    export MINIMAX_CN_API_KEY="..."
    

Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.


36. Vertex AI

Display NameGoogle Vertex AI
DriverNative Gemini (generateContent API via Vertex)
Config Section[vertex_ai]
Env VarGOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION
Base URLhttps://<location>-aiplatform.googleapis.com
Key RequiredYes (service account JSON or gcloud CLI)
Free TierNo
AuthOAuth2 service account or gcloud auth print-access-token
ModelsGemini models via Google Cloud Vertex AI enterprise endpoint

Setup:

  1. Enable Vertex AI API in Google Cloud Console
  2. Either create a service account key file, or authenticate with gcloud auth application-default login
  3. Set environment variables:
    # Option A: Service account key file
    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
    export VERTEX_PROJECT="your-gcp-project"
    export VERTEX_LOCATION="us-central1"
    
    # Option B: gcloud CLI (no key file needed)
    gcloud auth application-default login
    export VERTEX_PROJECT="your-gcp-project"
    export VERTEX_LOCATION="us-central1"
    
  4. Configure in config.toml:
    [vertex_ai]
    project = "your-gcp-project"
    location = "us-central1"
    

Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.


Dynamic Provider Loading

Place custom provider definitions in ~/.librefang/providers/. Each .toml file defines one provider:

# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true

[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false

Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.

Provider Regions

Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:

# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY"    # Optional: override the default API key env var

Select a region in config.toml:

[provider_regions]
qwen = "intl"
minimax = "china"

Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.


Model Catalog

The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.

#Model IDDisplay NameProviderTierContext WindowMax OutputInput $/MOutput $/MToolsVision
1claude-opus-4-20250514Claude Opus 4anthropicFrontier200,00032,000$15.00$75.00YesYes
2claude-sonnet-4-20250514Claude Sonnet 4anthropicSmart200,00064,000$3.00$15.00YesYes
3claude-haiku-4-5-20251001Claude Haiku 4.5anthropicFast200,0008,192$0.25$1.25YesYes
4gpt-4.1GPT-4.1openaiFrontier1,047,57632,768$2.00$8.00YesYes
5gpt-4oGPT-4oopenaiSmart128,00016,384$2.50$10.00YesYes
6o3-minio3-miniopenaiSmart200,000100,000$1.10$4.40YesNo
7gpt-4.1-miniGPT-4.1 MiniopenaiBalanced1,047,57632,768$0.40$1.60YesYes
8gpt-4o-miniGPT-4o MiniopenaiFast128,00016,384$0.15$0.60YesYes
9gpt-4.1-nanoGPT-4.1 NanoopenaiFast1,047,57632,768$0.10$0.40YesNo
10gemini-2.5-proGemini 2.5 ProgeminiFrontier1,048,57665,536$1.25$10.00YesYes
11gemini-2.5-flashGemini 2.5 FlashgeminiSmart1,048,57665,536$0.15$0.60YesYes
12gemini-2.0-flashGemini 2.0 FlashgeminiFast1,048,5768,192$0.10$0.40YesYes
13deepseek-chatDeepSeek V3deepseekSmart64,0008,192$0.27$1.10YesNo
14deepseek-reasonerDeepSeek R1deepseekSmart64,0008,192$0.55$2.19NoNo
15llama-3.3-70b-versatileLlama 3.3 70BgroqBalanced128,00032,768$0.059$0.079YesNo
16mixtral-8x7b-32768Mixtral 8x7BgroqBalanced32,7684,096$0.024$0.024YesNo
17llama-3.1-8b-instantLlama 3.1 8BgroqFast128,0008,192$0.05$0.08YesNo
18gemma2-9b-itGemma 2 9BgroqFast8,1924,096$0.02$0.02NoNo
19openrouter/google/gemini-2.5-flashGemini 2.5 Flash (OpenRouter)openrouterSmart1,048,57665,536$0.15$0.60YesYes
20openrouter/anthropic/claude-sonnet-4Claude Sonnet 4 (OpenRouter)openrouterSmart200,00064,000$3.00$15.00YesYes
21openrouter/openai/gpt-4oGPT-4o (OpenRouter)openrouterSmart128,00016,384$2.50$10.00YesYes
22openrouter/deepseek/deepseek-chatDeepSeek V3 (OpenRouter)openrouterSmart128,00032,768$0.14$0.28YesNo
23openrouter/meta-llama/llama-3.3-70b-instructLlama 3.3 70B (OpenRouter)openrouterBalanced128,00032,768$0.39$0.39YesNo
24openrouter/qwen/qwen-2.5-72b-instructQwen 2.5 72B (OpenRouter)openrouterBalanced128,00032,768$0.36$0.36YesNo
25openrouter/google/gemini-2.5-proGemini 2.5 Pro (OpenRouter)openrouterFrontier1,048,57665,536$1.25$10.00YesYes
26openrouter/mistralai/mistral-large-latestMistral Large (OpenRouter)openrouterSmart128,0008,192$2.00$6.00YesNo
27openrouter/google/gemma-2-9b-itGemma 2 9B (OpenRouter)openrouterFast8,1924,096$0.00$0.00NoNo
28openrouter/deepseek/deepseek-r1DeepSeek R1 (OpenRouter)openrouterFrontier128,00032,768$0.55$2.19NoNo
29mistral-large-latestMistral LargemistralSmart128,0008,192$2.00$6.00YesNo
30codestral-latestCodestralmistralSmart32,0008,192$0.30$0.90YesNo
31mistral-small-latestMistral SmallmistralFast128,0008,192$0.10$0.30YesNo
32meta-llama/Meta-Llama-3.1-405B-Instruct-TurboLlama 3.1 405B (Together)togetherFrontier130,0004,096$3.50$3.50YesNo
33Qwen/Qwen2.5-72B-Instruct-TurboQwen 2.5 72B (Together)togetherSmart32,7684,096$0.20$0.60YesNo
34mistralai/Mixtral-8x22B-Instruct-v0.1Mixtral 8x22B (Together)togetherBalanced65,5364,096$0.60$0.60YesNo
35accounts/fireworks/models/llama-v3p1-405b-instructLlama 3.1 405B (Fireworks)fireworksFrontier131,07216,384$3.00$3.00YesNo
36accounts/fireworks/models/mixtral-8x22b-instructMixtral 8x22B (Fireworks)fireworksBalanced65,5364,096$0.90$0.90YesNo
37llama3.2Llama 3.2 (Ollama)ollamaLocal128,0004,096$0.00$0.00YesNo
38mistral:latestMistral (Ollama)ollamaLocal32,7684,096$0.00$0.00YesNo
39phi3Phi-3 (Ollama)ollamaLocal128,0004,096$0.00$0.00NoNo
40vllm-localvLLM Local ModelvllmLocal32,7684,096$0.00$0.00YesNo
41lmstudio-localLM Studio Local ModellmstudioLocal32,7684,096$0.00$0.00YesNo
42sonar-proSonar ProperplexitySmart200,0008,192$3.00$15.00NoNo
43sonarSonarperplexityBalanced128,0008,192$1.00$5.00NoNo
44command-r-plusCommand R+cohereSmart128,0004,096$2.50$10.00YesNo
45command-rCommand RcohereBalanced128,0004,096$0.15$0.60YesNo
46jamba-1.5-largeJamba 1.5 Largeai21Smart256,0004,096$2.00$8.00YesNo
47cerebras/llama3.3-70bLlama 3.3 70B (Cerebras)cerebrasBalanced128,0008,192$0.06$0.06YesNo
48cerebras/llama3.1-8bLlama 3.1 8B (Cerebras)cerebrasFast128,0008,192$0.01$0.01YesNo
49sambanova/llama-3.3-70bLlama 3.3 70B (SambaNova)sambanovaBalanced128,0008,192$0.06$0.06YesNo
50grok-2Grok 2xaiSmart131,07232,768$2.00$10.00YesYes
51grok-2-miniGrok 2 MinixaiFast131,07232,768$0.30$0.50YesNo
52hf/meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B (HF)huggingfaceBalanced128,0004,096$0.30$0.30NoNo
53replicate/meta-llama-3.3-70b-instructLlama 3.3 70B (Replicate)replicateBalanced128,0004,096$0.40$0.40NoNo

Model Tiers:

TierDescriptionTypical Use
FrontierMost capable, highest costOrchestration, architecture, security audits
SmartStrong reasoning, moderate costCoding, code review, research, analysis
BalancedGood cost/quality tradeoffPlanning, writing, DevOps, day-to-day tasks
FastCheapest cloud inferenceOps, translation, simple Q&A, health checks
LocalSelf-hosted, zero costPrivacy-first, offline, development

Notes:

  • Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with Local tier and zero cost.
  • The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.

Model Aliases

All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.

AliasResolves To
sonnetclaude-sonnet-4-20250514
claude-sonnetclaude-sonnet-4-20250514
haikuclaude-haiku-4-5-20251001
claude-haikuclaude-haiku-4-5-20251001
opusclaude-opus-4-20250514
claude-opusclaude-opus-4-20250514
gpt4gpt-4o
gpt4ogpt-4o
gpt4-minigpt-4o-mini
flashgemini-2.5-flash
gemini-flashgemini-2.5-flash
gemini-progemini-2.5-pro
deepseekdeepseek-chat
llamallama-3.3-70b-versatile
llama-70bllama-3.3-70b-versatile
mixtralmixtral-8x7b-32768
mistralmistral-large-latest
codestralcodestral-latest
grokgrok-2
grok-minigrok-2-mini
sonarsonar-pro
jambajamba-1.5-large
command-rcommand-r-plus

You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.


Per-Agent Model Override

Each agent in your config.toml can specify its own model, overriding the global default:

# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus"                      # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"   # cheap Groq model for simple ops

[[agents]]
name = "coder"
model = "gemini-2.5-flash"          # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"                 # Perplexity with built-in web search

# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514"  # never auto-routed

When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.


Model Routing

LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.

How It Works

  1. The ModelRouter scores each incoming CompletionRequest based on heuristics
  2. The score maps to a TaskComplexity tier: Simple, Medium, or Complex
  3. Each tier has a pre-configured model

Scoring Heuristics

SignalWeightLogic
Total message length1 point per ~4 charsRough token proxy
Tool availability+20 per tool definedTools imply multi-step work
Code markers+30 per marker foundBackticks, fn, def, class, import, function, async, await, struct, impl, return
Conversation depth+15 per message > 10Deep context = harder reasoning
System prompt length+1 per 10 chars > 500Long system prompts imply complex tasks

Thresholds

ComplexityScore RangeDefault Model
Simplescore < 100claude-haiku-4-5-20251001
Medium100 <= score < 500claude-sonnet-4-20250514
Complexscore >= 500claude-sonnet-4-20250514

Configuration

# In agent manifest or config.toml
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

The router also integrates with the model catalog:

  • validate_models() checks that all configured model IDs exist in the catalog
  • resolve_aliases() expands aliases to canonical IDs (e.g., "sonnet" becomes "claude-sonnet-4-20250514")

Cost Tracking

LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.

Per-Response Cost Estimation

After each LLM call, cost is calculated as:

cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate

The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.

Cost Rates (per million tokens)

Model PatternInput $/MOutput $/M
*haiku*$0.25$1.25
*sonnet*$3.00$15.00
*opus*$15.00$75.00
gpt-4o-mini$0.15$0.60
gpt-4o$2.50$10.00
gpt-4.1-nano$0.10$0.40
gpt-4.1-mini$0.40$1.60
gpt-4.1$2.00$8.00
o3-mini$1.10$4.40
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.15$0.60
gemini-2.0-flash$0.10$0.40
deepseek-reasoner / deepseek-r1$0.55$2.19
*deepseek*$0.27$1.10
*cerebras*$0.06$0.06
*sambanova*$0.06$0.06
*replicate*$0.40$0.40
*llama* / *mixtral*$0.05$0.10
*qwen*$0.20$0.60
mistral-large*$2.00$6.00
*mistral* (other)$0.10$0.30
command-r-plus$2.50$10.00
command-r$0.15$0.60
sonar-pro$3.00$15.00
*sonar* (other)$1.00$5.00
grok-2-mini / grok-mini$0.30$0.50
*grok* (other)$2.00$10.00
*jamba*$2.00$8.00
Default (unknown)$1.00$3.00

Quota Enforcement

Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.

# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00   # cap at $5/hour

The usage footer (when enabled) appends cost information to each response:

> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514

Fallback Providers

The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.

Behavior

  • On success: returns immediately
  • On rate limit / overload errors (429, 529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
  • On all other errors: logs a warning and tries the next driver in the chain
  • If all drivers fail: returns the last error

Configuration

Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.

# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]

The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).


API Endpoints

List All Models

GET /api/models

Returns the complete model catalog with metadata, pricing, and feature flags.

Response:

[
  {
    "id": "claude-sonnet-4-20250514",
    "display_name": "Claude Sonnet 4",
    "provider": "anthropic",
    "tier": "Smart",
    "context_window": 200000,
    "max_output_tokens": 64000,
    "input_cost_per_m": 3.0,
    "output_cost_per_m": 15.0,
    "supports_tools": true,
    "supports_vision": true,
    "supports_streaming": true,
    "aliases": ["sonnet", "claude-sonnet"]
  }
]

Get Specific Model

GET /api/models/{id}

Returns a single model entry. Supports both canonical IDs and aliases.

GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514

List Aliases

GET /api/models/aliases

Returns a map of all alias-to-canonical-ID mappings.

Response:

{
  "sonnet": "claude-sonnet-4-20250514",
  "haiku": "claude-haiku-4-5-20251001",
  "flash": "gemini-2.5-flash",
  "grok": "grok-2"
}

List Providers

GET /api/providers

Returns all 49 providers with auth status and model counts.

Response:

[
  {
    "id": "anthropic",
    "display_name": "Anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": "https://api.anthropic.com",
    "key_required": true,
    "auth_status": "Configured",
    "model_count": 3
  },
  {
    "id": "ollama",
    "display_name": "Ollama",
    "api_key_env": "OLLAMA_API_KEY",
    "base_url": "http://localhost:11434/v1",
    "key_required": false,
    "auth_status": "NotRequired",
    "model_count": 5
  }
]

Auth status values: Configured, Missing, NotRequired.

Set Provider API Key

POST /api/providers/{name}/key
Content-Type: application/json

{ "api_key": "sk-..." }

Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).

Remove Provider API Key

DELETE /api/providers/{name}/key

Removes the configured API key for a provider.

Test Provider Connection

POST /api/providers/{name}/test

Sends a minimal test request to verify the provider is reachable and the API key is valid.


Channel Commands

Two chat commands are available in any channel for inspecting models and providers:

/models

Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).

/models

Example output:

Available models (12):

Frontier:
  claude-opus-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-pro (Google Gemini) — 1M ctx

Smart:
  claude-sonnet-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-flash (Google Gemini) — 1M ctx
  deepseek-chat (DeepSeek) — 64K ctx

Balanced:
  llama-3.3-70b-versatile (Groq) — 128K ctx

Fast:
  claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
  gemini-2.0-flash (Google Gemini) — 1M ctx

Local:
  llama3.2 (Ollama) — 128K ctx

/providers

Lists all 49 providers with their authentication status.

/providers

Example output:

LLM Providers (49):

  Anthropic          ANTHROPIC_API_KEY       Configured    3 models
  OpenAI             OPENAI_API_KEY          Missing       6 models
  Google Gemini      GEMINI_API_KEY          Configured    3 models
  DeepSeek           DEEPSEEK_API_KEY        Missing       2 models
  Groq               GROQ_API_KEY            Configured    4 models
  Ollama             (no key needed)         Ready         3 models
  vLLM               (no key needed)         Ready         1 model
  LM Studio          (no key needed)         Ready         1 model
  ...

Environment Variables Summary

Quick reference for all provider environment variables:

ProviderEnv VarRequired
AnthropicANTHROPIC_API_KEYYes
OpenAIOPENAI_API_KEYYes
Google GeminiGEMINI_API_KEY or GOOGLE_API_KEYYes
DeepSeekDEEPSEEK_API_KEYYes
GroqGROQ_API_KEYYes
OpenRouterOPENROUTER_API_KEYYes
Mistral AIMISTRAL_API_KEYYes
Together AITOGETHER_API_KEYYes
Fireworks AIFIREWORKS_API_KEYYes
OllamaOLLAMA_API_KEYNo
vLLMVLLM_API_KEYNo
LM StudioLMSTUDIO_API_KEYNo
Perplexity AIPERPLEXITY_API_KEYYes
CohereCOHERE_API_KEYYes
AI21 LabsAI21_API_KEYYes
CerebrasCEREBRAS_API_KEYYes
SambaNovaSAMBANOVA_API_KEYYes
Hugging FaceHF_API_KEYYes
xAIXAI_API_KEYYes
ReplicateREPLICATE_API_TOKENYes
Claude CodeANTHROPIC_API_KEYYes
NVIDIA NIMNVIDIA_API_KEYYes
Voyage AIVOYAGE_API_KEYYes
AnyscaleANYSCALE_API_KEYYes
DeepInfraDEEPINFRA_API_KEYYes
Azure OpenAIAZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINTYes
Amazon BedrockAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGIONYes
Google Vertex AIGOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATIONYes

Security Notes

  • All API keys are stored as Zeroizing<String> -- the key material is automatically overwritten with zeros when the value is dropped from memory.
  • Auth detection (detect_auth()) only checks std::env::var() for presence -- it never reads or logs the actual secret value.
  • Provider API keys set via the REST API (POST /api/providers/{name}/key) follow the same zeroization policy.
  • The health endpoint (/api/health) never exposes provider auth status or API keys. Detailed info is behind /api/health/detail which requires authentication.
  • All DriverConfig and KernelConfig structs implement Debug with secret redaction -- API keys are printed as "***" in logs.