LLM Settings

Configure language model providers, manage API keys, set up fallback models, and connect custom endpoints for your AI agents.

app.8bit-ai.com
LLM Settings

Supported Providers

8bit-ai supports multiple LLM providers, giving you the flexibility to choose the best model for each use case. You can configure multiple providers and assign different models to different agents.

ProviderModels AvailableStrengthsPricing Model
OpenAIGPT-4 Turbo, GPT-4o, GPT-4o MiniStrong reasoning, broad ecosystemPay-per-token
AnthropicClaude Sonnet 4, Claude HaikuLong context, nuanced outputPay-per-token
GoogleGemini Pro, Gemini FlashMultimodal, cost-effectivePay-per-token
Open SourceLlama 3, Mistral, MixtralSelf-hosted, no data leakageHosting cost only

Provider Comparison

Each provider excels in different areas. OpenAI offers the broadest model range; Anthropic provides the longest context windows; Google's Gemini is cost-effective for high volume; open-source models give you full control over data privacy.

Provider Configuration

Configure each provider with your API keys and preferred default models. API keys are encrypted at rest and never exposed in logs or API responses.

OpenAI Configuration

Anthropic Configuration

Google Configuration

API Key Security

Never hardcode API keys in client-side code or version control. Use environment variables or the 8bit-ai dashboard's secure credential storage. Keys are encrypted with AES-256 at rest.

Fallback Models

Configure fallback models to ensure high availability. If the primary model is unavailable or rate-limited, the system automatically routes to the fallback model.

Automatic Fallback Chain

Define an ordered list of fallback models. The system attempts the primary model first, then falls through the chain based on configurable conditions.

Fallback Conditions

any_error:Fall back on any API or network error
rate_limited:Only when rate limits are exceeded
timeout:When response exceeds timeout threshold
latency:When p95 latency exceeds configured limit

High Availability

With fallback models configured, your agents maintain uptime even during provider outages. The system logs all fallback events for monitoring and analysis.

Custom Endpoints

Connect self-hosted or third-party LLM endpoints that are compatible with the OpenAI API format. This includes local deployments of Llama, Mistral, vLLM, and other open-source models.

Self-Hosted Model Requirements

  • Endpoint must expose an OpenAI-compatible chat completions API
  • HTTPS is required for production endpoints
  • Minimum throughput of 10 tokens/second recommended
  • Support for streaming responses is optional but recommended

Supported Formats

openai-compatible:Standard OpenAI chat completions format
anthropic-compatible:Anthropic messages API format
tgi:Hugging Face Text Generation Inference

Latency Considerations

Self-hosted endpoints may introduce additional latency. Ensure your infrastructure can handle the expected request volume. Monitor endpoint health through the dashboard.

Usage Monitoring

Track token usage and costs across all configured providers to optimize spending.