LLM Settings

Configure language model providers, manage API keys, set up fallback models, and connect custom endpoints for your AI agents.

app.8bit-ai.com

Supported Providers

8bit-ai supports multiple LLM providers, giving you the flexibility to choose the best model for each use case. You can configure multiple providers and assign different models to different agents.

Provider	Models Available	Strengths	Pricing Model
OpenAI	GPT-4 Turbo, GPT-4o, GPT-4o Mini	Strong reasoning, broad ecosystem	Pay-per-token
Anthropic	Claude Sonnet 4, Claude Haiku	Long context, nuanced output	Pay-per-token
Google	Gemini Pro, Gemini Flash	Multimodal, cost-effective	Pay-per-token
Open Source	Llama 3, Mistral, Mixtral	Self-hosted, no data leakage	Hosting cost only

Provider Comparison

Each provider excels in different areas. OpenAI offers the broadest model range; Anthropic provides the longest context windows; Google's Gemini is cost-effective for high volume; open-source models give you full control over data privacy.

Provider Configuration

Configure each provider with your API keys and preferred default models. API keys are encrypted at rest and never exposed in logs or API responses.

OpenAI Configuration

Anthropic Configuration

Google Configuration

API Key Security

Never hardcode API keys in client-side code or version control. Use environment variables or the 8bit-ai dashboard's secure credential storage. Keys are encrypted with AES-256 at rest.

Fallback Models

Configure fallback models to ensure high availability. If the primary model is unavailable or rate-limited, the system automatically routes to the fallback model.

Automatic Fallback Chain

Define an ordered list of fallback models. The system attempts the primary model first, then falls through the chain based on configurable conditions.

Fallback Conditions

any_error:Fall back on any API or network error

rate_limited:Only when rate limits are exceeded

timeout:When response exceeds timeout threshold

latency:When p95 latency exceeds configured limit

High Availability

With fallback models configured, your agents maintain uptime even during provider outages. The system logs all fallback events for monitoring and analysis.

Custom Endpoints

Connect self-hosted or third-party LLM endpoints that are compatible with the OpenAI API format. This includes local deployments of Llama, Mistral, vLLM, and other open-source models.

Self-Hosted Model Requirements

Endpoint must expose an OpenAI-compatible chat completions API
HTTPS is required for production endpoints
Minimum throughput of 10 tokens/second recommended
Support for streaming responses is optional but recommended

Supported Formats

openai-compatible:Standard OpenAI chat completions format

anthropic-compatible:Anthropic messages API format

tgi:Hugging Face Text Generation Inference

Latency Considerations

Self-hosted endpoints may introduce additional latency. Ensure your infrastructure can handle the expected request volume. Monitor endpoint health through the dashboard.

Usage Monitoring

Track token usage and costs across all configured providers to optimize spending.

Agent Configuration

Parameters and model selection

API Authentication

API key management and security