Model Fallback¶
Automatic failover across multiple LLM providers. If the primary model fails, the next one in the chain handles the request — no downtime, no manual switching.
from promptise import build_agent, FallbackChain
agent = await build_agent(
model=FallbackChain([
"openai:gpt-5-mini", # Primary
"anthropic:claude-sonnet-4-20250514", # Fallback 1
"ollama:llama3", # Fallback 2 (local, always up)
]),
servers=servers,
)
# If OpenAI is down → Claude handles it. If both down → local Llama.
Why¶
Single-provider agents are a single point of failure. When OpenAI has an outage at 3am, your agent goes down with it. With a fallback chain, the next provider picks up seamlessly. With 3 providers at 99.9% uptime each, compound availability reaches 99.9999%.
How It Works¶
FallbackChain is a BaseChatModel — LangChain and Promptise treat it like any other model. Under the hood:
- Primary model receives the request
- If it fails (error, timeout, rate limit), the next model is tried
- Each model has an independent circuit breaker — after N consecutive failures, the model is skipped entirely for a recovery period
- When the recovery period elapses, one test request is sent (half-open state)
- If the test succeeds, the circuit closes and the model resumes normal traffic
Configuration¶
FallbackChain(
models=["openai:gpt-5-mini", "anthropic:claude-sonnet-4-20250514", "ollama:llama3"],
timeout_per_model=15.0, # 15s max per model attempt (0 = no limit)
global_timeout=30.0, # 30s max across ALL attempts (0 = no limit)
failure_threshold=3, # Circuit opens after 3 consecutive failures
recovery_timeout=60.0, # 60s before testing a tripped circuit
on_fallback=my_callback, # Optional: called on each fallback activation
)
| Parameter | Type | Default | Description |
|---|---|---|---|
models |
list[str \| BaseChatModel] |
required | Ordered list of models. First is primary. |
timeout_per_model |
float |
0 |
Max seconds per model attempt. 0 = provider default. |
global_timeout |
float |
0 |
Max seconds across all attempts combined. |
failure_threshold |
int |
3 |
Consecutive failures before circuit opens. |
recovery_timeout |
float |
60.0 |
Seconds before a tripped circuit allows a test. |
on_fallback |
callable \| None |
None |
(primary_id, fallback_id, error) callback. |
Circuit Breaker States¶
| State | Behavior |
|---|---|
| Closed | Normal operation. Requests go to this model. |
| Open | Model is skipped entirely. Too many recent failures. |
| Half-open | Recovery period elapsed. One test request is allowed. If it succeeds → closed. If it fails → open again. |
Monitoring¶
chain = FallbackChain([...])
status = chain.get_chain_status()
# [
# {"model_id": "openai:gpt-5-mini", "state": "closed", "failures": 0, "is_primary": True},
# {"model_id": "claude-sonnet", "state": "open", "failures": 5, "is_primary": False},
# ]
chain.active_model # "openai:gpt-5-mini" (first non-skipped model)
Fallback Notifications¶
Combine with the Events system to get notified when fallbacks activate:
from promptise import FallbackChain, EventNotifier, WebhookSink
def on_fallback(primary, fallback, error):
print(f"Switched from {primary} to {fallback}: {error}")
agent = await build_agent(
model=FallbackChain(
["openai:gpt-5-mini", "anthropic:claude-sonnet-4-20250514"],
on_fallback=on_fallback,
),
events=EventNotifier(sinks=[
WebhookSink("https://hooks.slack.com/...", events=["invocation.error"]),
]),
servers=servers,
)
What's Next?¶
- Events — get notified when models fail or fallbacks activate
- Observability — track which model served each request
- Building Agents — full
build_agent()parameter reference