Building Agentic Multi-User Systems¶
What You'll Build¶
A production-ready multi-user AI application where authenticated users each get isolated conversations, personalized memory, scoped cache, and guardrailed responses — backed by MCP servers with JWT authentication, role-based access control, tamper-evident audit logging, and per-user session state. The full stack: user authenticates → agent processes their request with their identity → MCP tools enforce their permissions → response is guardrailed → conversation persists to their session.
Concepts¶
A multi-user agentic system has three layers of identity:
- User identity — who the human is (JWT claims, user_id, roles, scopes). Carried by
CallerContexton every agent invocation. - Agent-to-server identity — the agent authenticating to MCP servers. Carried by
HTTPServerSpec.bearer_tokenon every tool call. - Session identity — which conversation this message belongs to. Carried by
session_idanduser_idon everychat()call.
These three layers work together. The user's JWT flows from your backend through the agent to the MCP server. The MCP server validates the token, extracts roles/scopes, and applies per-tool guards. The conversation store enforces ownership. The cache isolates per-user. Guardrails scan every input and output. Observability records every decision with the authenticated identity attached.
User (JWT) → Your Backend → build_agent(bearer_token=jwt)
↓
CallerContext(user_id, roles, scopes)
↓
┌───────────────┼───────────────┐
↓ ↓ ↓
Guardrails Semantic Cache Conversation Store
(scan I/O) (per_user scope) (ownership enforced)
↓ ↓ ↓
└───────────────┼───────────────┘
↓
MCP Server (JWT validated)
↓
Guards (HasRole, HasScope)
↓
Tool Handler (ctx.client has identity)
↓
Audit Log (client_id + tool + args + result)
Step 1: Authenticate Users on Your MCP Server¶
Start with the MCP server — the tool layer that enforces permissions. Every tool call arrives with the caller's identity.
from promptise.mcp.server import MCPServer, JWTAuth, AuthMiddleware, HasRole, HasScope
auth = JWTAuth(secret="${JWT_SECRET}")
server = MCPServer(name="customer-api", auth=auth)
The JWTAuth provider validates Bearer tokens on every request. It extracts sub (client_id), roles, and scope claims from the JWT payload. Verified tokens are cached in an LRU (256 entries by default) so repeated calls with the same token skip cryptographic validation.
A real MCP server with 30 tools¶
Production servers have many tools across domains. Here's a customer operations server with 30 tools organized into routers — each domain has its own namespace, guards, and middleware:
from promptise.mcp.server import (
MCPServer, MCPRouter, JWTAuth, AuthMiddleware,
HasRole, HasScope, HasAllRoles, RequireAuth,
LoggingMiddleware, TimeoutMiddleware, RateLimitMiddleware,
AuditMiddleware, RequestContext,
)
auth = JWTAuth(secret="${JWT_SECRET}")
server = MCPServer(name="customer-ops", auth=auth)
server.add_middleware(LoggingMiddleware())
server.add_middleware(TimeoutMiddleware(default_timeout=30.0))
server.add_middleware(RateLimitMiddleware(requests_per_second=20))
server.add_middleware(AuditMiddleware(secret="${AUDIT_SECRET}"))
# ── Customer Management (7 tools) ────────────────────────────
customers = MCPRouter(prefix="customers")
@customers.tool(guards=[HasScope("read")])
async def get_customer(customer_id: str, ctx: RequestContext) -> dict:
"""Retrieve customer profile."""
return await db.get_customer(customer_id, tenant=ctx.client_id)
@customers.tool(guards=[HasScope("read")])
async def search_customers(query: str, limit: int = 10) -> list[dict]:
"""Search customers by name, email, or phone."""
return await db.search_customers(query, limit=limit)
@customers.tool(guards=[HasScope("write")])
async def update_customer(customer_id: str, updates: dict) -> dict:
"""Update customer profile fields."""
return await db.update_customer(customer_id, updates)
@customers.tool(guards=[HasAllRoles("admin", "support")])
async def delete_customer(customer_id: str) -> str:
"""Permanently delete a customer. Requires admin + support roles."""
await db.delete_customer(customer_id)
return f"Customer {customer_id} deleted"
@customers.tool(guards=[HasScope("read")])
async def get_customer_history(customer_id: str) -> list[dict]:
"""Get interaction history for a customer."""
return await db.get_interactions(customer_id)
@customers.tool(guards=[HasScope("write")])
async def add_customer_note(customer_id: str, note: str) -> dict:
"""Add an internal note to a customer profile."""
return await db.add_note(customer_id, note)
@customers.tool(guards=[HasScope("read")])
async def get_customer_segments(customer_id: str) -> list[str]:
"""Get marketing segments for a customer."""
return await db.get_segments(customer_id)
server.mount(customers)
# ── Orders & Billing (8 tools) ───────────────────────────────
orders = MCPRouter(prefix="orders")
@orders.tool(guards=[HasScope("read")])
async def get_order(order_id: str) -> dict:
"""Retrieve order details."""
return await db.get_order(order_id)
@orders.tool(guards=[HasScope("read")])
async def list_orders(customer_id: str, status: str = "all") -> list[dict]:
"""List orders for a customer, optionally filtered by status."""
return await db.list_orders(customer_id, status=status)
@orders.tool(guards=[HasScope("read")])
async def track_shipment(order_id: str) -> dict:
"""Get real-time shipment tracking for an order."""
return await shipping.track(order_id)
@orders.tool(guards=[HasScope("billing:read")])
async def get_invoice(invoice_id: str) -> dict:
"""Retrieve invoice details."""
return await billing.get_invoice(invoice_id)
@orders.tool(guards=[HasScope("billing:write")])
async def process_refund(order_id: str, amount: float, reason: str) -> dict:
"""Process a refund. Requires billing:write scope."""
return await billing.refund(order_id, amount, reason)
@orders.tool(guards=[HasScope("billing:write")])
async def apply_discount(order_id: str, code: str) -> dict:
"""Apply a discount code to an order."""
return await billing.apply_discount(order_id, code)
@orders.tool(guards=[HasScope("write")])
async def cancel_order(order_id: str, reason: str) -> str:
"""Cancel a pending order."""
return await db.cancel_order(order_id, reason)
@orders.tool(guards=[HasScope("billing:read")])
async def get_payment_methods(customer_id: str) -> list[dict]:
"""List saved payment methods for a customer."""
return await billing.get_payment_methods(customer_id)
server.mount(orders)
# ── Support Tickets (8 tools) ────────────────────────────────
tickets = MCPRouter(prefix="tickets")
@tickets.tool(guards=[HasScope("read")])
async def get_ticket(ticket_id: str) -> dict:
"""Retrieve a support ticket."""
return await db.get_ticket(ticket_id)
@tickets.tool(guards=[HasScope("read")])
async def list_tickets(customer_id: str = None, status: str = "open") -> list[dict]:
"""List support tickets, optionally filtered by customer and status."""
return await db.list_tickets(customer_id=customer_id, status=status)
@tickets.tool(guards=[HasScope("write")])
async def create_ticket(customer_id: str, subject: str, description: str, priority: str = "medium") -> dict:
"""Create a new support ticket."""
return await db.create_ticket(customer_id, subject, description, priority)
@tickets.tool(guards=[HasScope("write")])
async def update_ticket(ticket_id: str, updates: dict) -> dict:
"""Update ticket fields (status, priority, assignee)."""
return await db.update_ticket(ticket_id, updates)
@tickets.tool(guards=[HasScope("write")])
async def add_ticket_comment(ticket_id: str, comment: str, internal: bool = False) -> dict:
"""Add a comment to a ticket. Set internal=True for agent-only notes."""
return await db.add_ticket_comment(ticket_id, comment, internal)
@tickets.tool(guards=[HasRole("support")])
async def assign_ticket(ticket_id: str, agent_id: str) -> str:
"""Assign a ticket to a support agent."""
return await db.assign_ticket(ticket_id, agent_id)
@tickets.tool(guards=[HasRole("support")])
async def escalate_ticket(ticket_id: str, reason: str) -> str:
"""Escalate a ticket to tier-2 support."""
return await db.escalate_ticket(ticket_id, reason)
@tickets.tool(guards=[HasRole("support")])
async def close_ticket(ticket_id: str, resolution: str) -> str:
"""Close a resolved ticket."""
return await db.close_ticket(ticket_id, resolution)
server.mount(tickets)
# ── Analytics & Reporting (4 tools) ──────────────────────────
analytics = MCPRouter(prefix="analytics")
@analytics.tool(guards=[HasRole("analyst")])
async def customer_lifetime_value(customer_id: str) -> dict:
"""Calculate customer lifetime value."""
return await analytics_db.clv(customer_id)
@analytics.tool(guards=[HasRole("analyst")])
async def churn_risk_score(customer_id: str) -> dict:
"""Get churn risk prediction for a customer."""
return await ml.predict_churn(customer_id)
@analytics.tool(guards=[HasRole("analyst")])
async def revenue_report(period: str = "monthly") -> dict:
"""Generate revenue summary report."""
return await analytics_db.revenue(period)
@analytics.tool(guards=[HasRole("analyst")])
async def support_metrics(period: str = "weekly") -> dict:
"""Get support team performance metrics."""
return await analytics_db.support_metrics(period)
server.mount(analytics)
# ── Internal Operations (3 tools) ────────────────────────────
ops = MCPRouter(prefix="ops")
@ops.tool(guards=[HasRole("admin")])
async def system_health() -> dict:
"""Check system health across all services."""
return await monitoring.health_check()
@ops.tool(guards=[HasAllRoles("admin", "ops")])
async def feature_toggle(feature: str, enabled: bool) -> str:
"""Enable or disable a feature flag. Requires admin + ops."""
await config_db.set_feature(feature, enabled)
return f"Feature {feature} {'enabled' if enabled else 'disabled'}"
@ops.tool(guards=[HasRole("admin")])
async def audit_log(hours: int = 24) -> list[dict]:
"""Retrieve recent audit log entries."""
return await db.get_audit_log(hours=hours)
server.mount(ops)
That's 30 tools across 5 domains. Each domain has its own router with appropriate guards. The agent discovers all 30 automatically — but sending all 30 tool descriptions to the LLM on every message wastes thousands of tokens. That's where semantic tool optimization comes in.
For asymmetric tokens from identity providers (Auth0, Keycloak, Okta):
from promptise.mcp.server import AsymmetricJWTAuth
auth = AsymmetricJWTAuth(
public_key_pem="${IDP_PUBLIC_KEY}",
algorithms=["RS256"],
issuer="https://auth.example.com",
audience="my-api",
)
For simpler deployments, API key authentication with role mappings:
from promptise.mcp.server import APIKeyAuth
auth = APIKeyAuth(keys={
"key-admin-abc": {"client_id": "admin-agent", "roles": ["admin"]},
"key-readonly-xyz": {"client_id": "reader-agent", "roles": ["reader"]},
})
Step 2: Protect Tools with Guards¶
Guards enforce what an authenticated client can do. Authentication tells you who. Guards decide whether they're allowed.
@server.tool(guards=[HasRole("analyst")])
async def get_customer_data(customer_id: str) -> dict:
"""Retrieve customer information."""
return await db.get_customer(customer_id)
@server.tool(guards=[HasRole("admin")])
async def delete_customer(customer_id: str) -> str:
"""Delete a customer record. Requires admin role."""
await db.delete_customer(customer_id)
return f"Customer {customer_id} deleted"
@server.tool(guards=[HasScope("billing:write")])
async def process_refund(order_id: str, amount: float) -> dict:
"""Process a refund. Requires billing:write scope."""
return await billing.refund(order_id, amount)
Six built-in guards:
| Guard | What it checks |
|---|---|
RequireAuth() |
Client is authenticated (any identity) |
HasRole("admin") |
Client has at least one of the given roles |
HasAllRoles("admin", "billing") |
Client has every given role |
HasScope("read") |
Client has at least one of the given OAuth2 scopes |
HasAllScopes("read", "write") |
Client has every given scope |
RequireClientId("agent-007") |
Client ID matches exactly |
Custom guards implement the Guard protocol — any callable that takes a RequestContext and returns allow/deny:
class RequirePremiumPlan:
async def check(self, ctx):
plan = ctx.client.extra.get("plan")
return plan in ("pro", "enterprise")
def describe_denial(self, ctx):
return "This tool requires a Pro or Enterprise plan."
@server.tool(guards=[RequirePremiumPlan()])
async def advanced_analytics(query: str) -> dict: ...
Step 3: Access Client Identity in Tool Handlers¶
Inside any tool handler, ctx.client carries the full authenticated identity:
from promptise.mcp.server import RequestContext
@server.tool(auth=True)
async def my_tool(query: str, ctx: RequestContext) -> dict:
user_id = ctx.client_id # "user-42"
roles = ctx.client.roles # {"analyst", "data-team"}
scopes = ctx.client.scopes # {"read", "write"}
issuer = ctx.client.issuer # "https://auth.example.com"
ip = ctx.client.ip_address # "192.168.1.100"
claims = ctx.client.claims # Full JWT payload
org = ctx.client.extra.get("org_id") # Custom enrichment
# Use identity for data isolation
return await db.query(query, tenant_id=user_id)
The ClientContext carries everything your handler needs to scope data access, log actions, and enforce tenant isolation.
Enrich client context with custom data¶
Use the on_authenticate hook to add application-specific fields (plan, org, tenant) from your database:
async def enrich_client(ctx):
user = await db.get_user(ctx.client_id)
ctx.client.extra["org_id"] = user.org_id
ctx.client.extra["plan"] = user.plan
return ctx
middleware = AuthMiddleware(auth, on_authenticate=enrich_client)
server.add_middleware(middleware)
Step 4: Enable Tamper-Evident Audit Logging¶
Every tool call is recorded with the authenticated identity — who called what, when, with what arguments, and what happened:
from promptise.mcp.server import AuditMiddleware
audit = AuditMiddleware(secret="${AUDIT_HMAC_SECRET}")
server.add_middleware(audit)
Each audit entry includes client_id, tool_name, request_id, timestamp, duration, status, and optionally args and result. Every entry is HMAC-chained — each entry includes a hash of the previous one. If anyone tampers with the log (edit, delete, reorder), the chain breaks and the tampering is detectable.
Step 5: Per-Client Session State on the Server¶
MCP session state lets tools store per-client data that persists across tool calls within the same session:
from promptise.mcp.server import SessionState
@server.tool()
async def track_search(query: str, state: SessionState) -> str:
"""Track user searches within this session."""
history = state.get("search_history", [])
history.append(query)
state.set("search_history", history)
return f"Tracked. {len(history)} searches this session."
Each client session gets its own SessionState — isolated automatically. No cross-client leakage.
Step 6: Connect the Agent with User Identity¶
Now connect the agent to your authenticated MCP server. The user's JWT flows through:
from promptise import build_agent
from promptise.config import HTTPServerSpec
# Your backend receives the user's JWT from their request
user_jwt = request.headers["Authorization"].split(" ")[1]
agent = await build_agent(
model="openai:gpt-5-mini",
servers={
"customer_api": HTTPServerSpec(
url="http://localhost:8000/mcp",
bearer_token=user_jwt, # User's JWT flows to MCP server
),
},
instructions="You are a customer support agent.",
)
Every tool call the agent makes includes this JWT in the Authorization header. The MCP server validates it, extracts roles/scopes, and applies guards. If the user has the analyst role, they can call get_customer_data. If they don't have admin, delete_customer is blocked.
Semantic tool optimization — 30 tools without the token cost¶
With 30 tools, every LLM call sends ~3,000-5,000 tokens of tool descriptions. Most messages only need 3-5 tools. Semantic optimization analyzes the user's message, matches it against tool descriptions using a local embedding model, and sends only the relevant tools:
agent = await build_agent(
model="openai:gpt-5-mini",
servers={
"customer_ops": HTTPServerSpec(
url="http://customer-ops:8000/mcp",
bearer_token=user_jwt,
),
},
optimize_tools="semantic", # 40-70% fewer tokens per request
)
"What's the status of order #12345?" → the LLM sees get_order, track_shipment, list_orders — not the 27 other tools about tickets, analytics, and admin operations. "Show me the churn risk for customer 42" → the LLM sees churn_risk_score, customer_lifetime_value, get_customer — not billing or ticket tools.
The embedding model runs locally (no API calls, no data leaving your infrastructure). For custom models or air-gapped deployments:
from promptise.tools import ToolOptimizationConfig, OptimizationLevel
agent = await build_agent(
...,
optimize_tools=ToolOptimizationConfig(
level=OptimizationLevel.SEMANTIC,
max_tools=8, # Send at most 8 tools per request
embedding_model="/models/local/all-MiniLM-L6-v2", # Fully offline
),
)
Step 7: Carry User Identity Through the Agent¶
CallerContext carries the user's identity through every layer of the agent — guardrails, cache, conversation store, observability:
from promptise.agent import CallerContext
caller = CallerContext(
user_id="user-42",
bearer_token=user_jwt,
roles={"analyst", "support"},
scopes={"read", "write"},
metadata={"ip": request.remote_addr, "session_id": "sess_abc123"},
)
result = await agent.ainvoke(
{"messages": [{"role": "user", "content": "Show me customer 99's data"}]},
caller=caller,
)
Every downstream system sees this caller:
- Guardrails —
get_current_caller()accesses the caller for audit enrichment - Semantic cache —
_build_scope_key(caller)isolates cache per user - Observability — timeline events include the authenticated identity
- Conversation store — ownership enforcement uses
caller.user_id
Step 8: Add Conversation Persistence with Ownership¶
Users expect to resume conversations. The conversation store persists messages across sessions with ownership enforcement — User A cannot access User B's conversations:
from promptise import build_agent
from promptise.conversations import PostgresConversationStore
store = PostgresConversationStore(dsn="postgresql://localhost/myapp")
agent = await build_agent(
...,
conversation_store=store,
conversation_max_messages=100,
)
# Each user's conversations are isolated
response = await agent.chat(
"What's my order status?",
session_id="sess_abc123",
user_id="user-42",
)
# Different user trying to access this session → SessionAccessDenied
try:
await agent.chat("Hello", session_id="sess_abc123", user_id="user-99")
except SessionAccessDenied as e:
print(f"Blocked: user-99 tried to access user-42's session")
Four built-in stores:
| Store | Use case |
|---|---|
InMemoryConversationStore |
Testing, development |
SQLiteConversationStore |
Single-node local apps |
PostgresConversationStore |
Production, distributed |
RedisConversationStore |
Ephemeral, caching layer |
Session IDs are generated with generate_session_id() — cryptographically random, non-enumerable (secrets.token_hex(16)). Ownership is assigned on the first message and enforced on every subsequent access.
Session management¶
# List a user's sessions
sessions = await agent.list_sessions(user_id="user-42")
# Get session metadata (enforces ownership)
session = await agent.get_session("sess_abc123", user_id="user-42")
# Delete a session (enforces ownership)
await agent.delete_session("sess_abc123", user_id="user-42")
Step 9: Add Semantic Caching with User Isolation¶
Identical questions from the same user get instant responses from cache. Different users never see each other's cached responses:
from promptise.cache import SemanticCache
cache = SemanticCache(
scope="per_user", # Default — each user gets isolated cache
similarity_threshold=0.92,
default_ttl=3600,
)
agent = await build_agent(..., cache=cache)
# User 42 asks a question → LLM call → response cached under user:user-42
await agent.ainvoke(input, caller=CallerContext(user_id="user-42"))
# User 42 asks a similar question → cache hit (instant response, zero tokens)
await agent.ainvoke(similar_input, caller=CallerContext(user_id="user-42"))
# User 99 asks the same question → cache miss (different user scope)
await agent.ainvoke(same_input, caller=CallerContext(user_id="user-99"))
Three isolation levels:
| Scope | Isolation | Use case |
|---|---|---|
per_user (default) |
Each user has own cache | Any personalized agent |
per_session |
Each session has own cache | Context-heavy conversations |
shared |
Everyone shares cache | Public FAQ, documentation bots |
GDPR compliance: await cache.purge_user("user-42") removes all cached entries for that user.
Step 10: Add Guardrails for Input/Output Security¶
Protect every user's input from injection attacks and every output from PII/credential leakage:
from promptise import PromptiseSecurityScanner
scanner = PromptiseSecurityScanner.default()
scanner.warmup()
agent = await build_agent(
...,
guardrails=scanner,
)
- Input: Prompt injection attempts are blocked before reaching the LLM
- Output: Credit card numbers, SSNs, API keys, and 160+ sensitive patterns are redacted before reaching the user
- Cached responses: Output guardrails run on every cache serve — new rules catch old cached PII
Guardrails work on every invocation path: ainvoke(), astream(), and chat().
Step 11: Add Memory with Auto-Injection¶
Give the agent persistent memory that's automatically searched and injected before every invocation:
from promptise.memory import ChromaProvider
memory = ChromaProvider(
collection_name="support_memory",
persist_directory=".promptise/memory",
)
agent = await build_agent(
...,
memory=memory,
memory_auto_store=True, # Auto-persist exchanges
)
Memory is agent-level (shared across all users of this agent). For per-user memory, use the Mem0Provider with user_id scoping, or build a custom provider that filters by user.
Step 12: Full Observability with Identity Tracking¶
Every invocation is traced with the authenticated identity:
from promptise.observability_config import ObservabilityConfig, ObserveLevel, TransporterType
agent = await build_agent(
...,
observe=ObservabilityConfig(
level=ObserveLevel.STANDARD,
transporters=[
TransporterType.STRUCTURED_LOG, # ELK/Splunk/Datadog
TransporterType.PROMETHEUS, # Grafana dashboards
],
log_file="./logs/agent.jsonl",
correlation_id=request_id, # Tie to your HTTP request
),
)
Timeline events include tool calls, token counts, latencies, cache hits/misses, and guardrail results — all tagged with the correlation ID for end-to-end tracing.
Complete Example¶
A multi-user customer support system with every feature integrated:
import asyncio
from promptise import build_agent, PromptiseSecurityScanner
from promptise.agent import CallerContext
from promptise.config import HTTPServerSpec
from promptise.cache import SemanticCache
from promptise.conversations import PostgresConversationStore
from promptise.memory import ChromaProvider
# Shared across all requests (stateless, thread-safe)
scanner = PromptiseSecurityScanner.default()
scanner.warmup()
conversation_store = PostgresConversationStore(dsn="${DATABASE_URL}")
async def handle_user_message(user_jwt: str, user_id: str, session_id: str, message: str):
"""Handle a single user message with full multi-user isolation.
This is what your FastAPI/Django/Flask endpoint calls.
The user's JWT flows end-to-end: backend → agent → MCP servers.
"""
# Build agent with user's JWT for MCP auth
# The agent discovers all 30 tools automatically
agent = await build_agent(
model="openai:gpt-5-mini",
servers={
# All 5 routers from one server, or split across services
"customer_ops": HTTPServerSpec(
url="http://customer-ops:8000/mcp",
bearer_token=user_jwt, # User's JWT → MCP server validates → guards enforce
),
},
instructions=(
"You are a customer support agent for Acme Corp. "
"Help users with their accounts, orders, billing, and support tickets. "
"Be concise and professional. Never share data from other customers."
),
# 30 tools discovered, but only 5-8 sent per request (40-70% token savings)
optimize_tools="semantic",
# Each user gets isolated conversation history
conversation_store=conversation_store,
conversation_max_messages=50,
# Each user gets isolated cache (same question, different user = cache miss)
cache=SemanticCache(scope="per_user", default_ttl=1800),
# Shared memory across all users (agent learns from all interactions)
memory=ChromaProvider(persist_directory=".promptise/memory"),
# Block injection attacks on input, redact PII/credentials on output
guardrails=scanner,
observe=True,
)
try:
# CallerContext carries identity through every layer:
# guardrails, cache scoping, observability, conversation ownership
caller = CallerContext(
user_id=user_id,
bearer_token=user_jwt,
roles={"support_user"},
metadata={"session_id": session_id},
)
# chat() handles the full pipeline:
# 1. Ownership check (is this user_id allowed to access this session?)
# 2. Load conversation history from PostgreSQL
# 3. Input guardrails → memory search → cache check
# 4. Semantic tool selection (pick 5-8 of 30 tools)
# 5. LLM invocation with enriched context
# 6. MCP tool calls (JWT validated, guards enforced, audit logged)
# 7. Output guardrails (PII/credential redaction)
# 8. Cache store (under user:user-42 scope)
# 9. Persist messages to PostgreSQL with ownership
response = await agent.chat(
message,
session_id=session_id,
user_id=user_id,
caller=caller,
)
return response
finally:
await agent.shutdown()
What happens on each message:
- User's JWT is attached to MCP server connections
CallerContextis stored in async context for all downstream access- Input guardrails scan for injection attacks → block if detected
- Memory searches for relevant context → injected into system prompt
- Semantic cache checks for similar query from this user → instant return if hit
- Conversation history loaded from PostgreSQL → ownership verified
- Semantic tool selection picks 5-8 relevant tools from 30 → saves 40-70% tokens
- LLM invoked with enriched context (memory + history + selected tools)
- Agent calls MCP tools → JWT validated → guards enforced → audit logged
- Output guardrails scan response → PII/credentials redacted
- Response cached under
user:{user_id}scope - Messages persisted to PostgreSQL with ownership stamp
- Observability records everything with correlation ID
Security Architecture Summary¶
| Layer | Feature | What it protects |
|---|---|---|
| Transport | JWT/OAuth/API key auth | Identifies every caller |
| Authorization | Guards (role, scope, client ID) | Controls who can call which tools |
| Session isolation | SessionState per client | Prevents cross-client state leakage |
| Conversation ownership | _enforce_ownership() |
Prevents cross-user conversation access |
| Cache isolation | per_user scope key |
Prevents cross-user cache hits |
| Input protection | Prompt injection detection | Blocks LLM manipulation attempts |
| Output protection | PII/credential redaction | Prevents sensitive data leakage |
| Audit trail | HMAC-chained audit log | Tamper-evident record of every action |
| Secret management | Per-process SecretScope | Credentials isolated, TTL-expired, zero-filled |
What's Next¶
Reference documentation:
- Guardrails — all detector types, 165+ patterns, ML models, content safety
- Semantic Cache — backends, embedding providers, scope modes, GDPR purge
- Conversations — all 4 stores, ownership model, custom stores
- Memory — 3 providers, auto-injection, sanitization
MCP server reference:
- Auth & Security — JWT, OAuth, API keys, guards, audit logging
- Routers & Middleware — middleware chain, DI, session state
Other guides:
- Building AI Agents — the core agent that powers every process
- Building Production MCP Servers — the tool servers your agents connect to
- Building Agentic Runtime Systems — autonomous agents with triggers, governance, distributed coordination