Building Agents¶
Create intelligent agents that connect to MCP servers, discover tools automatically, and execute tasks with full observability.
Quick Example¶
import asyncio
from promptise import build_agent
from promptise.config import HTTPServerSpec
async def main():
agent = await build_agent(
servers={
"weather": HTTPServerSpec(url="http://localhost:8000/mcp"),
},
model="openai:gpt-5-mini",
)
result = await agent.ainvoke({
"messages": [{"role": "user", "content": "What is the weather in Zurich?"}]
})
print(result["messages"][-1].content)
await agent.shutdown()
asyncio.run(main())
Concepts¶
Promptise agents are built around three ideas:
- MCP-first tool discovery -- You point the agent at one or more MCP servers via the
serversdict. On startup it connects to every server, lists all available tools, and converts them into LangChain-compatible tools automatically. - Opt-in capabilities -- Observability, memory, sandbox execution, cross-agent delegation, and prompt flows are all disabled by default. Enable each one with a single parameter and the agent wires everything together.
- Unified return type --
build_agent()always returns aPromptiseAgent. It wraps the underlying LangGraph ReAct agent and exposes a consistent interface regardless of which capabilities are active.
Walkthrough¶
The build_agent() Function¶
build_agent() is the primary entry point for creating agents. It is an async function that connects to MCP servers, discovers tools, and returns a ready-to-use agent.
from promptise import build_agent
from promptise.config import StdioServerSpec, HTTPServerSpec
agent = await build_agent(
# Required -----------------------------------------------
servers={
"files": StdioServerSpec(command="python", args=["-m", "file_server"]),
"api": HTTPServerSpec(url="https://api.example.com/mcp"),
},
model="openai:gpt-5-mini",
# Optional -----------------------------------------------
instructions="You are a helpful data analyst.",
trace_tools=True, # print every tool call to stdout
observe=True, # enable observability
memory=None, # MemoryProvider instance
memory_auto_store=False, # auto-persist exchanges
sandbox=True, # sandboxed code execution
observer=None, # shared ObservabilityCollector
observer_agent_id=None, # agent id for shared observer
cross_agents=None, # peer agents for delegation
extra_tools=[], # additional BaseTool instances
flow=None, # ConversationFlow for multi-turn prompts
)
Parameter Reference¶
| Parameter | Type | Default | Description |
|---|---|---|---|
servers |
Mapping[str, ServerSpec] |
required | Named MCP server connections. See Server Configuration. |
model |
str \| BaseChatModel \| Runnable |
required | LangChain model string (e.g. "openai:gpt-5-mini"), a chat model instance, or any Runnable. |
instructions |
str \| Prompt \| PromptSuite \| None |
Built-in prompt | System prompt. Accepts a plain string, a Prompt, or a PromptSuite. |
trace_tools |
bool |
False |
Print each tool invocation and result to stdout. |
observe |
bool \| ObservabilityConfig \| None |
None |
Enable observability. Pass True for defaults or an ObservabilityConfig for full control. |
memory |
MemoryProvider \| dict \| None |
None |
Memory backend. Automatically searches and injects relevant context before each invocation. |
memory_auto_store |
bool |
False |
When True, automatically stores each user/assistant exchange in memory. |
sandbox |
bool \| dict \| None |
None |
Enable sandboxed code execution. True uses defaults; a dict provides custom config. |
observer |
Any \| None |
None |
Pass an existing ObservabilityCollector to reuse across multiple agents. Mutually exclusive with observe. |
observer_agent_id |
str \| None |
None |
Agent identifier for the shared observer's timeline entries. |
cross_agents |
Mapping[str, CrossAgent] \| None |
None |
Peer agents exposed as ask_agent_<name> tools. See Cross-Agent Delegation. |
extra_tools |
list[BaseTool] \| None |
None |
Additional LangChain tools appended alongside MCP-discovered tools. |
flow |
ConversationFlow \| None |
None |
A conversation flow that evolves the system prompt across turns. |
guardrails |
PromptiseSecurityScanner \| None |
None |
Security scanner for input/output. Blocks injection attacks, redacts PII and credentials. See Guardrails. |
optimize_tools |
str \| ToolOptimizationConfig \| None |
None |
Semantic tool selection to reduce token costs. Pass "semantic" for defaults. See Tool Optimization. |
conversation_store |
ConversationStore \| None |
None |
Persistent conversation history. See Conversations. |
conversation_max_messages |
int |
0 |
Max messages to retain per session (0 = unlimited). |
cache |
SemanticCache \| None |
None |
Semantic response cache. Serves similar queries from cache, reducing LLM costs by 30-50%. See Cache. |
approval |
ApprovalPolicy \| None |
None |
Human-in-the-loop approval for sensitive tools. See Approval. |
events |
EventNotifier \| None |
None |
Webhook/event notifications. Emits structured events on invocation, tool, guardrail, budget, and process events. See Events. |
max_invocation_time |
float |
0 |
Maximum seconds per invocation. Raises TimeoutError and emits invocation.timeout event when exceeded. 0 = unlimited. |
The PromptiseAgent Class¶
build_agent() returns a PromptiseAgent. This is the unified agent object with opt-in capabilities -- disabled features no-op or return sensible defaults, so you never need to check what is active.
Invocation Methods¶
# Async invocation (recommended)
result = await agent.ainvoke({
"messages": [{"role": "user", "content": "Summarize sales.csv"}]
})
# Async streaming
async for chunk in agent.astream({
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}):
print(chunk)
# Synchronous invocation (convenience wrapper)
result = agent.invoke({
"messages": [{"role": "user", "content": "Hello"}]
})
Sync invocation and memory
invoke() delegates to ainvoke() internally when memory is enabled because memory search requires async I/O. If a running event loop is already active (e.g. inside Jupyter), memory injection is skipped for the sync path. Use ainvoke() in async contexts to ensure memory always works.
Observability Methods¶
When observe=True is passed to build_agent(), the agent records every LLM turn, tool call, and token count.
agent = await build_agent(
servers=servers, model="openai:gpt-5-mini", observe=True
)
result = await agent.ainvoke({"messages": [{"role": "user", "content": "Hello"}]})
# Retrieve aggregate statistics
stats = agent.get_stats()
print(stats)
# Generate an interactive HTML report
report_path = agent.generate_report("report.html", title="My Agent Report")
get_stats() returns an empty dict when observability is disabled. generate_report() raises RuntimeError if observability is not enabled.
Programmatic Metrics¶
When observe=True, the agent internally creates a PromptiseCallbackHandler that captures all LLM and tool events. Use get_stats() to access aggregate metrics:
agent = await build_agent(
servers=servers, model="openai:gpt-5-mini", observe=True
)
result = await agent.ainvoke({"messages": [{"role": "user", "content": "Hello"}]})
stats = agent.get_stats()
stats["total_tokens"] # Total tokens used (prompt + completion)
stats["total_prompt_tokens"] # Total input tokens
stats["total_completion_tokens"] # Total output tokens
stats["llm_call_count"] # Number of LLM calls
stats["tool_call_count"] # Number of tool calls
stats["error_count"] # Number of errors
stats["retry_count"] # Number of retries
See Observability for full details on the callback handler, transporter configuration, and HTML report generation.
Lifecycle: shutdown()¶
Always call shutdown() when the agent is no longer needed. It closes MCP connections, flushes observability transporters, and releases the memory provider.
shutdown() is always safe to call -- it no-ops for features that are not enabled.
Combining Capabilities¶
Capabilities compose naturally. Here is an agent with observability, memory, and cross-agent delegation all enabled at once:
from promptise import build_agent
from promptise.config import HTTPServerSpec
from promptise.cross_agent import CrossAgent
from promptise.memory import InMemoryProvider
# Build a peer agent first
researcher = await build_agent(
servers={"search": HTTPServerSpec(url="http://localhost:8001/mcp")},
model="openai:gpt-5-mini",
)
# Build the main agent with all capabilities
agent = await build_agent(
servers={"files": HTTPServerSpec(url="http://localhost:8002/mcp")},
model="openai:gpt-5-mini",
observe=True,
memory=InMemoryProvider(),
memory_auto_store=True,
cross_agents={
"researcher": CrossAgent(agent=researcher, description="Web research"),
},
)
result = await agent.ainvoke({
"messages": [{"role": "user", "content": "Research and summarize recent AI papers"}]
})
await agent.shutdown()
await researcher.shutdown()
Security Guardrails¶
Protect your agent from prompt injection attacks on input and PII/credential leakage on output. The guardrails parameter accepts a PromptiseSecurityScanner — a composable scanner built from detection heads.
Basic — block injections, redact PII and credentials¶
from promptise import build_agent, PromptiseSecurityScanner
scanner = PromptiseSecurityScanner.default()
scanner.warmup() # Pre-load models at startup, not on first message
agent = await build_agent(
servers=servers, model="openai:gpt-5-mini", guardrails=scanner,
)
With this configuration:
- Input: Prompt injection attempts are blocked before reaching the LLM.
GuardrailViolationis raised. - Output: Credit card numbers, SSNs, API keys, and other sensitive data are replaced with labels like
[CREDIT_CARD_VISA],[AWS_ACCESS_KEY].
Custom — pick specific detectors¶
from promptise import (
PromptiseSecurityScanner,
InjectionDetector, PIIDetector, CredentialDetector, CustomRule,
PIICategory, CredentialCategory,
)
scanner = PromptiseSecurityScanner(
detectors=[
InjectionDetector(threshold=0.9),
PIIDetector(categories={PIICategory.CREDIT_CARDS, PIICategory.SSN, PIICategory.EMAIL}),
CredentialDetector(categories={CredentialCategory.AWS, CredentialCategory.OPENAI}),
],
custom_rules=[
CustomRule(name="internal_id", pattern=r"EMP-\d{6}", description="Employee ID"),
],
)
agent = await build_agent(
servers=servers, model="openai:gpt-5-mini", guardrails=scanner,
)
Handling blocked input¶
from promptise.guardrails import GuardrailViolation
try:
result = await agent.ainvoke({"messages": [{"role": "user", "content": user_input}]})
except GuardrailViolation as v:
print(f"Blocked: {v.report.blocked[0].description}")
See Guardrails for the full reference — all 6 detector types, 165+ built-in patterns, model configuration, air-gapped deployments, and content safety classification.
CallerContext — Per-Request Identity¶
Every ainvoke() and chat() call can carry a CallerContext that identifies the user making the request. This is the foundation for multi-user systems.
from promptise.agent import CallerContext
caller = CallerContext(
user_id="user-alice-001", # Scopes memory, cache, conversations
bearer_token="eyJhbGciOi...", # Forwarded to MCP servers as Authorization header
roles={"analyst", "viewer"}, # Agent-side role info
scopes={"read", "write"}, # Agent-side scope info
metadata={"team": "finance"}, # Custom metadata
)
result = await agent.ainvoke(input, caller=caller)
What CallerContext Does¶
| Field | Agent side | MCP server side |
|---|---|---|
user_id |
Scopes memory search, conversation history, semantic cache to this user | Not sent (stays on agent) |
bearer_token |
Forwarded as Authorization: Bearer <token> to every MCP server |
Validated by AuthMiddleware, extracted into ClientContext with roles/scopes/claims |
roles |
Available for agent-side logic via get_current_caller() |
Not sent — server extracts roles from the JWT |
scopes |
Available for agent-side logic | Not sent — server extracts scopes from the JWT |
metadata |
Custom data available to guardrails, hooks, events | Not sent |
Identity Flow: Agent → MCP Server¶
CallerContext(bearer_token="eyJ...")
→ MCPClient sets Authorization header
→ HTTP request to MCP server
→ AuthMiddleware validates JWT
→ Extracts roles, scopes, claims
→ Builds ClientContext
→ Guards check HasRole, HasScope
→ Handler receives ctx.client
Only the bearer_token crosses the wire. Everything else is extracted server-side from the JWT payload.
Per-User Isolation¶
When CallerContext is passed, these features automatically isolate per user:
agent = await build_agent(
...,
memory=ChromaProvider(...), # Memory scoped to caller.user_id
conversation_store=SQLiteStore(...), # Sessions owned by caller.user_id
cache=SemanticCache(), # Cache keyed by caller.user_id
observe=True, # Traces tagged with caller.user_id
)
# Alice and Bob have completely separate state
await agent.chat("question", session_id="s1", caller=alice_ctx)
await agent.chat("question", session_id="s2", caller=bob_ctx)
Accessing CallerContext in Custom Code¶
from promptise.agent import get_current_caller
# Inside guardrails, hooks, or custom tools:
caller = get_current_caller()
if caller:
print(caller.user_id) # "user-alice-001"
print(caller.roles) # {"analyst", "viewer"}
See CallerContext: Agent to MCP Identity for the complete end-to-end guide with JWT structure, guard reference, and server-side handler examples.
From Agent to Production: Governance & Runtime¶
build_agent() creates a request-response agent — you call it, it replies. That's perfect for chatbots and synchronous workflows. For autonomous agents that run continuously, respond to triggers, and need safety rails, you wrap the agent in an AgentProcess managed by the AgentRuntime.
The runtime adds three capability layers on top of any agent you build with build_agent():
Triggers — how the agent wakes up¶
Instead of calling ainvoke() manually, you configure triggers that fire automatically:
- Cron — run on a schedule (every 5 minutes, hourly, daily at midnight)
- Webhook — listen for HTTP POSTs from monitoring systems, CI/CD pipelines, or APIs
- File Watch — react to filesystem changes
- Event — subscribe to an in-process EventBus shared with other agents
- Message — receive topic-based messages from a MessageBroker
Multiple triggers can fire on the same agent. See Triggers.
Governance — guardrails against runaway behavior¶
Four opt-in subsystems, each disabled by default with zero overhead:
| Subsystem | What it prevents | Docs |
|---|---|---|
| Autonomy Budget | Tool-call runaway loops, unexpected costs, unbounded irreversible actions | Autonomy Budget |
| Behavioral Health | Stuck agents, repeating loops, empty responses, elevated error rates | Behavioral Health |
| Mission Model | Long-horizon trajectory drift from the agent's objective | Mission Model |
| Secret Scoping | Credential leakage, secrets in logs, keys stored as plaintext in memory | Secret Scoping |
from promptise.runtime import (
AgentRuntime, ProcessConfig, TriggerConfig,
BudgetConfig, HealthConfig, MissionConfig,
)
config = ProcessConfig(
model="openai:gpt-5-mini",
instructions="Monitor the data pipeline.",
servers={"tools": {"type": "http", "url": "http://localhost:8000/mcp"}},
triggers=[
TriggerConfig(type="webhook", webhook_path="/alerts", webhook_port=9090),
],
# Budget — prevent runaway tool calls
budget=BudgetConfig(
enabled=True,
max_tool_calls_per_run=20,
max_cost_per_day=50.0,
on_exceeded="pause",
),
# Health — catch stuck agents
health=HealthConfig(
enabled=True,
stuck_threshold=3,
on_anomaly="escalate",
),
# Mission — evaluate progress every 10 invocations
mission=MissionConfig(
enabled=True,
objective="Keep all services above 99.9% uptime.",
success_criteria="No unresolved P1 incidents for >15 minutes.",
eval_every=10,
),
)
async with AgentRuntime() as runtime:
await runtime.add_process("pipeline-observer", config)
await runtime.start_all()
# Process runs autonomously until stopped.
Persistence — journal, crash recovery, audit trail¶
The FileJournal records every state transition, trigger event, invocation, and tool call to an append-only JSONL file on disk. On crash, the ReplayEngine reconstructs the last known state so the agent can pick up where it left off.
The journal is also your compliance audit trail. Every decision the agent made is timestamped, structured, and greppable.
When to use the runtime¶
| If you're building... | Use |
|---|---|
| A chatbot or Q&A interface | build_agent() alone + chat() with conversation_store |
| A one-shot automation script | build_agent() alone + ainvoke() |
| A background worker that wakes on a schedule | AgentProcess with cron trigger |
| An incident responder that reacts to webhooks | AgentProcess with webhook trigger |
| A file processor that reacts to new files | AgentProcess with file_watch trigger |
| A multi-agent pipeline | AgentRuntime with multiple processes + event triggers |
| Any agent that spends real money autonomously | Always add BudgetConfig with on_exceeded="pause" |
See the full Agent Runtime docs for architecture, state machine, and deployment patterns.
API Summary¶
| Symbol | Import | Description |
|---|---|---|
build_agent() |
from promptise import build_agent |
Async factory that discovers tools and returns a PromptiseAgent. |
PromptiseAgent |
from promptise.agent import PromptiseAgent |
Unified agent with ainvoke(), astream(), invoke(), shutdown(), get_stats(), generate_report(). |
ServerSpec |
from promptise.config import ServerSpec |
Union type: StdioServerSpec \| HTTPServerSpec. |
CrossAgent |
from promptise.cross_agent import CrossAgent |
Peer agent wrapper with agent and description fields. |
CallerContext |
from promptise.agent import CallerContext |
Per-request identity: user_id, bearer_token, roles, scopes, metadata. |
get_current_caller() |
from promptise.agent import get_current_caller |
Get the CallerContext for the current invocation (from contextvar). |
PromptiseSecurityScanner |
from promptise import PromptiseSecurityScanner |
Composable security scanner for input/output guardrails. |
Model strings
Promptise uses LangChain's init_chat_model() under the hood, so any provider string it accepts works here: "openai:gpt-5-mini", "anthropic:claude-sonnet-4-20250514", "ollama:llama3", etc.
Empty server dict
Passing an empty servers={} is valid. The agent runs without MCP tools -- useful when you only need extra_tools or cross_agents.
Always await shutdown
Forgetting await agent.shutdown() can leak open connections to MCP servers. Use a try/finally block or wrap the agent lifecycle in an async context manager pattern.
What's Next?¶
Core agent capabilities
- Server Configuration --
StdioServerSpecandHTTPServerSpecin detail. - SuperAgent Files -- define agents declaratively in YAML.
- Cross-Agent Delegation -- multi-agent collaboration.
- Guardrails -- injection blocking, PII redaction, content safety, custom rules.
- Memory -- persistent memory with vector search.
- Conversations -- multi-user session persistence.
- Tool Optimization -- semantic tool selection for token savings.
- Observability -- track token usage and agent behavior.
Autonomous agents (Runtime)
- Agent Runtime -- architecture and concepts.
- Agent Processes --
ProcessConfigreference, lifecycle methods. - Triggers -- cron, webhook, event, file watch, message.
- Autonomy Budget -- limit tool calls, cost, irreversible actions.
- Behavioral Health -- detect stuck loops and anomalies.
- Mission Model -- mission-oriented execution.
- Journal and Recovery -- crash recovery via replay.