Skip to content

Building AI Agents

Build a production-ready AI agent in 9 incremental steps. Each step adds one capability -- from a bare agent that calls tools, to a fully observable agent with memory, sandboxed code execution, and cross-agent delegation.

This is the recommended starting point for building agents

This guide walks you through building a complete agent step by step. For deep reference on individual features, see the Building Agents, Memory, Sandbox, and Observability pages.

What You'll Build

An AI agent that connects to MCP servers, discovers tools automatically, remembers context across conversations, executes code safely in a sandbox, observes every action for debugging and compliance, and delegates work to peer agents. One function call creates it. Every capability is opt-in.

Concepts

Promptise agents are built around three ideas:

  1. MCP-first tool discovery -- You point the agent at one or more MCP servers. On startup it connects to every server, lists all available tools, and converts them into typed tools automatically. No manual wiring.
  2. Opt-in capabilities -- Observability, memory, sandbox execution, cross-agent delegation, and prompt flows are all disabled by default. Enable each one with a single parameter.
  3. Model independence -- Any LLM model string ("openai:gpt-5-mini", "anthropic:claude-sonnet-4.5", "ollama:llama3"), any LangChain BaseChatModel, or any Runnable. Change one string, nothing else moves.

Step 1: Minimal Agent

Start with the simplest possible agent -- a model connected to an MCP server:

import asyncio
from promptise import build_agent
from promptise.config import HTTPServerSpec

async def main():
    agent = await build_agent(
        servers={
            "weather": HTTPServerSpec(url="http://localhost:8000/mcp"),
        },
        model="openai:gpt-5-mini",
    )

    result = await agent.ainvoke({
        "messages": [{"role": "user", "content": "What is the weather in Zurich?"}]
    })
    print(result["messages"][-1].content)
    await agent.shutdown()

asyncio.run(main())

build_agent() connects to the MCP server, discovers all available tools, converts their schemas, and returns a ready-to-use PromptiseAgent. The agent decides which tools to call based on the user's message.


Step 2: Multiple Servers and Instructions

Connect to multiple MCP servers and provide a system prompt:

from promptise.config import HTTPServerSpec, StdioServerSpec

agent = await build_agent(
    servers={
        "weather": HTTPServerSpec(url="http://localhost:8000/mcp"),
        "files": StdioServerSpec(command="python", args=["-m", "file_server"]),
        "database": HTTPServerSpec(
            url="https://db-api.internal/mcp",
            bearer_token="your-jwt-token",
        ),
    },
    model="openai:gpt-5-mini",
    instructions=(
        "You are a data analyst. Use the weather API for forecasts, "
        "the file server for reading reports, and the database for queries. "
        "Always cite your data sources."
    ),
)

The agent discovers tools from all three servers and has them available simultaneously. Tool names are automatically namespaced to avoid conflicts.

Switch models with one line:

# OpenAI
agent = await build_agent(servers=servers, model="openai:gpt-5-mini")

# Anthropic
agent = await build_agent(servers=servers, model="anthropic:claude-sonnet-4.5")

# Google
agent = await build_agent(servers=servers, model="google_genai:gemini-2.0-flash")

# Local Ollama
agent = await build_agent(servers=servers, model="ollama:llama3")

# Any LangChain model instance
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-5-mini", temperature=0)
agent = await build_agent(servers=servers, model=model)

Step 3: Memory

Give your agent persistent memory. Before every invocation, the agent automatically searches for relevant context and injects it into the system prompt.

from promptise.memory import ChromaProvider

# ChromaDB for local persistent vector search
memory = ChromaProvider(
    collection_name="analyst_memory",
    persist_directory=".promptise/chroma",
)

agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    memory=memory,
    memory_auto_store=True,  # Automatically store each exchange
)

# First conversation
result = await agent.ainvoke({
    "messages": [{"role": "user", "content": "The Q3 report shows 15% revenue growth."}]
})

# Later conversation -- agent remembers Q3 data
result = await agent.ainvoke({
    "messages": [{"role": "user", "content": "How did Q3 compare to projections?"}]
})
# Agent automatically retrieves the Q3 context from memory

Three providers ship with the framework:

Provider Use case Persistence
InMemoryProvider Testing and prototyping In-process only
ChromaProvider Local vector search Disk (configurable)
Mem0Provider Enterprise graph search External service
# In-memory (testing)
from promptise.memory import InMemoryProvider
memory = InMemoryProvider()

# ChromaDB (local, persistent)
from promptise.memory import ChromaProvider
memory = ChromaProvider(collection_name="my_agent", persist_directory="./data")

# Mem0 (enterprise)
from promptise.memory import Mem0Provider
memory = Mem0Provider(api_key="...", user_id="analyst-1")

Step 4: Observability

Enable full observability with a single flag. Every LLM turn, tool call, token count, latency, retry, and error is captured automatically.

agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    observe=True,  # That's it
)

result = await agent.ainvoke({
    "messages": [{"role": "user", "content": "Analyze the sales data"}]
})

# Get aggregate statistics
stats = agent.get_stats()
print(f"Total tokens: {stats['total_tokens']}")
print(f"Tool calls: {stats['tool_calls']}")
print(f"Duration: {stats['total_duration_ms']}ms")

# Generate an interactive HTML report
agent.generate_report("report.html", title="Sales Analysis")

For full control, pass an ObservabilityConfig:

from promptise.observability_config import ObservabilityConfig, ObserveLevel, TransporterType

agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    observe=ObservabilityConfig(
        level=ObserveLevel.FULL,           # OFF, BASIC, STANDARD, FULL
        session_name="production-audit",
        record_prompts=True,
        transporters=[
            TransporterType.HTML,           # Interactive HTML report
            TransporterType.STRUCTURED_LOG, # JSONL file
            TransporterType.CONSOLE,        # Live terminal output
            TransporterType.PROMETHEUS,     # Prometheus /metrics
            TransporterType.OTEL,           # OpenTelemetry spans
            TransporterType.WEBHOOK,        # HTTP POST on events
        ],
        output_dir="./reports",
        log_file="./logs/agent.jsonl",
    ),
)

Eight transporters available:

Transporter Output Use case
HTML Interactive HTML report Post-analysis, sharing
JSON JSON file Programmatic analysis
STRUCTURED_LOG JSONL file Log aggregation (ELK, Datadog)
CONSOLE Live terminal output Development debugging
PROMETHEUS Prometheus metrics Infrastructure monitoring
OTEL OpenTelemetry spans Distributed tracing
WEBHOOK HTTP POST Real-time notifications
CALLBACK Python callable Custom processing

Step 5: Sandboxed Code Execution

When agents write and execute code, run it inside a multi-layer security sandbox:

# Enable with defaults
agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    sandbox=True,
)

# Or configure resource limits and network mode
agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    sandbox={
        "network_mode": "restricted",  # "none", "restricted", "full"
        "memory_limit": "512M",
        "cpu_limit": 2,
        "timeout": 120,
    },
)

When sandbox is enabled, 5 tools are automatically injected into the agent:

Tool Description
execute_code Run Python code in the sandbox
read_file Read a file from the sandbox workspace
write_file Write a file to the sandbox workspace
list_files List files in the sandbox workspace
install_package Install a pip package in the sandbox

Security layers applied automatically:

  • Docker isolation -- code runs in a container, not on your host
  • Seccomp filtering -- blocks dangerous syscalls
  • Capability dropping -- removes ~40 Linux capabilities
  • Read-only rootfs -- only the workspace directory is writable
  • Resource limits -- CPU, memory, and time constraints
  • Network isolation -- configurable per agent (none/restricted/full)
  • Optional gVisor -- userspace kernel for additional isolation

Step 6: Cross-Agent Delegation

Let agents ask questions to peer agents via HTTP with JWT authentication:

from promptise.cross_agent import CrossAgent

agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    cross_agents={
        "researcher": CrossAgent(
            url="http://research-agent:8001",
            jwt_secret="shared-secret",
            description="Expert at finding and summarizing research papers.",
        ),
        "coder": CrossAgent(
            url="http://code-agent:8002",
            jwt_secret="shared-secret",
            description="Expert at writing and reviewing code.",
        ),
    },
)

This injects ask_agent_researcher and ask_agent_coder tools. The agent decides when to delegate:

User: "Find recent papers on transformer architectures and write a summary script"

Agent thinks: "I need the researcher for papers and the coder for the script"
→ calls ask_agent_researcher("Find recent papers on transformer architectures")
→ gets research results
→ calls ask_agent_coder("Write a Python script that summarizes these papers: ...")
→ combines results and responds

Broadcast to multiple peers:

# From code, send a question to all peers simultaneously
results = await agent.broadcast(
    "What is the current status of your subsystem?",
    timeout=30.0,
)
# Returns dict of agent_name → response, with graceful degradation on timeout

Step 7: SuperAgent Files

Define an entire agent declaratively in a .superagent YAML file:

# analyst.superagent
name: data-analyst
model: openai:gpt-5-mini
instructions: |
  You are a senior data analyst. Use available tools to query databases,
  generate visualizations, and produce reports. Always cite data sources.

servers:
  database:
    type: http
    url: http://localhost:8080/mcp
    bearer_token: "${DB_TOKEN}"
  files:
    type: http
    url: http://localhost:8081/mcp

memory:
  provider: chroma
  collection: analyst_memory
  persist_directory: .promptise/chroma
  auto_store: true

observability:
  level: standard
  transporters: [html, structured_log]
  output_dir: ./reports

sandbox:
  enabled: true
  network_mode: restricted
  memory_limit: 1G

cross_agents:
  researcher:
    url: http://research-agent:8001
    jwt_secret: "${CROSS_AGENT_SECRET}"
    description: Expert at finding research papers.

Load and run:

from promptise.superagent import load_superagent

agent = await load_superagent("analyst.superagent")
result = await agent.ainvoke({
    "messages": [{"role": "user", "content": "Analyze Q3 revenue trends"}]
})

Environment variables resolve automatically with ${VAR} and ${VAR:-default} syntax.


Step 8: Conversation Flows

Evolve the system prompt across conversation phases for sophisticated multi-turn agents:

from promptise.prompts import ConversationFlow, Phase

flow = ConversationFlow(
    phases={
        "greeting": Phase(
            blocks=["identity", "greeting_instructions"],
            transitions={"investigation": lambda ctx: ctx.turn > 1},
        ),
        "investigation": Phase(
            blocks=["identity", "investigation_rules", "tool_context"],
            transitions={"resolution": lambda ctx: ctx.state.get("has_diagnosis")},
        ),
        "resolution": Phase(
            blocks=["identity", "resolution_rules", "output_format"],
        ),
    },
    initial_phase="greeting",
)

agent = await build_agent(
    servers=servers,
    model="openai:gpt-5-mini",
    flow=flow,
)

The system prompt the agent sees on turn 1 is different from turn 5 and turn 10. Blocks appear and disappear as the conversation progresses. See the Prompt Engineering guide for the full prompt system.


Step 9: Conversation Persistence

Every chat application needs to persist conversations. The conversation store handles loading history, saving new exchanges, and managing sessions -- one parameter:

from promptise import build_agent
from promptise.config import HTTPServerSpec
from promptise.conversations import SQLiteConversationStore, generate_session_id

# Pick a backend: SQLite for dev, Postgres for production, Redis for ephemeral
store = SQLiteConversationStore("conversations.db")

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers={
        "tasks": HTTPServerSpec(url="http://localhost:8080/mcp"),
    },
    conversation_store=store,
    conversation_max_messages=200,  # Rolling window (0 = unlimited)
)

# Use secure session IDs — never user-controlled or predictable
sid = generate_session_id()  # "sess_a1b2c3d4e5f6..."

# chat() handles everything: ownership check → load history → invoke → persist
response = await agent.chat(
    "Create a task to review the PR",
    session_id=sid,
    user_id="user-42",  # Locks this session to user-42
)
response = await agent.chat(
    "What task did I just create?",
    session_id=sid,
    user_id="user-42",  # Same user — allowed
)

# Session management (all operations verify ownership)
sessions = await agent.list_sessions(user_id="user-42")
await agent.update_session(sid, calling_user_id="user-42", title="PR Review")
await agent.delete_session(sid, user_id="user-42")

await agent.shutdown()  # Closes store connections

Four built-in stores, or implement the ConversationStore protocol for any backend:

Store Backend Use case
InMemoryConversationStore Dict Testing
SQLiteConversationStore aiosqlite Local dev
PostgresConversationStore asyncpg Production
RedisConversationStore redis.asyncio Ephemeral sessions

Conversation persistence works alongside memory (Step 3). Memory provides semantic search across all sessions ("what do I know about this user?"). The conversation store provides exact message replay within a session ("what did they say 3 messages ago?"). See Conversation Persistence for the full reference.


Complete Example

A fully-featured agent with MCP tools, memory, observability, sandbox, and cross-agent delegation:

import asyncio
from promptise import build_agent
from promptise.config import HTTPServerSpec
from promptise.memory import ChromaProvider
from promptise.cross_agent import CrossAgent
from promptise.observability_config import ObservabilityConfig, ObserveLevel, TransporterType

async def main():
    # Memory provider
    memory = ChromaProvider(
        collection_name="analyst",
        persist_directory=".promptise/chroma",
    )

    # Build the agent
    agent = await build_agent(
        servers={
            "database": HTTPServerSpec(
                url="http://localhost:8080/mcp",
                bearer_token="your-jwt-token",
            ),
            "files": HTTPServerSpec(url="http://localhost:8081/mcp"),
        },
        model="openai:gpt-5-mini",
        instructions=(
            "You are a senior data analyst. Query databases, analyze data, "
            "write scripts when needed, and produce clear reports. "
            "Always cite your data sources."
        ),
        memory=memory,
        memory_auto_store=True,
        observe=ObservabilityConfig(
            level=ObserveLevel.STANDARD,
            transporters=[TransporterType.HTML, TransporterType.STRUCTURED_LOG],
            output_dir="./reports",
        ),
        sandbox={"network_mode": "restricted", "memory_limit": "1G"},
        cross_agents={
            "researcher": CrossAgent(
                url="http://research-agent:8001",
                jwt_secret="shared-secret",
                description="Expert at finding research papers.",
            ),
        },
    )

    # Run a conversation
    result = await agent.ainvoke({
        "messages": [{"role": "user", "content": "Analyze Q3 revenue by region"}]
    })
    print(result["messages"][-1].content)

    # Check stats
    stats = agent.get_stats()
    print(f"\nTokens used: {stats['total_tokens']}")
    print(f"Tool calls: {stats['tool_calls']}")

    # Generate report
    agent.generate_report("analysis_report.html")

    await agent.shutdown()

asyncio.run(main())

CLI

The Promptise CLI provides quick access to common agent operations:

# Run a SuperAgent file
promptise agent analyst.superagent

# Validate a SuperAgent file
promptise validate analyst.superagent

# List tools discovered from MCP servers
promptise list-tools --server http://localhost:8080/mcp

# Run an agent interactively
promptise run --model openai:gpt-5-mini --server http://localhost:8080/mcp

# Serve an agent over HTTP
promptise serve analyst.superagent --port 8001

Step 10 — Custom Reasoning Patterns

Replace the default ReAct loop with a specialized reasoning pattern:

from promptise.engine import PromptGraph, PromptNode, NodeFlag
from promptise.engine.reasoning_nodes import PlanNode, ThinkNode, SynthesizeNode

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers=my_servers,
    # Use a built-in pattern
    agent_pattern="deliberate",  # Think → Plan → Act → Observe → Reflect
)

# Or build a custom graph with reasoning nodes
graph = PromptGraph("my-agent", nodes=[
    PlanNode("plan", is_entry=True),
    PromptNode("act", inject_tools=True, flags={NodeFlag.RETRYABLE}),
    ThinkNode("think"),
    SynthesizeNode("answer", is_terminal=True),
])

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers=my_servers,
    agent_pattern=graph,
)

7 built-in patterns: react (default), peoatr, research, autonomous, deliberate, debate, pipeline. See Reasoning Patterns.


Error Handling & Troubleshooting

Agent Invocation Errors

try:
    result = await agent.ainvoke(
        {"messages": [{"role": "user", "content": "Do something complex"}]}
    )
except Exception as exc:
    print(f"Agent failed: {exc}")
    # Common causes: LLM API down, all MCP servers unreachable, timeout

MCP Server Connection Failures

# Check which tools were discovered
stats = agent.get_stats()
print(f"Tools available: {stats.get('tools_count', 0)}")

# If 0 tools — server connection failed. Check:
# 1. Is the server running? (stdio: is the command correct? http: is the URL reachable?)
# 2. Are credentials valid? (bearer_token, api_key)
# 3. Is the server healthy? (check server logs)

Timeout Handling

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers=my_servers,
    max_agent_iterations=25,   # Limit total reasoning steps
    timeout=120.0,             # Hard timeout in seconds
)

Guardrail Rejections

result = await agent.ainvoke(input)
last_msg = result["messages"][-1].content

# If guardrails blocked the input, the response will contain the rejection reason
# Check observability for details:
report = agent.generate_report()
print(report)

What's Next

Go deeper on each feature:

Feature used in this guide Deep reference
build_agent(), PromptiseAgent Building Agents
Server specs and connections Server Configuration
Reasoning patterns and custom graphs Reasoning Patterns
Reasoning Graph engine Reasoning Graph
Memory providers and auto-injection Memory
Observability levels and transporters Observability
Sandbox configuration and security Sandbox
Cross-agent delegation Cross-Agent Delegation
SuperAgent YAML files SuperAgent Files
CLI commands CLI Reference

Other guides: