Tool Optimization¶
Reduce the token cost of MCP tool definitions sent to the LLM with every invocation.
from promptise import build_agent
from promptise.config import HTTPServerSpec
agent = await build_agent(
servers={"tools": HTTPServerSpec(url="http://localhost:8000/mcp")},
model="openai:gpt-5-mini",
optimize_tools=True, # ~40% token savings on tool definitions
)
The Problem¶
When an agent connects to MCP servers, every tool's full name, description, and JSON Schema is sent to the LLM on every invocation via function-calling. With 20-50+ tools, this costs 5,000-15,000+ tokens per call — just for tool definitions. This is the single largest token cost after the conversation itself.
How It Works¶
Tool optimization operates at two layers:
Layer 1: Static optimization (applied once at build time) — reduces the per-tool token cost without changing which tools are available:
- Schema minification — strips
descriptionmetadata from Pydantic Field schemas. The LLM still sees field names, types, and required status — but not verbose per-field descriptions. - Description truncation — caps tool-level descriptions at N characters (word boundary).
- Depth flattening — replaces deeply nested objects with
dictbeyond a configurable depth.
Layer 2: Semantic tool selection (applied per invocation) — the biggest optimization. Instead of sending all 50 tools, only the most relevant tools are selected for each query:
- At build time, all tool descriptions are embedded using a lightweight local model
- Before each
ainvoke(), the user's query is embedded and compared against tool descriptions - Only the top-K most relevant tools are included in the LLM call
- A
request_more_toolsfallback tool is included so the agent can self-recover if the semantic selection missed something
Quick Start¶
One-liner¶
agent = await build_agent(
servers=servers, model="openai:gpt-5-mini",
optimize_tools=True, # Uses "minimal" preset
)
Preset levels¶
# Static optimization only — safe, no behavioral change
agent = await build_agent(servers=servers, model="openai:gpt-5-mini", optimize_tools="minimal")
# Deeper minification + nested description stripping
agent = await build_agent(servers=servers, model="openai:gpt-5-mini", optimize_tools="standard")
# Full semantic selection — biggest savings, per-invocation tool filtering
agent = await build_agent(servers=servers, model="openai:gpt-5-mini", optimize_tools="semantic")
Fine-grained control¶
from promptise import ToolOptimizationConfig, OptimizationLevel
agent = await build_agent(
servers=servers,
model="openai:gpt-5-mini",
optimize_tools=ToolOptimizationConfig(
level=OptimizationLevel.SEMANTIC,
max_description_length=100,
semantic_top_k=5,
preserve_tools={"critical_payment_tool", "auth_tool"},
),
)
Three Preset Levels¶
| Setting | minimal |
standard |
semantic |
|---|---|---|---|
| Schema minification | Yes | Yes | Yes |
| Max description length | 200 chars | 150 chars | 100 chars |
| Strip nested descriptions | No | Yes | Yes |
| Max schema depth | No limit | 3 | 2 |
| Semantic selection | No | No | Yes |
| Semantic top-K | — | — | 8 |
| Fallback tool | — | — | Yes |
| Estimated savings | ~40% | ~55% | ~85% |
Configuration Reference¶
ToolOptimizationConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
level |
OptimizationLevel \| None |
None |
Preset level. Any explicit field overrides the preset. |
minify_schema |
bool \| None |
from preset | Strip description from Pydantic Field metadata |
max_description_length |
int \| None |
from preset | Truncate tool descriptions at N chars |
strip_nested_descriptions |
bool \| None |
from preset | Remove descriptions from nested model fields |
max_schema_depth |
int \| None |
from preset | Flatten nested objects beyond this depth to dict |
semantic_selection |
bool \| None |
from preset | Enable per-invocation semantic tool selection |
semantic_top_k |
int \| None |
from preset | Number of tools to select per invocation |
always_include_fallback |
bool \| None |
from preset | Include request_more_tools fallback |
embedding_model |
str \| None |
"all-MiniLM-L6-v2" |
Model name or local path for sentence-transformers |
preserve_tools |
set[str] \| None |
None |
Tool names that are never optimized and always selected |
OptimizationLevel¶
| Level | Value |
|---|---|
OptimizationLevel.MINIMAL |
"minimal" |
OptimizationLevel.STANDARD |
"standard" |
OptimizationLevel.SEMANTIC |
"semantic" |
Semantic Selection Details¶
Embedding model¶
Semantic selection embeds tool descriptions using sentence-transformers. The default model is all-MiniLM-L6-v2 (384 dimensions, no API key needed). On first use it downloads from HuggingFace Hub and caches locally at ~/.cache/huggingface/. After that, it runs fully offline.
Local / air-gapped deployments¶
For enterprises that cannot access external networks, download the model files once and point to a local directory:
# Download the model on a machine with internet
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2').save('/models/all-MiniLM-L6-v2')"
# Use the local model — zero network calls
agent = await build_agent(
servers=servers,
model="openai:gpt-5-mini",
optimize_tools=ToolOptimizationConfig(
level=OptimizationLevel.SEMANTIC,
embedding_model="/models/all-MiniLM-L6-v2",
),
)
You can also use any other sentence-transformers-compatible model:
optimize_tools=ToolOptimizationConfig(
level=OptimizationLevel.SEMANTIC,
embedding_model="BAAI/bge-small-en-v1.5", # or any local path
)
The request_more_tools fallback¶
When semantic selection is active, a special fallback tool is automatically included:
Tool: request_more_tools
Description: "If you need a tool that is not currently available, call this
to see all available tools and their descriptions."
If the semantic search missed a relevant tool, the agent can self-recover by calling this and then retrying. This ensures the agent is never stuck.
preserve_tools¶
Tools listed in preserve_tools are:
- Never optimized (full descriptions and schemas are preserved)
- Always included in semantic selection (regardless of relevance score)
Use this for critical tools that the agent must always have access to:
optimize_tools=ToolOptimizationConfig(
level=OptimizationLevel.SEMANTIC,
preserve_tools={"process_payment", "verify_identity"},
)
Combining with Other Features¶
Tool optimization composes with all other agent features:
agent = await build_agent(
servers=servers,
model="openai:gpt-5-mini",
optimize_tools="standard",
observe=True, # observability still tracks all tool calls
memory=provider, # memory injection happens before tool selection
sandbox=True, # sandbox tools are added after optimization
)
FAQ¶
Does optimization affect tool call quality?
Schema minification removes per-field descriptions but keeps field names, types, and required status. Most LLMs infer field purpose from well-named parameters (e.g., user_id, email, start_date). For tools with ambiguous parameter names, use preserve_tools to exempt them.
What if semantic selection picks the wrong tools?
The request_more_tools fallback lets the agent self-recover. It lists all available tools, and the agent can retry with the right one. In practice, semantic selection with top-K=8 covers most use cases.
Does this work with all LLM providers?
Yes. Tool optimization modifies the tool definitions before they reach LangChain — it works with OpenAI, Anthropic, Ollama, and any other provider.
What's Next?¶
- Building Agents — the
build_agent()function and all its parameters - Memory — persistent memory with vector search
- Observability — track token usage and see exactly what the LLM receives