Prompt Testing¶

A purpose-built test framework for validating prompt behavior. PromptTestCase provides mocked LLM calls, context injection, and specialized assertions for latency, schema compliance, and guard behavior.

Quick Example¶

import pytest
from promptise.prompts import prompt
from promptise.prompts.testing import PromptTestCase
from promptise.prompts.context import UserContext

@prompt(model="openai:gpt-5-mini")
async def analyze_sentiment(text: str) -> str:
    """Analyze the sentiment of this text: {text}"""

class TestSentiment(PromptTestCase):
    prompt = analyze_sentiment

    async def test_positive(self):
        with self.mock_llm("positive"):
            result = await self.run_prompt("I love this product!")
            self.assert_contains(result, "positive")

    async def test_with_context(self):
        with self.mock_context(user=UserContext(expertise_level="expert")):
            with self.mock_llm("expert analysis: strongly positive"):
                result = await self.run_prompt("Great product")
                self.assert_contains(result, "expert")

Concepts¶

PromptTestCase is a base class for prompt tests. It works with both pytest and unittest.TestCase patterns. Key features:

mock_llm(response) -- mock the LLM call to return a fixed string, eliminating API costs during testing.
mock_context(**contexts) -- temporarily inject world contexts into the prompt under test.
Specialized assertions -- purpose-built for prompt validation (schema, latency, guards, content).
Stats access -- retrieve PromptStats from the last call for latency assertions.

Setting Up Tests¶

Configure the Prompt Under Test¶

Set the prompt class attribute to the Prompt you want to test:

from promptise.prompts.testing import PromptTestCase

class TestMyPrompt(PromptTestCase):
    prompt = my_analysis_prompt

Running Prompts¶

Use run_prompt() to execute the prompt:

async def test_basic(self):
    with self.mock_llm("expected response"):
        result = await self.run_prompt("input text")
        assert result == "expected response"

Use run_with_stats() to get both the result and execution stats:

async def test_with_stats(self):
    with self.mock_llm("response"):
        result, stats = await self.run_with_stats("input text")
        self.assert_latency(stats, max_ms=5000)

Mocking¶

Mocking LLM Calls¶

mock_llm(response) patches the LLM to return a fixed string. No API calls are made:

async def test_mocked(self):
    with self.mock_llm("The sentiment is positive."):
        result = await self.run_prompt("I love it!")
        self.assert_contains(result, "positive")

Mocking Context¶

mock_context(**contexts) temporarily adds world contexts to the prompt:

from promptise.prompts.context import UserContext, BaseContext

async def test_with_user(self):
    with self.mock_context(
        user=UserContext(name="Alice", expertise_level="expert"),
        project=BaseContext(name="Q1 Report"),
    ):
        with self.mock_llm("Expert-level analysis..."):
            result = await self.run_prompt("quarterly data")
            self.assert_contains(result, "Expert")

The original world contexts are restored when the context manager exits.

Combining Mocks¶

Stack mock_llm and mock_context for isolated tests:

async def test_full_isolation(self):
    with self.mock_context(user=UserContext(expertise_level="beginner")):
        with self.mock_llm("Simple explanation of the data."):
            result = await self.run_prompt("complex technical data")
            self.assert_contains(result, "Simple")
            self.assert_not_contains(result, "error")

Assertions¶

Content Assertions¶

# Assert result contains a substring
self.assert_contains(result, "recommendation")

# Assert result does NOT contain a substring
self.assert_not_contains(result, "error")

Schema Assertions¶

Validate that the result matches an expected type (works with dataclasses, Pydantic models, and basic types):

from pydantic import BaseModel

class Analysis(BaseModel):
    summary: str
    confidence: float

self.assert_schema(result, Analysis)
self.assert_schema(result, str)
self.assert_schema(result, dict)

Latency Assertions¶

result, stats = await self.run_with_stats("test input")

# Assert latency is within limit
self.assert_latency(stats, max_ms=5000)

Context Provider Assertions¶

Verify that a specific context provider contributed to the prompt:

self.assert_context_provided(stats, "ToolContextProvider")

Guard Assertions¶

Assert that the result was not blocked by a guard:

self.assert_guard_passed(result)

Test Patterns¶

Testing with Real LLM Calls¶

For integration tests that validate actual LLM behavior (requires API key):

import os
import pytest

class TestRealLLM(PromptTestCase):
    prompt = analyze_sentiment

    @pytest.mark.skipif(
        not os.getenv("OPENAI_API_KEY"),
        reason="Requires OPENAI_API_KEY",
    )
    async def test_real_sentiment(self):
        result = await self.run_prompt("I absolutely love this product!")
        self.assert_contains(result, "positive")

Testing Guard Behavior¶

from promptise.prompts.guards import GuardError

class TestGuards(PromptTestCase):
    prompt = guarded_prompt

    async def test_blocked_input(self):
        with self.mock_llm("should not reach here"):
            with pytest.raises(GuardError) as exc_info:
                await self.run_prompt("Contains secret data")
            assert exc_info.value.guard_name == "content_filter"

    async def test_clean_input(self):
        with self.mock_llm("Analysis complete"):
            result = await self.run_prompt("Clean input data")
            self.assert_guard_passed(result)

Testing Different User Contexts¶

class TestExpertiseAdaptation(PromptTestCase):
    prompt = adaptive_prompt

    async def test_beginner_response(self):
        with self.mock_context(user=UserContext(expertise_level="beginner")):
            with self.mock_llm("Here is a simple explanation..."):
                result = await self.run_prompt("Explain neural networks")
                self.assert_contains(result, "simple")

    async def test_expert_response(self):
        with self.mock_context(user=UserContext(expertise_level="expert")):
            with self.mock_llm("The gradient descent optimization..."):
                result = await self.run_prompt("Explain neural networks")
                self.assert_contains(result, "gradient")

API Summary¶

PromptTestCase¶

Method	Returns	Description
`run_prompt(args, *kwargs)`	`Any`	Execute the prompt under test
`run_with_stats(args, *kwargs)`	`tuple[Any, PromptStats]`	Execute and return result + stats

Context Managers¶

Method	Description
`mock_llm(response)`	Mock the LLM to return a fixed string
`mock_context(**contexts)`	Temporarily inject world contexts

Assertions¶

Method	Description
`assert_schema(result, expected_type)`	Assert result matches type
`assert_contains(result, substring)`	Assert substring is present
`assert_not_contains(result, substring)`	Assert substring is absent
`assert_latency(stats, max_ms)`	Assert latency within limit
`assert_context_provided(stats, name)`	Assert context provider was used
`assert_guard_passed(result)`	Assert not blocked by a guard

Mock for speed, real for confidence

Use mock_llm() in unit tests for fast, deterministic feedback loops. Use real LLM calls in integration tests (guarded by API key checks) for end-to-end validation.

pytest compatibility

PromptTestCase works seamlessly with pytest. Define your test classes with PromptTestCase as a base, use async def test_* methods, and run with pytest --asyncio-mode=auto.

Set the prompt attribute

Calling run_prompt() without setting the prompt class attribute raises ValueError. Always set prompt = my_prompt on your test class.

What's Next¶

Guards -- Input/output validation to test against
Inspector -- Trace prompt assembly for debugging
Prompt Chaining -- Compose prompts into pipelines