Skip to content

RAG API Reference

The promptise.rag module provides a pluggable foundation for retrieval-augmented generation: base classes you subclass for your loader, chunker, embedder, and vector store, plus a pipeline orchestrator and LangChain tool adapter.

See the RAG guide for architecture overview and usage patterns.


Data Types

Document

promptise.rag.Document dataclass

A raw document loaded from a data source.

A Document is the input to the RAG pipeline. It holds the full text and metadata but has not been chunked or embedded yet.

Attributes:

Name Type Description
id str

Stable identifier for the document. Used for updates and deletes. If your source doesn't have one, hash the URL or file path.

text str

The full text content of the document.

metadata dict[str, Any]

Arbitrary structured metadata. Typical fields include source, url, title, author, created_at. Metadata is copied to every chunk derived from the document so retrieval results include provenance.

Chunk

promptise.rag.Chunk dataclass

A chunk of a document ready for embedding and retrieval.

Attributes:

Name Type Description
id str

Unique chunk identifier. Typically derived from the document id and chunk index (e.g. doc-1:chunk-0).

document_id str

The parent document's identifier.

text str

The chunk text content.

embedding list[float] | None

Vector representation. None until an embedder populates it. VectorStore implementations can assume this is set when :meth:VectorStore.add is called.

metadata dict[str, Any]

Copied from the parent document, plus any chunk-level fields the chunker adds (e.g. chunk_index, page).

RetrievalResult

promptise.rag.RetrievalResult dataclass

A single hit returned by :meth:VectorStore.search.

Attributes:

Name Type Description
chunk Chunk

The matched chunk with its metadata.

score float

Similarity score in [0.0, 1.0]. Higher is better. Implementations should normalize their backend's native score (cosine similarity, dot product, etc.) into this range.

metadata property

Shortcut for self.chunk.metadata.

text property

Shortcut for self.chunk.text.

IndexReport

promptise.rag.IndexReport dataclass

Summary of an :meth:RAGPipeline.index operation.

Attributes:

Name Type Description
documents_loaded int

Number of documents returned by the loader.

chunks_created int

Total chunks produced by the chunker.

chunks_stored int

Chunks successfully written to the vector store.

errors list[tuple[str, str]]

List of (document_id, error_message) tuples for documents that failed to index.

duration_seconds float

Total wall-clock time for the indexing run.


Base Classes

Override these to plug in your own RAG backend.

DocumentLoader

promptise.rag.DocumentLoader

Fetches raw documents from a data source.

Subclass this for each source you need to index: filesystem, S3, Confluence, Notion, Postgres, a REST API, etc. Implementations should return a list (or async iterator) of :class:Document instances with stable id fields so re-indexing is idempotent.

Example::

class MarkdownFolderLoader(DocumentLoader):
    def __init__(self, path: str):
        self.path = path

    async def load(self) -> list[Document]:
        from pathlib import Path
        docs = []
        for p in Path(self.path).rglob("*.md"):
            docs.append(Document(
                id=str(p.relative_to(self.path)),
                text=p.read_text(),
                metadata={"source": str(p), "filename": p.name},
            ))
        return docs
iter_load() async

Stream documents one at a time for memory-efficient indexing.

Default implementation calls :meth:load and yields from the result. Override for sources that are too large to fit in memory.

Yields:

Type Description
AsyncIterator[Document]

class:Document instances.

load() async

Return all documents this loader can provide.

Returns:

Type Description
list[Document]

A list of :class:Document. For very large sources, override

list[Document]

meth:iter_load instead and leave this method returning

list[Document]

list(await self.iter_load()).

Chunker

promptise.rag.Chunker

Splits a document into chunks suitable for embedding.

Subclass this if you need special chunking logic (preserve code blocks, respect markdown headers, split by semantic boundaries, etc). For most use cases, :class:RecursiveTextChunker is sufficient.

Implementations must preserve the parent document's metadata on every chunk and can add chunk-level metadata (e.g. chunk_index, char_start, char_end, page).

chunk(document) async

Split a single document into chunks.

Parameters:

Name Type Description Default
document Document

The document to chunk.

required

Returns:

Type Description
list[Chunk]

List of :class:Chunk instances. Each chunk's

list[Chunk]

document_id must equal document.id, and the

list[Chunk]

metadata should inherit from document.metadata.

Embedder

promptise.rag.Embedder

Converts text into vector embeddings.

Subclass this to wrap any embedding provider (OpenAI, Cohere, Voyage, sentence-transformers, a local model, etc).

Implementations should:

  • Support batching via :meth:embed for efficiency.
  • Return vectors of consistent dimensionality (check with :attr:dimension).
  • Handle retries and rate limiting internally.

Example::

class SentenceTransformersEmbedder(Embedder):
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self._model = SentenceTransformer(model_name)

    async def embed(self, texts: list[str]) -> list[list[float]]:
        loop = asyncio.get_running_loop()
        return await loop.run_in_executor(
            None, lambda: self._model.encode(texts).tolist()
        )

    @property
    def dimension(self) -> int:
        return self._model.get_sentence_embedding_dimension()
dimension property

Dimensionality of the embedding vectors.

Override to return the correct value for your model. Used by some vector stores to initialize collections with the right vector size.

embed(texts) async

Embed a batch of texts.

Parameters:

Name Type Description Default
texts list[str]

List of strings to embed.

required

Returns:

Type Description
list[list[float]]

A list of embedding vectors, one per input text. All vectors

list[list[float]]

must have the same length (:attr:dimension).

embed_one(text) async

Embed a single text. Default implementation calls :meth:embed.

VectorStore

promptise.rag.VectorStore

Persists embedded chunks and performs similarity search.

Subclass this for your vector database: Pinecone, Weaviate, Qdrant, Milvus, PGVector, Redis, Elasticsearch, or a custom backend.

All methods are async. Implementations are responsible for:

  • Serializing chunk metadata to whatever format the backend needs (all vector DBs have slightly different metadata constraints).
  • Converting native similarity scores into the [0.0, 1.0] range expected by :class:RetrievalResult.
  • Handling connection lifecycle (pooling, reconnection, shutdown).

Example::

class QdrantStore(VectorStore):
    def __init__(self, url: str, collection: str):
        from qdrant_client import AsyncQdrantClient
        self._client = AsyncQdrantClient(url=url)
        self._collection = collection

    async def add(self, chunks: list[Chunk]) -> None:
        points = [
            {
                "id": c.id,
                "vector": c.embedding,
                "payload": {"text": c.text, **c.metadata},
            }
            for c in chunks
        ]
        await self._client.upsert(self._collection, points=points)

    async def search(self, vector, *, limit=5, filter=None):
        res = await self._client.search(
            self._collection, query_vector=vector, limit=limit,
            query_filter=filter,
        )
        return [
            RetrievalResult(
                chunk=Chunk(
                    id=hit.id,
                    document_id=hit.payload.get("document_id", ""),
                    text=hit.payload.get("text", ""),
                    metadata={k: v for k, v in hit.payload.items() if k != "text"},
                ),
                score=hit.score,
            )
            for hit in res
        ]

    async def delete(self, chunk_ids: list[str]) -> None:
        await self._client.delete(self._collection, points_selector=chunk_ids)
add(chunks) async

Store a batch of chunks with their embeddings.

Parameters:

Name Type Description Default
chunks list[Chunk]

Chunks to store. Every chunk is expected to have a non-None embedding field.

required
close() async

Release resources. Default is a no-op.

count() async

Return the number of stored chunks. Optional to implement.

delete(chunk_ids) async

Remove chunks by id.

Parameters:

Name Type Description Default
chunk_ids list[str]

List of chunk ids to remove. Implementations should be idempotent — deleting a non-existent id is not an error.

required
delete_by_document(document_id) async

Remove all chunks belonging to a document.

Default implementation raises NotImplementedError. Override if your backend supports efficient filtered deletes — otherwise the :class:RAGPipeline will fall back to tracking chunk ids per document.

Parameters:

Name Type Description Default
document_id str

Parent document id.

required

Returns:

Type Description
int

Number of chunks deleted.

search(vector, *, limit=5, filter=None) async

Find the nearest chunks to a query vector.

Parameters:

Name Type Description Default
vector list[float]

Query embedding.

required
limit int

Maximum number of results to return.

5
filter dict[str, Any] | None

Optional metadata filter. Semantics are implementation-defined — most vector DBs support simple equality filters on string metadata fields.

None

Returns:

Type Description
list[RetrievalResult]

List of :class:RetrievalResult ordered by descending score.


Built-in Implementations

RecursiveTextChunker

promptise.rag.RecursiveTextChunker

Bases: Chunker

Split text into fixed-size chunks with overlap.

Uses a hierarchy of separators — double newlines (paragraph breaks), then single newlines, then sentence boundaries, then word boundaries — to keep chunks readable. Adds chunk-level metadata (chunk_index, char_start, char_end) and preserves the parent document's metadata.

Parameters:

Name Type Description Default
chunk_size int

Target characters per chunk.

500
overlap int

Number of characters shared between consecutive chunks to preserve context across boundaries. Set to ~10-20% of chunk_size for most use cases.

50
separators list[str] | None

Ordered list of separator patterns to try. Defaults to ["\n\n", "\n", ". ", " "].

None

InMemoryVectorStore

promptise.rag.InMemoryVectorStore

Bases: VectorStore

Reference vector store that keeps everything in memory.

Uses brute-force cosine similarity against a list of stored chunks. Suitable for testing, small demo corpora, and as a reference implementation to copy when building your own :class:VectorStore.

For production, adapt the methods to Pinecone, Qdrant, Weaviate, etc. The interface is the same — only the backend changes.

Parameters:

Name Type Description Default
dimension int | None

Expected embedding dimension. When set, :meth:add raises if a chunk's embedding has the wrong length.

None

Pipeline

RAGPipeline

promptise.rag.RAGPipeline

Orchestrates the full RAG flow: load → chunk → embed → store → retrieve.

A pipeline wires together four components (loader, chunker, embedder, store) into a single object that agents can treat as a black box. The same pipeline interface works whether you're using an in-memory store for tests or a managed Pinecone cluster in production.

Parameters:

Name Type Description Default
loader DocumentLoader | None

:class:DocumentLoader that fetches source documents.

None
chunker Chunker | None

:class:Chunker that splits documents into chunks. Defaults to :class:RecursiveTextChunker with 500-char chunks.

None
embedder Embedder

:class:Embedder that converts text to vectors. Required.

required
store VectorStore

:class:VectorStore that persists and searches vectors. Required.

required
batch_size int

Number of chunks to embed in a single call to :meth:Embedder.embed. Tune based on your embedding provider's rate limits.

64

Example::

pipeline = RAGPipeline(
    loader=MarkdownFolderLoader("./docs"),
    embedder=OpenAIEmbedder(),
    store=PineconeStore("my-index"),
)
report = await pipeline.index()
print(f"Indexed {report.chunks_stored} chunks in {report.duration_seconds:.1f}s")

hits = await pipeline.retrieve("password reset")
for hit in hits:
    print(f"[{hit.score:.2f}] {hit.text[:80]}")
close() async

Release pipeline resources (closes the vector store).

delete_document(document_id) async

Remove a document and all of its chunks from the store.

Parameters:

Name Type Description Default
document_id str

Parent document id.

required

Returns:

Type Description
int

Number of chunks removed.

index(documents=None) async

Load, chunk, embed, and store documents.

Parameters:

Name Type Description Default
documents list[Document] | None

Optional pre-loaded documents. When None, the configured :class:DocumentLoader is called.

None

Returns:

Type Description
IndexReport

class:IndexReport with counts and any errors encountered.

Raises:

Type Description
ValueError

If no documents are provided and no loader is configured.

retrieve(query, *, limit=5, filter=None) async

Embed a query and return the top matching chunks.

Parameters:

Name Type Description Default
query str

Natural-language query string.

required
limit int

Maximum number of results.

5
filter dict[str, Any] | None

Optional metadata filter passed to the vector store.

None

Returns:

Type Description
list[RetrievalResult]

List of :class:RetrievalResult ordered by descending score.


Tool Adapter

rag_to_tool

promptise.rag.rag_to_tool(pipeline, *, name='search_knowledge_base', description='Search the knowledge base for information relevant to a query.', limit=5, format='markdown')

Wrap a :class:RAGPipeline as a LangChain tool the agent can call.

The returned tool has a single query argument. When the LLM invokes it, the pipeline embeds the query, retrieves the top limit chunks, and formats them for display.

Parameters:

Name Type Description Default
pipeline RAGPipeline

The RAG pipeline to wrap.

required
name str

Tool name the LLM sees. Make it specific — e.g. "search_product_docs" or "search_support_tickets".

'search_knowledge_base'
description str

Tool description the LLM uses to decide when to call it. Be clear about what's in the knowledge base.

'Search the knowledge base for information relevant to a query.'
limit int

Default number of results per query. The tool exposes this as a parameter so the LLM can override for broad vs. narrow searches.

5
format str

How to format results: "markdown" (default, human and LLM friendly), "json" (structured), or "text" (plain).

'markdown'

Returns:

Type Description
Any

A LangChain StructuredTool ready to pass to build_agent

Any

via the extra_tools parameter.

Example::

from promptise import build_agent
from promptise.rag import rag_to_tool

docs_tool = rag_to_tool(
    pipeline,
    name="search_internal_docs",
    description="Search Acme Corp's internal engineering wiki.",
)

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers={},
    extra_tools=[docs_tool],
)

Utilities

content_hash

promptise.rag.content_hash(text)

Return a short deterministic hash of a text string.

Useful for generating stable chunk ids when your loader doesn't provide them. Two identical strings always produce the same hash.

Parameters:

Name Type Description Default
text str

The text to hash.

required

Returns:

Type Description
str

A 12-character hex string derived from the SHA-256 of text.