RAG API Reference¶

The promptise.rag module provides a pluggable foundation for retrieval-augmented generation: base classes you subclass for your loader, chunker, embedder, and vector store, plus a pipeline orchestrator and LangChain tool adapter.

See the RAG guide for architecture overview and usage patterns.

Data Types¶

Document¶

`promptise.rag.Document` `dataclass` ¶

A raw document loaded from a data source.

A Document is the input to the RAG pipeline. It holds the full text and metadata but has not been chunked or embedded yet.

Attributes:

Name	Type	Description
`id`	`str`	Stable identifier for the document. Used for updates and deletes. If your source doesn't have one, hash the URL or file path.
`text`	`str`	The full text content of the document.
`metadata`	`dict[str, Any]`	Arbitrary structured metadata. Typical fields include `source`, `url`, `title`, `author`, `created_at`. Metadata is copied to every chunk derived from the document so retrieval results include provenance.

Chunk¶

`promptise.rag.Chunk` `dataclass` ¶

A chunk of a document ready for embedding and retrieval.

Attributes:

Name	Type	Description
`id`	`str`	Unique chunk identifier. Typically derived from the document id and chunk index (e.g. `doc-1:chunk-0`).
`document_id`	`str`	The parent document's identifier.
`text`	`str`	The chunk text content.
`embedding`	`list[float] \| None`	Vector representation. `None` until an embedder populates it. VectorStore implementations can assume this is set when :meth:`VectorStore.add` is called.
`metadata`	`dict[str, Any]`	Copied from the parent document, plus any chunk-level fields the chunker adds (e.g. `chunk_index`, `page`).

RetrievalResult¶

`promptise.rag.RetrievalResult` `dataclass` ¶

A single hit returned by :meth:VectorStore.search.

Attributes:

Name	Type	Description
`chunk`	`Chunk`	The matched chunk with its metadata.
`score`	`float`	Similarity score in `[0.0, 1.0]`. Higher is better. Implementations should normalize their backend's native score (cosine similarity, dot product, etc.) into this range.

`metadata` `property` ¶

Shortcut for self.chunk.metadata.

`text` `property` ¶

Shortcut for self.chunk.text.

IndexReport¶

`promptise.rag.IndexReport` `dataclass` ¶

Summary of an :meth:RAGPipeline.index operation.

Attributes:

Name	Type	Description
`documents_loaded`	`int`	Number of documents returned by the loader.
`chunks_created`	`int`	Total chunks produced by the chunker.
`chunks_stored`	`int`	Chunks successfully written to the vector store.
`errors`	`list[tuple[str, str]]`	List of `(document_id, error_message)` tuples for documents that failed to index.
`duration_seconds`	`float`	Total wall-clock time for the indexing run.

Base Classes¶

Override these to plug in your own RAG backend.

DocumentLoader¶

`promptise.rag.DocumentLoader` ¶

Fetches raw documents from a data source.

Subclass this for each source you need to index: filesystem, S3, Confluence, Notion, Postgres, a REST API, etc. Implementations should return a list (or async iterator) of :class:Document instances with stable id fields so re-indexing is idempotent.

Example::

class MarkdownFolderLoader(DocumentLoader):
    def __init__(self, path: str):
        self.path = path

    async def load(self) -> list[Document]:
        from pathlib import Path
        docs = []
        for p in Path(self.path).rglob("*.md"):
            docs.append(Document(
                id=str(p.relative_to(self.path)),
                text=p.read_text(),
                metadata={"source": str(p), "filename": p.name},
            ))
        return docs

`iter_load()` `async` ¶

Stream documents one at a time for memory-efficient indexing.

Default implementation calls :meth:load and yields from the result. Override for sources that are too large to fit in memory.

Yields:

Type	Description
`AsyncIterator[Document]`

`load()` `async` ¶

Return all documents this loader can provide.

Returns:

Type	Description
`list[Document]`	A list of :class:`Document`. For very large sources, override
`list[Document]`
`list[Document]`	`list(await self.iter_load())`.

Chunker¶

`promptise.rag.Chunker` ¶

Splits a document into chunks suitable for embedding.

Subclass this if you need special chunking logic (preserve code blocks, respect markdown headers, split by semantic boundaries, etc). For most use cases, :class:RecursiveTextChunker is sufficient.

Implementations must preserve the parent document's metadata on every chunk and can add chunk-level metadata (e.g. chunk_index, char_start, char_end, page).

`chunk(document)` `async` ¶

Split a single document into chunks.

Parameters:

Name	Type	Description	Default
`document`	`Document`	The document to chunk.	required

Returns:

Type	Description
`list[Chunk]`	List of :class:`Chunk` instances. Each chunk's
`list[Chunk]`	`document_id` must equal `document.id`, and the
`list[Chunk]`	metadata should inherit from `document.metadata`.

Embedder¶

`promptise.rag.Embedder` ¶

Converts text into vector embeddings.

Subclass this to wrap any embedding provider (OpenAI, Cohere, Voyage, sentence-transformers, a local model, etc).

Implementations should:

Support batching via :meth:embed for efficiency.
Return vectors of consistent dimensionality (check with :attr:dimension).
Handle retries and rate limiting internally.

Example::

class SentenceTransformersEmbedder(Embedder):
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self._model = SentenceTransformer(model_name)

    async def embed(self, texts: list[str]) -> list[list[float]]:
        loop = asyncio.get_running_loop()
        return await loop.run_in_executor(
            None, lambda: self._model.encode(texts).tolist()
        )

    @property
    def dimension(self) -> int:
        return self._model.get_sentence_embedding_dimension()

`dimension` `property` ¶

Dimensionality of the embedding vectors.

Override to return the correct value for your model. Used by some vector stores to initialize collections with the right vector size.

`embed(texts)` `async` ¶

Embed a batch of texts.

Parameters:

Name	Type	Description	Default
`texts`	`list[str]`	List of strings to embed.	required

Returns:

Type	Description
`list[list[float]]`	A list of embedding vectors, one per input text. All vectors
`list[list[float]]`	must have the same length (:attr:`dimension`).

`embed_one(text)` `async` ¶

Embed a single text. Default implementation calls :meth:embed.

VectorStore¶

`promptise.rag.VectorStore` ¶

Persists embedded chunks and performs similarity search.

Subclass this for your vector database: Pinecone, Weaviate, Qdrant, Milvus, PGVector, Redis, Elasticsearch, or a custom backend.

All methods are async. Implementations are responsible for:

Serializing chunk metadata to whatever format the backend needs (all vector DBs have slightly different metadata constraints).
Converting native similarity scores into the [0.0, 1.0] range expected by :class:RetrievalResult.
Handling connection lifecycle (pooling, reconnection, shutdown).

Example::

class QdrantStore(VectorStore):
    def __init__(self, url: str, collection: str):
        from qdrant_client import AsyncQdrantClient
        self._client = AsyncQdrantClient(url=url)
        self._collection = collection

    async def add(self, chunks: list[Chunk]) -> None:
        points = [
            {
                "id": c.id,
                "vector": c.embedding,
                "payload": {"text": c.text, **c.metadata},
            }
            for c in chunks
        ]
        await self._client.upsert(self._collection, points=points)

    async def search(self, vector, *, limit=5, filter=None):
        res = await self._client.search(
            self._collection, query_vector=vector, limit=limit,
            query_filter=filter,
        )
        return [
            RetrievalResult(
                chunk=Chunk(
                    id=hit.id,
                    document_id=hit.payload.get("document_id", ""),
                    text=hit.payload.get("text", ""),
                    metadata={k: v for k, v in hit.payload.items() if k != "text"},
                ),
                score=hit.score,
            )
            for hit in res
        ]

    async def delete(self, chunk_ids: list[str]) -> None:
        await self._client.delete(self._collection, points_selector=chunk_ids)

`add(chunks)` `async` ¶

Store a batch of chunks with their embeddings.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	Chunks to store. Every chunk is expected to have a non-`None` `embedding` field.	required

`close()` `async` ¶

Release resources. Default is a no-op.

`count()` `async` ¶

Return the number of stored chunks. Optional to implement.

`delete(chunk_ids)` `async` ¶

Remove chunks by id.

Parameters:

Name	Type	Description	Default
`chunk_ids`	`list[str]`	List of chunk ids to remove. Implementations should be idempotent — deleting a non-existent id is not an error.	required

`delete_by_document(document_id)` `async` ¶

Remove all chunks belonging to a document.

Default implementation raises NotImplementedError. Override if your backend supports efficient filtered deletes — otherwise the :class:RAGPipeline will fall back to tracking chunk ids per document.

Parameters:

Name	Type	Description	Default
`document_id`	`str`	Parent document id.	required

Returns:

Type	Description
`int`	Number of chunks deleted.

`search(vector, *, limit=5, filter=None)` `async` ¶

Find the nearest chunks to a query vector.

Parameters:

Name	Type	Description	Default
`vector`	`list[float]`	Query embedding.	required
`limit`	`int`	Maximum number of results to return.	`5`
`filter`	`dict[str, Any] \| None`	Optional metadata filter. Semantics are implementation-defined — most vector DBs support simple equality filters on string metadata fields.	`None`

Returns:

Type	Description
`list[RetrievalResult]`	List of :class:`RetrievalResult` ordered by descending score.

Built-in Implementations¶

RecursiveTextChunker¶

`promptise.rag.RecursiveTextChunker` ¶

Bases: Chunker

Split text into fixed-size chunks with overlap.

Uses a hierarchy of separators — double newlines (paragraph breaks), then single newlines, then sentence boundaries, then word boundaries — to keep chunks readable. Adds chunk-level metadata (chunk_index, char_start, char_end) and preserves the parent document's metadata.

Parameters:

Name	Type	Description	Default
`chunk_size`	`int`	Target characters per chunk.	`500`
`overlap`	`int`	Number of characters shared between consecutive chunks to preserve context across boundaries. Set to ~10-20% of `chunk_size` for most use cases.	`50`
`separators`	`list[str] \| None`	Ordered list of separator patterns to try. Defaults to `["\n\n", "\n", ". ", " "]`.	`None`

InMemoryVectorStore¶

`promptise.rag.InMemoryVectorStore` ¶

Bases: VectorStore

Reference vector store that keeps everything in memory.

Uses brute-force cosine similarity against a list of stored chunks. Suitable for testing, small demo corpora, and as a reference implementation to copy when building your own :class:VectorStore.

For production, adapt the methods to Pinecone, Qdrant, Weaviate, etc. The interface is the same — only the backend changes.

Parameters:

Name	Type	Description	Default
`dimension`	`int \| None`	Expected embedding dimension. When set, :meth:`add` raises if a chunk's embedding has the wrong length.	`None`

Pipeline¶

RAGPipeline¶

`promptise.rag.RAGPipeline` ¶

Orchestrates the full RAG flow: load → chunk → embed → store → retrieve.

A pipeline wires together four components (loader, chunker, embedder, store) into a single object that agents can treat as a black box. The same pipeline interface works whether you're using an in-memory store for tests or a managed Pinecone cluster in production.

Parameters:

Name	Type	Description	Default
`loader`	`DocumentLoader \| None`	:class:`DocumentLoader` that fetches source documents.	`None`
`chunker`	`Chunker \| None`	:class:`Chunker` that splits documents into chunks. Defaults to :class:`RecursiveTextChunker` with 500-char chunks.	`None`
`embedder`	`Embedder`	:class:`Embedder` that converts text to vectors. Required.	required
`store`	`VectorStore`	:class:`VectorStore` that persists and searches vectors. Required.	required
`batch_size`	`int`	Number of chunks to embed in a single call to :meth:`Embedder.embed`. Tune based on your embedding provider's rate limits.	`64`

pipeline = RAGPipeline(
    loader=MarkdownFolderLoader("./docs"),
    embedder=OpenAIEmbedder(),
    store=PineconeStore("my-index"),
)
report = await pipeline.index()
print(f"Indexed {report.chunks_stored} chunks in {report.duration_seconds:.1f}s")

hits = await pipeline.retrieve("password reset")
for hit in hits:
    print(f"[{hit.score:.2f}] {hit.text[:80]}")

`close()` `async` ¶

Release pipeline resources (closes the vector store).

`delete_document(document_id)` `async` ¶

Remove a document and all of its chunks from the store.

Parameters:

Name	Type	Description	Default
`document_id`	`str`	Parent document id.	required

Returns:

Type	Description
`int`	Number of chunks removed.

`index(documents=None)` `async` ¶

Load, chunk, embed, and store documents.

Parameters:

Name	Type	Description	Default
`documents`	`list[Document] \| None`	Optional pre-loaded documents. When `None`, the configured :class:`DocumentLoader` is called.	`None`

Returns:

Type	Description
`IndexReport`

Raises:

Type	Description
`ValueError`	If no documents are provided and no loader is configured.

`retrieve(query, *, limit=5, filter=None)` `async` ¶

Embed a query and return the top matching chunks.

Parameters:

Name	Type	Description	Default
`query`	`str`	Natural-language query string.	required
`limit`	`int`	Maximum number of results.	`5`
`filter`	`dict[str, Any] \| None`	Optional metadata filter passed to the vector store.	`None`

Returns:

Type	Description
`list[RetrievalResult]`	List of :class:`RetrievalResult` ordered by descending score.

Tool Adapter¶

rag_to_tool¶

`promptise.rag.rag_to_tool(pipeline, *, name='search_knowledge_base', description='Search the knowledge base for information relevant to a query.', limit=5, format='markdown')` ¶

Wrap a :class:RAGPipeline as a LangChain tool the agent can call.

The returned tool has a single query argument. When the LLM invokes it, the pipeline embeds the query, retrieves the top limit chunks, and formats them for display.

Parameters:

Name	Type	Description	Default
`pipeline`	`RAGPipeline`	The RAG pipeline to wrap.	required
`name`	`str`	Tool name the LLM sees. Make it specific — e.g. `"search_product_docs"` or `"search_support_tickets"`.	`'search_knowledge_base'`
`description`	`str`	Tool description the LLM uses to decide when to call it. Be clear about what's in the knowledge base.	`'Search the knowledge base for information relevant to a query.'`
`limit`	`int`	Default number of results per query. The tool exposes this as a parameter so the LLM can override for broad vs. narrow searches.	`5`
`format`	`str`	How to format results: `"markdown"` (default, human and LLM friendly), `"json"` (structured), or `"text"` (plain).	`'markdown'`

Returns:

Type	Description
`Any`	A LangChain `StructuredTool` ready to pass to `build_agent`
`Any`	via the `extra_tools` parameter.

from promptise import build_agent
from promptise.rag import rag_to_tool

docs_tool = rag_to_tool(
    pipeline,
    name="search_internal_docs",
    description="Search Acme Corp's internal engineering wiki.",
)

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers={},
    extra_tools=[docs_tool],
)

Utilities¶

content_hash¶

`promptise.rag.content_hash(text)` ¶

Return a short deterministic hash of a text string.

Useful for generating stable chunk ids when your loader doesn't provide them. Two identical strings always produce the same hash.

Parameters:

Name	Type	Description	Default
`text`	`str`	The text to hash.	required

Returns:

Type	Description
`str`	A 12-character hex string derived from the SHA-256 of `text`.

RAG API Reference¶

Data Types¶

Document¶

promptise.rag.Document dataclass ¶

Chunk¶

promptise.rag.Chunk dataclass ¶

RetrievalResult¶

promptise.rag.RetrievalResult dataclass ¶

metadata property ¶

text property ¶

IndexReport¶

promptise.rag.IndexReport dataclass ¶

Base Classes¶

DocumentLoader¶

promptise.rag.DocumentLoader ¶

iter_load() async ¶

load() async ¶

Chunker¶

promptise.rag.Chunker ¶

chunk(document) async ¶

Embedder¶

promptise.rag.Embedder ¶

dimension property ¶

embed(texts) async ¶

embed_one(text) async ¶

VectorStore¶

promptise.rag.VectorStore ¶

add(chunks) async ¶

close() async ¶

count() async ¶

delete(chunk_ids) async ¶

delete_by_document(document_id) async ¶

search(vector, *, limit=5, filter=None) async ¶

Built-in Implementations¶

RecursiveTextChunker¶

promptise.rag.RecursiveTextChunker ¶

InMemoryVectorStore¶

promptise.rag.InMemoryVectorStore ¶

Pipeline¶

RAGPipeline¶

promptise.rag.RAGPipeline ¶

close() async ¶

delete_document(document_id) async ¶

index(documents=None) async ¶

retrieve(query, *, limit=5, filter=None) async ¶

Tool Adapter¶

rag_to_tool¶

promptise.rag.rag_to_tool(pipeline, *, name='search_knowledge_base', description='Search the knowledge base for information relevant to a query.', limit=5, format='markdown') ¶

Utilities¶

content_hash¶

promptise.rag.content_hash(text) ¶

`promptise.rag.Document` `dataclass` ¶

`promptise.rag.Chunk` `dataclass` ¶

`promptise.rag.RetrievalResult` `dataclass` ¶

`metadata` `property` ¶

`text` `property` ¶

`promptise.rag.IndexReport` `dataclass` ¶

`promptise.rag.DocumentLoader` ¶

`iter_load()` `async` ¶

`load()` `async` ¶

`promptise.rag.Chunker` ¶

`chunk(document)` `async` ¶

`promptise.rag.Embedder` ¶

`dimension` `property` ¶

`embed(texts)` `async` ¶

`embed_one(text)` `async` ¶

`promptise.rag.VectorStore` ¶

`add(chunks)` `async` ¶

`close()` `async` ¶

`count()` `async` ¶

`delete(chunk_ids)` `async` ¶

`delete_by_document(document_id)` `async` ¶

`search(vector, *, limit=5, filter=None)` `async` ¶

`promptise.rag.RecursiveTextChunker` ¶

`promptise.rag.InMemoryVectorStore` ¶

`promptise.rag.RAGPipeline` ¶

`close()` `async` ¶

`delete_document(document_id)` `async` ¶

`index(documents=None)` `async` ¶

`retrieve(query, *, limit=5, filter=None)` `async` ¶

`promptise.rag.rag_to_tool(pipeline, *, name='search_knowledge_base', description='Search the knowledge base for information relevant to a query.', limit=5, format='markdown')` ¶

`promptise.rag.content_hash(text)` ¶