RAG API Reference¶
The promptise.rag module provides a pluggable foundation for retrieval-augmented generation: base classes you subclass for your loader, chunker, embedder, and vector store, plus a pipeline orchestrator and LangChain tool adapter.
See the RAG guide for architecture overview and usage patterns.
Data Types¶
Document¶
promptise.rag.Document
dataclass
¶
A raw document loaded from a data source.
A Document is the input to the RAG pipeline. It holds the full text and metadata but has not been chunked or embedded yet.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Stable identifier for the document. Used for updates and deletes. If your source doesn't have one, hash the URL or file path. |
text |
str
|
The full text content of the document. |
metadata |
dict[str, Any]
|
Arbitrary structured metadata. Typical fields include
|
Chunk¶
promptise.rag.Chunk
dataclass
¶
A chunk of a document ready for embedding and retrieval.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Unique chunk identifier. Typically derived from the document
id and chunk index (e.g. |
document_id |
str
|
The parent document's identifier. |
text |
str
|
The chunk text content. |
embedding |
list[float] | None
|
Vector representation. |
metadata |
dict[str, Any]
|
Copied from the parent document, plus any chunk-level
fields the chunker adds (e.g. |
RetrievalResult¶
promptise.rag.RetrievalResult
dataclass
¶
A single hit returned by :meth:VectorStore.search.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk |
Chunk
|
The matched chunk with its metadata. |
score |
float
|
Similarity score in |
IndexReport¶
promptise.rag.IndexReport
dataclass
¶
Summary of an :meth:RAGPipeline.index operation.
Attributes:
| Name | Type | Description |
|---|---|---|
documents_loaded |
int
|
Number of documents returned by the loader. |
chunks_created |
int
|
Total chunks produced by the chunker. |
chunks_stored |
int
|
Chunks successfully written to the vector store. |
errors |
list[tuple[str, str]]
|
List of |
duration_seconds |
float
|
Total wall-clock time for the indexing run. |
Base Classes¶
Override these to plug in your own RAG backend.
DocumentLoader¶
promptise.rag.DocumentLoader
¶
Fetches raw documents from a data source.
Subclass this for each source you need to index: filesystem, S3,
Confluence, Notion, Postgres, a REST API, etc. Implementations
should return a list (or async iterator) of :class:Document
instances with stable id fields so re-indexing is idempotent.
Example::
class MarkdownFolderLoader(DocumentLoader):
def __init__(self, path: str):
self.path = path
async def load(self) -> list[Document]:
from pathlib import Path
docs = []
for p in Path(self.path).rglob("*.md"):
docs.append(Document(
id=str(p.relative_to(self.path)),
text=p.read_text(),
metadata={"source": str(p), "filename": p.name},
))
return docs
Chunker¶
promptise.rag.Chunker
¶
Splits a document into chunks suitable for embedding.
Subclass this if you need special chunking logic (preserve code
blocks, respect markdown headers, split by semantic boundaries, etc).
For most use cases, :class:RecursiveTextChunker is sufficient.
Implementations must preserve the parent document's metadata on
every chunk and can add chunk-level metadata (e.g. chunk_index,
char_start, char_end, page).
chunk(document)
async
¶
Split a single document into chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document
|
Document
|
The document to chunk. |
required |
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
List of :class: |
list[Chunk]
|
|
list[Chunk]
|
metadata should inherit from |
Embedder¶
promptise.rag.Embedder
¶
Converts text into vector embeddings.
Subclass this to wrap any embedding provider (OpenAI, Cohere, Voyage, sentence-transformers, a local model, etc).
Implementations should:
- Support batching via :meth:
embedfor efficiency. - Return vectors of consistent dimensionality (check with
:attr:
dimension). - Handle retries and rate limiting internally.
Example::
class SentenceTransformersEmbedder(Embedder):
def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
from sentence_transformers import SentenceTransformer
self._model = SentenceTransformer(model_name)
async def embed(self, texts: list[str]) -> list[list[float]]:
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None, lambda: self._model.encode(texts).tolist()
)
@property
def dimension(self) -> int:
return self._model.get_sentence_embedding_dimension()
dimension
property
¶
Dimensionality of the embedding vectors.
Override to return the correct value for your model. Used by some vector stores to initialize collections with the right vector size.
embed(texts)
async
¶
Embed a batch of texts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of strings to embed. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
A list of embedding vectors, one per input text. All vectors |
list[list[float]]
|
must have the same length (:attr: |
embed_one(text)
async
¶
Embed a single text. Default implementation calls :meth:embed.
VectorStore¶
promptise.rag.VectorStore
¶
Persists embedded chunks and performs similarity search.
Subclass this for your vector database: Pinecone, Weaviate, Qdrant, Milvus, PGVector, Redis, Elasticsearch, or a custom backend.
All methods are async. Implementations are responsible for:
- Serializing chunk metadata to whatever format the backend needs (all vector DBs have slightly different metadata constraints).
- Converting native similarity scores into the
[0.0, 1.0]range expected by :class:RetrievalResult. - Handling connection lifecycle (pooling, reconnection, shutdown).
Example::
class QdrantStore(VectorStore):
def __init__(self, url: str, collection: str):
from qdrant_client import AsyncQdrantClient
self._client = AsyncQdrantClient(url=url)
self._collection = collection
async def add(self, chunks: list[Chunk]) -> None:
points = [
{
"id": c.id,
"vector": c.embedding,
"payload": {"text": c.text, **c.metadata},
}
for c in chunks
]
await self._client.upsert(self._collection, points=points)
async def search(self, vector, *, limit=5, filter=None):
res = await self._client.search(
self._collection, query_vector=vector, limit=limit,
query_filter=filter,
)
return [
RetrievalResult(
chunk=Chunk(
id=hit.id,
document_id=hit.payload.get("document_id", ""),
text=hit.payload.get("text", ""),
metadata={k: v for k, v in hit.payload.items() if k != "text"},
),
score=hit.score,
)
for hit in res
]
async def delete(self, chunk_ids: list[str]) -> None:
await self._client.delete(self._collection, points_selector=chunk_ids)
add(chunks)
async
¶
Store a batch of chunks with their embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Chunks to store. Every chunk is expected to have
a non- |
required |
close()
async
¶
Release resources. Default is a no-op.
count()
async
¶
Return the number of stored chunks. Optional to implement.
delete(chunk_ids)
async
¶
Remove chunks by id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_ids
|
list[str]
|
List of chunk ids to remove. Implementations should be idempotent — deleting a non-existent id is not an error. |
required |
delete_by_document(document_id)
async
¶
Remove all chunks belonging to a document.
Default implementation raises NotImplementedError. Override
if your backend supports efficient filtered deletes — otherwise
the :class:RAGPipeline will fall back to tracking chunk ids
per document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document_id
|
str
|
Parent document id. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of chunks deleted. |
search(vector, *, limit=5, filter=None)
async
¶
Find the nearest chunks to a query vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
list[float]
|
Query embedding. |
required |
limit
|
int
|
Maximum number of results to return. |
5
|
filter
|
dict[str, Any] | None
|
Optional metadata filter. Semantics are implementation-defined — most vector DBs support simple equality filters on string metadata fields. |
None
|
Returns:
| Type | Description |
|---|---|
list[RetrievalResult]
|
List of :class: |
Built-in Implementations¶
RecursiveTextChunker¶
promptise.rag.RecursiveTextChunker
¶
Bases: Chunker
Split text into fixed-size chunks with overlap.
Uses a hierarchy of separators — double newlines (paragraph breaks),
then single newlines, then sentence boundaries, then word boundaries
— to keep chunks readable. Adds chunk-level metadata (chunk_index,
char_start, char_end) and preserves the parent document's
metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_size
|
int
|
Target characters per chunk. |
500
|
overlap
|
int
|
Number of characters shared between consecutive chunks
to preserve context across boundaries. Set to ~10-20% of
|
50
|
separators
|
list[str] | None
|
Ordered list of separator patterns to try. Defaults
to |
None
|
InMemoryVectorStore¶
promptise.rag.InMemoryVectorStore
¶
Bases: VectorStore
Reference vector store that keeps everything in memory.
Uses brute-force cosine similarity against a list of stored chunks.
Suitable for testing, small demo corpora, and as a reference
implementation to copy when building your own :class:VectorStore.
For production, adapt the methods to Pinecone, Qdrant, Weaviate, etc. The interface is the same — only the backend changes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dimension
|
int | None
|
Expected embedding dimension. When set, :meth: |
None
|
Pipeline¶
RAGPipeline¶
promptise.rag.RAGPipeline
¶
Orchestrates the full RAG flow: load → chunk → embed → store → retrieve.
A pipeline wires together four components (loader, chunker, embedder, store) into a single object that agents can treat as a black box. The same pipeline interface works whether you're using an in-memory store for tests or a managed Pinecone cluster in production.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
loader
|
DocumentLoader | None
|
:class: |
None
|
chunker
|
Chunker | None
|
:class: |
None
|
embedder
|
Embedder
|
:class: |
required |
store
|
VectorStore
|
:class: |
required |
batch_size
|
int
|
Number of chunks to embed in a single call to
:meth: |
64
|
Example::
pipeline = RAGPipeline(
loader=MarkdownFolderLoader("./docs"),
embedder=OpenAIEmbedder(),
store=PineconeStore("my-index"),
)
report = await pipeline.index()
print(f"Indexed {report.chunks_stored} chunks in {report.duration_seconds:.1f}s")
hits = await pipeline.retrieve("password reset")
for hit in hits:
print(f"[{hit.score:.2f}] {hit.text[:80]}")
close()
async
¶
Release pipeline resources (closes the vector store).
delete_document(document_id)
async
¶
Remove a document and all of its chunks from the store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document_id
|
str
|
Parent document id. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of chunks removed. |
index(documents=None)
async
¶
Load, chunk, embed, and store documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
documents
|
list[Document] | None
|
Optional pre-loaded documents. When |
None
|
Returns:
| Type | Description |
|---|---|
IndexReport
|
class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no documents are provided and no loader is configured. |
retrieve(query, *, limit=5, filter=None)
async
¶
Embed a query and return the top matching chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Natural-language query string. |
required |
limit
|
int
|
Maximum number of results. |
5
|
filter
|
dict[str, Any] | None
|
Optional metadata filter passed to the vector store. |
None
|
Returns:
| Type | Description |
|---|---|
list[RetrievalResult]
|
List of :class: |
Tool Adapter¶
rag_to_tool¶
promptise.rag.rag_to_tool(pipeline, *, name='search_knowledge_base', description='Search the knowledge base for information relevant to a query.', limit=5, format='markdown')
¶
Wrap a :class:RAGPipeline as a LangChain tool the agent can call.
The returned tool has a single query argument. When the LLM
invokes it, the pipeline embeds the query, retrieves the top limit
chunks, and formats them for display.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline
|
RAGPipeline
|
The RAG pipeline to wrap. |
required |
name
|
str
|
Tool name the LLM sees. Make it specific — e.g.
|
'search_knowledge_base'
|
description
|
str
|
Tool description the LLM uses to decide when to call it. Be clear about what's in the knowledge base. |
'Search the knowledge base for information relevant to a query.'
|
limit
|
int
|
Default number of results per query. The tool exposes this as a parameter so the LLM can override for broad vs. narrow searches. |
5
|
format
|
str
|
How to format results: |
'markdown'
|
Returns:
| Type | Description |
|---|---|
Any
|
A LangChain |
Any
|
via the |
Example::
from promptise import build_agent
from promptise.rag import rag_to_tool
docs_tool = rag_to_tool(
pipeline,
name="search_internal_docs",
description="Search Acme Corp's internal engineering wiki.",
)
agent = await build_agent(
model="openai:gpt-5-mini",
servers={},
extra_tools=[docs_tool],
)
Utilities¶
content_hash¶
promptise.rag.content_hash(text)
¶
Return a short deterministic hash of a text string.
Useful for generating stable chunk ids when your loader doesn't provide them. Two identical strings always produce the same hash.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to hash. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A 12-character hex string derived from the SHA-256 of |