rag
Production-grade Retrieval-Augmented Generation. Knowledge bases with hybrid retrieval (BM25 + semantic + RRF), cross-encoder reranking, source citations, semantic cache, multi-source ingestion, Text2SQL, multi-query expansion, CRAG fallback.
| Property | Value |
|---|---|
| Module id | rag |
| Type | shared (one instance per daemon, per-app reconfig via on_config_update) |
| Action count | 14 |
| Pip deps | fastembed, qdrant-client (bundled) |
| Optional pip deps | chromadb, lancedb, pinecone, asyncpg + pgvector, elasticsearch |
Full reference (every config field, all 6 backends, all 7 embedding models + 5 reranker models, ingestion formats, sync strategies, complete enterprise example): RAG Module. This page is a quick module summary.
The 14 actions
All risk_level mostly medium or
high for ingestion / migration; low for queries / stats.
| Tool | Source | Purpose |
|---|---|---|
rag.create_knowledge_base | Create a named KB. | |
rag.delete_knowledge_base | Drop a KB + its vector + BM25 indexes. | |
rag.list_knowledge_bases | Enumerate KBs with metadata. | |
rag.knowledge_base_stats | Counts, model, last sync, hit rate. | |
rag.ingest | Add raw text documents. | |
rag.ingest_file | Add a single file. | |
rag.ingest_directory | Walk a directory + index matching files (content-hash dedup). | |
rag.ingest_database | Index DB tables (rows or schema-only). | |
rag.query | Retrieve from a KB (default strategy or per-call override). | |
rag.multi_query | LLM-expanded query with RRF fusion. | |
rag.sql_query | Text2SQL - generate + execute a SELECT. | |
rag.clear_cache | Wipe the semantic cache. | |
rag.migrate_embeddings | Switch a KB to a new embedding model (re-embeds in batches). | |
rag.list_models | List available embedding + reranker shortcuts. |
Zero-config quick start
tools:
modules:
rag: {}
Defaults:
| Setting | Default |
|---|---|
| Embedding | minilm-l12 (384 d, multilingual, 220 MB) |
| Backend | Qdrant in-memory |
| Strategy | Hybrid (BM25 + semantic + RRF) |
| Chunking | recursive, 500 chars, 50 overlap |
| Cache | enabled, in-memory, 1 h TTL |
| Citations | enabled, inline |
| Reranker | disabled |
6 vector backends
BackendConfig.type:
| Backend | Mode | Pip dep |
|---|---|---|
| Qdrant (default) | embedded / remote | bundled |
| ChromaDB | embedded / remote | chromadb |
| LanceDB | embedded (file) | lancedb, pyarrow |
| Pinecone | cloud | pinecone |
| pgvector | PostgreSQL | asyncpg, pgvector |
| Elasticsearch | remote cluster | elasticsearch |
7 embedding models
BUILTIN_MODELS. All auto-downloaded
by FastEmbed (ONNX, CPU):
| Shortcut | FastEmbed id | Dims |
|---|---|---|
minilm-l12 (default) | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 384 |
bge-m3 | BAAI/bge-m3 | 1024 |
bge-small | BAAI/bge-small-en-v1.5 | 384 |
bge-large | BAAI/bge-large-en-v1.5 | 1024 |
nomic-v1.5 | nomic-ai/nomic-embed-text-v1.5 | 768 |
jina-v3 | jinaai/jina-embeddings-v3 | 1024 |
snowflake-xs | snowflake/snowflake-arctic-embed-xs | 384 |
Custom models: any FastEmbed-supported HuggingFace id.
5 reranker models
BUILTIN_RERANKERS. Default
minilm-l6:
| Shortcut | HF id |
|---|---|
minilm-l6 (default) | Xenova/ms-marco-MiniLM-L-6-v2 |
minilm-l12 | Xenova/ms-marco-MiniLM-L-12-v2 |
bge-reranker-base | BAAI/bge-reranker-base |
jina-reranker-v1-tiny | jinaai/jina-reranker-v1-tiny-en |
jina-reranker-v2 | jinaai/jina-reranker-v2-base-multilingual |
config:
reranker: true # default minilm-l6
# OR
reranker: "bge-reranker-base"
Configuration shape
tools:
modules:
rag:
config:
embedding_model: minilm-l12
reranker: false # true | "<shortcut>" | "<HF id>"
backend:
type: qdrant # qdrant | chroma | lancedb | pinecone | pgvector | elasticsearch
path: "" # "" = in-memory
url: ""
quantization: none # none | int8 | binary (qdrant only)
pipeline:
retrieval: hybrid # hybrid | semantic | bm25
bm25_weight: 0.3
semantic_weight: 0.7
rerank_top_n: 20
final_top_k: 5
multi_query:
enabled: false
provider: ""
num_variants: 3
chunking:
strategy: recursive # fixed | sentence | paragraph | recursive
size: 500
overlap: 50
sources:
- type: file
path: "{{workspace}}/docs"
extensions: [.md, .txt, .pdf]
watch: true
- type: database
connection_id: crm
sync: { strategy: updated_at, interval: 30 }
tables:
users:
columns: [id, name, email, bio]
mode: embed_rows
template: "{name} - {bio}"
orders:
mode: schema_only
cache:
enabled: true
backend: memory # memory | redis
similarity_threshold: 0.95
ttl: 3600
citations:
enabled: true
format: inline # inline | footnote | structured
verify: false
text2sql:
enabled: false
provider: ""
example_cache: true
crag:
enabled: false
confidence_threshold: 0.5
fallback: broader_query # broader_query | none
adaptive:
enabled: false
strategies: {}
contextual_retrieval:
enabled: false
provider: ""
concurrency: 5
max_knowledge_bases: 50
max_documents: 100000
persistence_dir: ""
Shared module + per-app reconfig (gotcha)
rag has isolation = "shared" - one instance per daemon.
on_start runs once at boot with empty config →
default in-memory backend.
When an app activates, the bootstrap calls
module.on_config_update(cfg) with the app's config. The
overridden hook (in):
- Compares old vs new backend path.
- Closes the old backend if changed.
- Re-creates + initialises the new backend.
- Calls
_discover_existing_collectionsto rebuild_kbsfrom existing on-disk collections.
Common config bug: forgetting the
config:wrapper undertools.modules.ragcausescompiled.modules["rag"].configto be{}, soon_config_updateis never called and every query returns "knowledge base not found". Always nest underconfig:.
Cross-references
- Full RAG reference (every adapter, sync strategy, ingestion format, complete enterprise example): RAG Module
- App-config block reference (
tools.modules.rag.config): App Configuration → tools.modules - Lower-level vector ops module (no RAG pipeline): vector reference
- System-level workspace index (separate from
rag): index reference