Skip to main content

rag

Production-grade Retrieval-Augmented Generation. Knowledge bases with hybrid retrieval (BM25 + semantic + RRF), cross-encoder reranking, source citations, semantic cache, multi-source ingestion, Text2SQL, multi-query expansion, CRAG fallback.

PropertyValue
Module idrag
Typeshared (one instance per daemon, per-app reconfig via on_config_update)
Action count14
Pip depsfastembed, qdrant-client (bundled)
Optional pip depschromadb, lancedb, pinecone, asyncpg + pgvector, elasticsearch

Full reference (every config field, all 6 backends, all 7 embedding models + 5 reranker models, ingestion formats, sync strategies, complete enterprise example): RAG Module. This page is a quick module summary.

The 14 actions

All risk_level mostly medium or high for ingestion / migration; low for queries / stats.

ToolSourcePurpose
rag.create_knowledge_baseCreate a named KB.
rag.delete_knowledge_baseDrop a KB + its vector + BM25 indexes.
rag.list_knowledge_basesEnumerate KBs with metadata.
rag.knowledge_base_statsCounts, model, last sync, hit rate.
rag.ingestAdd raw text documents.
rag.ingest_fileAdd a single file.
rag.ingest_directoryWalk a directory + index matching files (content-hash dedup).
rag.ingest_databaseIndex DB tables (rows or schema-only).
rag.queryRetrieve from a KB (default strategy or per-call override).
rag.multi_queryLLM-expanded query with RRF fusion.
rag.sql_queryText2SQL - generate + execute a SELECT.
rag.clear_cacheWipe the semantic cache.
rag.migrate_embeddingsSwitch a KB to a new embedding model (re-embeds in batches).
rag.list_modelsList available embedding + reranker shortcuts.

Zero-config quick start

tools:
modules:
rag: {}

Defaults:

SettingDefault
Embeddingminilm-l12 (384 d, multilingual, 220 MB)
BackendQdrant in-memory
StrategyHybrid (BM25 + semantic + RRF)
Chunkingrecursive, 500 chars, 50 overlap
Cacheenabled, in-memory, 1 h TTL
Citationsenabled, inline
Rerankerdisabled

6 vector backends

BackendConfig.type:

BackendModePip dep
Qdrant (default)embedded / remotebundled
ChromaDBembedded / remotechromadb
LanceDBembedded (file)lancedb, pyarrow
Pineconecloudpinecone
pgvectorPostgreSQLasyncpg, pgvector
Elasticsearchremote clusterelasticsearch

7 embedding models

BUILTIN_MODELS. All auto-downloaded by FastEmbed (ONNX, CPU):

ShortcutFastEmbed idDims
minilm-l12 (default)sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384
bge-m3BAAI/bge-m31024
bge-smallBAAI/bge-small-en-v1.5384
bge-largeBAAI/bge-large-en-v1.51024
nomic-v1.5nomic-ai/nomic-embed-text-v1.5768
jina-v3jinaai/jina-embeddings-v31024
snowflake-xssnowflake/snowflake-arctic-embed-xs384

Custom models: any FastEmbed-supported HuggingFace id.

5 reranker models

BUILTIN_RERANKERS. Default minilm-l6:

ShortcutHF id
minilm-l6 (default)Xenova/ms-marco-MiniLM-L-6-v2
minilm-l12Xenova/ms-marco-MiniLM-L-12-v2
bge-reranker-baseBAAI/bge-reranker-base
jina-reranker-v1-tinyjinaai/jina-reranker-v1-tiny-en
jina-reranker-v2jinaai/jina-reranker-v2-base-multilingual
config:
reranker: true # default minilm-l6
# OR
reranker: "bge-reranker-base"

Configuration shape

tools:
modules:
rag:
config:
embedding_model: minilm-l12
reranker: false # true | "<shortcut>" | "<HF id>"
backend:
type: qdrant # qdrant | chroma | lancedb | pinecone | pgvector | elasticsearch
path: "" # "" = in-memory
url: ""
quantization: none # none | int8 | binary (qdrant only)
pipeline:
retrieval: hybrid # hybrid | semantic | bm25
bm25_weight: 0.3
semantic_weight: 0.7
rerank_top_n: 20
final_top_k: 5
multi_query:
enabled: false
provider: ""
num_variants: 3
chunking:
strategy: recursive # fixed | sentence | paragraph | recursive
size: 500
overlap: 50
sources:
- type: file
path: "{{workspace}}/docs"
extensions: [.md, .txt, .pdf]
watch: true
- type: database
connection_id: crm
sync: { strategy: updated_at, interval: 30 }
tables:
users:
columns: [id, name, email, bio]
mode: embed_rows
template: "{name} - {bio}"
orders:
mode: schema_only
cache:
enabled: true
backend: memory # memory | redis
similarity_threshold: 0.95
ttl: 3600
citations:
enabled: true
format: inline # inline | footnote | structured
verify: false
text2sql:
enabled: false
provider: ""
example_cache: true
crag:
enabled: false
confidence_threshold: 0.5
fallback: broader_query # broader_query | none
adaptive:
enabled: false
strategies: {}
contextual_retrieval:
enabled: false
provider: ""
concurrency: 5
max_knowledge_bases: 50
max_documents: 100000
persistence_dir: ""

Shared module + per-app reconfig (gotcha)

rag has isolation = "shared" - one instance per daemon. on_start runs once at boot with empty config → default in-memory backend.

When an app activates, the bootstrap calls module.on_config_update(cfg) with the app's config. The overridden hook (in):

  1. Compares old vs new backend path.
  2. Closes the old backend if changed.
  3. Re-creates + initialises the new backend.
  4. Calls _discover_existing_collections to rebuild _kbs from existing on-disk collections.

Common config bug: forgetting the config: wrapper under tools.modules.rag causes compiled.modules["rag"].config to be {}, so on_config_update is never called and every query returns "knowledge base not found". Always nest under config:.

Cross-references