RAG — Retrieval-Augmented Generation
RAG grounds an agent's answers in your own documents instead of the LLM's parametric memory. In Chidori, RAG is two host functions: chidori.memory() for built-in vector storage, and chidori.tool() for custom retrieval over your data source.
Strategy
- Ingest — chunk documents and store them with
chidori.memory("set", ...)or in your own vector DB. - Retrieve — at query time, call
chidori.memory("search", ...)or a custom tool to fetch relevant chunks. - Generate — pass the retrieved chunks into
chidori.prompt()with clear instructions to cite them.
Option A — built-in chidori.memory()
For small-to-medium corpora, Chidori's built-in memory is enough. It supports key-value and semantic search in the same store.
Ingest
A one-shot agent that chunks and stores a document:
agents/ingest.ts
import type { Chidori } from "chidori";
export async function agent(input: { document: string; sourceId: string }, chidori: Chidori) {
const { document, sourceId } = input;
// Chunk naively — swap in a smarter splitter for production
const chunks: string[] = [];
for (let i = 0; i < document.length; i += 700) {
chunks.push(document.slice(i, i + 800));
}
for (let idx = 0; idx < chunks.length; idx++) {
await chidori.memory("set", sourceId + ":" + idx, chunks[idx], {
metadata: { source: sourceId, idx },
});
}
return { storedChunks: chunks.length, source: sourceId };
}Retrieve + generate
agents/rag.ts
import type { Chidori } from "chidori";
export async function agent(input: { question: string; topK?: number }, chidori: Chidori) {
const topK = input.topK ?? 5;
const results = await chidori.memory("search", input.question, { topK });
const context = results.map((r) => r.value).join("\n\n---\n\n");
const citations = results.map((r) => ({ source: r.metadata.source, idx: r.metadata.idx }));
const answer = await chidori.prompt(
await chidori.template("prompts/rag.jinja", { question: input.question, context }),
{ model: "claude-sonnet", temperature: 0.2, type: "final" },
);
return { answer, citations };
}prompts/rag.jinja
You are a research assistant. Answer the question **only** from the context below.
If the context doesn't contain the answer, say so instead of guessing.
## Context
{{ context }}
## Question
{{ question }}
## AnswerOption B — custom retrieval tool
For production corpora, wrap your own vector DB as a tool. Everything upstream stays the same.
tools/retrieve.ts
import type { Chidori, ToolDefinition } from "chidori";
export const tool: ToolDefinition = {
name: "retrieve",
description: "Retrieve the most relevant document chunks for the query.",
parameters: {
type: "object",
properties: {
query: { type: "string" },
topK: { type: "number" },
namespace: { type: "string" },
},
required: ["query"],
},
};
export async function run(
args: { query: string; topK?: number; namespace?: string },
chidori: Chidori,
) {
const resp = await chidori.http("http://vectors.internal:8080/search", {
method: "POST",
body: { query: args.query, top_k: args.topK ?? 5, namespace: args.namespace ?? "default" },
});
return resp.body.matches;
}agents/rag_custom.ts
import type { Chidori } from "chidori";
export async function agent(input: { question: string; namespace?: string }, chidori: Chidori) {
const namespace = input.namespace ?? "default";
const matches = await chidori.tool("retrieve", { query: input.question, topK: 5, namespace });
const context = matches.map((m) => m.text).join("\n\n---\n\n");
const answer = await chidori.prompt(
await chidori.template("prompts/rag.jinja", { question: input.question, context }),
{ model: "claude-sonnet", temperature: 0.2, type: "final" },
);
return {
answer,
sources: matches.map((m) => ({ id: m.id, score: m.score })),
};
}Letting the LLM retrieve on demand
Instead of retrieving unconditionally, expose the retrieval tool to the LLM and let it decide when (and with what queries) to search:
const answer = await chidori.prompt(
"Answer this question, searching our knowledge base as needed:\n" + question,
{ tools: ["retrieve"] },
);This is especially useful for multi-hop questions where the LLM needs to refine its query based on what it found.
Tips
- Keep
temperaturelow — 0.0–0.3 — when generating from retrieved context; you want faithfulness, not creativity. - Always store
metadataalongside each chunk so citations survive retrieval. - Use
chidori.parallel()for multi-query RAG — if you generate several search queries, fan them out concurrently. - Replay matters for evals — check a checkpoint of each canonical question into git, then diff new traces to catch regressions when you swap models or retrievers.
