RAG — Retrieval-Augmented Generation

RAG grounds an agent's answers in your own documents instead of the LLM's parametric memory. In Chidori, RAG is two host functions: chidori.memory() for built-in vector storage, and chidori.tool() for custom retrieval over your data source.

Strategy

  1. Ingest — chunk documents and store them with chidori.memory("set", ...) or in your own vector DB.
  2. Retrieve — at query time, call chidori.memory("search", ...) or a custom tool to fetch relevant chunks.
  3. Generate — pass the retrieved chunks into chidori.prompt() with clear instructions to cite them.

Option A — built-in chidori.memory()

For small-to-medium corpora, Chidori's built-in memory is enough. It supports key-value and semantic search in the same store.

Ingest

A one-shot agent that chunks and stores a document:

agents/ingest.ts

import type { Chidori } from "chidori";

export async function agent(input: { document: string; sourceId: string }, chidori: Chidori) {
  const { document, sourceId } = input;

  // Chunk naively — swap in a smarter splitter for production
  const chunks: string[] = [];
  for (let i = 0; i < document.length; i += 700) {
    chunks.push(document.slice(i, i + 800));
  }

  for (let idx = 0; idx < chunks.length; idx++) {
    await chidori.memory("set", sourceId + ":" + idx, chunks[idx], {
      metadata: { source: sourceId, idx },
    });
  }

  return { storedChunks: chunks.length, source: sourceId };
}

Retrieve + generate

agents/rag.ts

import type { Chidori } from "chidori";

export async function agent(input: { question: string; topK?: number }, chidori: Chidori) {
  const topK = input.topK ?? 5;
  const results = await chidori.memory("search", input.question, { topK });

  const context = results.map((r) => r.value).join("\n\n---\n\n");
  const citations = results.map((r) => ({ source: r.metadata.source, idx: r.metadata.idx }));

  const answer = await chidori.prompt(
    await chidori.template("prompts/rag.jinja", { question: input.question, context }),
    { model: "claude-sonnet", temperature: 0.2, type: "final" },
  );
  return { answer, citations };
}

prompts/rag.jinja

You are a research assistant. Answer the question **only** from the context below.
If the context doesn't contain the answer, say so instead of guessing.

## Context
{{ context }}

## Question
{{ question }}

## Answer

Option B — custom retrieval tool

For production corpora, wrap your own vector DB as a tool. Everything upstream stays the same.

tools/retrieve.ts

import type { Chidori, ToolDefinition } from "chidori";

export const tool: ToolDefinition = {
  name: "retrieve",
  description: "Retrieve the most relevant document chunks for the query.",
  parameters: {
    type: "object",
    properties: {
      query: { type: "string" },
      topK: { type: "number" },
      namespace: { type: "string" },
    },
    required: ["query"],
  },
};

export async function run(
  args: { query: string; topK?: number; namespace?: string },
  chidori: Chidori,
) {
  const resp = await chidori.http("http://vectors.internal:8080/search", {
    method: "POST",
    body: { query: args.query, top_k: args.topK ?? 5, namespace: args.namespace ?? "default" },
  });
  return resp.body.matches;
}

agents/rag_custom.ts

import type { Chidori } from "chidori";

export async function agent(input: { question: string; namespace?: string }, chidori: Chidori) {
  const namespace = input.namespace ?? "default";
  const matches = await chidori.tool("retrieve", { query: input.question, topK: 5, namespace });

  const context = matches.map((m) => m.text).join("\n\n---\n\n");
  const answer = await chidori.prompt(
    await chidori.template("prompts/rag.jinja", { question: input.question, context }),
    { model: "claude-sonnet", temperature: 0.2, type: "final" },
  );
  return {
    answer,
    sources: matches.map((m) => ({ id: m.id, score: m.score })),
  };
}

Letting the LLM retrieve on demand

Instead of retrieving unconditionally, expose the retrieval tool to the LLM and let it decide when (and with what queries) to search:

const answer = await chidori.prompt(
  "Answer this question, searching our knowledge base as needed:\n" + question,
  { tools: ["retrieve"] },
);

This is especially useful for multi-hop questions where the LLM needs to refine its query based on what it found.

Tips

  • Keep temperature low — 0.0–0.3 — when generating from retrieved context; you want faithfulness, not creativity.
  • Always store metadata alongside each chunk so citations survive retrieval.
  • Use chidori.parallel() for multi-query RAG — if you generate several search queries, fan them out concurrently.
  • Replay matters for evals — check a checkpoint of each canonical question into git, then diff new traces to catch regressions when you swap models or retrievers.

Was this page helpful?