RAG — Retrieval-Augmented Generation

RAG grounds an agent's answers in your own documents instead of the LLM's parametric memory. In Chidori, RAG is two host functions: memory() for built-in vector storage, and tool() for custom retrieval over your data source.

Strategy

  1. Ingest — chunk documents and store them with memory("store", ...) or in your own vector DB.
  2. Retrieve — at query time, call memory("search", ...) or a custom tool to fetch relevant chunks.
  3. Generate — pass the retrieved chunks into prompt() with clear instructions to cite them.

Option A — built-in memory()

For small-to-medium corpora, Chidori's built-in memory is enough. It supports key-value and semantic search in the same store.

Ingest

A one-shot agent that chunks and stores a document:

agents/ingest.star

def agent(document, source_id):
    # Chunk naively — swap in a smarter splitter for production
    chunks = [document[i:i+800] for i in range(0, len(document), 700)]

    for idx, chunk in enumerate(chunks):
        memory("store",
            key   = source_id + ":" + str(idx),
            value = chunk,
            metadata = {"source": source_id, "idx": idx},
        )

    return {"stored_chunks": len(chunks), "source": source_id}

Retrieve + generate

agents/rag.star

config(model = "claude-sonnet")

def agent(question, top_k = 5):
    results = memory("search", query = question, top_k = top_k)

    context = "\n\n---\n\n".join([r["value"] for r in results])
    citations = [{"source": r["metadata"]["source"], "idx": r["metadata"]["idx"]} for r in results]

    answer = prompt(
        template("prompts/rag.jinja", question = question, context = context),
        temperature = 0.2,
    )
    return {"answer": answer, "citations": citations}

prompts/rag.jinja

You are a research assistant. Answer the question **only** from the context below.
If the context doesn't contain the answer, say so instead of guessing.

## Context
{{ context }}

## Question
{{ question }}

## Answer

Option B — custom retrieval tool

For production corpora, wrap your own vector DB as a tool. Everything upstream stays the same.

tools/retrieve.star

def retrieve(query, top_k = 5, namespace = "default"):
    """Retrieve the top_k most relevant document chunks for the query."""
    resp = http("POST", "http://vectors.internal:8080/search",
        json = {"query": query, "top_k": top_k, "namespace": namespace},
    )
    return resp["matches"]

agents/rag_custom.star

config(model = "claude-sonnet")

def agent(question, namespace = "default"):
    matches = tool("retrieve", query = question, top_k = 5, namespace = namespace)

    context = "\n\n---\n\n".join([m["text"] for m in matches])
    answer = prompt(
        template("prompts/rag.jinja", question = question, context = context),
        temperature = 0.2,
    )
    return {
        "answer": answer,
        "sources": [{"id": m["id"], "score": m["score"]} for m in matches],
    }

Letting the LLM retrieve on demand

Instead of retrieving unconditionally, expose the retrieval tool to the LLM and let it decide when (and with what queries) to search:

answer = prompt(
    "Answer this question, searching our knowledge base as needed:\n" + question,
    tools     = ["retrieve"],
    max_turns = 4,
)

This is especially useful for multi-hop questions where the LLM needs to refine its query based on what it found.

Tips

  • Keep temperature low — 0.0–0.3 — when generating from retrieved context; you want faithfulness, not creativity.
  • Always store metadata alongside each chunk so citations survive retrieval.
  • Use parallel() for multi-query RAG — if you generate several search queries, fan them out concurrently.
  • Replay matters for evals — check a checkpoint of each canonical question into git, then diff new traces to catch regressions when you swap models or retrievers.

Was this page helpful?