Tracing with Tael
Chidori emits a full OpenTelemetry trace for every agent run — one parent span per run, one child span per host function call, with model names, token counts, durations, and error status attached as attributes. Point it at Tael and every run becomes queryable from a CLI that returns structured JSON — built for agents like Claude Code to inspect their own telemetry.
This works against any OTLP/gRPC backend — Jaeger, Tempo, Honeycomb, Datadog, a vendor collector, etc. Tael is the one we reach for because its CLI-first design fits agent development.
Why Tael
- CLI-first, JSON-native. Every command prints structured JSON by default; tables are opt-in. An agent can
tael query traces --status error --format jsonand parse the result directly. - Single binary.
cargo install tael-server— no Docker, no cluster. - Zero SDK lock-in. Ingests standard OTLP/gRPC on port 4317 — if you ever want to swap backends, you change one env var.
- Trace comments. Agents can annotate failing traces with
tael comment add— useful for audit trails and collaborative on-call work.
Install Tael
cargo install tael-server
cargo install tael-cliStart the server (OTLP on :4317, REST API on :7701):
tael-serverWire Chidori up
Two environment variables and every run is instrumented. No code changes.
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317
export OTEL_SERVICE_NAME=my-agent # optional; defaults to "app-agent"Run an agent:
chidori run agents/researcher.star --input question="What is Rust?"Spans arrive in Tael within the batch-export window (a few seconds by default).
What you'll see
For every run Chidori emits:
- One parent span named
agent.run <agent_name>with attributes:agent.nameagent.run_id
- One child span per host function call, named
host.<function>(e.g.host.prompt,host.tool,host.http,host.memory). Each child span carries:call.seq— the sequence number shared with Chidori's checkpoint logcall.functioncall.duration_msgen_ai.request.model— forpromptcallsgen_ai.usage.input_tokens/gen_ai.usage.output_tokens— forpromptcalls- OTEL
Status::Error(message)when the host function raised
Span kinds follow the OTEL semantic conventions: prompt, http, tool, agent, exec, and memory are CLIENT; everything else is INTERNAL.
Querying
# All recent traces
tael query traces --last 1h --format table
# Everything that errored in the last 10 minutes
tael query traces --status error --last 10m
# Slow prompt calls
tael query traces --operation host.prompt --min-duration 500ms
# Pull the full span hierarchy for one run
tael get trace <trace-id>
# Aggregate health over a window: service-level error rate, top ops, log/metric volume
tael summarize --last 15mCorrelating with the Chidori call log
The call.seq attribute on each OTEL span is the same sequence number Chidori writes into the checkpoint JSON for that call. So a production trace in Tael can be linked to a local checkpoint file by matching agent.run_id and call.seq:
# In Tael: find the failing run
tael query traces --status error --last 1h --format json | jq '.[0].trace_id'
# Get the full span hierarchy and attributes
tael get trace <trace-id> | jq '.spans[] | {seq: .attributes["call.seq"], fn: .name, err: .status.message}'
# Locally: replay the exact same run from its checkpoint — zero LLM cost
chidori resume agents/researcher.star <run_id>The checkpoint has the deterministic inputs and the full call log; Tael has the wall-clock performance and cross-service context. Use both.
Trace comments from an agent
Tael lets agents annotate their own traces. This is the natural pairing with Chidori's human-in-the-loop and try_call patterns — when an agent recovers from an error, it can record why for future audits:
def agent(event):
result = try_call(lambda: tool("flaky_api"))
if result.error:
run_id = env("CHIDORI_RUN_ID") # surfaced automatically in serve mode
http("POST", "http://localhost:7701/comments", json = {
"trace_id": run_id,
"author": "my-agent",
"body": "Retried after flaky_api timeout; succeeded on fallback.",
})
result = prompt("Fallback answer:\n" + event["body"]["question"])
return {"answer": result}Alternatives
Because Chidori emits standard OTLP, you can point OTEL_EXPORTER_OTLP_ENDPOINT at anything OTLP-compatible:
- Jaeger —
http://localhost:4317after starting the all-in-one image. - Grafana Tempo — same endpoint format; visualize in Grafana.
- OpenTelemetry Collector — fan traces out to multiple backends simultaneously.
- Honeycomb / Datadog / New Relic — use their OTLP ingest URL with the appropriate auth header via
OTEL_EXPORTER_OTLP_HEADERS.
Troubleshooting
- No spans appearing. Check that
tael-serveris running and listening on:4317. SetAPP_AGENT_OTEL_DEBUG=1when running Chidori to print OTLP flush/shutdown errors. - Spans lagging. The default batch exporter buffers for up to a few seconds. For one-shot CLI runs, Chidori flushes before exiting, so you shouldn't need to wait.
- Short runs missing trailing spans. If the process exits before the final batch is flushed, spans can be dropped. Chidori calls
force_flush+shutdownautomatically at exit; if you're still losing spans, confirm no panic is happening beforemain()returns.