Writing on deterministic agents, agent-native observability, and the patterns we're discovering as we build Thousand Birds. Releases, deep dives, and case studies — pulled from the same call logs we use to debug production.
A ground-up rewrite: a single Rust binary, agents authored in plain TypeScript, and replay as the foundation for tests, debugging, resume, and human-in-the-loop.
The hard part of shipping agents isn't getting them to work once. It's getting the exact same run back when something goes wrong.
A research agent passed a thousand evals and then started failing once a week in production. Here is how we found the bug without spending another token.