testing

Here Comes the Judge
agents

Here Comes the Judge

Your agent says it completed the goal. How do you know? You need judges — composable, tiered, and often free. Here's how to build a jury system for your agents.

When You Come to a Fork in the Code, Take It
agents

When You Come to a Fork in the Code, Take It

The agent read the skill. Then it read the existing code. Then it wrote with the older pattern. Six runs, both variants, every time. Your codebase is the agent's primary teacher — no skill can override it.

I Read My Agent's Diary
agents

I Read My Agent's Diary

Every agent run leaves a trail. Record it, judge it, find the hotspots, fix them, run again. That's the loop that turns unpredictable agents into reliable ones.