agents
Here Comes the Judge
Your agent says it completed the goal. How do you know? You need judges — composable, tiered, and often free. Here's how to build a jury system for your agents.
testing
Your agent says it completed the goal. How do you know? You need judges — composable, tiered, and often free. Here's how to build a jury system for your agents.
The agent read the skill. Then it read the existing code. Then it wrote with the older pattern. Six runs, both variants, every time. Your codebase is the agent's primary teacher — no skill can override it.
Every agent run leaves a trail. Record it, judge it, find the hotspots, fix them, run again. That's the loop that turns unpredictable agents into reliable ones.