Here Comes the Judge
Your agent says it completed the goal. How do you know? You need judges — composable, tiered, and often free. Here's how to build a jury system for your agents.
agents
Your agent says it completed the goal. How do you know? You need judges — composable, tiered, and often free. Here's how to build a jury system for your agents.
Most agent code is a prompt and a loop. agent-workflow gives it explicit flow — a pure-Java DSL that compiles to a graph you can inspect, gate, and replay.
One dependency, a few properties, an annotated bean. Spring Boot handles the rest - transport, lifecycle, graceful shutdown.
The agent read the skill. Then it read the existing code. Then it wrote with the older pattern. Six runs, both variants, every time. Your codebase is the agent's primary teacher — no skill can override it.
I built this over three months. Then Karpathy posted about the same pattern. Here's the systematized, repeatable version - born from downloading papers, organizing them, and realizing the index file was the whole trick.
ACP standardizes how code editors talk to AI agents — LSP for the agent era. The Java SDK gives you annotations, a sync API, and in-memory testing. Here's what the programming model looks like.
Every agent run leaves a trail. Record it, judge it, find the hotspots, fix them, run again. That's the loop that turns unpredictable agents into reliable ones.