agents

How Does Your @Agent Flow?

Most agent code is a prompt and a loop. agent-workflow gives it explicit flow — a pure-Java DSL that compiles to a graph you can inspect, gate, and replay.

Mark Pollack

15 Apr 2026 — 7 min read

When someone asks how your agent works, what do you point to? A prompt string? A while(true) loop? A six-hundred-line method that calls the LLM, parses the response, maybe retries, maybe doesn't?

Most agent code doesn't have a flow you can point to. It has behavior that emerges from a prompt and a prayer. That's fine for a demo. It stops being fine when you need to gate quality mid-pipeline, fan out to three reviewers in parallel, or restart a crashed workflow from where it left off.

agent-workflow is a Java library that gives agents explicit, composable, durable flow. A fluent DSL that compiles to a graph, runs on pluggable runtimes, and treats durability as a deployment concern — not a code concern.

TL;DR: agent-workflow is a pure-Java DSL for composing multi-step agent pipelines. Java 21, no framework required. Steps are semantic units — an entire agentic loop, not a single LLM call. The DSL compiles to a graph IR: pure data, inspectable, replayable. Durability graduates from in-process to JDBC checkpointing to Temporal with one bean swap. Quality gates, parallel fan-out, LLM-driven routing, and error recovery are first-class primitives. Available at lab.pollack.ai.

The Step

Every workflow starts with steps. A Step<I, O> is the atomic unit of work — a function from input to output with access to shared context:

@FunctionalInterface
public interface Step<I, O> {
    O execute(AgentContext ctx, I input);
}

Three ways to define one:

// Lambda — quick and local
Step.named("fetch-diff", (ctx, prNumber) -> github.fetchDiff(prNumber))

// Class — production, injectable
public class AnalyzeDiffStep implements Step<String, String> {
    @Override
    public String execute(AgentContext ctx, String diff) {
        // deterministic analysis, pattern matching, whatever you need
    }
}

// Agentic — a full multi-turn Claude session behind one step
ClaudeStep.of("Analyze this diff and identify concerns: {input}")

The granularity matters. A ClaudeStep might internally run 50 LLM turns and 200 tool calls — the workflow sees it as one step. This is the opposite of frameworks that model each API call as a separate activity. "Analyze the diff" is one step. "Run the build" is one step. What happens inside is the step's business.

The Workflow DSL

Steps alone are a flat list. The DSL composes them into topology.

Sequential — output flows forward:

Workflow.<String, String>define("write-edit-publish")
    .step(writer)
    .then(editor)
    .then(publisher)
    .run("dragons and wizards");

Parallel — concurrent fan-out:

Workflow.<String, Object>define("gather-reviews")
    .parallel(codeReview, securityAudit, perfCheck)
    .then(combineResults)
    .run(submission);

Branch — predicate routing:

Workflow.<String, Object>define("route-by-topic")
    .step(classify)
    .branch(output -> "medical".equals(output))
        .then(medicalExpert)
        .otherwise(legalExpert)
    .run("I broke my leg, what should I do?");

Loop — repeat until satisfied:

Workflow.<String, Object>define("humor-loop")
    .repeatUntilOutput(score -> score instanceof Double d && d >= 0.6)
        .step(editor)
        .step(scorer)
    .end()
    .run("A dragon walked into a bar.");

Decision — LLM picks the route:

Workflow.<String, Object>define("intelligent-dispatch")
    .decision(chatClient)
        .option("security-fix", securityStep)
        .option("performance", optimizationStep)
        .option("documentation", docStep)
    .end()
    .run(analysisResult);

The DSL generates the routing prompt from option names — you declare what the choices are, it handles dispatch.

Supervisor — autonomous agent delegation:

Workflow.<String, Object>supervisor("delegate", routingClient)
    .agents(codeReview, securityAudit, docUpdate)
    .until(ctx -> ctx.get(AgentContext.ITERATION_COUNT).orElse(0) >= 5)
    .run(event);

Each iteration, the LLM reads the agent descriptions and picks which one to invoke next. The supervisor loops until the termination condition fires.

Ten primitives total — sequential, parallel, branch, loop (while-do and do-while), decision, gate, supervisor, error recovery, back-edge, terminate. They compose freely. A branch can contain a loop. A loop can contain a gate. A parallel fan-out can contain branches. The DSL enforces structural validity at build time.

Quality Gates

Sometimes you don't want the next step. You want a verdict first. A gate evaluates output mid-pipeline and routes based on quality:

Workflow.<String, String>define("gated-pipeline")
    .step(generateDraft)
    .gate(new JudgeGate(jury, 0.85))
        .onPass(publishStep)
        .onFail(revisionStep)
        .withReflector(feedbackStep)
        .maxRetries(2)
    .end()
    .run("a heroic knight");

On failure, the full Verdict — score, reasoning, per-judge judgments — flows to AgentContext.JUDGE_VERDICT. The reflector step transforms that verdict into actionable feedback for the retry. After maxRetries exhausted, the fail path executes.

Three gate types ship out of the box:

Gate	When to use
`JudgeGate`	Automated quality threshold (LLM-as-judge)
`HumanGate`	Human approval with timeout
`TieredGate`	Auto-approve above 0.9, escalate 0.7–0.9, reject below

Same DSL surface for all three. The workflow doesn't know or care whether a human or an LLM is behind the gate.

The Graph IR

Here's what makes the DSL more than syntactic sugar. Every workflow compiles to a WorkflowGraph — a pure data structure of nodes and edges:

WorkflowGraph graph = Workflow.<String, String>define("review-pipeline")
    .step(analyze)
    .branch(output -> output.contains("critical"))
        .then(escalate)
        .otherwise(approve)
    .compile();

The graph contains five nodes: StepNode("analyze"), GatewayNode("branch-1"), StepNode("escalate"), StepNode("approve"), JoinNode("join-1") — connected by four edges. A branch is never an opaque lambda. It's real nodes with real edges.

The nodes are a sealed interface — Java 21 pattern matching enforces exhaustiveness:

public sealed interface WorkflowNode permits
        StepNode, GatewayNode, DecisionNode, GateNode,
        LoopEntryNode, LoopCheckNode, LoopExitNode,
        ForkNode, JoinNode {
    String name();
    NodeType type();  // DETERMINISTIC or AGENT
}

Every control-flow construct expands into explicit topology. A loop is entry + check + exit nodes with a back-edge. A parallel fan-out is a fork node, branch steps, and a join node. Nothing hides behind a closure.

Why does this matter?

Tracing — Every step transition is recorded. Answer "which steps cost the most?" and "which transitions happen most often?" without instrumenting your step code.
Replay — The graph is serializable. Temporal can replay from checkpoints because the topology is data, not code.
Analysis — Feed transition logs into Markov analysis. Find the hotspots. See which steps should be deterministic instead of LLM-driven.
Composition — A Workflow implements Step<I, O>. Nest workflows inside workflows.

The Durability Seam

Most frameworks force you to choose your durability strategy up front — before you've written your first step. agent-workflow treats durability as a deployment concern. Same workflow code, different runtime:

// Development — direct in-process, zero overhead
@Bean
StepRunner stepRunner() {
    return new LocalStepRunner();
}

// Staging — JDBC crash recovery
@Bean
StepRunner stepRunner(DataSource dataSource) {
    return new CheckpointingStepRunner(readRepo, writeRepo);
}

// Production — distributed durable execution
@Bean
StepRunner stepRunner() {
    return new TemporalStepRunner("my-queue", Duration.ofMinutes(10));
}

One @Bean. That's the change. The workflow code doesn't move.

The CheckpointingStepRunner writes (runId, stepName) → output to a JDBC table after each step completes. On restart, it skips completed steps and resumes from the last checkpoint. The TemporalStepRunner wraps each step as a Temporal Activity — distributed durability, audit-grade replay, zero workflow code changes.

Level	What you get	What you add
`LocalStepRunner`	Trace, cost tracking, quality gates	Nothing
`CheckpointingStepRunner`	+ JDBC crash recovery	A `DataSource`
`TemporalStepRunner`	+ distributed durability, replay	Temporal Server

Write your workflow once. Graduate durability when you need it.

The Annotation Model

The @Agent annotation names an agent and registers it in an AgentRegistry — an immutable name-to-handler map that protocol layers (MCP, A2A, HTTP) use to address agents by name:

@Agent("code-review")
public class CodeReviewAgent implements AgentHandler<String, String> {

    private final AnalyzeStep analyze;
    private final ReviewStep review;

    public CodeReviewAgent(AnalyzeStep analyze, ReviewStep review) {
        this.analyze = analyze;
        this.review = review;
    }

    @Override
    public String handle(AgentContext ctx, String diff) {
        return Workflow.<String, String>define("code-review")
            .step(analyze)
            .then(review)
            .run(diff, ctx);
    }
}

AgentHandler<I, O> is the external-facing contract. Step<I, O> is internal plumbing. The handler owns the workflow and faces the outside world; the steps are the work inside.

Cross-cutting concerns get @AgentAdvice and @ExceptionHandler:

@AgentAdvice
public class GlobalErrorHandler {
    @ExceptionHandler(RateLimitException.class)
    public Object handleRateLimit(RateLimitException ex, AgentContext ctx) {
        return "Rate limited after " + ctx.get(ACCUMULATED_TOKENS).orElse(0L) + " tokens";
    }
}

The pattern will feel familiar if you've used Spring MVC — @Agent mirrors @Controller, @AgentAdvice mirrors @ControllerAdvice — but the annotations themselves are pure Java with no framework dependency.

In Production: The PR Review Agent

The PR review agent is the first real pipeline built on agent-workflow. It reviews pull requests for spring-projects/spring-ai.

Phase 1 — Deterministic context gathering:

FetchPrContextStep — GitHub API: PR metadata, changed files, existing reviews
RebaseStep — Rebase onto main, capture conflict files
ConflictDetectionStep — Classify conflicts as SIMPLE (build files, version bumps) or COMPLEX (logic changes)
RunTestsStep — Maven build on the rebased branch

Phase 2 — Judge cascade:

BuildJudge (T0) — Build pass? Complex conflicts? Deterministic gate — FAIL blocks everything downstream.
VersionPatternJudge (T1) — Boot 3→4 migration anti-patterns? Deterministic scan.
QualityJudge (T2) — LLM meta-judge. Only runs if T0 and T1 pass.

Phase 3 — AI assessment (skipped if judges fail):

AssessCodeQualityStep — Claude analyzes the diff for quality issues
AssessBackportStep — Claude assesses version compatibility

Each step reads from and writes to AgentContext via typed ContextKey<T>:

static final ContextKey<ReviewReport> REVIEW =
    ContextKey.of("review", ReviewReport.class);

// Producing step publishes
@Override
public AgentContext updateContext(AgentContext ctx, ReviewReport output) {
    return ctx.mutate().with(REVIEW, output).build();
}

// Consuming step reads
ReviewReport report = ctx.require(REVIEW);

No shared mutable state. AgentContext is immutable — mutations produce new instances. Parallel branches get isolated copies; results merge at join. The BuildJudge reads RebaseResult, ConflictReport, and BuildResult from context without a reference to the steps that produced them.

The pattern that emerges: front-load deterministic work. Let the build break before you spend tokens on LLM analysis. Gate early, gate often. Every judge verdict is traced.

Bottom Line

Steps are semantic, not mechanical. A step wraps an entire agentic session — dozens of LLM calls, hundreds of tool invocations — not a single API call. The workflow sees "analyze the diff," not "call Claude, parse response, call Claude again."
The DSL compiles to a graph. Ten primitives — sequential, parallel, branch, loop, decision, gate, supervisor, error recovery, back-edge, terminate — compose freely into a WorkflowGraph of typed nodes and edges. No opaque lambdas. Every path is visible.
Durability is a deployment concern. Write your workflow once. Swap LocalStepRunner for CheckpointingStepRunner or TemporalStepRunner with a single @Bean change. Zero workflow code changes.
Gates belong in the topology. Quality gates, human approval, tiered auto-approve — same DSL surface. The verdict flows back as context for retries. Gate early, gate often.
Pure Java. Step, AgentHandler, AgentContext, Workflow, WorkflowGraph, @Agent, @ExceptionHandler — all plain Java 21 with no framework dependency. The patterns will feel familiar if you know Spring MVC, but the library doesn't require it.

How does your agent flow? If the answer is "I'm not sure," that's the problem this library solves. The flow should be something you can point to, inspect, gate, and replay — not something that emerges from a prompt and a prayer.

With silver bells and cockle shells, and pretty steps all in a row.

All tooling is Java 21 on lab.pollack.ai.