Building effective agents

In late 2024, Anthropic published a piece called "Building Effective Agents" that sounded boring and turned out to be one of the clearest things written about applied LLMs in the past two years. The core argument is uncomfortable for anyone selling agentic frameworks. Most production AI features are not agents, do not benefit from being agents, and would ship sooner if the people building them were honest about that.

Two years on, the piece reads even better than it did at publication. The market got noisier and the frameworks got fancier; the patterns that actually shipped stayed small. What follows is a working summary, with the pieces of the original that have aged worst quietly cut.

§01The vocabulary problem

Two different things travel under the same word, and most of the confusion in the agent discourse comes from people using "agent" to mean either of them without saying which.

A workflow is a system where LLMs and tools follow a predetermined path. The author has decided, in advance, what happens at each step. An agent is a system where the LLM dynamically chooses its own tools and paths. The author has decided what tools exist, but the LLM decides how to use them.

Almost everything in production today is a workflow. Almost everything labeled "agent" in a startup pitch deck is also a workflow. This is not a problem. It is a relief. Workflows are easier to debug, cheaper to run, more predictable, and produce better results when the task is well-scoped. The mistake is reaching for the word agent because it sounds more advanced.

§02The five patterns

Anthropic's piece names five patterns, in roughly increasing order of complexity. The simplest is prompt chaining, where sequential calls pass each step's output to the next step's input. Translate, then summarize. Most things people call "AI features" are this and nothing more.

Next is routing, a classifier that picks which prompt or model handles the input. It is cheap, fast, and surprisingly effective for support triage. Then parallelization, which fans out the same input to multiple LLMs and merges the results, useful for voting or sectioning a long document. Then orchestrator-workers, where a planning LLM decomposes the task and delegates to specialists, sitting closer to an agent but keeping the orchestrator's job scoped. Last is evaluator-optimizer, which pairs a producer LLM with a critic LLM and loops until the critic is satisfied, useful when quality outranks speed.

The useful exercise, when you are about to build something, is to try expressing it as the simplest pattern on this list. If prompt chaining works, ship it. If routing works, ship it. The right pattern is the one you stop one step before.

§03When you actually need an agent

The honest test for "do I need an agent?" is whether the task has open-ended steps that the system cannot enumerate in advance. Browsing a codebase to fix a bug is open-ended. Researching a topic to write a report is open-ended. Refactoring a feature without a known endpoint is open-ended. If the steps can be written down at design time, the system is a workflow, even if it is using a fancy framework.

Agents pay for themselves when four things are true at once. The environment is rich enough that no static plan covers it. Errors are recoverable, so the agent can retry rather than fail. Latency and cost can absorb a longer loop. A human is in the loop somewhere, even if only for review. Most production tasks do not satisfy more than two of those four at once, which is the quiet finding the original piece never quite states out loud.

◆ pull quote

“Most production tasks don't satisfy more than two of those at once. That's the quiet finding.”

§04The smallest agent that works

When the task does justify an agent, the surface area should stay small. The original paraphrase, almost word for word: an agent is just an LLM in a loop with tools. No graph framework. No DAG. No abstraction layer that hides which prompt is running.

loop:
  decision = LLM.call(context, tools)
  if decision.is_done:
    break
  result = run_tool(decision.tool)
  context = update(context, result)

That is the entire pattern. It fits on a single page and reasons about itself out loud. Anything more elaborate should earn its complexity by removing pain that this simple loop has actually produced, not pain you imagine it might produce eventually.

The frameworks are not wrong, but they are usually premature. Build the loop yourself first. Once you understand which abstraction would have helped, the framework choice becomes obvious. Until then, every wrapper is a layer between you and your bug.

§05The discipline

The discipline of building effective agents is mostly the discipline of not building agents. Every step on the pattern list above buys you something, and rules out something else. The sooner you can name what your system actually is, the sooner you stop debating which framework to adopt and start shipping the thing.

The single paragraph worth taking from the original is the one about restraint. Do the simplest thing that could possibly work, and only add complexity when you have watched the simple version fail at something specific. The bar is not "could I imagine a case where this is too simple?" The bar is "did I actually see it fail?" That is the line between effective agents and impressive demos.