ReAct, Self-Refine, and Flow Engineering: The Three Paradigms Behind Modern AI Agents
A breakdown of the three foundational AI reasoning paradigms: ReAct, Self-Refine, and Flow Engineering, and how modern AI frameworks let you build all of them in production.
Beyond basic prompting, there are three foundational paradigms that define how AI agents reason and act: ReAct, Self-Refine, and Flow Engineering. Each solves a different problem, uses a different feedback loop, and targets a different type of task. Understanding them is the difference between blindly chaining LLM calls and deliberately engineering reliable AI systems.

The Three Paradigms at a Glance
| Paradigm | Core Idea | Feedback Source | Best Suited For |
|---|---|---|---|
| ReAct | Interleave reasoning with external actions | External environment (API calls, search) | QA, fact-checking, web agents |
| Self-Refine | Iteratively critique and improve your own output | Internal (the LLM itself) | Creative writing, code cleanup, text editing |
| Flow Engineering | Multi-stage pipeline with specialized nodes and test-driven iteration | Code execution (test pass/fail) | Competitive and complex programming |
ReAct: Reason + Act
Core concept: Synergize internal reasoning with external actions.
Think of it as an AI using a web browser. It thinks, clicks a tool, reads the result, and decides what to do next - exactly like a human would. The Action/Observation loop is the core.
ReAct forces the model to generate both a verbal reasoning trace ("thought") and a task-specific action in an interleaved loop. The model acts, receives an observation from the external environment, reasons about it, and decides the next step.
The loop:
Thought → Action → Observation → Thought → Action → ...
What makes it powerful: It grounds the model's reasoning in real-world, up-to-date observations instead of relying on internal static knowledge. This directly reduces hallucinations on knowledge-intensive tasks.
External dependency: High. ReAct requires an environment to interact with - a search API, a database, a web browser, a tool call. Without external grounding, the loop has nothing to observe.
Primary use cases: Multi-hop question answering, fact verification, web navigation, tool-use agents.
Source: ReAct: Synergizing Reasoning and Acting in Language Models
Self-Refine: Iterative Self-Improvement
Core concept: A single LLM generates output, critiques it, and refines it - with no external tools.
Think of it as a writer who is also their own editor. The model writes a draft, reviews its own work, and rewrites it until it meets its own quality bar - no outside input needed. The Critique/Refine loop is the core.
The model produces a first draft, then explicitly prompts itself to find flaws and generate actionable feedback, then uses that feedback to produce a better version. The cycle repeats until a stopping condition is met.
The loop:
Draft → Critique → Refined Draft → Critique → ...
What makes it powerful: LLMs frequently know better than their first attempt suggests. Self-Refine surfaces that latent capability by explicitly asking the model to critique and fix its own work.
External dependency: None. It runs entirely within the LLM - no tools, no training data, no APIs required.
Primary use cases: Dialogue response generation, text style transfer, code readability improvements, open-ended generation tasks where quality can be iterated.
Source: Self-Refine: Iterative Refinement with Self-Feedback
Flow Engineering
Core concept: Replace prompt engineering with a structured, multi-stage pipeline of distinct, specialized nodes.
Think of it as a full software agency. One persona plans the approach, another writes the code, and a strict testing system forces revisions until the output is programmatically verified correct. The specialized nodes are the core - each stage has one job, and the pipeline does not advance until that job passes.
AlphaCodium is the paper that introduced and named this paradigm. It operates in two phases:
Phase 1 - Pre-processing:
- Reflect on the problem statement
- Reason about public test cases
- Generate multiple candidate solutions
- Generate additional AI-authored tests
Phase 2 - Code Iterations:
- Write an initial solution
- Iteratively run and fix against public tests
- Iteratively run and fix against AI-generated tests (using "test anchors" to prevent regressions)
The loop:
Pre-process → Write → Run Tests → Fix → Run Tests → Fix → ...
What makes it powerful: It stops trying to write perfect code on the first attempt. Instead, it applies test-driven development as a feedback mechanism, using failure signals to converge on correct solutions.
External dependency: Medium. Requires a code execution environment to run tests and feed error logs back to the model.
Primary use cases: Competitive programming, complex algorithmic challenges, production code generation where correctness is non-negotiable.
Source: Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
How They Compare
| Feature | ReAct | Self-Refine | Flow Engineering |
|---|---|---|---|
| Mental model | Human using a web browser | Writer and self-editor | Full software agency |
| Key loop | Action/Observation | Critique/Refine | Specialized nodes + test gates |
| Feedback source | External environment | The LLM itself | Code execution |
| External tools required | Yes | No | Yes (executor) |
| Loop trigger | Observation from action | Internal critique | Test pass/fail |
| Solves the problem of | Hallucinations & outdated knowledge | Sub-optimal first drafts | Syntax errors & edge-case failures |
| Complexity to implement | Medium | Low | High |
Modern AI Frameworks: The Construction Kit
No modern AI framework invented these paradigms - but they were all built to make implementing them practical. The paradigms are the blueprint; the framework is the construction kit.
Early LLM pipeline tools were too linear to handle complex, iterative, multi-step workflows. Modern frameworks solved this by modeling AI workflows as directed graphs with persistent state, making cycles (loops), conditional routing, and parallel branches first-class primitives.
1. ReAct out of the box
The most basic agent in any modern framework is a ReAct agent. Most ship a built-in helper that sets up the full Thought → Action → Observation loop with your chosen LLM and tools - it is the standard starting point for any tool-using agent.
2. Self-Refine as a natural loop
The defining feature of modern AI orchestration frameworks over simple pipeline tools is native support for cycles. A Self-Refine system is simply a graph with an edge from the "Critique Node" back to the "Generate Node," with a conditional exit when quality is sufficient. Before graph-based frameworks, implementing this reliably required significant boilerplate. In frameworks designed for it, it is a first-class pattern.
3. Flow Engineering
For complex, multi-stage pipelines, graph-based frameworks give you full control. You define a "Planner Node," a "Coder Node," and a "Tester Node," then specify exact state transitions between them. The framework handles routing - including looping back to fix code after a failing test - without the developer having to manage state manually.
What Most Production Systems Actually Use
The three paradigms above are powerful - but they are also the ceiling, not the floor. The majority of production AI systems shipping today do not use full Flow Engineering or Self-Refine loops. They rely on simpler, faster, and more predictable patterns:
Advanced RAG (Retrieval-Augmented Generation): A highly optimized, linear pipeline. The system receives a query, searches a vector database, retrieves the most relevant documents, and passes them to the LLM for a single-shot answer. No loops, no self-critique. Just a well-engineered retrieval step feeding a constrained generation step. This covers the vast majority of enterprise knowledge base, search, and Q&A applications.
Semantic Routing: A fast classifier - either a small model or an embedding-based similarity check - inspects the user's prompt and routes it to a specific, constrained prompt template or a traditional software function. The LLM never sees irrelevant instructions. Latency stays low and behavior stays predictable. Many production chatbots work this way without anyone calling them "agents."
Single-Step Tool Calling: The LLM is given a set of tools (making it look like a ReAct agent), but is heavily constrained to make one or two predictable calls before returning a final answer. There is no open-ended observation loop. The developer controls which tools exist, what they return, and when the model stops. This is ReAct with guardrails - and it is what most function-calling integrations in production actually are.
| Pattern | Loop | Complexity | Predictability | Where you see it |
|---|---|---|---|---|
| Advanced RAG | None | Low | Very High | Knowledge bases, search, Q&A |
| Semantic Routing | None | Low | Very High | Chatbots, intent classification |
| Single-Step Tool Calling | Minimal | Medium | High | CRM integrations, form automation |
| ReAct | Open | Medium | Medium | Research agents, web tasks |
| Self-Refine | Internal | Medium | Medium | Content generation, code review |
| Flow Engineering | Complex | High | Low-Medium | Coding assistants, multi-agent systems |
The lesson: reach for complexity only when simpler patterns provably fail your requirements. A well-tuned RAG pipeline beats a poorly-designed multi-agent loop on latency, cost, and debuggability every time.
The Practical Progression
These patterns form a natural progression as system requirements grow:
| Stage | Pattern | When to reach for it |
|---|---|---|
| Start here | Advanced RAG or Semantic Routing | Fixed knowledge domain, predictable intents |
| Add tools | Single-Step Tool Calling | Need one or two external lookups |
| Add autonomy | ReAct | Tasks require open-ended tool use |
| Add quality gates | ReAct + Self-Refine | Output correctness is critical |
| Add orchestration | Full Flow Engineering | Multi-step, multi-agent, verifiable pipelines |
Start with the simplest pattern that solves the problem. Most applications never leave the first two rows. When reliability requirements push you toward more autonomy, add one layer of complexity at a time - and always ask whether a constrained version of the next pattern is sufficient before building the full one.
The paradigms are the theory. The simpler patterns are where most production systems live. Modern AI frameworks are the tools that let you move between all of them without rewriting your architecture.
