Agent CLI Framework Differences: Sequential vs Batch Tool Calling
Agent CLI Framework Differences: Sequential vs Batch Tool Calling
Most agent CLI frameworks offer two modes of tool execution: sequential and batch (parallel). Sequential runs one tool at a time, waiting for the result before deciding the next step. Batch fires off multiple tool calls in the same LLM turn and processes all results at once.
The right choice depends heavily on what your agent is actually doing. For data retrieval across independent APIs, batch can cut latency by 3-4x. For desktop automation, sequential almost always wins - and using batch on stateful desktop tasks leads to subtle, hard-to-debug failures.
This post breaks down how each major framework implements both modes, with code examples showing the difference, and a decision matrix for picking the right one.
How Each Framework Implements Tool Calling
Claude (Anthropic)
Claude's API returns tool calls in the tool_use content blocks of an assistant message. By default, Claude may return multiple tool calls in a single response - this is its natural "batch" mode. You can encourage fully sequential behavior by returning results one at a time and structuring your system prompt accordingly.
As of early 2025, Anthropic added a token-efficient-tools-2025-02-19 beta header that further optimizes parallel tool execution. For Claude Code specifically (the agentic CLI), the claude-agent-sdk provides first-class support for orchestrating sub-agents either sequentially or in parallel via asyncio.gather().
Sequential pattern with Claude:
import anthropic
client = anthropic.Anthropic()
def run_sequential_agent(tools, initial_message):
messages = [{"role": "user", "content": initial_message}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return response.content
# Process exactly one tool at a time, wait for result
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
result = execute_tool(tool_use.name, tool_use.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result}]
})
Batch pattern with Claude:
def run_batch_agent(tools, initial_message):
messages = [{"role": "user", "content": initial_message}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return response.content
# Collect ALL tool calls from this turn
tool_calls = [b for b in response.content if b.type == "tool_use"]
# Execute all of them concurrently
import asyncio
results = asyncio.run(execute_tools_parallel(tool_calls))
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": tc.id, "content": r}
for tc, r in zip(tool_calls, results)
]
})
OpenAI (GPT-4o, o1, o3)
OpenAI's API exposes a parallel_tool_calls parameter that controls this directly. When parallel_tool_calls=True (the default for most models), the model can return multiple tool_calls in a single assistant message. Setting it to false enforces a strict one-tool-per-turn pattern.
from openai import OpenAI
client = OpenAI()
# Force sequential - one tool per turn
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
parallel_tool_calls=False, # sequential
)
# Allow batch - model decides how many tools to call
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
parallel_tool_calls=True, # batch (default)
)
OpenAI's Agents SDK adds a higher-level abstraction. You can run multiple sub-agents in parallel using asyncio.gather(), or chain them sequentially with explicit handoffs. The SDK documentation notes that the "agent as tool" pattern (where one agent calls another as a tool) adds planner latency overhead but lets the orchestrator dynamically decide execution order - useful when dependencies aren't known ahead of time.
LangChain and LangGraph
LangChain's AgentExecutor runs tools sequentially by default. LangGraph, their newer graph-based orchestration layer, makes the choice explicit: nodes can run in parallel if they have no shared edges, and sequential if one node's output feeds into the next.
from langgraph.graph import StateGraph, END
# Sequential: each step depends on the previous
workflow = StateGraph(AgentState)
workflow.add_node("search", search_node)
workflow.add_node("read", read_node) # waits for search
workflow.add_node("summarize", sum_node) # waits for read
workflow.add_edge("search", "read")
workflow.add_edge("read", "summarize")
workflow.add_edge("summarize", END)
# Parallel: independent nodes run concurrently
workflow.add_node("fetch_docs", fetch_node)
workflow.add_node("check_status", status_node) # runs at same time
workflow.add_node("aggregate", aggregate_node) # waits for both
workflow.add_edge(["fetch_docs", "check_status"], "aggregate")
LangGraph makes the dependency graph visible in code - a clear advantage over implicit parallelism. When a node produces state that another node consumes, LangGraph serializes them automatically.
LlamaIndex
LlamaIndex's ReActAgent is sequential by default. Its ParallelAgentRunner (available in llama_index.core.agent) lets you distribute independent sub-tasks across multiple agent workers. LlamaIndex's async-first design makes parallel retrieval from multiple data sources straightforward - a common pattern for RAG pipelines where multiple indexes need to be queried before synthesis.
Open-source: AutoGen, CrewAI, smolagents
AutoGen (Microsoft) models agents as conversational units. Sequential execution is the default - agents take turns. Parallel execution requires explicitly spawning multiple agents and using a custom aggregator. The framework is designed around human-in-the-loop workflows, so sequential is the natural pattern.
CrewAI assigns agents to tasks with explicit sequential or parallel process modes:
from crewai import Crew, Process
# Sequential: task 2 gets task 1's output
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2],
process=Process.sequential)
# Hierarchical (parallel with manager): manager coordinates workers
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2],
process=Process.hierarchical, manager_llm=gpt4)
smolagents (Hugging Face) is explicitly sequential - it was designed for single-agent code-writing workflows. Parallelism is handled by writing Python code that uses concurrent.futures within a code-execution tool step.
Latency: What the Numbers Actually Show
Research on LLMCompiler (ICML 2024) - a framework specifically designed for parallel function calling - showed:
- 1.4x to 3.7x latency reduction on tasks with multiple independent tool calls
- Up to 6x cost reduction in some scenarios by reducing total LLM turns
- ~9% accuracy improvement over ReAct (sequential) on certain benchmarks, because parallel information gathering reduces the chance of the model hallucinating intermediate states
The math is straightforward. Four independent API calls at 300ms each:
| Approach | Total latency |
|---|---|
| Sequential | ~1,200ms |
| Parallel (batch) | ~300ms + overhead |
| Mixed (reads parallel, writes sequential) | ~400-600ms |
SimpleTool (2025) demonstrated 3-6x speedup with sub-100ms function calling for 4B-scale models, suggesting the latency advantage of parallel calling is a consistent finding, not an edge case.
However, these benchmarks measure tasks where tool calls are genuinely independent. On desktop automation tasks, where most calls are state-dependent, the sequential vs. parallel framing changes completely.
Why Batch Fails on Desktop
Desktop tasks have a property that most benchmark tasks don't: every action changes the state that the next action depends on.
Consider a simple form-filling task:
- Find the application window
- Click the input field
- Read the current value
- Clear the field
- Type the new value
- Click submit
- Verify the confirmation appeared
Steps 2-7 each depend on the previous step completing successfully. A batch agent that tries to parallelize steps 2, 3, and 4 will query the accessibility tree for a field that hasn't been clicked yet, get a stale result, and either fail silently or write to the wrong element.
This isn't a theoretical problem. In practice, batch desktop agents fail in ways that are hard to detect:
- Stale element references: The accessibility tree ref from step 2 may no longer be valid after step 3 changes UI state. macOS accessibility element refs are snapshot-scoped.
- Race conditions on UI transitions: A click that opens a modal changes what's visible. A parallel read that fires before the modal appears returns empty results.
- Write ordering violations: Two concurrent writes to overlapping state (clipboard, focus, selection) corrupt each other.
The parallel execution model was designed for API calls to independent services. The assumption - that you can fire multiple requests and they won't interfere with each other - breaks completely once any of those requests modifies shared state.
A concrete failing example:
# This looks fine but breaks on desktop
async def fill_form_batch(window_ref):
# Batch: try to do all reads at once before acting
title_field, body_field, submit_btn = await asyncio.gather(
find_element(window_ref, "title input"),
find_element(window_ref, "body textarea"),
find_element(window_ref, "submit button"),
)
# Problem: window_ref is a snapshot. If any prior action
# changed the DOM/accessibility tree, these refs are stale.
# fill_element will either fail or write to wrong element.
await fill_element(title_field, "New title")
await fill_element(body_field, "New content")
# Sequential version handles state transitions correctly
async def fill_form_sequential(window_ref):
title_field = await find_element(window_ref, "title input")
await fill_element(title_field, "New title")
# Re-snapshot after action - element refs may have changed
window_ref = await snapshot_window()
body_field = await find_element(window_ref, "body textarea")
await fill_element(body_field, "New content")
Decision Matrix
| Scenario | Recommended mode | Reason |
|---|---|---|
| Fetching data from multiple independent APIs | Batch/parallel | No shared state, latency savings are real |
| Reading multiple files before acting | Batch/parallel | Read-only, order-independent |
| Checking system status across services | Batch/parallel | Observation, no side effects |
| Writing to a file | Sequential | File system state changes on write |
| Any desktop UI interaction | Sequential | Accessibility tree state changes after every action |
| Opening app, then finding element | Sequential | Each step depends on previous |
| Multi-step form fill | Sequential | Focus, value, and DOM state all change |
| Gathering context before a plan | Batch/parallel | Pure reads to inform next decision |
| Executing a plan | Sequential | Each step modifies world state |
The general rule: batch for reads, sequential for writes. When in doubt about whether something modifies state, treat it as a write.
For desktop agents specifically, there is a further constraint: even reads after a write need to re-query state rather than reuse cached results. A click that dismisses a dialog invalidates every accessibility tree ref from before the click.
The Latency Tradeoff Is Real - But Context Matters
The argument for batch is speed. LLMCompiler and similar work show genuine 2-4x latency improvements. On server-side agents doing research, data aggregation, or multi-source retrieval, these savings compound across many tool calls in a session.
But on the desktop, the cost of a wrong assumption is much higher than the cost of an extra LLM round trip. A failed batch operation on a form fill might partially write data, leaving the application in an inconsistent state that requires manual recovery. A failed sequential step is a clean retry of exactly one action.
The best frameworks expose both modes explicitly and let you choose per-operation. Sequential for actions that modify world state, batch for pure observations. Claude's agent SDK, LangGraph, and CrewAI all support this hybrid pattern - it's the right default for any agent that interacts with stateful environments.
Fazm is an open source macOS AI agent built around sequential tool execution for desktop tasks. Open source on GitHub.