Agent CLI Framework Differences: Sequential vs Batch Tool Calling

M
Matthew Diakonov

Agent CLI Framework Differences: Sequential vs Batch Tool Calling

Most agent CLI frameworks offer two modes of tool execution: sequential and batch (parallel). Sequential runs one tool at a time, waiting for the result before deciding the next step. Batch fires off multiple tool calls in the same LLM turn and processes all results at once.

The right choice depends heavily on what your agent is actually doing. For data retrieval across independent APIs, batch can cut latency by 3-4x. For desktop automation, sequential almost always wins - and using batch on stateful desktop tasks leads to subtle, hard-to-debug failures.

This post breaks down how each major framework implements both modes, with code examples showing the difference, and a decision matrix for picking the right one.


How Each Framework Implements Tool Calling

Claude (Anthropic)

Claude's API returns tool calls in the tool_use content blocks of an assistant message. By default, Claude may return multiple tool calls in a single response - this is its natural "batch" mode. You can encourage fully sequential behavior by returning results one at a time and structuring your system prompt accordingly.

As of early 2025, Anthropic added a token-efficient-tools-2025-02-19 beta header that further optimizes parallel tool execution. For Claude Code specifically (the agentic CLI), the claude-agent-sdk provides first-class support for orchestrating sub-agents either sequentially or in parallel via asyncio.gather().

Sequential pattern with Claude:

import anthropic

client = anthropic.Anthropic()

def run_sequential_agent(tools, initial_message):
    messages = [{"role": "user", "content": initial_message}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return response.content

        # Process exactly one tool at a time, wait for result
        tool_use = next(
            block for block in response.content
            if block.type == "tool_use"
        )

        result = execute_tool(tool_use.name, tool_use.input)

        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{"type": "tool_result",
                         "tool_use_id": tool_use.id,
                         "content": result}]
        })

Batch pattern with Claude:

def run_batch_agent(tools, initial_message):
    messages = [{"role": "user", "content": initial_message}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return response.content

        # Collect ALL tool calls from this turn
        tool_calls = [b for b in response.content if b.type == "tool_use"]

        # Execute all of them concurrently
        import asyncio
        results = asyncio.run(execute_tools_parallel(tool_calls))

        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [
                {"type": "tool_result", "tool_use_id": tc.id, "content": r}
                for tc, r in zip(tool_calls, results)
            ]
        })

OpenAI (GPT-4o, o1, o3)

OpenAI's API exposes a parallel_tool_calls parameter that controls this directly. When parallel_tool_calls=True (the default for most models), the model can return multiple tool_calls in a single assistant message. Setting it to false enforces a strict one-tool-per-turn pattern.

from openai import OpenAI

client = OpenAI()

# Force sequential - one tool per turn
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    parallel_tool_calls=False,   # sequential
)

# Allow batch - model decides how many tools to call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    parallel_tool_calls=True,    # batch (default)
)

OpenAI's Agents SDK adds a higher-level abstraction. You can run multiple sub-agents in parallel using asyncio.gather(), or chain them sequentially with explicit handoffs. The SDK documentation notes that the "agent as tool" pattern (where one agent calls another as a tool) adds planner latency overhead but lets the orchestrator dynamically decide execution order - useful when dependencies aren't known ahead of time.

LangChain and LangGraph

LangChain's AgentExecutor runs tools sequentially by default. LangGraph, their newer graph-based orchestration layer, makes the choice explicit: nodes can run in parallel if they have no shared edges, and sequential if one node's output feeds into the next.

from langgraph.graph import StateGraph, END

# Sequential: each step depends on the previous
workflow = StateGraph(AgentState)
workflow.add_node("search", search_node)
workflow.add_node("read", read_node)        # waits for search
workflow.add_node("summarize", sum_node)    # waits for read
workflow.add_edge("search", "read")
workflow.add_edge("read", "summarize")
workflow.add_edge("summarize", END)

# Parallel: independent nodes run concurrently
workflow.add_node("fetch_docs", fetch_node)
workflow.add_node("check_status", status_node)   # runs at same time
workflow.add_node("aggregate", aggregate_node)    # waits for both
workflow.add_edge(["fetch_docs", "check_status"], "aggregate")

LangGraph makes the dependency graph visible in code - a clear advantage over implicit parallelism. When a node produces state that another node consumes, LangGraph serializes them automatically.

LlamaIndex

LlamaIndex's ReActAgent is sequential by default. Its ParallelAgentRunner (available in llama_index.core.agent) lets you distribute independent sub-tasks across multiple agent workers. LlamaIndex's async-first design makes parallel retrieval from multiple data sources straightforward - a common pattern for RAG pipelines where multiple indexes need to be queried before synthesis.

Open-source: AutoGen, CrewAI, smolagents

AutoGen (Microsoft) models agents as conversational units. Sequential execution is the default - agents take turns. Parallel execution requires explicitly spawning multiple agents and using a custom aggregator. The framework is designed around human-in-the-loop workflows, so sequential is the natural pattern.

CrewAI assigns agents to tasks with explicit sequential or parallel process modes:

from crewai import Crew, Process

# Sequential: task 2 gets task 1's output
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2],
            process=Process.sequential)

# Hierarchical (parallel with manager): manager coordinates workers
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2],
            process=Process.hierarchical, manager_llm=gpt4)

smolagents (Hugging Face) is explicitly sequential - it was designed for single-agent code-writing workflows. Parallelism is handled by writing Python code that uses concurrent.futures within a code-execution tool step.


Latency: What the Numbers Actually Show

Research on LLMCompiler (ICML 2024) - a framework specifically designed for parallel function calling - showed:

  • 1.4x to 3.7x latency reduction on tasks with multiple independent tool calls
  • Up to 6x cost reduction in some scenarios by reducing total LLM turns
  • ~9% accuracy improvement over ReAct (sequential) on certain benchmarks, because parallel information gathering reduces the chance of the model hallucinating intermediate states

The math is straightforward. Four independent API calls at 300ms each:

Approach Total latency
Sequential ~1,200ms
Parallel (batch) ~300ms + overhead
Mixed (reads parallel, writes sequential) ~400-600ms

SimpleTool (2025) demonstrated 3-6x speedup with sub-100ms function calling for 4B-scale models, suggesting the latency advantage of parallel calling is a consistent finding, not an edge case.

However, these benchmarks measure tasks where tool calls are genuinely independent. On desktop automation tasks, where most calls are state-dependent, the sequential vs. parallel framing changes completely.


Why Batch Fails on Desktop

Desktop tasks have a property that most benchmark tasks don't: every action changes the state that the next action depends on.

Consider a simple form-filling task:

  1. Find the application window
  2. Click the input field
  3. Read the current value
  4. Clear the field
  5. Type the new value
  6. Click submit
  7. Verify the confirmation appeared

Steps 2-7 each depend on the previous step completing successfully. A batch agent that tries to parallelize steps 2, 3, and 4 will query the accessibility tree for a field that hasn't been clicked yet, get a stale result, and either fail silently or write to the wrong element.

This isn't a theoretical problem. In practice, batch desktop agents fail in ways that are hard to detect:

  • Stale element references: The accessibility tree ref from step 2 may no longer be valid after step 3 changes UI state. macOS accessibility element refs are snapshot-scoped.
  • Race conditions on UI transitions: A click that opens a modal changes what's visible. A parallel read that fires before the modal appears returns empty results.
  • Write ordering violations: Two concurrent writes to overlapping state (clipboard, focus, selection) corrupt each other.

The parallel execution model was designed for API calls to independent services. The assumption - that you can fire multiple requests and they won't interfere with each other - breaks completely once any of those requests modifies shared state.

A concrete failing example:

# This looks fine but breaks on desktop
async def fill_form_batch(window_ref):
    # Batch: try to do all reads at once before acting
    title_field, body_field, submit_btn = await asyncio.gather(
        find_element(window_ref, "title input"),
        find_element(window_ref, "body textarea"),
        find_element(window_ref, "submit button"),
    )
    # Problem: window_ref is a snapshot. If any prior action
    # changed the DOM/accessibility tree, these refs are stale.
    # fill_element will either fail or write to wrong element.
    await fill_element(title_field, "New title")
    await fill_element(body_field, "New content")
# Sequential version handles state transitions correctly
async def fill_form_sequential(window_ref):
    title_field = await find_element(window_ref, "title input")
    await fill_element(title_field, "New title")
    # Re-snapshot after action - element refs may have changed
    window_ref = await snapshot_window()
    body_field = await find_element(window_ref, "body textarea")
    await fill_element(body_field, "New content")

Decision Matrix

Scenario Recommended mode Reason
Fetching data from multiple independent APIs Batch/parallel No shared state, latency savings are real
Reading multiple files before acting Batch/parallel Read-only, order-independent
Checking system status across services Batch/parallel Observation, no side effects
Writing to a file Sequential File system state changes on write
Any desktop UI interaction Sequential Accessibility tree state changes after every action
Opening app, then finding element Sequential Each step depends on previous
Multi-step form fill Sequential Focus, value, and DOM state all change
Gathering context before a plan Batch/parallel Pure reads to inform next decision
Executing a plan Sequential Each step modifies world state

The general rule: batch for reads, sequential for writes. When in doubt about whether something modifies state, treat it as a write.

For desktop agents specifically, there is a further constraint: even reads after a write need to re-query state rather than reuse cached results. A click that dismisses a dialog invalidates every accessibility tree ref from before the click.


The Latency Tradeoff Is Real - But Context Matters

The argument for batch is speed. LLMCompiler and similar work show genuine 2-4x latency improvements. On server-side agents doing research, data aggregation, or multi-source retrieval, these savings compound across many tool calls in a session.

But on the desktop, the cost of a wrong assumption is much higher than the cost of an extra LLM round trip. A failed batch operation on a form fill might partially write data, leaving the application in an inconsistent state that requires manual recovery. A failed sequential step is a clean retry of exactly one action.

The best frameworks expose both modes explicitly and let you choose per-operation. Sequential for actions that modify world state, batch for pure observations. Claude's agent SDK, LangGraph, and CrewAI all support this hybrid pattern - it's the right default for any agent that interacts with stateful environments.


Fazm is an open source macOS AI agent built around sequential tool execution for desktop tasks. Open source on GitHub.

More on This Topic

Related Posts