Codex Tools and Toolcalls¶

OpenAI tools are defined to extend the model's capabilities; LLM understands tools and decides when to use them in a proper place. You can read the openai tools docs for more details.

The previous post, Getting Started, discusses the basic interaction with LLM server and data flow process.

This blog will continue to introduce data flow process and focus on the tool call part.

Overview of Tools¶

LLM Tools¶

There are various tools listed at openai docs:

Function calling
Shell
MCP Server
Skills
...

These tools are natively supported by LLM server side, so LLM server may ask clients to execute tools.

Codex Tool Definition¶

Codex defines ToolCall struct at codex-rs/core/src/tools/router.rs:L28, which serves as a unified interface.

We can find that Codex supports limited kinds of tools as Codex defines itself strictly as a code agent so some tools don't make sense here.

pub struct ToolCall {
    pub tool_name: String,
    pub call_id: String,
    pub payload: ToolPayload,
}

pub enum ToolPayload {
    Function { arguments: String },
    Custom { input: String },
    LocalShell { params: ShellToolCallParams },
    Mcp { server: String, tool: String, raw_arguments: String },
}

Toolcall Event Item Process¶

Event Processing from Raw Response¶

In the previous post, I introduced how the data is dispatched in data flow process section.

data is dispatched via event processing

let outcome: CodexResult<SamplingRequestResult> = loop {
    let event = match stream.next().await { ... };

    match event {
        ResponseEvent::OutputItemAdded(item) => {
            sess.emit_turn_item_started(...).await;
        }
        ResponseEvent::OutputTextDelta(delta) => {
            sess.send_event(..., EventMsg::AgentMessageContentDelta(...)).await;
        }
        ResponseEvent::ReasoningContentDelta { .. } => {
            sess.send_event(..., EventMsg::ReasoningRawContentDelta(...)).await;
        }
        ResponseEvent::OutputItemDone(item) => {
            let output_result = handle_output_item_done(...).await?;
            if let Some(tool_future) = output_result.tool_future {
                in_flight.push_back(tool_future);
            }
            needs_follow_up |= output_result.needs_follow_up;
        }
        ResponseEvent::Completed { .. } => {
            break Ok(SamplingRequestResult { needs_follow_up, ... });
        }
        _ => {}
    }
};

The toolcall check happens at OutputItemDone, but before stepping further, it's nice to understand OutputItemDone meaning. In streamable LLM response, it will contain various kinds of items, and some items will be sent as a delta to reduce length.

OutputItemDone is defined by the server side, and is parsed by Codex from raw event directly inside process_responses_event(codex-rs/codex-api/src/sse/responses.rs:L231).

pub fn process_responses_event(
    event: ResponsesStreamEvent,
) -> std::result::Result<Option<ResponseEvent>, ResponsesEventError> {
    match event.kind.as_str() {
        "response.output_item.done" => {
            if let Some(item_val) = event.item {
                if let Ok(item) = serde_json::from_value::<ResponseItem>(
                    item_val
                ) {
                    return Ok(Some(ResponseEvent::OutputItemDone(item)));
                }
            }
        }
        "response.output_text.delta" => {...}
        "response.reasoning_summary_text.delta" => {...}
        "response.reasoning_text.delta" => {...}
        "response.created" => {...}
        "response.failed" => {...}
        "response.incomplete" => {...}
        "response.completed" => {...}
        "response.output_item.added" => {...}
        "response.reasoning_summary_part.added" => {...}
        _ => {...}
    }

OutputItemDone means an item(message, toolcall, etc...) is ready to use, as a result, the event handler checks whether it's a toolcall and dispatches it if true.

Note that there could be multiple OutputItemDone events in a single turn of requesting to LLM, for example, LLM returns 1 message and 2 toolcalls, and the handler will receive 3 OutputItemDone events for each of them.

LLM Server Tool to Codex Toolcall¶

Converting from event to toolcall happens at build_tool_call(codex-rs/core/src/tools/router.rs:L67), it maps different possible toolcall into Codex-defined unified ToolCall. The function call, custom tool call or local shell call are all well understood by the LLM server side.

pub async fn build_tool_call(
    session: &Session,
    item: ResponseItem,
) -> Result<Option<ToolCall>, FunctionCallError> {
    match item {
        ResponseItem::FunctionCall { name, arguments, call_id, .. } => {...}
        ResponseItem::CustomToolCall { name, input, call_id, .. } => {...}
        ResponseItem::LocalShellCall { id, call_id, action, .. } => {...}
        _ => {...}
    }
}

Toolcall Event Item Process and Dispatch¶

ToolCall is constructed by handle_output_item_done inside the event processing OutputItemDone match branch.

match event {
    ResponseEvent::OutputItemDone(item) => {
        let output_result = handle_output_item_done(
            &mut ctx,
            item,
            previously_active_item
        )
        .instrument(handle_responses)
        .await?;
        if let Some(tool_future) = output_result.tool_future {
            in_flight.push_back(tool_future);
        }
    }
}

Function handle_output_item_done (codex-rs/core/src/stream_events_utils.rs:L158) checks the event details, and the two most notable branches are toolcall and message.

pub(crate) async fn handle_output_item_done(
    ctx: &mut HandleOutputCtx,
    item: ResponseItem,
    previously_active_item: Option<TurnItem>,
) -> Result<OutputItemResult> {

    match ToolRouter::build_tool_call(
        ctx.sess.as_ref(),
        item.clone()
    ).await {
        // The model emitted a tool call; log it, persist the item immediately,
        // and queue the tool execution.
        Ok(Some(call)) => {
            let cancellation_token = ctx.cancellation_token.child_token();
            let tool_future: InFlightFuture<'static> = Box::pin(
                ctx.tool_runtime
                    .clone()
                    .handle_tool_call(call, cancellation_token),
            );
            output.needs_follow_up = true;
            output.tool_future = Some(tool_future);
        }
        Ok(None) => { ... }
        Err(...) => { ... }
    }

Toolcall Dispatch¶

For toolcall, a future JoinHandle is constructed and dispatched via handle_tool_call call (codex-rs/core/src/stream_events_utils.rs:L184). handle_tool_call uses Tokio to spawn a task and execute it based on the Tokio scheduler:

    pub(crate) fn handle_tool_call(...) {
        let lock = Arc::clone(&self.parallel_execution);
        ...
        let handle: AbortOnDropHandle<
            Result<ResponseInputItem, FunctionCallError>
        > = AbortOnDropHandle::new(tokio::spawn(async move {
            let _guard = if supports_parallel {
                Either::Left(lock.read().await)
            } else {
                Either::Right(lock.write().await)
            };

            router.dispatch_tool_call(...).await
        }));
        async move { match handle.await { ... } }
    }

Toolcall Concurrent Limit¶

For each LLM turn, it's possible to ask multiple tool calls. Here, Codex uses a lock to guard it.

Semantically, the tools should not have a dependency as there is no guarantee which tool call will be executed first by Codex.

Toolcall Execution¶

The tool call execution is done via router dispatch_tool_call. It first constructs a ToolInvocation by adding the session and turn to it to construct a toolcall execution context, and dispatches the invocation to router registry.

    pub async fn dispatch_tool_call(
        &self,
        session: Arc<Session>,
        turn: Arc<TurnContext>,
        tracker: SharedTurnDiffTracker,
        call: ToolCall,
        source: ToolCallSource,
    ) -> Result<ResponseInputItem, FunctionCallError> {

        let invocation = ToolInvocation {
            session,
            turn,
            tracker,
            call_id,
            tool_name,
            payload,
        };
        match self.registry.dispatch(invocation).await {
            Ok(response) => Ok(response),
            ...
        }
    }

The dispatcher uses a standard name-to-handler registry pattern. The handler is a function typed async fn handle(&self, invocation: ToolInvocation) -> Result<ToolOutput, FunctionCallError>;.

pub async fn dispatch(
    &self,
    invocation: ToolInvocation,
) -> Result<ResponseInputItem, FunctionCallError> {
    ...
    let handler = match self.handler(tool_name.as_ref()) {
        Some(handler) => handler,
        None => { ... }
    };

    let result = otel.log_tool_result_with_tags(
        ...,
        || {
            let handler = handler.clone();
            let output_cell = &output_cell;
            async move {
                if is_mutating {
                    tracing::trace!("waiting for tool gate");
                    invocation_for_tool
                        .turn
                        .tool_call_gate
                        .wait_ready()
                        .await;
                    tracing::trace!("tool gate released");
                }
                match handler.handle(invocation_for_tool).await {
                    Ok(output) => {
                        let preview = output.log_preview();
                        let success = output.success_for_logging();
                        let mut guard = output_cell.lock().await;
                        *guard = Some(output);
                        Ok((preview, success))
                    }
                    Err(err) => Err(err),
                }
            }
        },
    );
    ...
}

The real logic of handler happens at registering parts, and this section covers the execution well.

Toolcall Register¶

File codex-rs/core/src/tools/spec.rs contains the handler registering to the registry.

        builder.register_handler("shell", shell_handler.clone());
        builder.register_handler("container.exec", shell_handler.clone());
        builder.register_handler("local_shell", shell_handler);
        builder.register_handler("shell_command", shell_command_handler);

All tools should implement the trait ToolHandler.

pub enum ToolKind {
    Function,
    Mcp,
}

pub trait ToolHandler: Send + Sync {
    fn kind(&self) -> ToolKind;

    fn matches_kind(&self, payload: &ToolPayload) -> bool {
        matches!(
            (self.kind(), payload),
            (ToolKind::Function, ToolPayload::Function { .. })
                | (ToolKind::Mcp, ToolPayload::Mcp { .. })
        )
    }

    async fn is_mutating(&self, _invocation: &ToolInvocation) -> bool { false }

    async fn handle(
        &self,
        invocation: ToolInvocation
    ) -> Result<ToolOutput, FunctionCallError>;
}

Looking at ToolHandler trait, basically we can understand what are the expectations for a Codex tool.

kind and matches_kind: used for payload validation.
is_mutating: a flag to indicate whether the tool is modifying the code project.
handle: the real logic of a tool execution.

Case Study: How Shell Handler is Implemented¶

File codex-rs/core/src/tools/handlers/shell.rs defines struct ShellHandler that implements ToolHandler (L143) to perform as a shell tool.

is_mutating¶

    async fn is_mutating(&self, invocation: &ToolInvocation) -> bool {
        match &invocation.payload {
            ToolPayload::Function { arguments } => {
                serde_json::from_str::<ShellToolCallParams>(
                    arguments
                )
                .map(|params| !is_known_safe_command(&params.command))
                .unwrap_or(true)
            }
            ToolPayload::LocalShell { params } => {
                !is_known_safe_command(&params.command)
            }
            _ => true, // unknown payloads => assume mutating
        }
    }

There is a list of known commands treated as safe, and the default value will be true, which means it's possible to mutate.

handle¶

Handle is the real implementation for a shell, so naturally it triggers shell inside code with some setting up such as args processing, env var setting up and so on.

So, beyond simply executing the command, what additional logic does the shell handler provide?

Besides the basic shell triggering, shell handler also does the following items in each sub-section(Most of them are big components and out of scope right now so I just brief them).

Intercept Specific Commands¶

Some commands have implemented already and will be more powerful than shell, hence Codex will intercept these commands and use them instead of calling shell.

The logic happens at intercept_apply_patch (codex-rs/core/src/tools/handlers/apply_patch.rs:L194), it checks whether the commands should be replaced and replaced if possible.

UI Interaction¶

Before and after running scripts, the run_exec_like will notify UI to notify users what happened.

emitter.begin(event_ctx).await;
...
let content = emitter.finish(event_ctx, out).await?;

Command Approval¶

When using Codex, triggering tools require user to approve via UI. It happens at codex-rs/core/src/tools/orchestrator.rs:L117:

        let mut already_approved = false;

        let requirement = tool
            .exec_approval_requirement(req)
            .unwrap_or_else(|| {
                default_exec_approval_requirement(
                    approval_policy,
                    &turn_ctx.sandbox_policy
                )
            });

Sandbox¶

The sandbox is designated to ensure the shell commands are safe to execute. Inside run_exec_like (codex-rs/core/src/tools/handlers/shell.rs:L307), the orchestrator.run() runs the shell inside the sandbox.

  async fn run_exec_like(
      args: RunExecLikeArgs
  ) -> Result<ToolOutput, ...> {
      let mut orchestrator = ToolOrchestrator::new();
      let mut runtime = ShellRuntime::new();

      let out = orchestrator
          .run(...)
          .await
          .map(|result| result.output);
  }

How sandbox works is out of the scope of this blog, so we don't step further. In short, sandbox tries to keep both security and transparency to ensure shell scripts call won't cause any issue.

Network Approval¶

Codex uses Seccomp system call to disable all network access to ensure the scripts won't access network silently.

Conclusion¶

By reading this blog, you have understood how the tool calls are extracted from the LLM server response, and how Codex processes them and finally call them to get the response.

You should also know how a Codex tool is implemented, and how the shell tool is executed with extra steps to ensure security.