1. Getting Started¶
This series of articles records my process of reading the Codex source code to deeply understand how an AI code agent works.
I focus on the high level overview first and intentionally skip some 'technical details' to get a basic understanding of an AI code agent, even though sometimes it relates to high level design.
Pre-knowledge:
- tried LLM models via chat such as ChatGPT, Gemini.
- tried AI code agents such as Claude Code, Cursor, Codex etc.
The fundamental understanding should be LLM can understand, think and output results based on human language input, and code agents have defined some abilities(file read and write, shell execute, etc...)
Basic Components and Flow¶
Codex is generally consisted by 3 components: UI, Daemon and Model.
-
UI: The term "UI" is used to refer to the application driving
Codex. This may be the CLI / TUI chat-like interface that users operate, or it may be a GUI interface like a VSCode extension. The UI is external toCodex, asCodexis intended to be operated by arbitrary UI implementations. -
Daemoncontains the real implementation of codex, which handles the core logic of code agent. It's named as daemon as it totally decouples from UI and supports various of UI. -
Modelis the LLM side, which performs as a brain to think and make decisions. It holds models(such as gpt-5.3).
The diagram shows how ai code agent performs for task: "how many go code lines under a project".
sequenceDiagram
autonumber
actor User
participant Daemon
participant Model
User ->>+ Daemon: Query Go LOC
Note over Daemon: Append prompts & tools
Daemon ->>+ Model: Forward query
Model -->>- Daemon: Tool Call: shell (wc)
Note over Daemon: Exec: find . -name '*.go' | xargs wc -l
Daemon ->>+ Model: Output: 1006 total
Model -->>- Daemon: Return formatted response
Note right of Model: Task finished. There are total<br>1006 go code lines under<br>this project.
Daemon -->>- User: Deliver final message
Note right of Daemon: There are total 1006 go code<br>lines under this project. Communications Between Components¶
The UI <-> Daemon and Daemon <-> Model communicate with each other, but UI and Model won't.
UI and Daemon¶
-
Custom Protocol for CLI/TUI
Codex defines protocol v1 for internal usage. It defines SQ(Submission Queue) for
UItoDaemon, and EQ(Event Queue) forDaemontoUI. See docs and code for more details. -
App-Server for Extensions
JSON-RPC 2.0 is used to communicate, and see codex app-server docs for details.
Daemon and Model¶
Daemon talks to the Model via OpenAI's Responses API at /v1/responses.
-
HTTP + SSE
Daemon calls POST /v1/responses (response create API). Request is JSON; response is an SSE stream.
-
WebSocket
When supported, Daemon may use WebSocket for the same endpoint to reuse connections and send incremental requests. It falls back to HTTP if WebSocket fails.
See OpenAI Responses API and
codex-rs/codex-api/README.mdfor details.
Model: LLM API Provider¶
LLM is the brain and code agent receives the instruction from it and extends LLM ability. Codex uses /response create API from chatgpt to interact with LLM.
Unlike chat, /response API allows flexible input and output, some extra definition to understand through API:
-
toolcall Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data.
-
structured response Structured response ask LLM to generate responses that adhere to a provided JSON Schema. This ensures predictable, type-safe results and simplifies extracting structured data from unstructured text.
Daemon: Codex Core Engine Flow¶
Codex docs has a comprehensive explainitions for the terms inside codex for developers.
Daemon core engine is Codex, and it runs locally, either in a background thread or separate process. It takes user input, makes requests to the Model, executes commands and applies patches.
Session, Task, and Turn¶
Note: Be careful not to confuse the core engine
Turnwith theTurndefined in the internal protocol. In Codex's internal protocol representation, a Turn might conceptually imply a user-agent exchange loop, but at the core execution level, Task and Turn are strictly 1:1.
- Session: One UI window = one Session. Holds the entire conversation context and state.
- Task: A single user request. It is essentially a
tokio::spawnJoinHandle wrapper that tracks the cancellation and lifecycle of a user request. - Turn: The actual execution unit driving the LLM interactions. A
Turnhandles the multi-round sampling loop (prompting, parsing tool calls, executing tools, re-prompting) until the LLM decides the job is complete.
Relationship: \(Session \supset Tasks \equiv Turns\)
Since a Task delegates its entire lifecycle to a single Turn, their relationship is strictly 1:1. When a Turn ends (the sampling loop finishes), the Task finishes. When the user inputs again, it creates a new Task (and thus a new Turn), rather than creating a new Turn inside an existing Task.
sequenceDiagram
autonumber
participant User
participant Codex
participant Session
participant Task (JoinHandle)
participant Turn (Execution)
participant Model
User ->> Codex: Configure
Codex ->> Session: Initialize Session
Note over Session: Session is maintained for the current UI state
User ->> Session: Provide Input 1
Note over Session: Abort any previous Tasks
Session ->>+ Task (JoinHandle): Spawn Task 1
Task (JoinHandle) ->>+ Turn (Execution): Start Turn 1
Note over Turn (Execution), Model: Turn 1 manages the multi-round LLM sampling loop
%% Turn Sampling Loop
loop Multi-round Tool Call / Sampling
Turn (Execution) ->>+ Model: Request (Prompt/Context)
Model -->>- Turn (Execution): Response (Stream: Text or Tool Call)
opt Tool Called
Turn (Execution) ->> Turn (Execution): Execute Local Tool (e.g. bash, read_file)
end
end
Turn (Execution) -->>- Task (JoinHandle): Turn 1 completes
Task (JoinHandle) -->>- Session: Task 1 ends (JoinHandle resolves)
Note over Session: Session waits for new user input
User ->> Session: Provide Input 2
Note over Session: New user input starts a NEW Task (and NEW Turn)
Session ->>+ Task (JoinHandle): Spawn Task 2
Task (JoinHandle) ->>+ Turn (Execution): Start Turn 2
Note over Turn (Execution): Executes sampling loop...
Turn (Execution) -->>- Task (JoinHandle): Turn 2 completes
Task (JoinHandle) -->>- Session: Task 2 ends