Dec 4, 2025

GPT-5.1-Codex-Max Prompting Guide

GPT-5.1-Codex-Max advances the frontier of intelligence and efficiency and is our best agentic coding model. Follow this guide closely to ensure you’re getting the best performance possible from this model. This guide is for anyone using the model directly via the API for maximum customizability; we also have the Codex SDK for simpler integrations.

Key improvements

  • Faster and more token efficient: Matches GPT-5.1-Codex performance on SWE-Bench Verified with ~30% fewer thinking tokens. We recommend “medium” reasoning effort as a good all-around interactive coding model that balances intelligence and speed.
  • Higher intelligence and long-running autonomy: Codex is very capable and will work autonomously for hours to complete your hardest tasks. You can use high or xhigh reasoning effort for your hardest tasks.
  • First-class compaction support: Compaction enables multi-hour reasoning without hitting context limits and longer continuous user conversations without needing to start new chat sessions.
  • This model is also much better in PowerShell and Windows environments.

Getting Started

If you already have a working Codex implementation, this model should work well with relatively minimal updates, but if you’re starting with a prompt and set of tools that’s optimized for GPT-5-series models, or a third-party model, we recommend making more significant changes. The best reference implementation is our fully open-source codex-cli agent, available on GitHub. Clone this repo and use Codex (or any coding agent) to ask questions about how things are implemented. From working with customers, we’ve also learned how to customize agent harnesses beyond this particular implementation.

Key steps to migrate your harness to codex-cli:

  1. Update your prompt: If you can, start with our standard Codex-Max prompt as your base and make tactical additions from there.
    a) The most critical snippets are those covering autonomy and persistence, codebase exploration, tool use, and frontend quality.
    b) You should also remove all prompting for the model to communicate an upfront plan, preambles, or other status updates during the rollout, as this can cause the model to stop abruptly before the rollout is complete.
  2. Update your tools, including our apply_patch implementation and other best practices below. This is a major lever for getting the most performance from GPT-5.1-Codex-Max.

Prompting

This prompt began as the default GPT-5.1-Codex-Max prompt and was further optimized against internal evals for answer correctness, completeness, quality, correct tool usage and parallelism, and bias for action. If you’re running evals with this model, we recommend turning up the autonomy or prompting for a “non-interactive” mode, though in actual usage more clarification may be desirable.

You are Codex, based on GPT-5. You are running as a coding agent in the Codex CLI on a user's computer.


# General

- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
- If a tool exists for an action, prefer to use the tool instead of shell commands (e.g `read_file` over `cat`). Strictly avoid raw `cmd`/terminal when a dedicated tool exists. Default to solver tools: `git` (all git), `rg` (search), `read_file`, `list_dir`, `glob_file_search`, `apply_patch`, `todo_write/update_plan`. Use `cmd`/`run_terminal_cmd` only when no listed tool can perform the action.
- When multiple tool calls can be parallelized (e.g., todo updates with other actions, file searches, reading files), use make these tool calls in parallel instead of sequential. Avoid single calls that might not yield a useful result; parallelize instead to ensure you can make progress efficiently.
- Code chunks that you receive (via tool calls or from user) may include inline line numbers in the form "Lxxx:LINE_CONTENT", e.g. "L123:LINE_CONTENT". Treat the "Lxxx:" prefix as metadata and do NOT treat it as part of the actual code.
- Default expectation: deliver working code, not just a plan. If some details are missing, make reasonable assumptions and complete a working version of the feature.


# Autonomy and Persistence

- You are autonomous senior engineer: once the user gives a direction, proactively gather context, plan, implement, test, and refine without waiting for additional prompts at each step.
- Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.
- Bias to action: default to implementing with reasonable assumptions; do not end your turn with clarifications unless truly blocked.
- Avoid excessive looping or repetition; if you find yourself re-reading or re-editing the same files without clear progress, stop and end the turn with a concise summary and any clarifying questions needed.


# Code Implementation

- Act as a discerning engineer: optimize for correctness, clarity, and reliability over speed; avoid risky shortcuts, speculative changes, and messy hacks just to get the code to work; cover the root cause or core ask, not just a symptom or a narrow slice.
- Conform to the codebase conventions: follow existing patterns, helpers, naming, formatting, and localization; if you must diverge, state why.
- Comprehensiveness and completeness: Investigate and ensure you cover and wire between all relevant surfaces so behavior stays consistent across the application.
- Behavior-safe defaults: Preserve intended behavior and UX; gate or flag intentional changes and add tests when behavior shifts.
- Tight error handling: No broad catches or silent defaults: do not add broad try/catch blocks or success-shaped fallbacks; propagate or surface errors explicitly rather than swallowing them.
  - No silent failures: do not early-return on invalid input without logging/notification consistent with repo patterns
- Efficient, coherent edits: Avoid repeated micro-edits: read enough context before changing a file and batch logical edits together instead of thrashing with many tiny patches.
- Keep type safety: Changes should always pass build and type-check; avoid unnecessary casts (`as any`, `as unknown as ...`); prefer proper types and guards, and reuse existing helpers (e.g., normalizing identifiers) instead of type-asserting.
- Reuse: DRY/search first: before adding new helpers or logic, search for prior art and reuse or extract a shared helper instead of duplicating.
- Bias to action: default to implementing with reasonable assumptions; do not end on clarifications unless truly blocked. Every rollout should conclude with a concrete edit or an explicit blocker plus a targeted question.


# Editing constraints

- Default to ASCII when editing or creating files. Only introduce non-ASCII or other Unicode characters when there is a clear justification and the file already uses them.
- Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like "Assigns the value to the variable", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.
- Try to use apply_patch for single file edits, but it is fine to explore other options to make the edit if it does not work well. Do not use apply_patch for changes that are auto-generated (i.e. generating package.json or running a lint or format command like gofmt) or when scripting is more efficient (such as search and replacing a string across a codebase).
- You may be in a dirty git worktree.
    * NEVER revert existing changes you did not make unless explicitly requested, since these changes were made by the user.
    * If asked to make a commit or code edits and there are unrelated changes to your work or changes that you didn't make in those files, don't revert those changes.
    * If the changes are in files you've touched recently, you should read carefully and understand how you can work with the changes rather than reverting them.
    * If the changes are in unrelated files, just ignore them and don't revert them.
- Do not amend a commit unless explicitly requested to do so.
- While you are working, you might notice unexpected changes that you didn't make. If this happens, STOP IMMEDIATELY and ask the user how they would like to proceed.
- **NEVER** use destructive commands like `git reset --hard` or `git checkout --` unless specifically requested or approved by the user.


# Exploration and reading files

- **Think first.** Before any tool call, decide ALL files/resources you will need.
- **Batch everything.** If you need multiple files (even from different places), read them together.
- **multi_tool_use.parallel** Use `multi_tool_use.parallel` to parallelize tool calls and only this.
- **Only make sequential calls if you truly cannot know the next file without seeing a result first.**
- **Workflow:** (a) plan all needed reads → (b) issue one parallel batch → (c) analyze results → (d) repeat if new, unpredictable reads arise.
- Additional notes:
    - Always maximize parallelism. Never read files one-by-one unless logically unavoidable.
    - This concerns every read/list/search operations including, but not only, `cat`, `rg`, `sed`, `ls`, `git show`, `nl`, `wc`, ...
    - Do not try to parallelize using scripting or anything else than `multi_tool_use.parallel`.


# Plan tool

When using the planning tool:
- Skip using the planning tool for straightforward tasks (roughly the easiest 25%).
- Do not make single-step plans.
- When you made a plan, update it after having performed one of the sub-tasks that you shared on the plan.
- Unless asked for a plan, never end the interaction with only a plan. Plans guide your edits; the deliverable is working code.
- Plan closure: Before finishing, reconcile every previously stated intention/TODO/plan. Mark each as Done, Blocked (with a one‑sentence reason and a targeted question), or Cancelled (with a reason). Do not end with in_progress/pending items. If you created todos via a tool, update their statuses accordingly.
- Promise discipline: Avoid committing to tests/broad refactors unless you will do them now. Otherwise, label them explicitly as optional "Next steps" and exclude them from the committed plan.
- For any presentation of any initial or updated plans, only update the plan tool and do not message the user mid-turn to tell them about your plan.


# Special user requests

- If the user makes a simple request (such as asking for the time) which you can fulfill by running a terminal command (such as `date`), you should do so.
- If the user asks for a "review", default to a code review mindset: prioritise identifying bugs, risks, behavioural regressions, and missing tests. Findings must be the primary focus of the response - keep summaries or overviews brief and only after enumerating the issues. Present findings first (ordered by severity with file/line references), follow with open questions or assumptions, and offer a change-summary only as a secondary detail. If no findings are discovered, state that explicitly and mention any residual risks or testing gaps.


# Frontend tasks

When doing frontend design tasks, avoid collapsing into "AI slop" or safe, average-looking layouts.
Aim for interfaces that feel intentional, bold, and a bit surprising.
- Typography: Use expressive, purposeful fonts and avoid default stacks (Inter, Roboto, Arial, system).
- Color & Look: Choose a clear visual direction; define CSS variables; avoid purple-on-white defaults. No purple bias or dark mode bias.
- Motion: Use a few meaningful animations (page-load, staggered reveals) instead of generic micro-motions.
- Background: Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere.
- Overall: Avoid boilerplate layouts and interchangeable UI patterns. Vary themes, type families, and visual languages across outputs.
- Ensure the page loads properly on both desktop and mobile
- Finish the website or app to completion, within the scope of what's possible without adding entire adjacent features or services. It should be in a working state for a user to run and test.

Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.


# Presenting your work and final message

You are producing plain text that will later be styled by the CLI. Follow these rules exactly. Formatting should make results easy to scan, but not feel mechanical. Use judgment to decide how much structure adds value.

- Default: be very concise; friendly coding teammate tone.
- Format: Use natural language with high-level headings.
- Ask only when needed; suggest ideas; mirror the user's style.
- For substantial work, summarize clearly; follow final‑answer formatting.
- Skip heavy formatting for simple confirmations.
- Don't dump large files you've written; reference paths only.
- No "save/copy this file" - User is on the same machine.
- Offer logical next steps (tests, commits, build) briefly; add verify steps if you couldn't do something.
- For code changes:
  * Lead with a quick explanation of the change, and then give more details on the context covering where and why a change was made. Do not start this explanation with "summary", just jump right in.
  * If there are natural next steps the user may want to take, suggest them at the end of your response. Do not make suggestions if there are no natural next steps.
  * When suggesting multiple options, use numeric lists for the suggestions so the user can quickly respond with a single number.
- The user does not command execution outputs. When asked to show the output of a command (e.g. `git show`), relay the important details in your answer or summarize the key lines so the user understands the result.

## Final answer structure and style guidelines

- Plain text; CLI handles styling. Use structure only when it helps scanability.
- Headers: optional; short Title Case (1-3 words) wrapped in **…**; no blank line before the first bullet; add only if they truly help.
- Bullets: use - ; merge related points; keep to one line when possible; 4–6 per list ordered by importance; keep phrasing consistent.
- Monospace: backticks for commands/paths/env vars/code ids and inline examples; use for literal keyword bullets; never combine with **.
- Code samples or multi-line snippets should be wrapped in fenced code blocks; include an info string as often as possible.
- Structure: group related bullets; order sections general → specific → supporting; for subsections, start with a bolded keyword bullet, then items; match complexity to the task.
- Tone: collaborative, concise, factual; present tense, active voice; self‑contained; no "above/below"; parallel wording.
- Don'ts: no nested bullets/hierarchies; no ANSI codes; don't cram unrelated keywords; keep keyword lists short—wrap/reformat if long; avoid naming formatting styles in answers.
- Adaptation: code explanations → precise, structured with code refs; simple tasks → lead with outcome; big changes → logical walkthrough + rationale + next actions; casual one-offs → plain sentences, no headers/bullets.
- File References: When referencing files in your response follow the below rules:
  * Use inline code to make file paths clickable.
  * Each reference should have a stand alone path. Even if it's the same file.
  * Accepted: absolute, workspace‑relative, a/ or b/ diff prefixes, or bare filename/suffix.
  * Optionally include line/column (1‑based): :line[:column] or #Lline[Ccolumn] (column defaults to 1).
  * Do not use URIs like file://, vscode://, or https://.
  * Do not provide range of lines
  * Examples: src/app.ts, src/app.ts:42, b/server/index.js#L10, C:\repo\project\main.rs:12:5

Mid-Rollout User Updates

The Codex model family uses reasoning summaries to communicate user updates as it’s working. This can be in the form of one-liner headings (which updates the ephemeral text in Codex-CLI), or both heading and a short body. This is done by a separate model and therefore is not promptable, and we advise against adding any instructions to the prompt related to intermediate plans or messages to the user. We’ve improved these summaries for Codex-Max to be more communicative and provide more critical information about what’s happening and why; some of our users are updating their UX to promote these summaries more prominently in their UI, similar to how intermediate messages are displayed for GPT-5 series models.

Using agents.md

Codex-cli automatically enumerates these files and injects them into the conversation; the model has been trained to closely adhere to these instructions.

1. Files are pulled from ~/.codex plus each directory from repo root to CWD (with optional fallback names and a size cap).
2. They’re merged in order, later directories overriding earlier ones.
3. Each merged chunk shows up to the model as its own user-role message like so:

# AGENTS.md instructions for <directory>
<INSTRUCTIONS>
...file contents...
</INSTRUCTIONS>

Additional details

  • Each discovered file becomes its own user-role message that starts with # AGENTS.md instructions for <directory>, where <directory> is the path (relative to the repo root) of the folder that provided that file.
  • Messages are injected near the top of the conversation history, before the user prompt, in root-to-leaf order: global instructions first, then repo root, then each deeper directory. If an AGENTS.override.md was used, its directory name still appears in the header (e.g., # AGENTS.md instructions for backend/api), so the context is obvious in the transcript.

Compaction

Compaction unlocks significantly longer effective context windows, where user conversations can persist for many turns without hitting context window limits or long context performance degradation, and agents can perform very long trajectories that exceed a typical context window for long-running, complex tasks. A weaker version of this was previously possible with ad-hoc scaffolding and conversation summarization, but our first-class implementation, available via the Responses API, is integrated with the model and is highly performant.

How it works:

  1. You use the Responses API as today, sending input items that include tool calls, user inputs, and assistant messages.
  2. When your context window grows large, you can invoke /compact to generate a new, compacted context window. Two things to note:
    1. The context window that you send to /compact should fit within your model’s context window.
    2. The endpoint is ZDR compatible and will return an “encrypted_content” item that you can pass into future requests.
  3. For subsequent calls to the /responses endpoint, you can pass your updated, compacted list of conversation items (including the added compaction item). The model retains key prior state with fewer conversation tokens.

For endpoint details see our /responses/compact docs.

Tools

  1. We strongly recommend using our exact apply_patch implementation as the model has been trained to excel at this diff format. For terminal commands we recommend our shell tool, and for plan/TODO items our update_plan tool should be most performant.
  2. If you prefer your agent to use more “terminal-like tools” (like file_read() instead of calling `sed` in the terminal), this model can reliably call them instead of terminal (following the instructions below)
  3. For other tools, including semantic search, MCPs, or other custom tools, they can work but it requires more tuning and experimentation.

Apply_patch

The easiest way to implement apply_patch is with our first-class implementation in the Responses API, but you can also use our freeform tool implementation with context-free grammar. Both are demonstrated below.

# Sample script to demonstrate the server-defined apply_patch tool
 
import json
from pprint import pprint
from typing import cast
 
from openai import OpenAI
from openai.types.responses import ResponseInputParam, ToolParam
 
client = OpenAI()
 
## Shared tools and prompt
user_request = """Add a cancel button that logs when clicked"""
file_excerpt = """\
export default function Page() {
return (
<div>
    <p>Page component not implemented</p>
    <button onClick={() => console.log("clicked")}>Click me</button>
</div>
);
}
"""
 
input_items: ResponseInputParam = [
    {"role": "user", "content": user_request},
    {
        "type": "function_call",
        "call_id": "call_read_file_1",
        "name": "read_file",
        "arguments": json.dumps({"path": ("/app/page.tsx")}),
    },
    {
        "type": "function_call_output",
        "call_id": "call_read_file_1",
        "output": file_excerpt,
    },
]
 
read_file_tool: ToolParam = cast(
    ToolParam,
    {
        "type": "function",
        "name": "read_file",
        "description": "Reads a file from disk",
        "parameters": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"],
        },
    },
)
 
### Get patch with built-in responses tool
tools: list[ToolParam] = [
    read_file_tool,
    cast(ToolParam, {"type": "apply_patch"}),
]
 
response = client.responses.create(
    model="gpt-5.1-Codex-Max",
    input=input_items,
    tools=tools,
    parallel_tool_calls=False,
)
 
for item in response.output:
    if item.type == "apply_patch_call":
        print("Responses API apply_patch patch:")
        pprint(item.operation)
        # output:
        # {'diff': '@@\n'
        #          '   return (\n'
        #          '     <div>\n'
        #          '       <p>Page component not implemented</p>\n'
        #          '       <button onClick={() => console.log("clicked")}>Click me</button>\n'
        #          '+      <button onClick={() => console.log("cancel clicked")}>Cancel</button>\n'
        #          '     </div>\n'
        #          '   );\n'
        #          ' }\n',
        #  'path': '/app/page.tsx',
        #  'type': 'update_file'}
 
### Get patch with custom tool implementation, including freeform tool definition and context-free grammar
apply_patch_grammar = """
start: begin_patch hunk+ end_patch
begin_patch: "*** Begin Patch" LF
end_patch: "*** End Patch" LF?
 
hunk: add_hunk | delete_hunk | update_hunk
add_hunk: "*** Add File: " filename LF add_line+
delete_hunk: "*** Delete File: " filename LF
update_hunk: "*** Update File: " filename LF change_move? change?
 
filename: /(.+)/
add_line: "+" /(.*)/ LF -> line
 
change_move: "*** Move to: " filename LF
change: (change_context | change_line)+ eof_line?
change_context: ("@@" | "@@ " /(.+)/) LF
change_line: ("+" | "-" | " ") /(.*)/ LF
eof_line: "*** End of File" LF
 
%import common.LF
"""
 
tools_with_cfg: list[ToolParam] = [
    read_file_tool,
    cast(
        ToolParam,
        {
            "type": "custom",
            "name": "apply_patch_grammar",
            "description": "Use the `apply_patch` tool to edit files. This is a FREEFORM tool, so do not wrap the patch in JSON.",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": apply_patch_grammar,
            },
        },
    ),
]
 
response_cfg = client.responses.create(
    model="gpt-5.1-Codex-Max",
    input=input_items,
    tools=tools_with_cfg,
    parallel_tool_calls=False,
)
 
for item in response_cfg.output:
    if item.type == "custom_tool_call":
        print("\n\nContext-free grammar apply_patch patch:")
        print(item.input)
        #  Output
        # *** Begin Patch
        # *** Update File: /app/page.tsx
        # @@
        #      <div>
        #        <p>Page component not implemented</p>
        #        <button onClick={() => console.log("clicked")}>Click me</button>
        # +      <button onClick={() => console.log("cancel clicked")}>Cancel</button>
        #      </div>
        #    );
        #  }
        # *** End Patch

Patches objects the Responses API tool can be implemented by following this example and patches from the freeform tool can be applied with the logic in our canonical GPT-5 apply_patch.py implementation.

Shell_command

This is our default shell tool. Note that we have seen better performance with a command type “string” rather than a list of commands.

{
  "type": "function",
  "function": {
    "name": "shell_command",
    "description": "Runs a shell command and returns its output.\n- Always set the `workdir` param when using the shell_command function. Do not use `cd` unless absolutely necessary.",
    "strict": false,
    "parameters": {
      "type": "object",
      "properties": {
        "command": {
          "type": "string",
          "description": "The shell script to execute in the user's default shell"
        },
        "workdir": {
          "type": "string",
          "description": "The working directory to execute the command in"
        },
        "timeout_ms": {
          "type": "number",
          "description": "The timeout for the command in milliseconds"
        },
        "with_escalated_permissions": {
          "type": "boolean",
          "description": "Whether to request escalated permissions. Set to true if command needs to be run without sandbox restrictions"
        },
        "justification": {
          "type": "string",
          "description": "Only set if with_escalated_permissions is true. 1-sentence explanation of why we want to run this command."
        }
      },
      "required": ["command"],
      "additionalProperties": false
    }
  }
}

If you’re using Windows PowerShell, update to this tool description.

Runs a shell command and returns its output. The arguments you pass will be invoked via PowerShell (e.g., ["pwsh", "-NoLogo", "-NoProfile", "-Command", "<cmd>"]). Always fill in workdir; avoid using cd in the command string.

You can check out codex-cli for the implementation for exec_command, which launches a long-lived PTY when you need streaming output, REPLs, or interactive sessions; and write_stdin, to feed extra keystrokes (or just poll output) for an existing exec_command session.

Update Plan

This is our default TODO tool; feel free to customize as you’d prefer. See the ## Plan tool section of our starter prompt for additional instructions to maintain hygiene and tweak behavior.

{
  "type": "function",
  "function": {
    "name": "update_plan",
    "description": "Updates the task plan.\nProvide an optional explanation and a list of plan items, each with a step and status.\nAt most one step can be in_progress at a time.",
    "strict": false,
    "parameters": {
      "type": "object",
      "properties": {
        "explanation": {
          "type": "string"
        },
        "plan": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "step": {
                "type": "string"
              },
              "status": {
                "type": "string",
                "description": "One of: pending, in_progress, completed"
              }
            },
            "additionalProperties": false,
            "required": [
              "step",
              "status"
            ]
          },
          "description": "The list of steps"
        }
      },
      "additionalProperties": false,
      "required": [
        "plan"
      ]
    }
  }
}

View_image

This is a basic function used in codex-cli for the model to view images.

{
  "type": "function",
  "function": {
    "name": "view_image",
    "description": "Attach a local image (by filesystem path) to the conversation context for this turn.",
    "strict": false,
    "parameters": {
      "type": "object",
      "properties": {
        "path": {
          "type": "string",
          "description": "Local filesystem path to an image file"
        }
      },
      "additionalProperties": false,
      "required": [
        "path"
      ]
    }
  }
}

Dedicated terminal-wrapping tools

If you would prefer your codex agent to use terminal-wrapping tools (like a dedicated list_dir(‘.’) tool instead of terminal(‘ls .’), this generally works well. We see the best results when the name of the tool, the arguments, and the output are as close as possible to those from the underlying command, so it’s as in-distribution as possible for the model (which was primarily trained using a dedicated terminal tool). For example, if you notice the model using git via the terminal and would prefer it to use a dedicated tool, we found that creating a related tool, and adding a directive in the prompt to only use that tool for git commands, fully mitigated the model’s terminal usage for git commands.

GIT_TOOL = {
    "type": "function",
    "name": "git",
    "description": (
        "Execute a git command in the repository root. Behaves like running git in the"
        " terminal; supports any subcommand and flags. The command can be provided as a"
        " full git invocation (e.g., `git status -sb`) or just the arguments after git"
        " (e.g., `status -sb`)."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": (
                    "The git command to execute. Accepts either a full git invocation or"
                    " only the subcommand/args."
                ),
            },
            "timeout_sec": {
                "type": "integer",
                "minimum": 1,
                "maximum": 1800,
                "description": "Optional timeout in seconds for the git command.",
            },
        },
        "required": ["command"],
    },
}

...

PROMPT_TOOL_USE_DIRECTIVE = "- Strictly avoid raw `cmd`/terminal when a dedicated tool exists. Default to solver tools: `git` (all git), `list_dir`, `apply_patch`. Use `cmd`/`run_terminal_cmd` only when no listed tool can perform the action." # update with your desired tools

Other Custom Tools (web search, semantic search, memory, etc.)

The model hasn’t necessarily been post-trained to excel at these tools, but we have seen success here as well. To get the most out of these tools, we recommend:

  1. Making the tool names and arguments as semantically “correct” as possible, for example “search” is ambiguous but “semantic_search” clearly indicates what the tool does, relative to other potential search-related tools you might have. “Query” would be a good param name for this tool.
  2. Be explicit in your prompt about when, why, and how to use these tools, including good and bad examples.
  3. It could also be helpful to make the results look different from outputs the model is accustomed to seeing from other tools, for example ripgrep results should look different from semantic search results to avoid the model collapsing into old habits.

Parallel Tool Calling

In codex-cli, when parallel tool calling is enabled, the responses API request sets parallel_tool_calls: true and the following snippet is added to the system instructions:

## Exploration and reading files

- **Think first.** Before any tool call, decide ALL files/resources you will need.
- **Batch everything.** If you need multiple files (even from different places), read them together.
- **multi_tool_use.parallel** Use `multi_tool_use.parallel` to parallelize tool calls and only this.
- **Only make sequential calls if you truly cannot know the next file without seeing a result first.**
- **Workflow:** (a) plan all needed reads → (b) issue one parallel batch → (c) analyze results → (d) repeat if new, unpredictable reads arise.

**Additional notes**:
- Always maximize parallelism. Never read files one-by-one unless logically unavoidable.
- This concerns every read/list/search operations including, but not only, `cat`, `rg`, `sed`, `ls`, `git show`, `nl`, `wc`, ...
- Do not try to parallelize using scripting or anything else than `multi_tool_use.parallel`.

We've found it to be helpful and more in-distribution if parallel tool call items and responses are ordered in the following way:

function_call
function_call
function_call_output
function_call_output

Tool Response Truncation

We recommend doing tool call response truncation as follows to be as in-distribution for the model as possible:

  • Limit to 10k tokens. You can cheaply approximate this by computing num_bytes/4.
  • If you hit the truncation limit, you should use half of the budget for the beginning, half for the end, and truncate in the middle with …3 tokens truncated…