Comments Page - Unrolling the Codex agent loop

« Back Unrolling the Codex agent loopopenai.comSubmitted by tosh 3 hours ago

postalcoder 14 minutes ago
The best part about this blog post is that none of it is a surprise – Codex CLI is open source. It's nice to be able to understand the internals without reverse engineering it. My only gripe is that they don't allow Pro users to run builds of Codex that have custom developer (system) instructions.
Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.
https://github.com/openai/codex
westoncb 13 minutes ago
Interesting that compaction is done using an encrypted message that "preserves the model's latent understanding of the original conversation":
> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.
jumploops 2 hours ago
One thing that surprised me when diving into the Codex internals was that the reasoning tokens persist during the agent tool call loop, but are discarded after every user turn.
This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.
A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.
- olliepro 23 minutes ago
  I made a skill that reflects on past conversations via parallel headless codex sessions. Its great for context building. Repo: https://github.com/olliepro/Codex-Reflect-Skill
- CjHuber an hour ago
  It depends on the API path. Chat completions does what you describe, however isn't it legacy?
  I've only used codex with the responses v1 API and there it's the complete opposite. Already generated reasoning tokens even persist when you send another message (without rolling back) after cancelling turns before they have finished the thought process
  Also with responses v1 xhigh mode eats through the context window multiples faster than the other modes, which does check out with this.
- ljm 37 minutes ago
  I’ve been using agent-shell in emacs a lot and it stores transcripts of the entire interaction. It’s helped me out lot of times because I can say ‘look at the last transcript here’.
  It’s not the responsibility of the agent to write this transcript, it’s emacs, so I don’t have to worry about the agent forgetting to log something. It’s just writing the buffer to disk.
- crorella 2 hours ago
  Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.
- behnamoh an hour ago
  but that's why I like Codex CLI, it's so bare bone and lightweight that I can build lots tools on top of it. persistent thinking tokens? let me have that using a separate file the AI writes to. the reasoning tokens we see aren't the actual tokens anyway; the model does a lot more behind the scenes but the API keeps them hidden (all providers do that).
  postalcoder 39 minutes ago
  Codex is wicked efficient with context windows with the tradeoff of time spent. It hurts the flow state, but overall I've found that it's the best at having long conversations/coding sessions.
  behnamoh 35 minutes ago
  yeah it throws me out of the "flow", which I don't like. maybe the cerebras deal helps with that.
  postalcoder 27 minutes ago
  It's worth it at the end of the day because it tends to properly scope out changes and generate complete edits, whereas I always have to bring Opus around to fix things it didn't fix or manually loop in some piece of context that it didn't find before.
  That said, faster inference can't come soon enough.
  behnamoh 23 minutes ago
  > That said, faster inference can't come soon enough.
  why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.
- sdwr 2 hours ago
  That could explain the "churn" when it gets stuck. Do you think it needs to maintain an internal state over time to keep track of longer threads, or are written notes enough to bridge the gap?
- EnPissant an hour ago
  I don't think this is true.
  I'm pretty sure that Codex uses reasoning.encrypted_content=true and store=false with the responses API.
  reasoning.encrypted_content=true - The server will return all the reasoning tokens in an encrypted blob you can pass along in the next call. Only OpenaAI can decrypt them.
  store=false - The server will not persist anything about the conversation on the server. Any subsequent calls must provide all context.
  Combined the two above options turns the responses API into a stateless one. Without these options it will still persist reasoning tokens in a agentic loop, but it will be done statefully without the client passing the reasoning along each time.
- vmg12 2 hours ago
  I think this explains why I'm not getting the most out of codex, I like to interrupt and respond to things i see in reasoning tokens.
  behnamoh an hour ago
  that's the main gripe I have with codex; I want better observability into what the AI is doing to stop it if I see it going down the wrong path. in CC I can see it easily and stop and steer the model. in codex, the model spends 20m only for it to do something I didn't agree on. it burns OpenAI tokens too; they could save money by supporting this feature!
  zeroxfe an hour ago
  You're in luck -- /experimetal -> enable steering.
  behnamoh an hour ago
  I first need to see real time AI thoughts before I can steer it tho! Codex hides most of them
mohsen1 5 minutes ago
Tool call during thinking is something similar to this I am guessing. Deepseek has a paper on this.
Or am I not understanding this right?
coffeeaddict1 an hour ago
What I really want from Codex is checkpoints ala Copilot. There are a couple of issues [0][1] opened about on GitHub, but it doesn't seem a priority for the team.
[0] https://github.com/openai/codex/issues/2788
[1] https://github.com/openai/codex/issues/3585
- wahnfrieden an hour ago
  They routinely mention in GitHub that they heavily prioritize based on "upvotes" (emoji reacts) in GitHub issues, and they close issues that don't receive many. So if you want this, please "upvote" those issues.
tecoholic an hour ago
I use 2 cli - Codex and Amp. Almost every time I need a quick change, Amp finishes the task in the time it takes Codex to build context. I think it’s got a lot to do with the system prompt and a the “read loop” as well, amp would read multiple files in one go and get to the task, but codex would crawl the files almost one by one. Anyone noticed this?
- sumedh an hour ago
  Which Gpt model and reasoning level did you use in Codex and Amp?
  Generally I have noticed Gpt 5.2 codex is slower compared to Sonnet 4.5 in Claude Code.
mkw5053 2 hours ago
I guess nothing super surprising or new but still valuable read. I wish it was easier/native to reflect on the loop and/or histories while using agentic coding CLIs. I've found some success with an MCP that let's me query my chat histories, but I have to be very explicit about it's use. Also, like many things, continuous learning would probably solve this.
dfajgljsldkjag 2 hours ago
The best part about this is how the program acts like a human who is learning by doing. It is not trying to be perfect on the first try, it is just trying to make progress by looking at the results. I think this method is going to make computers much more helpful because they can now handle the messy parts of solving a problem.
rvnx an hour ago
Codex agent loop:
```
    Call the model. If it asks for a tool, run the tool and call again (with the new result appended). Otherwise, done
```
https://i.ytimg.com/vi/74U04h9hQ_s/maxresdefault.jpg
- jmkni an hour ago
  I think this should be called the Homer Simpson loop, it seems more apt
  rvnx an hour ago
  They sadly renamed the Ralph Wiggum loop due to copyright concerns so little hope for Homer :(
  https://github.com/anthropics/claude-plugins-official/commit...
  jmkni an hour ago
  ha I didn't know that, very interesting
written-beyond 2 hours ago
Has anyone seriously used codex cli? I was using LLMs for code gen usually through the vscode codex extension, Gemini cli and Claude Code cli. The performance of all 3 of them is utter dog shit, Gemini cli just randomly breaks and starts spamming content trying to reorient itself after a while.
However, I decided to try codex cli after hearing they rebuilt it from the ground up and used rust(instead of JS, not implying Rust==better). It's performance is quite literally insane, its UX is completely seamless. They even added small nice to haves like ctrl+left/right to skip your cursor to word boundaries.
If you haven't I genuinely think you should give it a try you'll be very surprised. Saw Theo(yc ping labs) talk about how open ai shouldn't have wasted their time optimizing the cli and made a better model or something. I highly disagree after using it.
- ewoodrich 2 hours ago
  OpenCode also has an extremely fast and reliable UI compared to the other CLIs. I’ve been using Codex more lately since I’m cancelling my Claude Pro plan and it’s solid but haven’t spent nearly as much time compared to Claude Code or Gemini CLI yet.
  But tbh OpenAI openly supporting OpenCode is the bigger draw for me on the plan but do want to spend more time with native Codex as a base of comparison against OpenCode when using the same model.
  I’m just happy to have so many competitive options, for now at least.
  behnamoh an hour ago
  Seconded. I find codex lacks only two things:
  - hooks (this is a big one)
  - better UI to show me what changes are going to be made.
  the second one makes a huge diff and it's the main reason I stopped using opencode (lots of other reasons too). in CC, I am shown a nice diff that I can approve/reject. in codex, the AI makes lots of changes but doesn't pin point what changes it's doing or going to make.
  written-beyond an hour ago
  Yeah it's really weird with automatically making changes. I read in it's chain of thought that it's going to request approval for something from the user, the next message was approval granted doing it. Very weird...
- georgeven an hour ago
  I found codex cli to be significantly better than claude code. It follows instructions and executes the exact change I want without going off on an "adventure" like Claude code. Also the 20 dollars per month sub tier gives very generous limits of the most powerful model option (5.2 codex high).
  I work on SSL bio acoustic models as context.
  behnamoh an hour ago
  codex the model (not the cli) is the big thing here. I've used it in CC and w/ my claude setup, it can handle things Opus could never. it's really a secret weapon not a lot of people talk about. I'm not even using xhigh most of the time.
  copperx an hour ago
  When you say CC is it Codex CLI or Claude Code?
  behnamoh 37 minutes ago
  claude code
  wahnfrieden an hour ago
  No, the codex harness is also optimized for the codex models. Highly recommend using first-party OpenAI harnesses for codex.
  behnamoh 36 minutes ago
  I used that too, but CC currently has features like hooks that codex team has refused to add far too many times.
- estimator7292 37 minutes ago
  It's pretty good, yeah. I get coherent results >95% of the time (on well-known problems).
  However, it seems to really only be good at coding tasks. Anything even slightly out of the ordinary, like planning dialogue and plot lines it almost immediately starts producing garbage.
  I did get it stuck in a loop the other day. I half-assed a git rebase and asked codex to fix it. It did eventually resolve all debased commits, but it just kept going. I don't really know what it was doing, I think it made up some directive after the rebase completed and it just kept chugging until I pulled the plug.
  The only other tool I've tried is Aider, which I have found to be nearly worthless garbage
- williamstein an hour ago
  I strongly agree. The memory and cpu usage of codex-cli is also extremely good. That codex-cli is open source is also valuable because you can easily get definitive answers to any questions about its behavior.
  I also was annoyed by Theo saying that.
- CuriouslyC an hour ago
  The problem with codex right now is it doesn't have hook support. It's hard to understate how big of a deal hooks are, the Ralph loop that the newer folks are losing their shit over is like the level 0, most rudimentary use of hooks.
  I have a tool that reduces agent token consumption by 30%, and it's only viable because I can hook the harness and catch agents being stupid, then prompt them to be smarter on the fly. More at https://sibylline.dev/articles/2026-01-22-scribe-swebench-be...
- procinct 2 hours ago
  Same goes for Claude Code. Literally has vim bindings for editing prompts if you want them.
  behnamoh an hour ago
  CC is the clunkiest PoS software I've ever used in terminal; feels like it was vibe coded and anthroshit doesn't give a shit
  estimator7292 36 minutes ago
  All of these agentic UIs are vibe coded. They advertise the percent of AI written code in the tool.
  behnamoh 25 minutes ago
  which begs the question: which came first—agentic AI tools or the AI that vibe coded them?
MultifokalHirn 2 hours ago
thx :)
ppeetteerr an hour ago
I asked Claude to summarize the article and it was blocked haha. Fortunately, I have the Claude plugin in chrome installed and it used the plugin to read the contents of the page.
- sdwvit an hour ago
  Great achievement. What did you learn?
  ppeetteerr an hour ago
  Nothing particularly insightful other than avoiding messing with previous messages so as not to mess with the cache.
  rvnx an hour ago
  Summary by Claude:
  Codex works by repeatedly sending a growing prompt to the model, executing any tool calls it requests, appending the results, and repeating until the model returns a text response