Transcript: Codex Just Became THE BEST Long Running Agentic Harness

Watch video

Chase AI17m 15sTranscript ✅Added May 15, 12:40 am GMT+8

Source video ID: nOFordZCyzs

Transcript

0:00 — Cody’s might have Claude code beat here. With the release of the brand new experimental goals feature, Cody’s is now the easiest way to execute long-running autonomous coding tasks without having to include any sort of additional orchestration layers. Goals acts like a more sophisticated integrated route loop. You give it some sort of objective and it will work for potentially hours upon hours to solve that problem without you needing to intervene at all. And today, I’m going to show you how it works, how you can set it up, and we’ll go through a real
0:30 — demo so you can see this thing in action. So today, we’ll be creating Rift Salvage, our 2D combat video game that uses completely original assets and that we build strictly through goals. The goals feature is one of the real differentiators with Cody’s right now and it is hilariously simple to use. We’re talking about a single slash command. So there’s a ton of value to be had here. So whether you’re using the Cody’s desktop app or the Cody’s CLI, you have to enable goals because it’s an experimental feature. Now you can prompt Cody’s to do that or you can do it
1:00 — yourself very quickly. Inside of the Cody’s app, I’m just going to go to settings and then I am going to go to configuration. Right here where it says open config.toml, I’m going to click that. Going to open it up in VS code. And down here, you need to add two lines if it’s not already there. Features and then goals equal true. That’s it. It should take you like 2 seconds. If that’s too complicated, you can also tell Cody’s, “Hey, can you enable goals for me?” So features goals
1:31 — equal true. That’s it. Now to actually use goals inside of the desktop app and inside the CLI, you have to do forward slash goal. Now, for whatever reason, I think it’s cuz it’s new and experimental, when you do forward slash goal, you’re not going to get any like notification that it’s actually working. And you’ll see once we give it a proper prompt that we will actually get a little badge that we know goal is working. So if you enable it, make sure you reset Claude code after you do that just to make sure the changes hit. But when you do /goal you’re not going to see anything like you normally would like if you did, you
2:01 — know, a skill or something where you get like some proper, you know, feedback that it’s working, but this is good. But before we actually demo goal inside of the app, let me explain how it actually is working under the hood. But first, a quick word from today’s sponsor, me. So as you know, inside of Chase AI Plus I have the Claude Code Masterclass, but I also just released the Codex Masterclass. So you now have two tools that can help bring you from zero to AI dev, and this is the best place to learn how to do that because I assume you have no technical knowledge and we focus on real use
2:32 — cases. So if you want to get your hands on this, or if you want to listen to my free webinar that I’m running in a couple days, the link will be down in the pinned comment. Hope to see you there. So like I said in the intro, Codex goals is basically a more sophisticated, integrated RALF loop. Now, what is a RALF loop you ask? Well, we’ll do a quick review for those of you who don’t remember. At its core, a RALF loop, if we were using it in something like Claude Code, is simply one line of code. It’s just a bash loop. It’s
3:02 — exactly what you see right here. And the idea is I run this line of code, and what’s going to happen is it’s going to spin up Claude Code, or spin up Codex, or any AI system, and it’s going to take a look at a prompt.md file. And this prompt is going to say, “Hey, here’s what we’re trying to do. Here’s how I want to do it. By the way, here’s the criteria that will consider it complete.” So in this example, we want to lift coverage on authentication files, which basically means we need to create more tests, and we will stop when
3:32 — coverage is at 75%. So that’s the end goal. And so the way it would work is you would start this loop, and then the loop takes a look at the prompt. It then injects that into the AI session. The session runs a single turn. It reads the prompt, and it also reads a state dot md file. The state file is basically a file that it can take a look at saying, “Okay, if we have task one, two, and three, what
4:02 — have we done so far and is it working?” So, say the first few turns, it completes task one, and then the next turn it’s going to go take a look at the state file and say, “Hey, task two isn’t complete. Guess what we’re going to do in this session. Well, we’re going to do session two.” And then maybe it doesn’t work for the first turn. It says here, “Hey, here’s what I tried. Next guy comes, etc., etc., until it completes all the tasks.” And so, after that agent runs its turn, it updates the file, the turn ends, and the loop continues. So, you get this sort of like continual loop where it’s constantly checking a couple different
4:33 — files to see what have we done, what do we need to do, what is the end state. And eventually, once it reaches the completion criteria, it says, “Hey, we’re done.” All autonomous. That’s the idea of Ralph Loops. Now, if you want Ralph Loops to do more things, it requires additional scaffolding. You know, things to do with like billing. What do you do with any sort of like smart token usage? Not necessarily. What happens if it shuts down, right? The agent crashes, you control C. How does it know it’s actually done? Is there actually like a built-in third party
5:03 — that verifies everything’s done? Not really, because at its core, again, it’s just a single line of code. Now, compare that to goals. Goals, big picture, works the same. We’re telling it to do something, it has an idea of how it’s going to do it, and it’s constantly updating internal files saying, “Here’s what I’ve done, here’s what we still need to do.” And it’s trying to reach that end state. So, big picture, it’s pretty much the same. However, there’s a few differences. First of all, we have these two markdown files, which are essentially invisible to you. It’s continuation
5:33 — and budget limit. What are these two things doing? Well, these things allow Codex to act in a different manner if you’re about to bump up against usage limits, which is important. So, there’s actually sort of a graceful ending for how your system will handle a task in a goals loop versus a ralf loop. Ralf loop, you hit your budget, you’re done. Codex, not necessarily. It will figure out a good way to sort of like get you to a spot that you can work on later. And the way that happens in reality is Codex runs its turn in its goals loop or ralf loop,
6:05 — however you want to think about it. And when it reaches the end of turn, it really has four paths it can go down. One, if it still has work to do and the budget is good, hey, we’re just going to keep on trucking. Two, if we are near our token cap, what it’s going to do is it’s going to inject that budget limit.md file and it’s going to essentially wrap up the turn gracefully and give you final report for what’s been done and what you need to do moving forward if you update your limit. If we have finished the project, it’s going to make an update goal tool call. So, it’s going to add
6:35 — and change its status. It’s going to make sure all the deliverables are audited and if everything comes back thumbs up, hey, goal complete, we’re done. Lastly, we have ways to pause the or edit the goal, deal with crashes. So, in the event something goes wrong while we’re doing our loop, well, it’s not like a traditional ralf loop where we’re kind of just like boned. So, a little more sophisticated than the ralf loop, very similar to big picture, and we don’t have to do any additional orchestrations. This whole thing should sound very familiar to you if you’ve ever worked with something
7:05 — like GSD. GSD superpowers, all these tools are orchestration layers that sit on Claude code to essentially do what we’re doing with a single / command inside of Codex with goals. And because it’s literally just a single / command, it makes it super easy to execute. You don’t need to watch a 40-minute demo on all the intricacies of GSD. You just kind of do forward / go and Codex goes forth and conquers. And so, with that in mind, let’s actually put it to the test.
7:35 — So, first of all, we’re going to put this guy in plan mode cuz we can go from plan mode to goals very easily, and we’re going to have it create essentially a top-down arcade survival game for us. And we’re going to have it create all of its own assets. The cool thing about Codex versus something like Cloud Code, for example, is because it’s an OpenAI product, we have access to image two the GPT images, too. So, it’s going to create all of its own assets for this game. I want a player drone sprite, I want three enemies, I want a boss creature, energy core hazard mine, rift
8:05 — background, badges, two UI flavor assets. So, I’m going to have it create quite a bit. Okay. So, the prompt is relatively sophisticated because this can go on for a long, long time. Like, I should have shown you the screenshot already of the guy who’s like, “I’m having it run for 50 straight hours.” I mean, who knows if 50 straight hours is is really the best way to do this, but the idea is we have a fuzzy idea, we go into plan mode, we get something very, very tight, and very importantly,
8:35 — with something like this is you need to be extremely specific about what the end result needs to be. Because if we don’t have a very specific end result we are shooting for, a very quantifiable set of things it must hit in order for it to complete the loop, you’re going to get an outcome that is kind of mediocre, it might be half-baked. So, I highly suggest you go through plan mode and you take the time to actually flesh out the plan and not say like, “{slash} goal, make me a SaaS product that makes a
9:06 — billion dollars.” And so, here’s the plan for our game. And when it comes to verification, this is what it’s going to be looking at, right? This is what it’s actually going to test before it says it’s complete. Obviously, it needs to run npm run build and fix all the errors, start the dev server, and provide the local URL. Add and run an automated Playwright verification script that opens the app, confirms everything loads, checks the canvas is non-blank, simulates keyboard movements, simulates collectible event, forces damage, confirms health changes, boss win state UI’s, on and on and on
9:37 — and on. So, this is what you really want to take a look at. You know, if you look at the verification and you say, “Hey, if all that is completed, I will be happy.” Well, then you’re good to move forward. Now, when it says implement the plan, you’re going to want to go to no. I’ll tell you what to do. You’re going to do {slash} {slash} goal use goal to implement this plan. And we’re going to submit. And so, right up here, what do you see? You have this little badge that says goal. So, now I know we’re doing goal
10:07 — and it says it right here as well. So, like I told you before, when you do {slash} goal, you’re not going to get any commands, but it’s working. I think it’s just sort of a UI bug for it being an experimental feature. So, it says it’s still in plan mode, so we’ll cancel that goal. Use goal to implement this plan. So, little rough around the edges still, but let’s see what it actually does for us. The idea is now I’m completely hands-off. You know, it’s going to execute its little Ralph loop, its little goal
10:38 — thing, and at the end, we’re going to have a final product. So, it’s been working for about 12 minutes now, and you can see it’s already in the process of creating all the different assets using the ImageGen 2 model, which is like pretty sweet. And again, the Yeah, the nice thing is when you’re using the desktop app versus just scraping in the raw terminal, like all of this is presented to you in line, which is which is nice. I personally have been very impressed with the Codex desktop app. Um not to say I don’t still love Claude code. I Thing is, I use both
11:08 — of these tools interchangeably. You can kind of watch my last video for my whole bit on that, where I think the idea that we need to choose between these two tools is kind of stupid. Like, why are we not just using both and often both of them in tandem? Um but with Claude code, I’m very much pure terminal, but with Codex, I’ve really enjoyed the desktop app. And part of that might just be it’s a nice change of pace sometimes, too, versus always being in the the terminal all the time, so so far I’ve really liked it. So, after about 30 minutes, it said it was done, and actually finished it up faster than
11:39 — I thought it would. So, let’s see how it did on the first pass. And because it did this so quickly, I’ll probably ask it to do some stuff at the end. So, says it implemented Rift salvage, local dev server’s running here. It’s a canvas game with keyboard touch control, spawning enemies, mine scoring, shield power-ups, boss phase, win, lose, pause, and restart. 11 image gen bitmap assets with alpha cutouts, automated play right verifier, and then shows us
12:09 — all the things it built, which is pretty cool. So, let’s see if it works and what we can add to kind of push to the limit a little bit more. Oh, let’s actually do it in the real browser. Okay, so have a little loading screen. And contrast is a little low, kind of hard to see it. Might be kind of hard for you to see it, but I have my little spaceship. So, that’s a mine, I think. I’m supposed to like grab these things.
12:41 — While it spawns enemies that chase me. So, you know, it works. It looks kind of cool. I think we could probably work on the graphics a little bit, but it is kind of neat that everything here was created um like as unique images. I think what we could do is we could add Well, first of all, I want to see what the boss fight looks like if we could kind of speed that up and also add some sort of like shooting system, either with like lasers or something cool like that. So, let’s actually do that. Let’s Let’s have it do
13:12 — that before we sit here any longer. So, I’m going to throw it in plain mode and see if we can make it work a little bit harder. Okay, so I think that was a pretty good first pass. Everything’s working, um but I’d like to make it a bit more complicated. Can we add some sort of like combat system, um whether that’s like lasers shooting at you know, different enemies and they shoot back at us. Could we also have the boss phase come a little bit quicker or include some sort some sort of button that I can press to just have the boss phase start? Could we also change the contrast a little bit cuz right now
13:43 — everything kind of blends into the background. And if you have any other ideas, just sort of just make this a little bit more complicated and push you to your limits, um let me see those ideas. So, this is the plan it came up with. Now, one thing you want to note when you’re using the goals system, each goal run is tied to the thread or the session that you are using at that time. So, we’ve been in the same chat, which means we’re in the same goal thread. If I want to do goals again, I want to do a second goals
14:15 — run on the same project, we can do that, but we have to do it in a second thread or a second chat, like like opening up another terminal. So, all I’m going to do is copy this plan. I’m going to open up another chat. And we’re going to do {slash} goal. And we’re going to paste this in there. So, after 15 minutes we completed the second goal pass, so it implemented the combat upgrade. So, let’s see what this game looks like now. So, here’s the loading screen again, very similar to
14:45 — what we saw the first time except it added a few sort of widgets up top here. So, we have target combo as well as the boss signal now. So, if we launch it right away, kind of shooting my shooting my gun, the enemies are able to shoot back and they have sort of hit points. I can also hit the boss signal. So, there is the boss. Um pretty sick-looking actually. I I think the coolest thing about this game and what it did was just all the unique assets, right? The fact that everything is an is a original asset and that it did all of
15:16 — this using the image gen 2, which I think was pretty sweet. Um and I know obviously this only took about 45 minutes total between the two runs and we saw some people doing runs for like 3 days from their screenshots, but I think the the best part about this is is how simple it is to execute these goals. And you know, you kind of just give it a goal and it’s going to go nuts assuming you have some sort of locked in Did we win? I don’t know if we died or not, but as I was saying, the cool thing about this and about the goals in
15:47 — general is the idea that if you have a clear North Star and you have clear criteria for what success looks like, you can get a ton of out of this. And this can kind of just run forever. So, instead of having to set up your own sort of route loop and your own scaffolding or using something outside as an orchestration layer like GSD or Superpowers, it’s kind of just built in for you. And like we demoed here, you can add a lot of neat stuff that are harder to implement, but you can inside of Cloud Code. Like if we use Cloud Code for this, we could have definitely done
16:17 — this. We would have just had it implement something like the Higgsfield CLI or the Higgsfield MCP to do all that image generation for us rather than it being this one integrated holistic system. So, I hope you were able to get something out of this video and I highly suggest you check out Codex, guys. I’ve really enjoyed the desktop app like I’ve been talking about before. I think this goals thing is really cool. And again, we could have done this in tandem with Cloud Code as well. We could have had the plan be created in Cloud Code and then thrown it into Codex for goals,
16:47 — had, you know, Cloud Code take a look at what work it did and kind of have this back and forth, which is where I think you get the most value. It’s kind of like, you know, um the whole being greater than the sum of its parts type deal. So, as always, let me know what you thought. Make sure to check out Chase AI Plus. There is a link to that down in the pinned comment. Also running a webinar in a few days. There’ll be a link there as well. So, hope to see you there. And other than that, I’ll see you around.