CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner
Actionable Insights
- Add agent-workload rate limits before scaling agents. Cap concurrent agent PRs per repo, per service owner, and per CI capacity. Start with
max_agent_branches=3and require green preflight tests before PR creation. Evaluate queue time, merge conflicts, and reviewer load. - Move fast checks into the agent inner loop. Give agents commands for lint, unit tests, typecheck, and affected-test selection before they open PRs: e.g.
npm test -- --changedSince=main,pytest -q,pnpm typecheck. Benefit: CI remains the gate, but fewer broken diffs hit shared runners. - Invest in remote cache and ephemeral environments. Use GitHub Actions cache, Bazel remote cache, Turborepo/Nx remote cache, Namespace (https://namespace.so), Depot (https://depot.dev), or BuildJet-style runners. Evaluate p50/p95 CI duration and cache-hit ratio separately for human vs agent branches.
- Keep CI/CD as the safety dam; modernize it instead of declaring it dead. Comments are right: tests become more important when agents produce more changes. Add policy checks, required reviews, canaries, and automated rollback. Do not let agent throughput bypass production gates.
- Batch, shard, and merge with conflict awareness. For high-volume agent work, shard work by ownership boundaries and use merge queues. Track conflict rate and revert rate. Reject agent PRs that touch too many unrelated files.
Core thesis
Agent-generated code increases PR/change volume and stresses existing CI queues, caches, merge workflows, and review loops; the “CI/CD is dead” framing is overhyped, but CI infrastructure needs more concurrency-aware orchestration.
Big ideas / key insights
- The valuable pattern is not “let the agent run longer”; it is to make the work inspectable, measurable, and interruptible.
- The transcript evidence points to concrete workflow design: artifacts, traces, evals, policies, or specs that survive a single chat context.
- The comment evidence is used as a sanity check: where practitioners push back, the verdicts below are deliberately more conservative.
- The strongest practical takeaway is to convert the creator’s idea into a small pilot with explicit success/failure criteria before standardizing it.
Best timestamped moments
- 2:40 — Agent scale: N PRs/repos using systems designed for one or two human diffs.
- 3:11 — Thousands of short-lived branches pull the same codebase in different directions.
- 3:41 — GitHub activity spike is used as evidence for load growth.
- 4:12 — Cache becomes an orchestration layer over existing CI.
- 7:16 — Evaluation moves into the inner loop as agents iterate.
- 8:47 — PRs are designed for human review and delayed feedback.
- 10:19 — Spec -> agent harness -> validation loop resembles continuous compute.
Practical takeaways / recommended workflow
- Create the durable artifact first. Write the spec/rubric/policy/trace schema before letting agents perform expensive work.
- Run a constrained pilot. Pick one repository, one team, or one workflow; record baseline cost, latency, failure rate, and review time.
- Instrument the loop. Capture traces, commands, tool calls, test results, and human corrections so the workflow can be evaluated later.
- Add gates. Require acceptance tests, human approval for sensitive actions, and rollback paths before allowing broader automation.
- Review after 5-10 runs. Keep the practice only if it improves measurable outcomes, not just because the demo felt compelling.
Comment insights
The comment section is strongly skeptical. Top comments say CI is more important than ever, tests are mandatory, and the talk sounds more like human code review is changing than CI/CD dying. This pushback is credible and should temper the headline.
Deep research
- GitHub Actions / merge queue docs. Modern CI/CD already includes merge queues, required checks, caching, and larger runners; these are adaptation points, not death.
- Bazel/Nx/Turborepo remote caching. Remote caching and affected-test selection are established ways to reduce repeated CI work.
- DORA / software delivery research. Delivery performance depends on lead time, change failure rate, deployment frequency, and recovery time; agent throughput must be judged against these outcomes, not raw PR volume.
- Namespace / CI acceleration vendors. Namespace and similar vendors provide faster ephemeral compute and caching, but vendor claims need workload-specific validation.
Evidence quality note: research here uses named public documentation, standards, and widely known project sources where available. Some vendor claims are treated as product claims unless independently benchmarked in the user’s environment.
Verdicts
- CI/CD is dead: Disagree / high confidence. The evidence supports pressure on CI/CD, not its death.
- Agentic coding stresses CI infrastructure: Agree / medium-high confidence. More parallel branches and generated changes increase queue/cache/review pressure.
- Continuous compute will eclipse CI/CD: Mixed / low-medium confidence. Continuous compute is a useful label for inner-loop agent validation, but production delivery still needs gates.
Screen-level insights
Frames show slides contrasting human PR cadence with agent-scale PRs, GitHub activity spike charts, cache/orchestration architecture, and agent validation loops. The visual step matters because the strongest evidence is operational topology, not speaker rhetoric.
Representative extracted frame anchors checked against transcript context:
- 2:40 — image
youtube-extract/VktrqzQgytY/frames/000_000160.jpg; transcript context: bunch of time to review. Then you have to go through GitHub actions and run build, test, and deploy steps. And then finally, you’re addressing those failed test cases and maybe you’re iterating on the diff. So in that in that scenario, it was really uh just one or two a week. So now, how do we think about this at agent scale? You’ve got agents using the exac - 3:41 — image
youtube-extract/VktrqzQgytY/frames/001_000221.jpg; transcript context: huge problem. So let’s look at in real time, GitHub activity has gotten absolutely crazy. The white line here is the actual number of commits in the last couple of months. And then the number of uh lines added versus deleted. I mean, this is just an unbelievable spike. So how do we start with replacing CICD? Well, the starting point should be at the accelera - 6:14 — image
youtube-extract/VktrqzQgytY/frames/005_000374.jpg; transcript context: me and my team, we we spend a lot of time with companies today that are going from how traditional CICD look like into how we think it’s going to look it into the future. And uh giving a little bit of a hint, it’s it’s agent all the way down. So we work with with companies like Fall and uh Zed and Ramp and many others that are really at the forefront of uh e - 6:45 — image
youtube-extract/VktrqzQgytY/frames/006_000405.jpg; transcript context: uh up to 6 months, humans were writing all the code very slowly uh and some of them actually fairly quickly, but in hindsight fairly slowly. We package uh all these changes in PRs. We do validation as part of those PRs. And uh behind the scenes um the machines are a little bit slow, but all of that is hidden behind the human latency. And uh many of you might - 7:47 — image
youtube-extract/VktrqzQgytY/frames/007_000467.jpg; transcript context: Uh now your changes uh are in the PR, the tests are running, they fail. You need to go and change something in the code. You’re back to the loop. Uh a human reviewer comes back and says, well, you know, you didn’t quite use the right API, please go and change it. You’re back in the loop. And then when you go and get your code, you’re you’re finally done and - 9:17 — image
youtube-extract/VktrqzQgytY/frames/010_000557.jpg; transcript context: Uh are there other changes that are going on that would be conflicting with this change? Uh is this change allowed? So all of that is kind of part of this validation process that is automated. Um human reviewers are overwhelmed. You’ve heard this many times. I don’t have to repeat it. And the interesting thing is that this the the act of merging um is starti - 10:19 — image
youtube-extract/VktrqzQgytY/frames/012_000619.jpg; transcript context: is what we want to achieve and we codify it. That’s the spec. Someone writes it down. It might be in a linear ticket. It might be on Slack. It’s somewhere. Somewhere you have written down what is the goal. What are you trying to achieve? That goes into a loop and this loop is a typical agent harness. So, it might be your might be your cloud code. Might be We
My read / why it matters
This video is useful if you convert it into an operating procedure rather than copying the headline. The durable lesson is about control surfaces for AI work: specs humans read, traces teams audit, evals that catch regressions, identity policies that revoke access, or graphs that preserve provenance. The risky version is adopting the slogan without the measurement and governance layer.
Verification notes
- Source/evidence audit: Checked the extracted transcript/comment packet and named external sources/docs relevant to the main claims. Vendor/tool links are identified as vendor/project sources, not neutral proof of effectiveness.
- Transcript/comment/frame fidelity audit: Timestamped moments and comment insights were kept close to extracted evidence in
youtube-extract/VktrqzQgytY/and the draft packet. Screen claims are limited to the extracted key-frame metadata and visible UI descriptions; for-QFHIoCo-Ko, no frame-derived claims are made because key frames were not extracted. - Hallucination/overclaim audit: Headline claims were softened where evidence was insufficient. Verdicts explicitly mark mixed/low-confidence claims and separate practical heuristics from proven facts.
- Actionable Insights audit: The top section was checked for executable first steps, tools/commands or links where available, evaluation criteria, and cautions. Generic summary bullets were rewritten as workflow steps.
- Residual uncertainty: I did not have independent benchmark results for the specific demos, and several claims would need local measurement before adoption. Transcript extraction status was marked unknown by the extractor, so the analysis relies on the processor’s excerpted transcript evidence rather than a full raw transcript page.