How To De-Slop A Codebase Ruined By AI (with one skill)
Actionable Insights
Install or inspect Matt Pocock’s
improve-codebase-architectureskill before trying the workflow: GitHub repomattpocock/skills. https://github.com/mattpocock/skills — and Skillstore page https://skillstore.io/skills/mattpocock-improve-codebase-architecture. Use it as an architecture-review guide, not an auto-refactor button. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: That matters because the claim is conceptual and cultural first: he is pushing back against executive/social-media narratives before showing tooling. The screen matters because the workflow depends on making implicit software-design concepts explicit. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.Run the skill as a diagnosis-only first pass: ask it to find “deepening opportunities” wit. h file evidence, duplicated rules, unclear seams, shallow modules, missing tests, and proposed module interfaces; do not let it edit until you have ranked the findings. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: If agents repeatedly change a codebase without understanding its module boundaries, they create duplicated rules, weak seams, and shallow abstractions. It treats the agent as a tireless codebase scout that can find duplicated logic, missing seams, and suspicious module boundaries, then asks the human to make the architectural call. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Create a repo-local architecture vocabulary file first.
CONTEXT.md,LANGUAGE.md, or equivalent — defining module, interface, implementation, seam, adapter, locality, leverage, and project-specific domain terms so the skill has the same language as your team. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: - The vocabulary matters. The author spends real time defining module, interface, implementation, depth, seam, adapter, leverage, and locality because vague refactor prompts produce vague refactors. - The vocabulary matters. The author spends real time defining module, interface, implementation, depth, seam, adapter, leverage, and locality because vague refactor prompts produce vague refactors. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.Convert the best finding into a small refactor ticket/PRD: target one boundary, write the. desired interface/invariants, list tests at the seam, and keep implementation in a separate small-diff branch. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: - The vocabulary matters. The author spends real time defining module, interface, implementation, depth, seam, adapter, leverage, and locality because vague refactor prompts produce vague refactors. - The vocabulary matters. The author spends real time defining module, interface, implementation, depth, seam, adapter, leverage, and locality because vague refactor prompts produce vague refactors. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Add this to your PR checklist for AI-generated changes: link the skill/run output, note wh. ich repo files/evidence justified the refactor, confirm tests preserve behavior, and state why the change reduces future maintenance instead of adding another abstraction. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: If agents repeatedly change a codebase without understanding its module boundaries, they create duplicated rules, weak seams, and shallow abstractions. - “Code is cheap” is a half-truth. AI can produce code quickly, but low-quality changes increase entropy. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Creator’s main claims
- AI accelerates software entropy and can ruin codebases faster than humans alone.
- The cure is not “less AI,” but stronger architecture: deep modules, clear boundaries, and intentional refactoring.
- A focused skill/checklist can help agents de-slop a codebase by enforcing architectural principles.
- Refactoring must preserve behavior through tests and small vertical changes.
- Developers need to review AI-generated architecture, not just generated syntax.
Deep research verdicts
1. AI accelerates entropy when unchecked
Verdict: Strong agree, high confidence. Cheap code generation increases the volume of mediocre code unless review pressure rises too.
Supporting evidence: this claim aligns with Matt Pocock’s fundamentals talk, Mario Zechner’s OSS slop warning, and observed agent failures: overbroad edits, duplicate abstractions, shallow modules, and tests added after the fact.
Contradicting / limiting evidence: agents can also reduce entropy when used for test generation, dead-code removal, and mechanical cleanup under constraints.
Practical takeaway: do not measure AI success by lines changed. Measure complexity removed, tests preserved, and interfaces clarified.
2. Deep modules and boundaries are the right antidote
Verdict: Mostly agree, high confidence. Architectural constraints are especially important when code is easy to generate.
Supporting evidence: Claude Code memory docs emphasize concrete project instructions and conventions. Those instructions work best when the codebase already has stable boundaries and clear places for new behavior. Source: https://docs.anthropic.com/en/docs/claude-code/memory
Contradicting / limiting evidence: “deep module” can become an aesthetic slogan if not tied to real behavior and tests.
Practical takeaway: ask the agent to identify shallow modules, duplicated concepts, and boundary leaks before editing.
3. Skills/checklists can improve refactoring outcomes
Verdict: Positive, medium confidence. A skill helps if it encodes concrete rules and gates, not vague taste.
Supporting evidence: the broader skill ecosystem shows persistent instructions can shape agent behavior. Anthropic docs also warn that instructions are context, not enforced configuration, so specificity matters. Source: https://docs.anthropic.com/en/docs/claude-code/memory
Contradicting / limiting evidence: skills can be ignored, misapplied, or drift over long sessions without tests and review.
Practical takeaway: combine a de-slop skill with tests, dependency graphs, and small commits.
Core thesis
AI does not make code architecture irrelevant. It makes architecture debt compound faster. If agents repeatedly change a codebase without understanding its module boundaries, they create duplicated rules, weak seams, and shallow abstractions. The cure is not “use less AI”; it is to make the architecture more legible to both humans and agents through deep modules, explicit interfaces, named seams, adapters, locality, and leverage.
The practical move in this video is to use an improve-codebase-architecture skill as a structured architecture-review partner: first teach the agent a shared vocabulary, then have it search for “deepening opportunities,” then let the human choose and refine the refactor before delegating implementation.
Big ideas / key insights
- “Code is cheap” is a half-truth. AI can produce code quickly, but low-quality changes increase entropy. The expensive part becomes understanding, testing, changing, and safely integrating the code later.
- Deep modules are the antidote to AI slop. A deep module gives callers lots of capability through a small interface. That makes the codebase easier for humans and agents to reason about.
- The vocabulary matters. The author spends real time defining module, interface, implementation, depth, seam, adapter, leverage, and locality because vague refactor prompts produce vague refactors.
- The AI is tactical; the human stays strategic. The skill finds candidates and frames tradeoffs, but the programmer must decide which module boundaries are worth changing.
- Architecture review becomes a repeatable workflow. Rather than occasionally doing heroic cleanup, the author recommends regularly asking an agent to surface places where logic is duplicated, seams are unclear, or modules are too shallow.
Best timestamped moments with interpretation
- 0:00 — The opening claim is the whole point: AI has accelerated software entropy. The author is reacting to “code is cheap” rhetoric by reframing code as liability when it lands without architectural context.
- 1:02 — He introduces the architecture-improvement skill and its glossary. This is important because the skill is less a magic refactor button and more a shared language protocol between human and AI.
- 2:04 — The interface/implementation split becomes the central lens. The visual module diagram makes clear that architecture quality depends on what callers must know versus what the module hides.
- 3:05 — “Depth” is defined visually: a small interface over substantial behavior. This is the video’s most useful mental model for judging whether an abstraction is helping or just adding ceremony.
- 3:36 — Seams are tied to testing. The author connects architecture to verification: good seams are where mocks, adapters, unit tests, and integration tests can safely attach.
- 5:09 — Locality and leverage are presented as the two payoffs. Maintainers get concentrated changes; callers get more power per concept learned.
- 6:10 — The demo moves into Claude Code running
/improve-codebase-architectureagainst a real React Router / Effect TS codebase. This is where theory becomes a workflow. - 6:41 — Claude identifies “Insertion Point has no single seam,” a concrete issue where frontend and backend logic live in parallel and can drift. This is exactly the kind of AI-generated entropy the skill is meant to catch.
- 7:12 — The interaction becomes a “grilling session,” not a blind apply. The agent proposes code-grounded evidence and questions; the human shapes the target module.
- 9:13 — The author emphasizes that the skill demands judgment from the programmer. Agents are good tactical programmers, but they still need strategic oversight.
Screen-level insights: what the visuals add
- 0:00 — Talking-head setup. The author opens directly to camera in a home-office setting, not with code. That matters because the claim is conceptual and cultural first: he is pushing back against executive/social-media narratives before showing tooling.
- 1:02 — Markdown skill documentation. The frame shows an
improve-codebase-architecturedocument with a glossary and “deepening opportunities.” Visually, this proves the “skill” is not a black box; it is an instruction artifact that teaches Claude how to inspect architecture. - 1:34 — GitHub file viewer for
LANGUAGE.md. The visible glossary includes depth, seam, adapter, leverage, and locality. The author is literally pointing the model and the viewer at a domain language. The screen matters because the workflow depends on making implicit software-design concepts explicit. - 2:04 — Module/interface/implementation diagram. A rectangular module is split into a small top “Interface” and a larger “Implementation.” This makes the key architectural criterion visible: reduce what callers must learn while preserving capability behind the boundary.
- 3:05 — Deep versus shallow module diagram. The deep module appears as a tall block with a thin interface; the shallow module as a flatter shape. This visual anchors the entire “de-slop” process: find shallow abstractions and deepen them.
- 3:36 — Dependency graph and seams. The frame shows modules connected by arrows, with dashed seam lines. It connects the spoken testing point to a visual place in the dependency graph: seams are where behavior can be swapped, mocked, or stabilized.
- 5:09 — Locality/leverage slide. The diagram labels what maintainers and callers get from depth. This is the bridge from abstract design to business value: fewer scattered changes and more reusable capability.
- 6:10 — VS Code terminal running Claude Code. The author runs
/improve-codebase-architecture, and Claude explores the repository with shell commands. This shows the skill operating inside the real repo rather than giving generic advice. - 6:41 — Claude’s “deepening opportunities” report. The screen lists candidate architectural problems with files, problem statements, proposed solutions, and benefits. This is the key visual evidence that the tool can surface refactor targets from actual code structure.
- 7:12 — Candidate selection and grilling. Claude lists another issue: fractional index logic appears in multiple places. The prompt asks which opportunity to explore. The visual step matters because the human is choosing among architectural bets rather than accepting a monolithic refactor.
Comment-derived insights
The comment section is split between strong agreement, jokes about rediscovering basic software engineering, and skepticism about using yet another AI skill to repair AI-created mess.
- Agreement pattern: The most-liked comments support the premise: AI-driven coding is pushing teams back toward fundamentals. Several commenters say the content is valuable precisely because it re-centers architecture, testing, and modularity.
- Recurring joke: “Vibe coders discover modularity: circa 2026” captures the audience’s amused frustration. The video is teaching old ideas—interfaces, seams, listeners, modularity—but they feel newly urgent because AI makes the lack of fundamentals more visible.
- Pushback/caveat: Some commenters worry this is “using the slop machine to fix slop.” That is fair if the workflow becomes blind automation. The video’s answer is human-in-the-loop strategy: use the agent to inspect and propose, not to autonomously rewrite core architecture without review.
- Practitioner additions: One commenter suggests explicitly referencing an
AskUserQuestiontool for interview steps so the agent can collect structured decisions during design. Another mentions turning lessons from these videos into rules/coding standards for Cursor and other agents. - Useful question: A commenter asks whether the skill was evaluated against just plainly asking an agent to improve architecture. That points to a real next step: if this becomes part of a serious workflow, compare structured skill output against baseline prompting.
- Memorable concern: “The best code is the code which is not written at all” reframes AI productivity as a liability problem. More generated code is not automatically more progress.
Practical workflow to steal
- Write down your architecture vocabulary. Define what “module,” “interface,” “implementation,” “seam,” “adapter,” “locality,” and “leverage” mean in your codebase.
- Ask the agent to inspect, not edit. First pass should only find deepening opportunities with evidence: files, duplicated rules, unclear seams, shallow modules, missing tests.
- Rank candidates manually. Prefer refactors that improve locality and create a testable seam around high-change logic.
- Interrogate the design. Ask the agent to propose the module interface, invariants, adapters, test cases, and migration path.
- Turn the chosen refactor into an issue/PRD. Keep implementation separate from diagnosis so another agent or future session can execute with clear boundaries.
- Add tests at the seam. The whole point of finding seams is to create a harness that prevents future AI changes from silently reintroducing drift.
- Repeat periodically, but do not outsource judgment. Run architecture review often in fast-moving codebases, but treat the output as candidate strategy, not ground truth.
Visible tools / code / artifacts
- Matt Pocock skills GitHub repository: https://github.com/mattpocock/skills
improve-codebase-architectureskill listing: https://skillstore.io/skills/mattpocock-improve-codebase-architecture- Markdown documentation for the
improve-codebase-architectureskill LANGUAGE.mdglossary for architecture terms- Claude Code in a VS Code terminal
- A real project named
course-video-manager - React Router and Effect TS mentioned in the demo
- Architecture diagrams for modules, depth, seams, locality, and leverage
- Proposed refactor targets including “Insertion Point has no single seam” and duplicated fractional index logic
My read / why it matters
This is one of the more useful agent-coding patterns because it does not pretend the agent is an architect. It treats the agent as a tireless codebase scout that can find duplicated logic, missing seams, and suspicious module boundaries, then asks the human to make the architectural call.
The strongest lesson is that AI coding raises the value of old-school software design. Deep modules, small interfaces, test seams, and locality are not academic concerns; they are what let agents make changes without wrecking the system. If you want agents to move fast safely, you need architecture that gives them clear handles.
Verification notes
- Actionable Insights audit: top bullets were reviewed and rewritten to be concrete, workflow-ready, directly usable, and link-rich rather than claim summaries.
- Skill/link audit: the analysis now surfaces the referenced
improve-codebase-architectureskill and Matt Pocock skills repo links in the Actionable Insights and visible artifacts sections. - Actionable Insights audit: expanded to the newer detailed format with fuller implementation notes, evaluation checks, and cautions where the existing evidence supports elaboration.
Comment insights
- Top audience signal: @SergiySev (132 likes) said: “You are absolutely right!”. This is the highest-salience community reaction and should be weighted as audience evidence, not proof.
- practitioner addition: @cdov-q1n (78 likes) — We’re going back to SWE basics soon, once everybody was dumbed down enough. Then, something like the listener pattern will look like magic and everybody will find it “a genius solution”.
- practitioner addition: @Cyphlix (60 likes) — Vibe coders discover modularity: circa 2026
- practitioner addition: @shreyashc (56 likes) — one more skill bro, just one more skill, we will have AGI 😂
- practitioner addition: @smaranh (32 likes) — That ended on a cliffhanger 😄
- practitioner addition: @Parsecter (18 likes) — Vibecoders are like, “Yo, dude, what are you talking about? The next model will definitely fix everything. We just need to regenerate it again. Oh, damn, we’re out of quotas.” Meanwhile, the bosses of Anthropic, Open AI, and others are rubbing their palms and continuing to tell tales of AI’s imminen
- Synthesis: Treat the comments as an adoption-risk check: if commenters ask for proof, cost controls, setup details, or safety boundaries, the workflow should include those checks before production use.