I Tried Every Popular Claude Skills System, Here is the Best

Rob Shocks15m 32sTranscript ✅Added May 23, 8:40 pm GMT+8

Actionable Insights

Build a five-step agent loop before installing another mega-library. Start with ask/interview → prototype → plan → build → test/review, then only promote repeated fixes into SKILL.md files. In this video, the creator compares Addy Osmani, Matt Pocock, Garry Tan/gstack, ECC, BMAD, Superpowers and others, then lands on a lightweight bespoke loop. First experiment: choose one active repo and create five local commands or prompts named ask.md, prototype.md, plan.md, build.md, and review.md; evaluate by whether PRs get smaller, have clearer acceptance criteria, and require fewer rework comments. Caution: do not add skills for one-off preferences; stale skills become misleading documentation.
Use existing skill repos as source material, not as unquestioned operating systems. Useful references surfaced in the video/research include Addy Osmani’s agent-skills, Matt Pocock’s skills and docs at AI Hero, Garry Tan’s gstack, affaan-m/ECC, and curated indexes like VoltAgent/awesome-agent-skills. Pull one skill at a time, rewrite it for your repo, and add an exit criterion: tests run, issue created, prototype branch produced, PR review completed. Evaluate each imported skill after three uses: keep it only if it changed agent behavior measurably.
Codify prototyping as a separate mode. The video’s strongest practical workflow is the design-mode prompt: front end only, dummy JSON for backend data, linked navigation, responsive buttons, no backend logic. Use this when UI/state/product shape is uncertain. First step: create a prototype skill that requires a disposable branch, mock data, and a screenshot or Playwright smoke test before implementation starts. Success criteria: product questions get answered before backend work begins; caution: throwaway prototypes should not silently become production code without a hardening pass.
Make code review layered, not just file-by-file. The sponsor segment shows CodeRabbit Atlas organizing PR review by layers such as data shape, business logic, and tests, with AI summaries next to diffs. You can try CodeRabbit or reproduce the pattern manually: ask an agent to classify changed files into schema/API, domain logic, UI, tests, security, and migration risk before reviewing individual hunks. Evaluate by whether reviewers find architectural risks earlier. Caution: AI review summaries are triage aids, not proof; still run static checks and human smoke tests.
Use TDD selectively as a verification scaffold, not a religion. The video notes that many systems converge on tests and that agents can cheat tests. Practical version: for risky logic, first ask the agent to write a failing test, run it, implement the smallest passing change, then run the whole relevant suite. Keep evidence in the PR body: failing output, passing output, and files touched. Caution: require negative tests and behavior assertions; shallow snapshot tests can give false confidence.
Separate project-pipeline skills from task-pipeline skills. A top comment correctly points out that /spec and /plan can mix project-level and issue-level workflows. Create two lanes: project discovery (spec, architecture, features-contract) and task execution (build, test, review, ship). Evaluate by whether a single issue can be completed without reopening project-wide strategy debates.

Core thesis

The video argues that popular Claude/agent skill systems mostly rediscover the same software-development loop: clarify the work, prototype where useful, plan small slices, build incrementally, test, review, simplify, and ship. The creator’s distinctive claim is that the best skill system is not someone else’s repo; it is the lightweight, evolving harness you build from your own repeated agent failures, codebase context, and team process.

Big ideas / key insights

Skills are workflow prompts with packaging. Around 0:30, the creator explains skills as natural-language prompts in SKILL.md with front matter plus optional scripts, reference material, and assets.
Convergence matters more than brand. Addy Osmani, Matt Pocock, BMAD, Superpowers, ECC, and gstack differ in opinionation, but most include some form of spec/discuss, plan, build, test, review, and ship.
Context is a scarce resource. The creator repeatedly favors short, bespoke skills over large context-heavy systems. External support: Addy Osmani’s blog also distinguishes workflow skills with exit criteria from long essays that agents ignore.
Agent harnesses are becoming a differentiator. The video frames personal/team harness design as a compounding software-development advantage.
Review and verification become more important as AI increases throughput. The CodeRabbit segment is sponsored, but the underlying point is valid: if agents create more PRs, review structure and smaller commits matter more.

Best timestamped moments with interpretation

0:30 — Skills are described as SKILL.md files with front matter and bundled resources. Interpretation: this is a useful operational definition, but implementation details vary by harness.
1:30 — Star-history graph is used to argue that skill libraries are exploding. Interpretation: developer interest is real, but stars are not a quality metric.
2:01 — Addy Osmani’s lifecycle is presented: spec, plan, build, test, review, simplify, ship. Interpretation: this is a classic SDLC mapped into slash commands.
3:34 — The creator praises vertical slicing and prototyping. Interpretation: this is the most durable advice in the video; it reduces agent scope and gives visual feedback early.
5:06 — ECC is positioned as the biggest, broader harness with memory, verification, subagents, and security. Interpretation: powerful, but likely overkill unless you need a full operating system.
6:36–8:38 — CodeRabbit Atlas review walkthrough. Interpretation: treat as sponsored demonstration; the useful pattern is layered review, not the claim that one tool solves review.
9:09–11:42 — The final recommended loop: ask, prototype, plan, build, test. Interpretation: good default for individual builders and small teams.
12:12–14:44 — Build your own skill system over time. Interpretation: strong advice; local process knowledge is where generic skill packs run out.

Practical takeaways / recommended workflow

Create a minimal local skills/ or .agent/ folder with five workflows: ask, prototype, plan, build, review.
For every new feature, start in ask/interview mode until the agent can restate goals, non-goals, success criteria, and risks.
Use prototype mode only when UI, state, or interaction design is uncertain; keep it frontend-only with mock data.
Convert the accepted plan into small vertical slices, each with tests or smoke checks.
During build, require evidence after every slice: command run, result, screenshot if UI, and changed files.
During review, categorize the diff by layer before reading hunks: data contracts, business logic, UI, tests, security, migration/deploy.
Promote a new skill only after the same failure repeats at least twice.

Comment insights

The comments add useful counterweights. One commenter says the harness is personal and mentions switching from Claude Code to Pi harness, reinforcing the video’s bespoke-harness thesis. Another argues the proposed slash-command flow mixes project-level and issue-level pipelines, a real process-design caveat: teams should separate product discovery from task execution. Several commenters complain that links were missing from the description, which matters because this video references many repos and tools; the analysis above supplies the likely canonical links. A few comments push for specific missing skills, especially frontend/backend design skills, suggesting the audience wants concrete templates more than another survey.

Deep research on the main claims

Claim: Addy Osmani’s agent skills encode a lifecycle of spec/plan/build/test/review/simplify/ship. Support: the agent-skills GitHub README lists seven slash commands mapping to those phases and says the pack includes lifecycle skills with verification gates. Addy’s Agent Skills blog post argues skills should be workflows with checkpoints and exit criteria, not reference essays. Contradiction/caution: the repo’s popularity and structure do not prove it improves outcomes in every codebase.
Claim: Matt Pocock’s skills emphasize practical engineering moves like grill-with-docs, TDD, PRD/issues, handoff, prototype. Support: AI Hero’s skills page lists /grill-with-docs, /to-prd, /to-issues, /tdd, /handoff, /prototype, /review, and links to source mattpocock/skills. Contradiction/caution: comments and community posts note that some patterns are common knowledge repackaged; the value is consistency, not novelty.
Claim: gstack is a highly opinionated virtual engineering team for Claude Code. Support: garrytan/gstack describes 23 specialists and slash commands including office hours, plan reviews, design, review, QA, security, and release. Contradiction/caution: its own README includes strong productivity claims; those are self-reported and context-specific, so treat them as anecdotal unless reproduced in your environment.
Claim: ECC/everything Claude Code is a broad harness with memory, security, verification, and cross-harness support. Support: affaan-m/ECC describes skills, instincts, memory optimization, continuous learning, security scanning, research-first development, and compatibility across Claude Code, Codex, Cursor, OpenCode, Gemini and others. Contradiction/caution: broad systems can add cognitive overhead; small teams may get more value by extracting one workflow at a time.
Claim: AI-assisted code review tools can reduce PR review burden. Support: CodeRabbit docs describe AI-generated summaries, bug detection, security/quality analysis, one-click fixes, code graph analysis, and PR walkthroughs. Contradiction/caution: independent reporting such as Help Net Security’s 2025 coverage of AI-assisted PR risks argues AI-generated PRs can increase logic, security, and quality issues, so review automation should augment rather than replace human review.

Verdict

“Most skill systems converge on the same SDLC loop.” — Agree, high confidence. The transcript and external sources consistently show spec/interview, plan, build, test, review, and ship patterns. Practical takeaway: start with the loop, not the library.
“The best skill system is the one you develop yourself.” — Agree with caveats, medium-high confidence. Local harnesses capture codebase-specific decisions and repeated failures better than generic repos. Overclaim: not everyone should start from scratch; proven public skills are good seed material. Practical takeaway: fork ideas, not whole ideologies.
“Simple natural-language prompting gets you 90% of what libraries offer.” — Mixed, medium confidence. For solo builders and small features, yes. For regulated teams, multi-repo systems, security gates, or onboarding many developers, explicit skills and policies matter more.
“CodeRabbit Atlas solves code-review pain.” — Mixed, medium confidence. Layered PR walkthroughs are useful, and CodeRabbit documents related features. But the segment is sponsored and tool claims need local validation. Practical takeaway: test it on a few PRs with known defects and compare against human review.
“Large skill libraries’ star counts imply quality.” — Disagree if taken literally, high confidence. Stars show attention, not correctness. Use stars as discovery, then inspect the actual workflow, exit criteria, maintenance, and fit.

Screen-level insights

0:00 — Abstract bokeh intro; no technical evidence. It sets production tone only.
0:30 — Talking-head frame with studio microphone while defining skills. Visual value: confirms this is conceptual setup, not screen evidence.
1:00 — Diagram of SKILL.md structure: YAML front matter, markdown instructions, scripts, references, and assets. This matters because it shows the concrete packaging pattern behind the verbal definition.
1:30 — Star-history graph comparing repos such as addyosmani/agent-skills, mattpocock/skills, affaan-m/everything-claude-code, and garrytan/gstack. Visual value: supports the claim that skill libraries are trending, while also warning us not to equate stars with quality.
2:01 — GitHub README for Agent Skills with the Spec/Plan/Build/Test/Review/Simplify/Ship table. This directly anchors the SDLC comparison.
3:34 — Markdown/README showing skills such as triage, tdd, to-issues, to-prd, prototype, and references to CONTEXT.md/ADRs. This matters because it shows Matt Pocock’s workflow is documentation- and issue-driven, not just prompt snippets.
4:36 — gstack README with contribution graphs and specialist roles. The visual supports the claim that gstack is role-heavy and productivity-branded, but not the causal claim that gstack alone created the output.
6:36 — Code editor with many uncommitted changes and a “smaller commits” pro tip. This visually supports the review-bottleneck warning.
7:07 — GitHub PR page titled feat(circle): generate post titles, with “Made with Cursor” and a test plan. This matters because the workflow ends in a conventional PR artifact.
7:38 — CodeRabbit diff/review UI showing Zod schema and LLM prompt changes. Visual value: demonstrates layered AI review around concrete code, schemas, and prompts.

My read / why it matters

The video is useful because it de-hypes skill libraries without dismissing them. The real leverage is not memorizing slash commands; it is turning repeated engineering judgment into small, testable, maintainable workflows. For technical users, the best next step is to audit your last five AI-agent failures and convert only the recurring ones into skills with exit criteria.

Verification notes

Four verification passes were applied. Source/evidence audit: external links were checked for Addy Osmani agent-skills, Addy’s blog, AI Hero/Matt Pocock skills, gstack, ECC, VoltAgent, and CodeRabbit docs; unsupported star/productivity claims were softened. Transcript/comment/frame fidelity audit: timestamped claims were matched to the extracted transcript, top comments, and keyframe visual analysis. Hallucination/overclaim audit: sponsor claims and repo-star claims were marked as caveated, and verdicts separate evidence from interpretation. Actionable Insights audit: the top section was expanded into concrete workflows with first steps, evaluation criteria, links, and cautions rather than generic summaries. Residual uncertainty: current repo star counts and CodeRabbit Atlas-specific feature naming may change quickly; validate against live repos/docs before making purchase or rollout decisions.