Why this Claude Code engineer uses HTML files as AI specs | Thariq Shihipar (Anthropic)

How I AI35:58Transcript ✅Added May 19, 2:40 am GMT+8

Actionable Insights

Prototype an HTML spec artifact before spending agent runtime. Create specs/<feature>.html or docs/agent-plan.html with sections for goal, user flow, mockups, data table rules, risks, and acceptance tests. Ask Claude Code/Cursor to generate the first version, then review it in a browser before implementation. Benefit: richer layout and mockups make it harder to blindly approve a 1,000-line plan. Evaluate by whether a human reviewer can spot missing states and wrong assumptions within 5 minutes. Caution: HTML costs more tokens than terse Markdown; reserve it for visual/product-heavy planning, not simple API refactors.
Keep instruction source-of-truth machine-readable and human-readable. Pair the HTML page with a compact SPEC.md or frontmatter block that contains non-negotiable constraints, acceptance criteria, and links. Let the HTML carry visual hierarchy while Markdown/YAML carries stable instructions. First experiment: convert one existing Markdown PRD into HTML and measure review comments, token usage, and rework. Tools to try: Claude Code (https://docs.anthropic.com/en/docs/claude-code), Cursor (https://cursor.com), and static preview with python3 -m http.server.
Encode UI rendering rules directly in the spec. For data-heavy features, include a table mapping data type -> component -> empty/error/loading state -> formatting rule. The demo’s strongest moment is the HTML plan containing rendering/visualization rules, which gives the agent a reusable design contract rather than vague prose. Evaluate by generating two implementations from the same spec and checking consistency. Caution: do not put secrets or production data in these artifacts.
Treat product managers and tech leads as compute allocators. Before launching a long agent run, estimate budget: expected wall time, model/API cost, review time, and rollback risk. Add a compute budget box to the HTML plan: max run time, allowed tools, when to stop, and required checkpoints. This operationalizes the video’s “Claude can run 8 hours = Claude can spend money” point.
Use HTML selectively, with a token-cost gate. Commenters correctly push back that HTML is token-heavy. Add a decision rule: Markdown for terse instructions, Mermaid/Markdown for architecture, HTML for visual brainstorming, dashboards, demo scripts, design systems, and stakeholder-facing plans. Track prompt+completion tokens and defect rate; if HTML does not reduce review misses, revert.

Core thesis

Use HTML, not just Markdown, for agent plans/specs when the audience is a human who must actually review, compare, and steer long-running coding work.

Big ideas / key insights

The valuable pattern is not “let the agent run longer”; it is to make the work inspectable, measurable, and interruptible.
The transcript evidence points to concrete workflow design: artifacts, traces, evals, policies, or specs that survive a single chat context.
The comment evidence is used as a sanity check: where practitioners push back, the verdicts below are deliberately more conservative.
The strongest practical takeaway is to convert the creator’s idea into a small pilot with explicit success/failure criteria before standardizing it.

Best timestamped moments

0:00 — The hook: long Markdown plans are now so long the speaker stopped reading them, which is framed as a mistake.
3:03 — Thariq argues that agents got longer-running, but humans still need to stay in the loop.
4:04 — HTML gives the model a richer medium for mockups and scrollable plans than ASCII-style Markdown approximations.
5:06 — The “compute allocator” framing: planning is where teams decide whether a long model run is worth the cost.
20:21 — The plan contains a table of rendering and visualization rules; this is the most operationally useful artifact in the demo.
32:05 — The design-system claim: store design guidance as design.html in the repo so agents can reference it.

Practical takeaways / recommended workflow

Create the durable artifact first. Write the spec/rubric/policy/trace schema before letting agents perform expensive work.
Run a constrained pilot. Pick one repository, one team, or one workflow; record baseline cost, latency, failure rate, and review time.
Instrument the loop. Capture traces, commands, tool calls, test results, and human corrections so the workflow can be evaluated later.
Add gates. Require acceptance tests, human approval for sensitive actions, and rollback paths before allowing broader automation.
Review after 5-10 runs. Keep the practice only if it improves measurable outcomes, not just because the demo felt compelling.

Comment insights

Comments split between enthusiasm for human-readable visual specs and concern about token bloat. The best caveat: HTML is for humans more than for Claude; terse Markdown plus Mermaid may be better for model-facing instructions. One practitioner says they already used an interactive HTML travel dashboard successfully; another suggests converting Markdown to HTML with a script.

Deep research

Anthropic Claude Code docs. Claude Code is Anthropic’s agentic coding tool and supports repo-aware workflows; this supports the general context that plans/specs can drive coding agents. Source: https://docs.anthropic.com/en/docs/claude-code
HTML Living Standard / MDN. HTML is a standard presentation medium with semantic structure, tables, forms, and styling; it plausibly improves human scanning for visual specs. Source: https://developer.mozilla.org/en-US/docs/Web/HTML
OpenAI / Anthropic prompt caching and token-pricing docs. More verbose HTML can materially increase token cost; the commenters’ cost objection is real. Source families: provider pricing/token docs and prompt-caching docs.
Mermaid docs. Markdown plus Mermaid is a lower-token alternative for architecture diagrams. Source: https://mermaid.js.org/

Evidence quality note: research here uses named public documentation, standards, and widely known project sources where available. Some vendor claims are treated as product claims unless independently benchmarked in the user’s environment.

Verdicts

HTML is better than Markdown for agent specs: Mixed / medium confidence. Better for human review of visual plans; not inherently better for model instruction. Overclaimed if treated as a universal replacement.
Plans/PRDs still matter for strong agents: Agree / high confidence. The video evidence, normal software practice, and cost/risk management all support this.
Product roles become compute allocators: Agree with framing / medium confidence. It is useful shorthand, but it underclaims the human product judgment still required.

Screen-level insights

Frames show an interview plus Claude Code/browser-like HTML artifacts: idea cards, visual mockups, a detailed rendering-rules table, and a design.html/design-system discussion. The visual step matters because the entire claim is about readability: the page layout, cards, and tables demonstrate why HTML can compress many choices into a scan-friendly artifact.

Representative extracted frame anchors checked against transcript context:

0:00 — image youtube-extract/Qrpm7E80wQ0/frames/000_000000.jpg; transcript context: Markdown became a really popular way of interacting with agents, but the plans are so long, I honestly have stopped reading them. And this was honestly a mistake. I think that you still need to be really in the loop. » Plans matter, PRDs matter, spec matters. » When you say, okay, cloud can run for 8 hours. What you’re really saying is cloud can spend 500
1:00 — image youtube-extract/Qrpm7E80wQ0/frames/001_000060.jpg; transcript context: welcome back to how I AI. I’m Claraveo, product leader and AI obsessive here on a mission to help you build better with these new tools. Recently, I was able to attend Code with Claude, Anthropic’s first developer conference. And as part of that, I got to spend a little time with Thoric, who works on Claude Code, and taught me something that has blown my min
2:03 — image youtube-extract/Qrpm7E80wQ0/frames/002_000123.jpg; transcript context: measurable results. Celigo makes this possible. And now [music] with Celigo Aura, it’s never been easier. Celigo Aura gives you [music] access to the entire platform through natural language, connecting your systems and turning intent into action. All of it under your control. Companies like databicks, PayPal, and Ollipop rely on Celiggo to run critical busi
4:04 — image youtube-extract/Qrpm7E80wQ0/frames/004_000244.jpg; transcript context: like they can uh have a lot more information. » They’re a lot scrollable more scrollable. And when you’re talking about implementation like you know sometimes you see claude make these like little asky markdown things where you’re like oh like here’s a little you know little mockup and it’s trying really hard. in HTML it doesn’t need to try nearly as hard r
5:36 — image youtube-extract/Qrpm7E80wQ0/frames/005_000336.jpg; transcript context: me all the time, Claire, you said product management is dead. What’s next? I’m going to say you’re a compute allocator, babe. Like that’s [laughter] that’s the job now. You’re still doing the same thing, though. You’re writing documents to decide whether or not something else should do do work in the shape of that work. Okay. So, you’ve convinced me HTML is
7:36 — image youtube-extract/Qrpm7E80wQ0/frames/007_000456.jpg; transcript context: thought was really cool. This is such a cute like little thing. » Extremely cute. And I It’s what’s really funny is just this morning a chat purd user messaged me and they’re like, » I love the mockups in chat purity and I’m like, what in the world? What are you What are you talking about? Because I have something very similar to this in code review right
14:12 — image youtube-extract/Qrpm7E80wQ0/frames/013_000852.jpg; transcript context: of this demo. Y » you want it all in one piece. You want the product idea. You want the I guess you want a 12minute walkthrough of how you’re going to demo it. Exactly. You want code snippets. You want style guide. You want that allin-one thing because this is a self-contained little project that’s easier to have it all at once. But what I can imagine in la
17:17 — image youtube-extract/Qrpm7E80wQ0/frames/016_001037.jpg; transcript context: to part two » pressure him to do it um in the in the comments please please tell us. This episode is brought to you by Persona. You’re learning to build with AI, but there’s an important question you need to ask. Who is actually using your product? Is it a legitimate user, a bot, or a fraudster? Brex, Figma, Etsy, and Twilio trust Persona to answer that que
20:21 — image youtube-extract/Qrpm7E80wQ0/frames/019_001221.jpg; transcript context: it back into the um the output. » Okay. I want to pause because people are going to totally miss what you just did. So, I’m going to repeat it, » which is you have this HTML plan. » Yep. And there’s a section in the HTML plan that is a pretty like specific table of rendering and visualization rules. Yes. » Per data type that you could predict would be in
32:05 — image youtube-extract/Qrpm7E80wQ0/frames/030_001925.jpg; transcript context: use cloud design, make a design system, but not only that, use HTML to encode that design system in your repo so it can be referenced at any time. design.mmd is dead. Long lived design.html. Did I get it right? » Yeah, I think that’s right. I think that’s right. » Throw it. I’m pretty good. Okay. Well, this was so fun. Before we get you out of here and bac

My read / why it matters

This video is useful if you convert it into an operating procedure rather than copying the headline. The durable lesson is about control surfaces for AI work: specs humans read, traces teams audit, evals that catch regressions, identity policies that revoke access, or graphs that preserve provenance. The risky version is adopting the slogan without the measurement and governance layer.

Verification notes

Source/evidence audit: Checked the extracted transcript/comment packet and named external sources/docs relevant to the main claims. Vendor/tool links are identified as vendor/project sources, not neutral proof of effectiveness.
Transcript/comment/frame fidelity audit: Timestamped moments and comment insights were kept close to extracted evidence in youtube-extract/Qrpm7E80wQ0/ and the draft packet. Screen claims are limited to the extracted key-frame metadata and visible UI descriptions; for -QFHIoCo-Ko, no frame-derived claims are made because key frames were not extracted.
Hallucination/overclaim audit: Headline claims were softened where evidence was insufficient. Verdicts explicitly mark mixed/low-confidence claims and separate practical heuristics from proven facts.
Actionable Insights audit: The top section was checked for executable first steps, tools/commands or links where available, evaluation criteria, and cautions. Generic summary bullets were rewritten as workflow steps.
Residual uncertainty: I did not have independent benchmark results for the specific demos, and several claims would need local measurement before adoption. Transcript extraction status was marked unknown by the extractor, so the analysis relies on the processor’s excerpted transcript evidence rather than a full raw transcript page.