← Back to library

By AI Engineer · 19:18 · transcript unknown

Watch videoView transcript

LLM codegen fails and how to stop 'em — Danilo Campos, PostHog

Video: https://youtu.be/juoNbJiZUi0?si=KH_0mKpvZvIUwfww

Video ID: `juoNbJiZUi0`

Duration: 19:18

Source quality: direct transcript extracted successfully; comments extracted from the top available YouTube comments.

Core thesis

Autonomous codegen works when you stop treating the model as a magic programmer and start treating it as a capable but context-hungry agent that needs fresh documentation, good examples, sequenced instructions, constrained tools, and feedback loops.

Danilo’s strongest claim is that the PostHog Wizard succeeds not because it is mostly clever code, but because it is mostly high-quality prose and context engineering: “90% markdown files, 8% tools for delivering and processing markdown files, and the rest agent harness stuff.”

Big ideas / key insights

1. Model rot is unavoidable for fast-moving software

At 2:18–4:25, Danilo explains that LLMs are snapshots of the web from months ago. For fast-moving libraries and APIs, that means the model is often confidently wrong: inventing keys, making up APIs, and applying stale integration patterns.

His practical answer is not fancy retrieval first; it is fresh markdown context. With today’s large context windows, PostHog lets the agent select up-to-date documentation and slide it directly into context.

2. Give agents “model airplanes,” not full production apps

At 4:55–6:28, he introduces “model airplanes”: thin example projects that have the right shape of a real app without the complexity. They include PostHog integrations across frameworks and languages, with simplified features such as auth that is “auth-shaped” but not production-auth-complete.

This gives the model a concrete pattern for where integration code belongs while keeping the example token-efficient.

3. Breadcrumb the agent to limit improvisation

At 6:58–9:31, Danilo warns that if 15,000 monthly integrations produce 15,000 different implementation styles, support becomes a nightmare. The solution is to sequence the task.

Instead of telling the agent “integrate PostHog” up front, the Wizard first asks it to find files with business value: login, Stripe, churn signals, and other meaningful product events. Then it asks what events are worth tracking. Only after the agent has built that intermediate understanding does it implement the integration.

The lesson: don’t over-specify the destination too early. Shape the path.

4. The biggest source of agent failure is often human error

At 9:31–12:06, he makes a funny but serious point: humans have context limits too. Teams change prompts, tool definitions, docs, and instructions, then forget contradictions or missing pieces.

PostHog catches this with inference-time interrogation at the stop hook: after every run, they ask the agent what could have been done better to set it up for success. This surfaced missing tool permissions, contradictory tool instructions, and language-mismatched guidance such as JavaScript instructions inside a Python project.

5. Tool permissions need to prevent “successful but creepy” behavior

At 12:06–13:38, Danilo describes an early Wizard version reading `.env` files because file writes mechanically require reads. That solved the integration task but risked sending sensitive environment contents into cloud inference logs.

They fixed it by locking down reads around env files and giving the agent a narrow tool that could only check whether a key exists and write a new value. The principle: the agent can fulfill the user’s request and still violate trust if tool access is too broad.

6. Prose is becoming a compounding asset

At 14:09–16:15, he argues that code depreciates, but good prose/context can appreciate as models improve. The Wizard’s value lives mostly in markdown, model examples, and sequencing rather than elaborate scaffolding.

The agent metaphor is an octopus: it can wriggle around problems if you give it enough information and sequence that information well. Overconstraining it with code can reduce its ability to adapt.

Best timestamped moments

Practical takeaways / recommended workflow

1. Treat stale model knowledge as a default failure mode. Assume the model does not know your current API.

2. Serve fresh docs as markdown. Let the agent choose relevant docs and insert them into context.

3. Maintain reference implementations. Create small “model airplane” apps for each major framework/language/integration shape.

4. Flatten examples into agent-readable context. PostHog’s Q&A describes generated skill files that include docs plus model airplanes as references.

5. Breadcrumb the task. Sequence the agent through discovery → event design → implementation instead of asking for the final integration immediately.

6. Capture intermediate artifacts. Event names and descriptions are written into a small file before implementation, giving the agent a stable plan.

7. Interrogate every run. At the stop hook, ask: “What could we have done better to set you up for success?” Aggregate those answers to find prompt/tool/doc defects.

8. Constrain sensitive tools. Replace broad file access with narrow operations, especially around secrets such as `.env` files.

9. Invest in prose. Keep high-quality markdown instructions, docs, examples, and skill files as first-class production assets.

10. Avoid over-scaffolding. Give the agent enough context and constraints to succeed, but leave room for adaptive problem-solving.

Comment insights

The extracted comment set is tiny: one visible top comment.

Agreement / enthusiasm

The only extracted commenter reacts strongly to Danilo’s delivery: “One minute in and I already love this guy!” That suggests the talk’s humor and plainspoken style landed immediately, even before the technical content developed.

Disagreement patterns

No disagreement was present in the extracted comments. There is no comment-side pushback on the architecture, security model, or “prose over code” thesis in the available data.

Practitioner additions

No commenter added additional implementation patterns or field experience. The useful practitioner detail comes from the Q&A instead: PostHog uses generated skill files, a context service, flattened model-airplane markdown, the Claude Agent SDK, a CLI wrapper, and an LLM gateway for inference.

Memorable phrases from comments

Pushback / caveats

No comment-derived caveats were extracted. Caveats from the talk itself are important: broad file access can leak secrets, stale docs cause hallucinated APIs, and unconstrained integrations create support burden even when they technically work.

Concrete tools/workflows mentioned by commenters

None in the comments. Concrete tools/workflows mentioned in the talk and Q&A include:

My read / why it matters

This is one of the more useful production-agent talks because it avoids the vague “agents are the future” layer and gets into the boring parts that actually make codegen reliable: context freshness, examples, sequencing, feedback, and permissions.

The key inversion is that the valuable artifact is not necessarily more code. It is a maintained body of prose and examples that tells the agent what “good” looks like today. That is especially relevant for fast-moving products where model pretraining will always lag reality.

The most transferable pattern is the stop-hook question. It is cheap, humble, and powerful: ask the agent where your harness failed it. That turns every failed or awkward run into feedback about missing docs, broken permissions, contradictory instructions, or bad sequencing.

The security section also deserves attention. Agent UX can look magical while quietly doing unacceptable things under the hood. A production agent should not merely complete the task; it should complete it in a way that preserves trust.