The AI Skill I Rely On Daily — Priscila Andre de Oliveira, Sentry

AI Engineer16m 53sTranscript ✅Added Jun 4, 2:34 pm GMT+8

Speaker: Priscila Andre de Oliveira, Sentry
Duration: 16:53

Actionable Insights

Turn repeated codebase-comprehension prompts into a reusable skill file. Priscila’s strongest concrete workflow is her private “catch me up” skill: a markdown-style prompt that structures repository exploration into modes such as architecture, convention, feature trace, syntax, testing, and history. First step: look through your last 50–100 AI coding sessions and tag each prompt as comprehension, modification, process, review, generation, or other. Any prompt shape that repeats more than a few times should become a reusable skill, command, or prompt template. Evaluate success by whether a new contributor can answer “what does this repo do, where is the relevant flow, what conventions matter, and what should I not touch?” without rereading half the repository.
Use AI first to build your mental model, not just to generate code. Her personal measurement was surprising: 67% of her AI usage was comprehension and only 2% was code generation. Copy the pattern for serious production repositories: before asking the agent to implement, ask it to map the architecture, trace the feature path, identify ownership/conventions, summarize relevant tests, and explain the change history. The practical gate is simple: if you cannot explain the plan and the files being changed, do not ship the AI’s code. This is especially important in large, fast-moving codebases where stale components, removed feature flags, new lint rules, and merged PRs can invalidate an apparently plausible agent answer.
Add an explicit “understand” step between research and planning. The talk cites the common research → planning → implementation flow, then argues that the missing step is human understanding of the agent’s research. A useful workflow is: (0) write the problem statement, (1) have the agent research the repo and produce a source-backed map, (2) ask follow-up questions until you understand the map, (3) ask for a plan, (4) implement with small checkpoints, (5) review the diff against the original mental model. Evaluate by asking: “Could I defend this change in code review without saying ‘the agent said so’?”
Create a “catch me up” command for PR review, not only onboarding. Priscila uses the skill when reviewing colleagues’ PRs because she may have enough context to be responsible for approval but not enough to be confident. Turn this into a repeatable PR-review checklist: ask the agent to identify the feature path, summarize touched modules, compare the diff with existing conventions, list tests that should exist, and flag risky assumptions. Expected benefit: faster review without turning approval into rubber-stamping. Caution: the agent can miss project-specific social/contextual constraints, so use it to ask better questions rather than to replace judgment.
Instrument AI usage by category before deciding what to automate. Her “116 sessions → six categories” analysis is a good lightweight analytics pattern. Export or summarize recent agent sessions, classify them into comprehension/modification/process/review/generation/other, and then build tools for the dominant category. If comprehension dominates, invest in repository maps, dependency graphs, file-role summaries, test guides, and history explainers. If modification dominates, invest in patch templates and test harnesses. The point is to avoid buying or building “code generation” automation when the real bottleneck is understanding.
Keep codebase quality work as a prerequisite for agent productivity. Sentry’s quality quarter—removing TypeScript any, clearing stale TODOs, simplifying code, and removing unused feature flags—matters because agents perform better when code and conventions are clean. First step: pick a narrow cleanup metric that directly affects agent reliability, such as reducing ambiguous types, obsolete components, or duplicated patterns. Evaluate whether AI-generated plans become shorter, diffs become smaller, and review comments shift from “wrong convention” to real product/logic issues.
Treat internal bots as workflow products with observability, not toys. The Sentry examples in the talk—Abacus for AI usage tracking, Warden for code review, Junior for turning Slack bug reports into PRs, and an AI SDK testing repository—show a mature pattern: every agentic tool has a narrow job, an integration point, and some quality loop. If copying this, start with one high-friction workflow like “Slack bug report → triaged issue/PR draft,” then log who invoked it, what context it used, what it changed, whether a human accepted it, and where it failed. Do not let it perform irreversible external actions without review.

Core thesis

The biggest unlock from AI in a large production codebase is not raw code generation; it is comprehension. AI becomes valuable when it helps engineers build, update, and verify their mental model of a complex, fast-moving system before they plan or ship changes.

Big ideas / key insights

“Agent manager” is closer to technical lead than prompt typist. Priscila jokes that she no longer codes and only prompts, but the actual workflow is active steering: asking clarifying questions, checking whether the agent understood the repository, and deciding whether the plan is safe.
Large codebases make comprehension a daily tax. Sentry’s codebase is described as 15+ years old, used by 100k organizations, with roughly 100 PRs merged per day. In that environment, the bottleneck is often “what changed, why, and what conventions matter now?”
AI coding without understanding is the failure mode. She agrees with warnings from Jack Nations and Armin Ronacher-style critiques: if engineers no longer know what is in their own codebase, something is wrong.
Skills are a practical way to package personal workflow knowledge. Her catch-me-up skill is not magic; it is a detailed markdown prompt with exploration modes and visual/table output preferences. Its value is consistency and repeatability.
Comprehension scales beyond onboarding. The same skill helps with unfamiliar repositories, PR review, incident/regression tracing, and product-history questions that previously required Slack archaeology or waiting for colleagues in other time zones.

Best timestamped moments with interpretation

1:08–1:40 — “Agent manager” setup. The three-monitor image is funny, but it frames the actual behavior: one engineer orchestrating multiple agents and treating them like reports that need direction.
3:14–3:45 — Why quality matters at Sentry scale. She emphasizes that Sentry is a real business with a complex codebase and 100k organizations depending on it. This is the guardrail against casual “ship slop code.”
3:45–5:15 — Internal AI tools. Abacus, Warden, Junior, and the AI SDK testing repo show Sentry experimenting with bounded, workflow-specific agents rather than one generic magic bot.
5:46–6:46 — Quality quarter. The TypeScript any, TODO, simplification, and feature-flag cleanup work is important because agents inherit the ambiguity and messiness of the repository.
9:40–10:13 — Usage analysis result. The headline datapoint—67% comprehension and 2% generation—is the talk’s pivot. It turns “AI coding” from a generation story into an understanding story.
10:44–11:14 — Catch-me-up skill structure. The six exploration modes make the skill actionable: architecture, convention, feature trace, syntax, testing, and history are exactly the lenses a human reviewer/onboarder needs.
12:15–13:16 — Demo on an unfamiliar repository. The prompt “I am a new contributor, catch me up…” is the reusable workflow. It converts a vague onboarding problem into a structured repository explanation.
13:48–15:52 — Missing step: understand. Her critique of research/planning/implementation is the best conceptual takeaway: agent research is not useful until the human understands and can steer it.

Practical workflow

Export or review recent AI coding sessions and classify them by intent: comprehension, modification, process, review, generation, other.
Identify the most repeated comprehension prompt shape.
Create a local skill/command markdown file with: goal, when to use it, expected inputs, exploration modes, required output sections, evidence requirements, and follow-up questions.
For a new repo or PR, run the skill before asking for code changes.
Require the output to include source anchors: files, functions, tests, commits, or documentation pages.
Ask a second prompt that challenges the explanation: “What might you have misunderstood? Which files contradict this? What tests would prove it?”
Only then ask for a plan and implementation.
After merge/review, update the skill with any convention the agent missed.

Comment insights

The useful comments cluster around three themes:

Demand for the actual skill. The top comment and another highly liked comment ask Priscila to publish the catch-me-up skill. That reinforces that the audience saw the skill artifact—not the general “AI helps me code” claim—as the valuable reusable piece.
“This is detailed requirements, like the old days.” A commenter summarizes the workflow as writing detailed requirements. That is partly right: the skill works because it turns vague intent into explicit exploration criteria. The difference is that the output is not just a spec; it is a continuously refreshed repository mental model.
Problem statement before research. One commenter adds that research/plan/implement misses hidden step 0: write the problem statement. This is a strong addition to Priscila’s “understand” step. The safest workflow is problem statement → research → understand → plan → implement.
Pushback on agentic-content hype. Some comments complain that the channel is drifting into agentic-startup funnel content or joke about “97% vibe-coded.” That pushback is worth noting: practitioners are skeptical of broad AI claims unless the workflow is concrete, source-backed, and quality-preserving.
Presentation dynamics vs engineering value. A few comments criticize delivery, but others praise the point that AI is useful for understanding existing systems. For a technical viewer, the reusable pattern matters more than stage polish.

Deep research on the main claims

Claim 1: The main AI unlock in large codebases is comprehension, not generation.

Support: The transcript’s strongest direct evidence is Priscila’s analysis of 116 sessions: 67% comprehension, 2% code generation. This is personal data rather than a universal benchmark, but it matches the lived reality of large systems where onboarding, ownership, dependencies, and history dominate change cost.
External context: Public discussion around AI coding agents increasingly distinguishes “vibe coding” from disciplined engineering workflows. Jake Nations’ “The Infinite Software Crisis” talk is indexed with the “vibecoding our way to disaster” framing and references Rich Hickey’s “Simple Made Easy” complexity argument. Armin Ronacher has also been publicly associated with warnings about engineers losing understanding of their code under agentic workflows.
Nuance: In greenfield or prototype work, generation may be the dominant benefit. In mature production codebases, comprehension usually becomes more valuable because the risk of wrong changes is higher.
Verdict: Agree, medium-high confidence. The claim is strongest for large, long-lived repositories with frequent merges and high reliability expectations.

Claim 2: A reusable skill/prompt file is an effective way to package codebase-comprehension workflow.

Support: The demo shows the skill asking structured questions and producing summaries/tables for a new repository. Anthropic’s Claude Code skills documentation and public anthropics/skills repository support the general pattern: skills are markdown instructions/examples that extend Claude with repeatable procedures.
Nuance: A skill is only as good as its evidence discipline. If it summarizes without file/function/test anchors, it can become confident fiction. For production use, the skill should require citations to source files, tests, docs, commits, or traces.
Verdict: Agree, high confidence. The practical artifact should be a repo-specific or organization-specific skill with mandatory evidence and review checks.

Claim 3: Research → planning → implementation is incomplete without human understanding.

Support: The transcript explicitly argues that the agent’s research can go down the wrong path unless the engineer understands and steers it. This aligns with the critique of vibe coding: outsourcing not just typing but also comprehension creates hidden risk.
External context: Rich Hickey’s “Simple Made Easy” is widely cited for separating ease from simplicity and warning against complexity accumulation. AI agents can make difficult work feel easy while still adding complexity if humans do not understand the result.
Nuance: Some low-risk tasks can be delegated more aggressively. But once changes affect product behavior, data, security, observability, or public APIs, understanding should be a gate.
Verdict: Strong agree, high confidence. Add explicit “understanding checks” before implementation.

Claim 4: Internal AI bots should be narrow and workflow-specific.

Support: Sentry’s examples are bounded: Abacus tracks AI usage, Warden reviews PRs, Junior turns Slack reports into PRs, and the AI SDK testing repo creates tests for integrations. These are not a single all-purpose agent; they are mapped to existing workflow points.
External context: Sentry’s public docs describe AI Agent Monitoring for tracing/debugging agent workflows, which is consistent with treating agents as observable production systems rather than informal helpers.
Nuance: The exact internal tools named in the talk may not all have public docs. Treat the names as transcript evidence, not externally verified public products.
Verdict: Agree, medium confidence. The architectural pattern is sound; public verification of each internal tool is limited.

Claim 5: Code quality cleanup improves AI-assisted engineering.

Support: The transcript links Sentry’s quality quarter to agent effectiveness: removing any, stale TODOs, unused feature flags, and complexity makes the codebase easier for both humans and agents to reason about.
Nuance: AI can also help perform cleanup, but asking it to operate inside ambiguous, inconsistent code increases the chance of wrong conventions and brittle patches.
Verdict: Agree, high confidence. Cleanup is not separate from AI adoption; it is enablement work.

Verdict

Bottom line: strong agree with the talk’s core direction. The best use of AI in serious codebases is not “let it write code while I stop thinking.” It is “use it to interrogate the codebase until I understand enough to steer, review, and own the change.” The main limitation is that the talk’s 67%/2% breakdown is one engineer’s usage pattern, not a universal statistic. But the operational lesson is highly portable: measure your own usage, package repeated comprehension workflows into skills, and make understanding a gate before implementation.

Screen-level insights

1:08 frame — multi-agent orchestration setup. The frame shows Priscila’s “agent manager” joke: multiple agent windows across monitors. It matters because the workflow is not one prompt/one answer; it is active supervision of several AI workstreams.
5:15 frame — internal AI tooling context. The nearby transcript discusses Sentry’s AI SDK testing repository and the instruction to only prompt until a useful result emerges. This supports the idea that Sentry is experimenting with agent workflows under real engineering constraints, not just using AI for demo snippets.
11:14 frame — skill as markdown and visual output. The nearby transcript says the skill is an MD file with human language and that she prefers visual/table/organogram-style summaries. This is important: the artifact is simple enough to copy, and the output format is tuned to how the engineer understands systems.
12:15 frame — catch-me-up demo prompt. The frame corresponds to the demo prompt for a new contributor asking how a repository works and whether it simulates/intercepts Sentry envelopes. This is the concrete reusable pattern: state your role, ask the skill to orient you, and require clarification of a specific domain question.

My read / why it matters

This is a useful talk because it reframes AI coding around responsibility. The dangerous version of agentic coding is “I do not know what changed, but the diff passed.” The valuable version is “I can ask unlimited questions, trace history faster, and enter implementation with a sharper mental model.”

For teams, the immediate move is not to buy another coding agent. It is to identify the repeated comprehension questions engineers ask every day and turn them into reusable, evidence-backed skills. That compounds quickly: onboarding gets faster, PR review gets sharper, incident tracing improves, and agents become a force multiplier without eroding ownership.

Verification notes

Source/evidence audit: Main claims were checked against the extracted transcript: Sentry context, internal tools, quality quarter, 116-session analysis, 67% comprehension / 2% generation, catch-me-up skill modes, and the understand-before-implementation argument.
Comment fidelity audit: Comment insights were derived only from extracted comments, especially requests to publish the skill, the “detailed requirements” observation, the “problem statement” addition, and skepticism about agentic hype.
Frame fidelity audit: Screen-level notes were limited to the four extracted frames and nearby transcript context; no unreadable UI text was invented.
External research audit: External checks were used to verify the broader context around Claude Code skills, Sentry AI Agent Monitoring, Jake Nations/Rich Hickey-style complexity framing, and public discussion of Armin Ronacher’s AI coding critiques. Specific Sentry internal tools such as Abacus/Warden/Junior are treated as transcript-backed, not independently verified public products.
Actionable Insights audit: The top section includes first steps, expected benefits, evaluation criteria, and cautions. Remaining uncertainty: Priscila’s private catch-me-up skill itself was not publicly available in the extracted evidence, so the recommended template is inferred from her described modes and demo behavior rather than copied from the original file.