Agentic Search for Context Engineering — Leonie Monigatti, Elastic

AI Engineer1:03:12Transcript ✅Added May 18, 4:40 pm GMT+8

Actionable Insights

Inventory every context source before choosing retrieval tools List local files, skills, plans/scratchpads, databases, web sources, and memory stores. Map each to a native search tool. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: - 4:25–5:56 multiple context sources: The transcript lists local files, plans, skills, databases, web, and memory. Fixed RAG pipelines are often insufficient because agents need to choose tools, rewrite queries, search multiple sources, and perform multi-hop retrieval. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Treat shell/exec as a powerful search tool, not just a command runner Use rg, find, SQL CLIs, curl, and small scripts when they outperform a generic retriever. Guard it carefully because it can also touch sensitive systems. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: - 6:27–7:28 shell tool section: The speaker describes bash/exec as a versatile search and integration surface across files, databases, scripts, and HTTP APIs. - Shell/exec can replace many specialized search tools: Mixed. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Design search tools with parameter recovery A top comment notes that when Claude calls a tool with wrong parameters, it may stop trying and answer anyway. Make tools return actionable errors and examples for the next call. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: The comments are mostly positive, but the most useful technical pushback is about wrong tool parameters: once an agent receives an error, it may incorrectly proceed with insufficient information. Supporting sources and concepts: - RAG evolved from fixed vector retrieval to agentic retrieval where the model decides when and how to call search tools. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Use multi-hop retrieval when answers depend on discovered entities Let the agent search once, extract new identifiers/terms, then search again instead of forcing a single vector query. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: Supporting sources and concepts: - RAG evolved from fixed vector retrieval to agentic retrieval where the model decides when and how to call search tools. Fixed RAG pipelines are often insufficient because agents need to choose tools, rewrite queries, search multiple sources, and perform multi-hop retrieval. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.
Measure retrieval, not just final answers Track tool-choice accuracy, invalid-parameter rate, empty-result rate, search latency, follow-up search count, and grounded-answer rate. Start by turning this into a small, reversible pilot: write down the exact input, expected output, owner, and success metric before changing the wider workflow. The useful detail from the analysis is: Fixed RAG pipelines are often insufficient because agents need to choose tools, rewrite queries, search multiple sources, and perform multi-hop retrieval. Supporting sources and concepts: - RAG evolved from fixed vector retrieval to agentic retrieval where the model decides when and how to call search tools. Treat the first run as an evaluation, not a migration: capture before/after examples, note where the method saves time or improves quality, and keep the old path available until the new one passes repeated checks. Watch for the main failure mode here: overgeneralizing the creator’s demo beyond the evidence. If the video or comments only showed a narrow case, keep the rollout narrow and require fresh proof before broad adoption.

Core thesis

Monigatti argues that context engineering is mostly agentic search: the core problem is deciding which information from many possible sources enters the context window. Fixed RAG pipelines are often insufficient because agents need to choose tools, rewrite queries, search multiple sources, and perform multi-hop retrieval.

Comment insights

The comments are mostly positive, but the most useful technical pushback is about wrong tool parameters: once an agent receives an error, it may incorrectly proceed with insufficient information. That means tool design must help the model recover, not merely fail. Other comments praise the talk as informative, which supports usefulness but does not add technical evidence.

Deep research

Supporting sources and concepts:

RAG evolved from fixed vector retrieval to agentic retrieval where the model decides when and how to call search tools.
Coding agents already use multiple context tools: file search, shell, memory, skill loading, and web search.
Elastic’s domain expertise in search/retrieval makes the talk’s emphasis on tool design credible, though the transcript remains the primary evidence here.

Limiting evidence:

“80% agentic search” is a useful hot take, not a measured universal constant.
Shell tools are versatile but risky; they need sandboxing, allowlists, output limits, and secret hygiene.
More tools can confuse agents unless descriptions, parameters, and error messages are carefully designed.

Verdict

Context engineering depends heavily on search: Agree, high confidence.
Fixed one-shot RAG is insufficient for many agent tasks: Agree, high confidence.
Shell/exec can replace many specialized search tools: Mixed. It is versatile, but less safe and less structured than domain tools.
Tool errors are a major agent failure mode: Agree, medium-high confidence, supported by the top technical comment.

Screen-level insights

1:16 context engineering diagram: The talk frames context engineering as selecting from all possible context sources into the model window.
1:48 search-arrow emphasis: The “arrow” from context sources to context window is the real system: search tools decide what the model sees.
4:25–5:56 multiple context sources: The transcript lists local files, plans, skills, databases, web, and memory. This is the strongest operational section.
6:27–7:28 shell tool section: The speaker describes bash/exec as a versatile search and integration surface across files, databases, scripts, and HTTP APIs.

Verification notes

Verification passes performed: source/evidence audit against transcript and comments; fidelity audit for RAG/search-tool evolution; hallucination audit treating “80%” as framing rather than fact; Actionable Insights audit converting the talk into a retrieval-tool checklist. Residual uncertainty: full workshop code examples were not included in the draft excerpt.

Actionable Insights audit: expanded to the newer detailed format with fuller implementation notes, evaluation checks, and cautions where the existing evidence supports elaboration.