← Back to parent livestream

Segment 31: Anun Joshi (Bland): voice AI failure modes beyond the model itself

AI Engineer10h 9mTranscript ✅Added May 29, 12:54 am GMT+8

  • Timestamp: 09:02:39
  • Duration: 12m 26s
  • Livestream range: 09:02:39 → 09:15:05
  • Transcript evidence: 23 chunks, about 1852 words

Actionable Insights

  1. Turn voice AI failure modes beyond the model itself into an operating checklist. Turn the speaker’s idea into a concrete workflow: define the user, the input, the tool boundary, the review step, and the failure condition.
  2. Separate capability from accountability. The recurring lesson in this chapter is that more capable AI changes who does the work, but not who owns the outcome. When applying it to voice and relationship AI, write down what the system may do autonomously and what still requires explicit human judgment.
  3. Instrument the loop before scaling it. The useful operating loop is: capture context, let the tool act, review the result, preserve the learning, and tighten the next run. Write down acceptance criteria and review notes early so the workflow can be audited later.
  4. Design for the failure mode, not the demo. The polished demo version of voice AI failure modes beyond the model itself is less important than the places it breaks: weak context, unsafe permissions, weak evaluation, unclear ownership, latency, or poor human review.
  5. Convert this into a personal/relationship agents checklist. The durable takeaway from Anun Joshi (Bland) is to turn “voice AI failure modes beyond the model itself” into explicit operating rules: what the system may do, what it must prove, what evidence a reviewer needs, and where a human must stay accountable. The next useful artifact is a short checklist or eval case that someone can actually run.

What they actually use/show that is worth copying

  • Claude for slides/drafts: Claude is used for first drafts, speeches, and slides. The key lesson is using a frontier model to speed up expression while the human still owns the judgment and accountability.
  • container isolation: Container isolation is the safety idea worth copying. Assume the agent will make mistakes, then make sure those mistakes happen inside a boundary that limits blast radius.
  • Slack agent factory: The agent is embedded in the existing delivery workflow. That makes review, testing, and handoff happen where the team already works.
  • ChatGPT / AGI builder stack: The valuable part is preserving editability and taste. The tool is useful when it keeps design intent alive instead of producing generic one-shot output.
  • Google shopping/travel UX: This is a concrete mechanism from the talk. The useful question is whether it reduces friction, improves reliability, or makes human review easier in a real workflow.
  • Exa search primitive: The agent is embedded in the existing delivery workflow. That makes review, testing, and handoff happen where the team already works.
  • ElevenLabs speech/turn-taking stack: This is a concrete mechanism from the talk. The useful question is whether it reduces friction, improves reliability, or makes human review easier in a real workflow.

Core thesis

Anun Joshi (Bland) uses this chapter to make a specific argument about voice AI failure modes beyond the model itself. The useful pattern is not just the named product or institution; it is how the segment exposes the new operating model for voice and relationship AI: humans keep taste, accountability, and deployment judgment while agents or models absorb more of the execution loop.

The chapter starts from this evidence: “Um, I just want to say before we start, all the speakers have been amazing. So, can we just give a round of applause for all of them?” That opening matters because it frames the segment as a concrete slice of the broader AIE Singapore Day 1 theme: agentic systems are moving from novelty demos into production workflows, institutions, creative tools, infrastructure, and embodied systems. The analysis should therefore be read as a nested talk-level packet, not as a generic summary of the entire livestream.

Comment insights

The extracted YouTube comments do not provide reliable speaker-specific audience reactions for Anun Joshi (Bland). So this section should not pretend there is detailed sentiment about the talk. The useful audience-facing read is instead content-based: this segment is valuable for viewers who care about voice ai failure modes beyond the model itself, especially the concrete implementation choices and operating constraints called out in the transcript.

Deep research

The research value of this talk is the practical architecture behind voice AI failure modes beyond the model itself. Anun Joshi (Bland) is not only making a broad claim; the useful details are the concrete mechanisms named in the transcript: Claude for slides/drafts, container isolation, Slack agent factory, ChatGPT / AGI builder stack, Google shopping/travel UX, Exa search primitive.

The main question to take away is how those mechanisms change the workflow. What becomes cheaper, what needs a stronger checkpoint, and what must remain human-owned? For this talk, the strongest evidence is in the speaker’s examples rather than in generic AI optimism. Use the named tools and operating choices as the starting point for further research, then validate whether the same pattern fits your own environment, security constraints, and evaluation loop.

Verdict

  • The talk contains a specific operating lesson about voice AI failure modes beyond the model itself: Agree. The speaker gives enough segment-level evidence to extract concrete implications rather than treating it as generic conference commentary.
  • The named tools/examples should be copied blindly: Disagree. They are useful design references, but each needs to be checked against local security, data, latency, cost, and human-review requirements.
  • The most valuable part is the concrete workflow detail: Agree. The strongest takeaways are the mechanisms, constraints, and examples the speaker actually names.
  • The implementation details are transcript-supported: Agree. This page cites details such as Claude for slides/drafts, container isolation, Slack agent factory, ChatGPT / AGI builder stack.
  • Human accountability disappears when agents improve: Disagree. The recurring production pattern is to move execution into tools while keeping ownership, review, and failure handling explicit.

Screen-level insights

  • 9:03:13 — opening frame: Anun Joshi (Bland) frames the talk around voice ai failure modes beyond the model itself, with the useful setup being: “in Singapore. I moved to San Francisco two years ago for Bland. And fun fact, I actually used to be a theater kid in junior college here. Um yeah, I never thought I’ll be back on stage again, but here I am. I do like storytelling a lot.”
  • 9:08:29 — Claude for slides/drafts: The talk shows or names this as part of the actual workflow. The relevant evidence is: “working the same way?" or “Why is my agent not working the same way it did yesterday?” I don’t know how many of you have dealt with customers telling you that or you yourself maybe have experienced that.”
  • 9:09:30 — container isolation: The talk shows or names this as part of the actual workflow. The relevant evidence is: “It sucks to break the trust of your customers and that’s hard to rebuild. What we’ve built and what I’m proud of building in bland was that we allow customers to deploy canary deployments and test out versioned agent releases.”
  • 9:04:48 — Slack agent factory: The talk shows or names this as part of the actual workflow. The relevant evidence is: “we’re actually serving millions of calls every month. It still hasn’t struck me that someone right now is talking to our agent. That’s crazy.”
  • 9:05:20 — ChatGPT / AGI builder stack: The talk shows or names this as part of the actual workflow. The relevant evidence is: “I didn’t know that was possible. I didn’t know we could do that. Um so yeah, all of this has grown way bigger than I could ever imagine.”
  • 9:11:34 — closing implication: The later part of the talk turns the idea into a practical takeaway: “adding commas to each of the digits in between. The reason that worked was that the LM can then now treat each digit as a separate token and we actually found like later on that uh a paper was released uh which you can look up for sync and stro 2024 that was r…”

Verification notes

Verified against the extracted transcript for Anun Joshi (Bland)’s talk on voice AI failure modes beyond the model itself. The supported claims in this page are based on concrete tools/artifacts named in the talk: Claude for slides/drafts, container isolation, Slack agent factory, ChatGPT / AGI builder stack, Google shopping/travel UX, Exa search primitive, ElevenLabs speech/turn-taking stack. I treated auto-caption wording cautiously, kept only details that are explicitly present in the segment transcript, and avoided importing claims from adjacent speakers or from the overall conference description.