Transcript: How to Leverage Domain Expertise — Chris Lovejoy, Notius Labs

Watch video

AI Engineer24m 45sTranscript ✅Added May 19, 8:40 pm GMT+8

Source video ID: kfSDc2eVLo4

Transcript

0:07 — [music] » Okay, so Welcome everybody. Hi, my name is is Chris Lovejoy. And I’m going to talk about how to leverage domain expertise to build better AI products. And the way I believe you can do this is by building what I call a domain native AI organization. So I’m going to talk about what that looks like. » [snorts] » Um brief background about me because um this is relevant. So I started out my
0:38 — career as a medical doctor. I trained at the University of Cambridge and then worked in the NHS for several years. And I then uh in 2018 moved into the kind of AI space uh trading and building models and working with various organizations including Tandem uh which is the largest um clinical AI product provider in the in the UK in terms of adoption. » [snorts] » Uh also Anterior. I was the first employee uh and it’s a Square Back startup performing um prior authorization in the US. And then also uh various other
1:08 — startups as well. » [snorts] » And my um kind of challenge at all of these different companies was we’re building some kind of product that bakes in domain expertise. How can you How can you do that? How can you leverage that in a way that then builds a differentiated AI product? So I talked a bit about this uh at the last AI Engineer conference in San Francisco. Um I shared my kind of thesis which is that the system for
1:38 — incorporating domain insights is more important than the sophistication of your of your models of your pipelines. » [snorts] » I talked about the the last mile problem which is this challenge of getting your product to really understand the specific nuances of the workflows of the use cases uh of your of your customer that you’re serving. Um and uh this talk was seen by a lot of people, about 100,000 people. Um so it crossed different platforms. Many of them reached out to me and the kind of most common question that they had was, “Okay, but how do I
2:08 — How do I build my organization? Like I I’m kind of on board. I get I get that domain [snorts] expertise is important, but how do I build my organization to actually enable this? To like What kind of domain expert should I hire? Where should I put them in my organization to enable this to to take place?” » [snorts] » So that’s what I’m going to talk about today. Um and you know, I’ve heard people say that winning in vertical AI uh you kind of want to get the best model. And I actually I don’t think this is true. I think that fun I think that fundamentally uh winning in vertical AI is an organizational problem.
2:44 — And so I have this this framework based on different um organizations I’ve seen and and worked at and and how they’re baking in domain expertise uh which is that you can do it in three main ways. You can have your domain expert as an oracle, as an evaluator, as an architect. And I’m going to talk a bit about more about those in more detail. But just stepping back, uh what why do we care about vertical AI? Ultimately it comes down to the fact that vertical AI is a big opportunity. Uh so a lot of VCs are talking about, you know, this is the next big thing. Uh
3:15 — there’s a lot of startups raising big amounts of money on this potential uh and this promise. And as Bessemer pointed out, vertical AI, uh you know, we had vertical SaaS and that was a you know, $50 billion or something market, but actually now AI is moving into the kind of labor force and that’s like a multi-trillion But we’ve not yet really seen um success at scale. And according to Gartner, about 50% of all generative AI projects were abandoned last year. » [snorts] » And I think there’s many reasons for this, but my take is that one of the core reasons for this is that we’re
3:46 — often building AI products and AI systems without really having uh you know, a deep kind of understanding of exactly what workflows we’re automating um and you know, exactly how how the kind of how the domain experts would would uh perform these these kind of processes. » [snorts] » So just to reiterate my belief is front-end models are good enough, but the gap is now how do organizations operationalize the expert judgment around them? And the three most common mistakes that I have seen are firstly not hiring domain experts
4:17 — or hiring them too late. Secondly, hiring the wrong kind of domain expert. And then finally, not fitting them into your organization appropriately, not leveraging them uh correctly. And so this maps to three questions which I’m going to address in this talk which is, do you really need a domain expert? Um what’s the why? If so, who do you need? And then how do you leverage them? So I’ll touch on the first one and then this oracle evaluator architect framework I’m going to go through answers those second two. So, do you really need a domain expert? My take is that the answer is is yes.
4:49 — And it’s because appraising AI quality is something that’s very important to do in your company. You want to be able to make decisions between different approaches based on the kind of output that they give. And your company needs to have a sense of what good AI quality looks like. And that ultimately requires judgment. And the best kind of judgment involves some kind of domain expertise that you can then bake into your into your company and into your product. And this domain expertise could be kind of specialized domain expertise. It could be you’re building like a health care or legal product and you therefore
5:21 — need doctors or lawyers and like bring their domain expertise in some way. But it doesn’t have to be. There’s also kind of informal domain expertise um and I’ll give some examples later. Um and it’s worth saying also that you know, I’m not making the pitch in this talk that you should go out and hire a domain expert per se. I mean maybe that makes sense. But in many cases you might already have somebody in the organization that essentially performs this function and you want to basically kind of empower them or understand how best to organize uh your organization around them. » [snorts]
5:51 — » Okay, so then who who do I actually need? I’m going to go into the framework now. So I mentioned these these these three models for incorporating domain expertise. The first of these I call the oracle which is that you have your domain expert who directly embeds their domain expertise into your actual application. Um and I’ll talk about what that actually looks like in in a second. The evaluator performs a different kind of role. They define how you measure quality. Like what is it that matters? What are we optimizing for? And once that’s defined, they set up a system where that can then be measured
6:22 — and you get that data that you can then ultimately use. And finally, the architect builds this system that automatically improves itself and and bakes in more domain expertise and just learns from being used by users uh and those interactions. » [snorts] » So if we consider like a very simplified um view of what does it mean to build an AI product? It’s kind of two stages, right? Like you assess how how is my AI product doing right now? Like what is the state of play? And then how can I improve that? So in the oracle model, both of these
6:52 — steps are being performed by the domain expert. So the domain expert is looking at the AI outputs, they’re playing with the product, they’re looking at the traces and they’re seeing, “Okay, how’s it doing? Where’s it going wrong? Uh what could be improved?” And then they’re going in and they’re improving it. And it you know, it might be in a simple case it’s just like kind of tweaking prompts I think is is often um the mechanism here, but it might be that they’re also baking in some kind of domain expertise um you know, via adding documents or or tools or something else. Um there’s a few different levers, but fundamentally they’re doing both sides of the equation. Now this changes when you think about
7:23 — the evaluator. Now the domain expert is still assessing, but assessing is actually more complex. So they are defining metrics that you can store, some kind of objective way of quantifying performance. And they are then, you know, building that system. It might be that you’re going to get the message from customers. The customer uses it. There’s some kind of user metric that you’re going to you’re going to use and and use as your north star that determines, you know, really what you care about. It might be that you want to actually hire out domain experts to perform some kind of reviews. So in a clinical context, you might want to hire clinicians who
7:53 — review a subset of all of your AI outputs and determine how you’re doing. You might do things like LLM as judge. Um this can this can really kind of scale in different directions depending on your use case. But what’s not working and then working with the engineers to actually then go ahead and and make those changes. » [snorts] » And then the final which is the architect here kind of stepping back, the the domain expert is actually designing this a system that does both.
8:24 — So the idea is that there’s there shouldn’t be too much human in the loop in this actual process in the middle, but the domain expert here is creating the ability to do that. Um and you know, leaning on the different mechanisms that are available to to kind of have this automated improvement. And I spoke about this before, so there’s a link at the bottom. Uh I’m not going to detail exactly what the system looks like, but there’s there’s different uh different levers that you that you can lean on. So then that’s the framework. Who do you therefore need for your specific organization? I kind of came up with this like rough guide to understand based on your
8:55 — specific use case and also your specific scale what makes the most sense. And the first question I think to ask yourself is can I measure performance in metrics? Is there some kind of objective thing that I can I can measure and that’s really meaningful? Or is it something where actually taste is a bit more important? Um if it’s not the case that you can measure it then you want to take an oracle approach. And the question is is one person enough to do it? Maybe at my current scale or maybe also certain types of products are actually very
9:25 — suitable to just have one oracle. And I’ll give some examples to make it more concrete. Um whereas if not, then you might have multiple who maybe handle different sub sub segments of your of your kind of AI outputs. Now assuming that you can measure things and you want to measure things the following question is, “Okay, what is manual iteration fast enough? Can I just have an engineer who makes some changes um and that then that’s fine. I can I can adapt to customer needs and and and learn.” If so, you only really need the evaluator. You need your domain expert to basically assess what is quality, like how we’re doing right now, feed that into engineers to make the improvements. But actually if that’s not
9:56 — fast enough and you you kind of don’t want to be relying on this as human iteration, then you want some approaches that an architect would come up with. It’s worth us also calling out that within an organization, this can evolve over time. So, a common starting place is is an oracle, particularly if you’re, you know, a startup and you’re kind of small in scale, your domain expert probably should be the oracle to begin with. But then they can progress, and in some cases they want to progress one way, in some cases another way. Um there are certain criteria for when it’s actually necessary, and also if it’s possible. So, it’s necessary if things
10:28 — are currently breaking at your scale. Um and then depending, you might go in one or two different directions. But you can only actually move on to an evaluator if there’s some kind of objective uh metric you can measure. And likewise, as I mentioned, if manual iteration is too slow, and you can identify methods to automate improvements, then you might uh progress to a kind of an architect model. All right. Let’s make this more tangible with some case studies. So, Granoola, uh for those of you who don’t know, is a is a company that uh generates AI meeting notes, and recently uh kind of passed a billion in
10:58 — valuation. Um and my my friend at Granoola, Joe, has a background as a as a kind of writer and journalist, and she joined Granoola as the first employee. She wrote all She wrote all of the prompts. And she did extensive research, like many, many hours reading papers and um talking to hundreds, thousands of users to understand your really what it is that makes a good meeting notes. Uh and her role has kind of been to be the primary gatekeeper of AI quality. So, question to the room, of the three models I’ve outlined, like
11:28 — which which would you say this this sounds like? Yes, exactly. So, what she’s doing is she’s doing both sides. She’s kind of assessing the outputs, looking at the meeting notes that the current version of the product generates, and then she’s making those uh improvements herself directly and and doing this iterative loop. And in Granoola’s case, my argument is that this makes sense because, firstly, there’s no objectively perfect meeting note. We can’t just say, okay, this is this is the best note. Um so, actually, it’s a bit more about human taste and kind of really baking these sorts of things in. And then because with meeting notes, um
11:59 — Granoola, that is the main kind of core output of their product, is this meeting notes, so actually it’s amenable even at scale to have this kind of direct human review and improvement loop. Um so, even as they’ve scaled, Joe is still kind of largely playing this role. There’s there’s nuances. They’re also are running like evals and and have um kind of built out some other internal tooling to help other people contribute to prompts. But fundamentally, uh this is a process of um kind of working as an oracle. So, let let’s consider a uh a second use case. So, Tandem uh have this medical AI
12:32 — scribe product, uh which basically listens to a doctor’s consultation, and again, basically generates meeting notes, but this time they’re medical. So, that introduces some nuance. And their first domain expert that they hired was someone called Roy. His background is he’s a medical doctor, and then he went to McKinsey. And what he did is he reviewed the medical notes, he updated the the prompts. So, he’s following this kind of oracle approach. But then as that scaled, it became impossible for one person to do this. So, what they did
13:02 — was uh they went the approach of a decentralized oracle and hired out um various other doctors to um basically do this in kind of like subsets. And so, they updated the platform to support this kind of long-tail of prompt customizations. The challenge was that they’re serving many different specialties, many different countries, many different types of notes, many different use cases. So, they kind of needed the ability to have doctors from those different countries, specialties, etc., to be able to make these uh tweaks and changes.
13:33 — And this this worked well for them because you need medical expertise, um hence needing domain experts. There’s some subjectivity. There’s no like perfect medical note in the same way there’s no perfect meeting notes. There are these variations I described. And then because there’s so many customizations, you need many different domain experts, each who might own a particular relationship or a particular subset. Like you might have somebody who uh is doing, you know, some subset specialty in some uh geographical location who will work with the customers there, understand what they need, tweak the prompts, and then that prompt version is live there, but then they have many
14:04 — other, you know, like thousands or more of different variations on prompts that are available in different different places. Okay, third and final case study, and excuse the self-reference here. This is um Is it Is it I can use it as an example because I’m obviously very uh familiar with this one. Uh so, I worked at Anterior, and we were our first product was prior authorization. And prior authorization, for those who are not familiar, is basically a process in the US where you go to your doctor, your doctor requests some kind of a scan, let’s say requests an MRI. It then goes to the insurance company, and the
14:34 — insurance company has nurses and doctors inside uh who will determine, okay, should this treatment be approved? Like is it appropriate? Um and so, at Anterior, I was the first technical employee. I built the initial product, the the prompts and the code. Um and then I was reviewing our outputs. I was kind of putting my doctor hat on, clinically assessing, is this is this an appropriate uh decision that’s being made? Um then I updated the prompts and the code to handle that. But again, that didn’t scale. And as we had more customers, and we we
15:04 — had uh you know, more more kind of demand, um and more different variation in what we were serving, it was then necessary for for for that role to evolve. And so, I then defined metrics and failure modes. Um I built a review dashboard to enable clinicians to come in and look at certain outputs, and hired clinicians to perform that process. So, we sort of scaled that out, and that gave us these metrics and information on on performance that we could then use to collaborate with the engineering team to make these changes. But again, this also um
15:34 — didn’t scale from the fixing point of view. So, having the kind of like manual iteration these engineers didn’t work because um there were many different variations in how these organizations are ultimately uh yeah, kind of like uh where was I going with this? Uh Yeah, the the the the kind of There’s a lot of variation in how they interpret their policies and their rules. Um so, you kind of need a mechanism to learn that at the at the edge, uh
16:04 — and that was why I then needed to kind of evolve more into an architect approach, and um design methods for automated improvement, which I talk about in another talk here. And so, overall, this made sense for us to go from uh oracle to evaluator to architect because in our case, the AI quality is clearly measurable. Um the AI output is either approval of the prior authorization decision or escalation for a clinician review. So, it’s either correct or incorrect based on the medical evidence. You need clinical reasoning to determine that. There’s large variation in how prior auth rules are interpreted. Uh that that was the point I was trying to
16:34 — make here. So, therefore, because of this kind of variation, you need a system that is able to adapt dynamically and learn from the usage such that then you can um can solve that. » [snorts] » And then also uh this progression, so understanding the failure modes uh as an oracle and it really kind of having a deep understanding of the way in which the AI is performing, the way the ways in which it’s failing, is very helpful for designing uh defining how to assess uh and improve things. Cool. So, who do I need for my my organization, and what skills should
17:04 — they have? So, to think about what kind of skills you might want to have for each of these different roles, it’s helpful to kind of come back to what are they doing? So, the domain expert is kind of looking at the outputs, tweaking prompts, um making changes, improving the product. So, the core skills that they require is relevant domain expertise. And by relevant here, I particularly mean direct experience of the use case. Uh so, you know, just being a doctor might not actually be sufficient. Let’s say your product is medical coding, you, you know, have you got experience of medical coding? Do you know what that, you know, involves? Do you know where that can go
17:35 — wrong? Um so, yeah, you kind of want to think more granularly than just, oh, I need a I need a doctor, I need a lawyer, I need, you know, whatever domain expert you might need. And then other relevant experiences, you know, things like prompting, content engineering, these are all like nice-to-haves. I mean, I think these are a little bit more learnable, um but but are helpful. You know, definitely insight into detail. I mentioned the example of of Joe Granoo- Joe at Granoola, she like really went into the details, um and uh that was very valuable. And then things like customer communication to understand, you know, what exactly do your your customers want?
18:05 — For the evaluator, if they’re doing something differently, they’re they’re um building up this kind of system to understand the performance. So, there I think, as well as domain expertise, you really want this data science intuition. Because I fundamentally see like a lot of this as being kind of data science skills. You’re defining metrics that you care about. You’re figuring out a way to build a system that collects those metrics and and makes them usable. And these are kind of These are all data science skills, fundamentally. Um it’s also helpful to, you know, have some of the skills that we mentioned from before. You might want statistical skills if
18:35 — you’re doing this at scale and you really kind of wanting to analyze some of these metrics. » [snorts] » Industry connections are helpful if you go down the line of building out a team that does these internal uh reviews. It’s helpful to be able to hire from that network. It’s helpful to have leadership experience to then manage that team. And it’s helpful to have product management experience because you’re then going to help feed into uh engineers making these improvements, and you want to be able to collaborate with them effectively. » [snorts] » And finally, what do you need if you are a kind of like architect profile? Um Similar, obviously, domain expertise is kind of a given.
19:05 — And then you also want experience working on LLM-powered products. Uh so, you want to know what are the kind of levers that you can lean on to ultimately improve performance. Um and then it’s also relevant, you know, as I mentioned, everything kind of from before, and then engineering skills can also be helpful because you can lean on those to even like implement or or steer the the development of some of these uh improvements. So, third and final question, how should you leverage them? Okay, you found the perfect domain expert. Maybe you already had them, or maybe you have have, you know, found somebody you want to hire in. Um
19:35 — I think three principles that I that I uh yeah, I I I believe are important. » [snorts] » The first is there’s a lot of value organizationally in defining a principal domain expert. And what I mean by this is a single individual who’s ultimately accountable for the quality of the AI performance and makes can therefore like make decisions. Um so this avoids, you know, kind of consensus by committee where it’s everybody’s kind of responsible so nobody’s truly responsible and and you move a bit more slowly. Um and then if that if that’s that
20:06 — individual’s responsibility, they can really invest the time in in deeply understanding how the AI is performing. Um and that can then, you know, ultimately feed into better decision-making uh but it kind of gives them the the time and space to really focus on that. And I think there’s a lot of value in in that from a from a um speed point of view. The second is I think it’s important to give them ownership. You you ultimately don’t want to treat them as just kind of like a consultant or or somebody that you come to for advice. You want them to be in the room when you’re making these decisions. Um and that’s because you
20:36 — want to be able to leverage them to build a differentiated product. Uh if they’re not in the room, then it’s it’s much harder to do that. » [snorts] » And um yeah, I think you know, kind of concretely if one of one of my arguments is you could just have a a domain expert who kind of performs reviews of the AI outputs and kind of gives that to somebody else. But what you actually want to do is you want to go beyond that and like kind of build out a system around that such that organizationally you are able to measure things accurately and improve things accurately and uh
21:06 — you know, that that’s an important part of of building a differentiated product. Um so I’ve seen failure modes here where for example, uh one company I was at uh had the kind of two different senior clinicians. Um so neither of them were really a principal domain expert. It’s kind of a bit ambiguous uh who had the final say. They kind of weren’t given that much ownership um in in the process and were a bit more kind of advisory. And essentially the progress was just very slow uh in terms of actually like building this this wider system for for you know, improving the AI quality and the performance. Um and then what ends up happening is
21:36 — actually both of those two individuals after about like 12 18 months kind of left the company I think in part because they maybe didn’t feel they had the ownership. And uh that’s obviously a loss for an organization because they have a lot of relevant context in their head that that you know, you kind of want to be building on top of. » [snorts] » And the final point is hiring for breadth. So we saw all the you know, all the skills listed on the previous slide. I mean I think it’s a it’s a big ask to you know, try and find somebody who has all of those skills. So my general recommendation is hiring somebody who has the domain expertise kind of is is the base and needs to be a given. But
22:06 — then as many of those skills as possible. Um and you can always pair them with somebody who then kind of complements that. So let’s say you know, you end up needing a statistician and this person doesn’t have stats experience, you can find a statistician and and they can kind of collaborate there. Um but the I think the failure mode here is that you you hire somebody who is maybe just a domain expert. I don’t want to like trivialize being a domain expert. I mean obviously that that’s that that can be a big thing, but um that can then make it hard for that individual to grow from being an oracle to evaluator to a to an architect if that’s necessary. Uh and then you know, organizationally
22:37 — maybe you need to bring somebody else in and that that’s um kind of less less desirable I think than having somebody who’s kind of with you the whole way through. And also actually having having experience as kind of in this oracle role in the company, you have a great insight into how the AI performs. You’re very well placed to then become an evaluator. You know what kind of things you should be measuring, what kind of failure modes to be avoiding. And uh yeah, it’s it’s kind of very complementary to to have somebody here who evolves there. » [snorts] » So just to summarize, um I think assessing the AI quality in your product requires domain expertise. That
23:07 — can be formal or informal. There’s broadly three ways that I’ve seen domain experts um kind of be baked into the organization. The first is directly bringing domain expertise uh into the application. This is kind of the oracle role. Also defining quality so that you can measure and improve it, which is the evaluator role. And then designing systems that improve and learn over time. So there’s architect role. And the appropriate approach really depends on your specific use case. It also depends on your current scale. And each of these approaches requires different adjacent skills.
23:38 — And my advice is hiring a principal domain expert early is very effective organizationally. You want them to have the relevant domain expertise and you want this breadth of adjacent skills. You want to give them ownership. You want to probably start them as an oracle in that kind of role. Um and then as your product evolves, as your organization evolves, their role should evolve along one of the the axes that make sense. Maybe it’s kind of decentralized but they’re playing a role in in labeling the decentralized oracle. Maybe it’s through this evaluator architect spectrum. Um but it’s up to you to kind of like figure it out. And and ultimately the playbook on how this stuff works is still being figured out
24:09 — at the moment because we’re still early on in this in this journey of building AI-powered products with domain expertise. Um so thank you. Uh yeah, I’ll put the the slides on my on my website if you want to um download these slides. I think maybe do one question. I’m conscious of time. Uh but yes, thank you. Thank you for your attention. » [applause] [music]