Transcript: Bounded Autonomy: Between Free Will and Determinism — Angus J. McLean, OLIVER

Watch video

AI Engineer16:52Transcript ✅Added May 26, 8:40 pm GMT+8

Source video ID: t4359sKBu4w

Transcript

0:14 — Uh thanks so much for coming everyone. Uh happy Friday. Um My talk today is uh it’s called bounded autonomy between free will and determinism and it’s sort of about changing uh the way we think about uh our interactions with large language models. This talk could have been called a number of different things. I toyed with the idea of between automation and customization, between oversight and agency, and between possibilities and constraints. But what
0:44 — it really is is conventional wisdom for unconventional times and it’s based off some of my personal experience in uh designing agents within industry. Uh it’s for people actively experimenting with agents. Uh it’s both for beginners who are overwhelmed by the pace of change and experts stuck in a rut in need of a new perspective. What it is not going to be is overly technical, definitive, or prescriptive. These are like my ideas and I don’t expect you to follow them if you’re not that interested. Um Great. I’m Angus. I’m an AI director at
1:15 — Oliver. Uh we’re a startup. We’ve been in the advertising industry for a few years and then we switched into almost fully gen AI now. We’ve got 3,000 staff across 46 countries. Uh you probably haven’t heard of us, but I guarantee you’ve definitely seen our work. You probably didn’t realize it. You probably didn’t notice. I didn’t when I joined the company because you didn’t know it was AI when you saw it. Uh this is some of our work we’ve done for Johnnie Walker. Um we generate around 4,000 assets a day
1:45 — for more than 200 brands, many of which you’ve probably interacted with today, maybe even this morning or this afternoon. Uh unlike other sort of gen AI content agencies, we actually put quite a lot of media spend behind these assets. You know, anything ranging from 20 grand to a few million. And that enables us to measure these assets performance in the wild. This gives us quite a good feedback loop. We’ve got huge volumes of data and it allows us for much faster iteration and a more deeper understanding of what actually works. Right. Who knows what an ad agency looks
2:16 — like? Anyone? Probably not. I didn’t think so. Uh yeah, you probably think it’s maybe a bit like Mad Men. But essentially there’s three parts of an ad agency. There’s the accounts department, there’s the creative department, and the strategy department. Accounts typically made up 50% of the agency before. Now we have 20% creative and 20% strat. Uh accounts manages the client relationships and keeps the project on track. Uh creative turns the ideas into compelling ads and content. That’s historically been the core of the agency. And then strategy is layered on
2:46 — creative. So that’s how do we get to those images? How do we define the insight, the audience, and the direction behind the work? Uh Creative and strategy have previously been knowledge work, but they’re now increasingly agentic. We don’t just do image generation, we do ideation, we do copywriting, we do content production, and these are all done with forms of agents, right? Uh I’m on the strategy side. I’ve been for a long time and we do audience insight, trends analysis, competitor analysis, and performance optimization, all with different agents cuz we’re predominantly
3:17 — customer facing. So we operate in quite a high fast-paced and high-risk environment. Um and when we scale these images, it can be just as useful it can be just as negative for a brand if there’s poor reception of these images as if you did a good job. Uh why do we use agents? We use them primarily for speed and secondarily for scale. Agents allow us to move faster and be more reactive. For creative teams, this allows us to generate content at speed and this is
3:47 — especially important for iteration and testing. For strategy teams, um agents allow us to scale our research so we can get much closer to the consumer and we can eventually like convert them more easily down the line. So typically something we might do is campaign personalization or territory personalization. So that’ll be we’ll have ideal personas and then we’ll we’ll try and deep research each of those personas or each of those territories. Uh a good example is, you know, we do advertising in multiple cities around the world. We might want to localize that for New York. We might
4:18 — want to localize that for Miami. And then in general the goal is to do much more with much less and create more effective advertising. Uh so my first piece of advice here today is slow down. Uh I think AI is moving very fast. There’s quite a blink and you’ll miss it mentality. Uh everything seems to be coming and going very quickly. There’s a lot of change. Uh I’ve seen a lot of tools come and go in the last few years, but if you did blink and you did miss it, was it really that important? Probably not. Uh the actual core of
4:50 — LLMs hasn’t changed. I mean, let’s say at least since the 1990s. Um Andris Drubel would argue even further back. But no matter how advanced LLMs seem, today’s large language don’t actually understand the data they’re presented with. And we know this cuz they’ve still got several clear limitations, right? So first one is data efficiency. So humans learn from very few examples, whereas models need massive data sets in order to come up with relatively simple conclusions. And it especially it’s especially evident when it comes to learning. So
5:21 — models don’t continuously learn without forgetting in the way that humans do. It’s a closed box as we’ll see in a second. One could argue that recent gains come more from less from material breakthroughs and more from sort of brute force model improvements. You just heard in the last presentation the amount of compute required to generate those images, 400 marathons. That’s maybe too much to be using. Um So what are the This is a famous diagram that’s a bit of a lesson. Um
5:51 — Um So actually what we end up doing is we end up creating band-aids to go around these model constraints, right? Uh and most of our tools are band-aids and you could argue even the way the models themselves are being trained is a form of band-aid, right? And one of the ways to spot these band-aids is that they’re temporary. They’re a quick fix. They’re not a long-term solution. They’re quite superficial. They often mask the symptoms rather than fixing the problem and they’re often inadequate and they don’t fully fix the issues. Uh so uh this is how I like to think of LLMs. It’s very simplistic, but uh it’s just a
6:22 — closed box with knowledge inside. Um I prefer to think of it as like a flexible database capable of doing semantic math than anything else. So it’s fully closed. I don’t expect any form of like emergence or the model to like actually learn anything. Um this is quite evident when we’re trying to like especially in advertising when we use trends. So you’ll probably come up against it not recognizing certain models, but for us the biggest problem is trend identification. If it’s really new, the model won’t recognize it. Um most recent advances in agentic capabilities have been largely due to
6:53 — increased context window. This is especially true with like long-running agents because the longer context windows allows them to do longer running tasks. You’ve got things it needs to do that, history of actions, tool outputs, structure over time, goals and plans. So it has to be organized. It has to know what it’s doing. It has to be able to store and retrieve information in the short term. So without large context, the system forgets mid-task. It can’t work on long complex multi-step workflows. And you may have heard of this famous incident of it deleting a
7:23 — lot of emails cuz their context ran out. Uh this is a model size context difference, GPT-2 versus Gemini 3.5 Pro. Does anyone remember 512 context windows? They’re very small. Um Context windows keep getting larger, but they’ll never be enough. The total knowledge in the world keeps doubling about every 12 hours, so we’re always going to want more. However big they get, it’s it’s not going to be enough. I think that’s the sort of clear clear problem. So how does that work from a sort of developer’s perspective?
7:55 — Um Actually, I find context constraints the models as much as guardrails. So context is sort of like a soft constraint rather than a hard constraint. So you can you can feed feed stuff in and you can move stuff around and shape example of this is not giving the model access to the internet and instead giving it high quality documentation, you’ll get much better results. And in typically when we do this, we find that models are really bad at spotting promotional content. So when
8:25 — we’re doing advertising, we’re looking up competitors, it’ll soak up all of the information they wrote themselves rather than like consumer information, which is ultimately like what we’re after. Um and they’re very susceptible to SEO. I think as soon as you take the knowledge out of the model and you give it another tool, it’s quite like limited by the way that it uses that tool. Um So in the past, this was originally what we used to do. So when we had really small context windows, we used to do I used to do TF-IDF for cluster
8:55 — labeling. So you would take the uh most frequent words within the cluster. You cluster the text corpus. You’d do the most frequent words within the cluster cluster and then you’d label that. Uh you’d also do like top top K and that sort of thing. Whereas now, you know, context assembly is much more dynamic, right? Um and we’ve gone from like context uh a lack of context to like a too much context, right? So now I think we need to sort of think more about what we can exclude from that. And the challenge is no longer getting context in, but to a
9:25 — certain extent keeping the noise out, right? Uh and con Constraints actually create creativity, right? Abundance stops you being scrappy. So if suddenly if suddenly progress stops tomorrow, how would you make the most of what you have? Uh I think it’s important to set sort of self-imposed constraints. Everyone knows that you shouldn’t use like the full context window, but maybe how little of the context window can you use and still get the task done is a more interesting question. I’m not a token billionaire yet. I know there are some people
9:55 — probably in this room. This is again parallels with early computing lack of compute very difficult times. That’s from the model railroad club. That was from yesterday. But great things come out of constraints and limitations, right? So if you can build space war with only 4,000 words, that’s pretty big deal. And I don’t know if you’ve ever seen the developers of Grand of Crash Bandicoot, but they talk a lot about how they would like using the memory function in the PS2 to get like massive improvements, right?
10:26 — So if they can do it, I imagine you probably can. Other things to try maybe using an old This is just experimental. I’m not suggesting you do this in production. Using an older smaller version version of a model or harness. This can help you understand like connect with the model maybe a bit more. Building your own harness, building your own memory and compaction. And also I think like preprocessing and archiving stuff is really important. So how do file systems work? How do knowledge how do you negotiate knowledge
10:56 — graphs? That sort of thing. And this will overall it will improve your ability to prompt and control the model. You’ll be much closer to it. And you’ll have better fundamentals and better best practices. And in general you just have an improved understanding of the data that you’re working with. You never know what’s going to come in handy later. This is Rosenblatt actually pulling out wires in the perceptron. And that’s later, you know, formed the basis of what became dropout. The next one is keep it simple.
11:27 — Have you ever built anything and then realize that the model could just do it better on its own? Probably. Got a lot of nods there. Happened to me recently. I was trying to do my CV. What would win? This was my very complex CV application. Did not work. Well, it did work, but it didn’t work as well as four simple letters. Those four simple letters were HTML. And yeah, it was I was pretty uh blown away by this. So that was over a 10x improvement. I’d
11:58 — say it’s probably a 100x improvement. So I think just because you have the power of the gods, that doesn’t mean you should use it. Models are like naturally verbose and they tend towards complexity. Like they are going to suggest the most complicated solution. So don’t waste your time. Don’t make Don’t waste tokens and don’t make extra work for yourself. Your ideas will collide with reality pretty fast. So no matter So what matters the most is building a simple version that works. And I think in when we’re talking about using agents, we talk about shortening that feedback loop. But maybe you should
12:29 — shorten your feedback loop with reality when you’re building products. This is I think probably the most interesting bit. So AI at its core is just translation. So this is the attention is all you need paper. It was done on English to French initially. And then I think more interesting is the idea of being able to translate text into images, images into audio, audio into video, right? And I think there’s an argument that knowledge production is just in itself
13:00 — summarization, right? Like in this talk I’m trying to summarize my experiences of the last few years. We’re compacting that down into like knowledge or like good knowledge, right? So different types of data can be converted into common internal representations and then transformed into something else. Like you’ve got an unstructured input and you turn it into an unstructured output. You’ve got a structured input and you turn that into an an unstructured output. This is like I think I guess what MCP is in
13:31 — model handoffs for long running agents. This is This is all like a very similar idea, right? You’ve got something structured on one side, something very unstructured on the other. And I think these two things can sort of coexist. And if you can manipulate something through a representation space, then that means that the structure of that data is not an inherent inherent property of that object, right? So it’s actually more of a property of the representation of the observer. It’s like what do I want to see this piece of
14:02 — content in what format, right? And sometimes I guess maybe the easiest way of thinking about it is with slides. What can you turn into a diagram? What do you want written? What do you want voiced over? And I think it’s super interesting. So this has a lot of relevant implications when you’re building, right? This means you know, ideally you should use multiple representation structures. So you might want to use markdown for human readable hierarchy and authoring. You might want to use graph relationships and references. You might want to use clustering as well
14:33 — if you’re dealing with like large bodies of text or maybe more unstructured bodies of text. And you might want to use folders for stuff you need to retrieve really really fast. You could also use timelines if that’s relevant to your task. And finally, there were some a few more practical bits of advice. But finally, I think it’s so important to have fun and experiment, right? I think so much of what we do is for work, but actually a lot of this stuff you could only learn through like thoughtful play and experimentation.
15:05 — I go to a lot of hackathons now more than I ever did before. I just don’t see I’m not allowed to experiment with this stuff in the way that I would in my role. So a lot of that then helps me even more. I’ve got one minute left. It’s really interesting. I will say one more thing. So I was going to talk about how agents fit into the workplace and whether you should use more structured agents or less structured agents. And I will just say so I think if you look back to Adam Smith and his
15:36 — pin factory, like capitalism naturally breaks tasks down into small easily repeatable chunks. And often we have found workflows to be far more effective. I was going to say um that I would like One of the other things that other things I was going to say is don’t automate a job unless you can do it yourself. This is my old job. This was social social media intelligence. And what you can see here is like a practical report. This is done by an analytic cluster of the data. It did everything. And this is typically the
16:06 — sort of information that we would want from an advertising agency because we’re AI engineer. I’ve done it for open claw just on the last year sort of data. And this is all the sort of insight that we would get. This is 50,000 tweets clustered and then organized into essentially strategies so we can create this almost instantly for our creative teams or our strategic teams. And then it’s got standout tweets and it’s breaking it all down for you. And I think that’s my time there. So thank you very much.