Transcript: Claude Code + LightRAG = UNSTOPPABLE

Watch video

Chase AI20m 25sTranscript ✅Added May 6, 3:52 pm GMT+8

Source video ID: QHlB-RJfx8w

Transcript

0:00 — The death of rag has been greatly exaggerated. Yes, I know large language models like Opus 4.6 have gotten way better lately at handling large contexts. But if you think that means you will never need rag, you are going to hit a wall that you can’t just prompt your way out of. So today I’m going to explain when you need rag, what sort of rag actually works in 2026 because the landscape has changed a ton over the last year. and I’m going to show you how to connect cloud code to your rag system
0:30 — as well as give you some skills you can take home with you. So today’s goal is to give you this a graph rag system built on the back of light rag that we can use with clawed code and more importantly this is going to give us a system that we can use when we need to use AI with giant large corpuses of documents right not just five documents not just 10 documents like you’ll see in the demo but 500 documents a thousand documents because it’s not enough just to rely on the context window cloud code comes with or any other LLM because when
1:01 — you start to at huge scale which you do see in a lot of enterprises or even just smaller businesses having a rag system like this is actually cheaper and faster than your standard agent GP. So with that in mind having the skill of being able to create these sorts of rag systems is very important but luckily it’s pretty simple and like I just alluded to we will be using light rag today. This is an open- source repo that I absolutely love. It’s been around for a while and it’s something that’s been updated over and over again. It’s able to compete with more sophisticated graph rag systems like Microsoft at literally
1:33 — a small percentage of the cost. So it’s the perfect place to actually sort of test out these graph rag concepts if you’ve never used it before. But in order for us to get the most out of light rag, we need to understand how rag actually works at a base level because the landscape for rag has changed. What we were doing at the end of 2024 and early 2025 was what is called naive rag. The most base level rag. Remember all those naden automations where it was like hey let’s go to pine cone let’s go to superbase that was naive rag that doesn’t work anymore that does not cut
2:03 — it we have to use more sophisticated versions of rag but we need to understand the fundamentals first so let’s do a quick refresher of what rag is and how it works before we dive into the light rag setup so rag retrieval augmented generation the way it works is I first start with some sort of document right and I’m going to have thousands of these in a pretty, you know, robust rag system. But what happens is is I have this document that I want to go inside of my rag system, inside of a vector
2:33 — database. Well, what happens isn’t, you know, the the document doesn’t just get thrown into this database, right? Like it’s some sort of Google Drive system. What happens is the document goes through an embedding model and then it gets turned into a vector. But even more so than that, the document doesn’t go as one giant piece. It gets chunked up. So imagine we have this onepage document and it gets pushed into chunk one, chunk two and chunk three. Each of these chunks then become vectors which is just
3:03 — a point on a graph, a point in a vector database. Now the embedding model is what does this chunking for us. It’s in charge of the process of taking this document, figuring out what it’s all about and then turning it into a point on this graph. So the document gets chunked up. It goes through the embedding model and then our document becomes a vector on this graph. Now this is a three-dimensional graph. In reality, it is thousands of dimensions, but just think of it as a three-dimensional graph for now. Now,
3:33 — imagine this document was about warships, okay? And each vector got turned into some sort of, you know, chunk about warships. Well, where’s it going to go? Well, it’s going to go over here next to boats and ships. Obviously, it’s going to become its own little vector. And by vector I mean it’s just given a series of numbers that represent it. You can see that over here with bananas. So banana is 0.52 5.12 and 9.31 on and on and on. This goes for thousands of numbers. So our little boat guy over
4:03 — here is like 1 2 3 dot dot dot dot dot forever and ever. Easy enough. Obviously, it’s not going to be next to bananas and apples, but that is the document to embedding process as well as the chunking. Now, let’s say you’re over here, okay? You’re our happy little guy over here, and you ask the large language model a question about warships. Well, that question in this rag system scenario is also going to be turned into a vector. So your question, you know, the LLM takes a look at it and
4:34 — it assigns it a series of numbers that also correspond to some sort of vector in this database. Okay? And so what it’s going to do is it’s going to compare what your question vector is to the other vectors in the graph. It’s looking at what’s called cosine similarity. But all it’s really doing is it’s saying, hey, the question was about this. We’re assigning these numbers. What vectors are closest to it? What numbers are closest to that question? Well, it’s going to be this one about warships and probably boats and ships. So, it is now going to
5:07 — retrieve all those vectors with all their information and it’s going to augment the answer it generates for you. Hence, retrieval augmented generation. So, instead of the large language model relying purely on its training data, it is able to go inside the vector database, grab the relevant vectors, bring them back, and give you your answer about warships. That’s how rag works, right? Document ingestion chunks turned into a vector. The vector isn’t compared against the question being asked, brings the closest ones. Tada!
5:38 — Rag and that is naive rag. And that actually really doesn’t work very well at all. So smarter people than you have come up with better ways to do this, namely hybrid search and graph rag and agentic rag. What we’re going to focus on today is graph rag. Now, graph red goes through the same process. You’re still going to have that document. It’s still going to get chunked. It’s still going to be put in this flat vector database, but it’s going to do one other thing. It’s going to create this knowledge graph as well. It’s going to create this crazy thing. So, what is all
6:09 — this? What does what are all these vectors and lines? What does this actually mean? Well, all these vectors, these little circles, these are what is known as entities. And the lines that connect to entities are an edge or a relationship. So going back to our document example, imagine this document is about anthropic and claude code. And the entire chunk that got pulled out said anthropic created clawed code. It’s going to take that and it’s going to break it out into entities and relationships. What are the two
6:39 — entities? The entities are going to be clawed and are going to be enthropic and clawed code. And the relationship is anthropic created claude code. So you have the relation. So you have enthropic right here and you have claude code over here and you can see this is an entity, this is an entity and they have a relationship on the visual graph. It’s just a line but under the hood coding wise that line between these two entities has a bunch of text associated to it explaining its
7:10 — relationship. And so in a graph rag system it does that for each and every document you add to it. Imagine this times a thousand documents. This is with 10 documents. All of these relationships and all of these entities. And you can imagine how much more sophisticated that is than a bunch of random vectors just sort of siloed in a vector database. And so with a system like light rag, we get the creation of a knowledge graph as well as your standard vector database. It does both of these things in
7:40 — parallel. And so when you now ask a question about whatever it is to the large language model, it not only pulls that specific vector that it finds that’s closest, it will also go down here and take a look at an entity. So let’s say you asked about enthropic. Well, now it’s going to traverse the relationships, the edges, and find everything that it thinks is relevant. So what this means for you the user with a graph rag system I can now ask much
8:10 — more deeper questions not just like about a document and essentially just doing control F for all intents and purposes I can now ask how different documents and different theories and different ideas relate to one another because those those relationships are mapped right this is what it’s all about it’s about taking disparate information and connecting them that is the power of graph rag that is the power of Light rag and that’s what we’re going to learn today. So installing and using light rag is as easy as you want it to be. I’m
8:41 — going to show you the easiest way where we are just going to take claud code. We are going to give it the URL of light rag and we’re going to say hey set this up for us and it’s going to do essentially everything. In that scenario we’re just going to need a few things like you saw in sort of the breakdown of how rag works. We need an embedding model. So that is going to require an API. I suggest using OpenAI. They have a very effective embedding model. So you will need an open AI key. You do have the ability with LightRag to make this
9:12 — an entirely local thing. So you could have a local model via O Lama that’s doing all like the breakdowns with the embeddings as well as the question and answer stuff. So understand that’s an option too going fully local. We’re going to kind of do half and half. So we’re going to set up an OpenAI embedding model as well as the model that’s actually doing all the work. And then we also need Docker. So if you’ve never used Docker before, it’s pretty easy to set up. You’re just going to need Docker Desktop. Just download it, install it, and have it running when you run Lightrag because it is going to need
9:44 — a container. So what you’re going to do now is you’re going to open up Cloud Code and you’re going to say clone the Lightrag repo. Write the EMV file configured for OpenAI with GPT5 mini and text embedding 3 large. Use all default local storage and start it with Docker Compose. and then give it the link to light rag. If you do that, it’s going to do everything for you. I will put this prompt inside of the free school community. Link to that in the description. Also, what’s going to be there is I’ll show you in a little bit
10:15 — some skills related to cloud code and light rag to make it easier to sort of control it from cloud code. So, you’ll be able to find that there as well. And you knew it was coming. Speaking of my school, quick plug for the Cloud Code Masterass, which is the number one way to go from zero to AI dev, especially if you don’t come from a technical background. The link to it is in the pinned comment. I update this quite literally every single week. In the last two weeks, I’ve already added like an hour and a half of additional content. So, definitely check it out if you’re serious about mastering cloud code and AI in general. But again, if you’re new,
10:45 — this is all a little too much. Definitely check out the free school. Tons of great resources for you if you’re just starting out. And before you run this, just make sure you have Docker Desktop running and have that open AI key ready and let Cloud Code go to work. Now, once Cloud Code finishes installing it and you add your Open AI key to the EMV file, you should see something like this. First of all, on your Docker desktop, you should see a container called Light Rag up and running. And then Claude Code should also give you a link to your local host. It should be 9621. And it’ll take you to a page that looks like this. This is the web UI for
11:18 — lighter rag and it’s here where we can upload documents, we can look at the knowledge graph, we can retrieve things and we can also take a look at all the different API endpoints which will come in handy later. And what you see here are the documents I’ve uploaded for this video. To upload documents is very very simple. We’re just going to come over here to the right where it says upload. And then you’re going to drop them in. Now understand there’s only certain types of documents we can put in here, right? text documents, PDFs. Essentially, you’re limited to text
11:48 — documents. Now, there’s a way to get around this, namely with things like, you know, images and charts and tables and that sort of thing, and we’ll talk about that at the end because it’s it’s a little outside the scope, but we will learn about it. So, drop whatever documents you want into here, and then you will be able to see their status as they’re uploaded. It’ll take a little bit because again, it’s building the knowledge graph as it does this. So, this can take a while. And if for whatever reason you’re on the knowledge graph page, cuz this can kind of happen and it says like, “Hey, it didn’t load
12:19 — or whatever, you just reset it by hitting this button over here on the top left. If you come over to the retrieval tab, that’s where you can ask questions about your knowledge graph to the large language model, which in this case is probably OpenAI if you use the same key for embedding. And over here on the right, we have some parameters. Honestly, off the bat, there isn’t too many you need to change. And in a second I’ll show you how clawed code can do it. But as you ask your questions, like for example, I had a bunch of AI and rag documents in there. I said, “Hey, what’s the full cost picture of running rag in
12:49 — 2026?” It gives me a pretty sophisticated response. And on top of that, it also gives you the references for everything it’s doing, right? See four, three here, two, because at the bottom of the page, it will actually give you the references for the documents it grabbed. And obviously inside of our knowledge graph, right, we explain entities and relationships. If I click on one of these entities, like OpenAI for example, I can see some of the properties. So it does more than just pull relationships and entities in the embedding process with light rag. It
13:19 — actually goes a little deeper and it was like, all right, what kind of type of entity is it, right? Is it an organization or a person? It has the specific files it grabbed as well as like chunking IDs. And then you can see the actual relationships down at the bottom right. I’ll move this for a second. So down here on the bottom right, if you can’t visually see it cuz it can get kind of like clumped up um on the graph, you can actually just like click here and it will take you to them as well. So this server API is what we’re going to be using to actually connect this thing to clawed code because as great as this is, like I’m
13:50 — not really going to be sitting here every single time I want to ask a question about my knowledge graph via the retrieval tab. That’s too much of a pain in the butt. So instead, we’re just going to use these APIs. Now, every single one of these APIs, right, has a description. You can see the parameters and stuff. Every one of these APIs can be turned into a skill, right? And that’s what I’m about to do and show you here today. That way, when you want claude code to use light rag, well, we just go inside of Claude Code wherever we are and say, “Hey, I want to use the light rag query skill and ask question
14:20 — blah blah blah blah blah.” It’s the same thing as if you were here in the retrieval tab and asked your question. And better yet, claude code will kind of take the response it gives you and summarize it because these responses can be pretty in-depth uh off the rip when it comes to light rag. But if you just want the raw answer, you can set that up as well. Point is, even though this has a web UI, you never really have to interact with it if you don’t want to. And it’s really easy to bring it into our claw code ecosystem. So the four big skills I think you’ll use the most are query, upload, explore, and status. All
14:51 — four of these will be inside the free school as well. But what are you gonna be doing mostly? You’re going to be adding new documents and you’re going to be asking questions about those documents. And you’ll probably want to know, hey, what did I actually put in there? Because after you have a ton of documents, you kind of want to avoid putting in the same ones over and over and over again. And so if I ask this same question inside of clawed code, right, I’ve just invoked the light rag query skill. It’s sending that request off to light rag, which again is hosted on our computer. It’s running inside that
15:21 — docker container, and it’s going to bring the response back. Now you aren’t limited to this like semiloc system. If you are someone who’s scaling really really hard with light rag, right? You can host this on like a standard you know postgress server right you have a lot of options. You could use something like neon. So it kind of goes the full gamut right you can go fully local or you can push all this off to the cloud if you want to as well. Light rag is very very customizable. And here’s the response claude code came back with which again is a summary of the raw response that light rag gave us
15:54 — and it also quotes its sources. I also asked it for the raw response because you can get that as well because it just brings it back to cloud code in a JSON response. So that’s all this is and then again it also has the references if you want them. So like you just saw super easy to install light rag and very simple to integrate it into your cloud code workflow. Now the question becomes, okay, Chase, sounds great. I get conceptually that if I have a ton of documents, I should maybe be using this. Well, where’s the line in the sand? When
16:24 — should I start integrating LightRag? Well, there’s not an exact number to this gray area is, I would say, somewhere between like 500 and 2,000 pages worth of documents. I don’t want to just say documents cuz who knows how large those are going to be, but like 500 to 2,000 text pages. At that point at 2,00 you’re starting to get into like a million tokens. Beyond that, it probably makes sense for sure to start integrating light rag because the thing is the way
16:54 — rag is set up, it’s going to end up be it’s going to be cheaper and faster to do that than just relying on standard GP from cloud code. Agent GP, the way claude code searches files already is great. Like there is a reason cla chose to do that. However, it was wasn’t under the assumption you had 2,00 pages of documents or 4,000 or 5,000, right? There is an upper limit. The nice thing is you don’t have to necessarily have that decision like set in stone. As you saw, it is very easy to implement this.
17:24 — So, just experiment. If you feel like you have a ton of documents, it’s like, hey, should we be using rag at this point? Well, I don’t know. Try it out. It doesn’t take long to do. The most painful part is the embedding process. That can take a minute for sure, but it’s not debilitating. And the cost isn’t insane, especially with light rag. If you compare this again to other graph rag systems like Microsoft graph rag, this is a small small percentage of the cost. And at the very large document sizes, the cost with rag versus the cost with
17:55 — something like GP is to the tune of a thousand times cheaper. There was a study done the last summer that it was 1,250 times cheaper to use rag in those sorts of situations. You can see that right here with textual rag versus textual LLM as well as the actual response time. Now, full disclosure, this was from July of last year. So, the models have changed. I highly doubt it’s as insane of a difference when we compare rag versus your standard tech situations. And this
18:26 — was also a Gemini 2.0. We weren’t talking about a harness. So, a lot of things have changed. But, has it changed to, you know, close the gap by 1,250x? Maybe, maybe not. I don’t think so. Either way, just try it out. You know, I don’t think there’s much to lose. The other thing with light rag is the idea that, hey, if I want to upload documents, we talked about this a little bit a little bit earlier. What do we do if we again have like tables, graphs, stuff that isn’t text, can light rag handle this? Not exactly, but we can fix
18:59 — that. And the answer is rag anything from the same exact makers as light rag. And this is something that can essentially be multimodal and it’s something we can pretty much plug right on top of light rag. Now, I hate to disappoint you, but that is going to be outside of today’s the scope of today’s video. However, tomorrow’s video, what do you think we’re going to do? Tomorrow, we’re going to be going through rag anything and showing essentially how you can integrate it into what we built with light rag. So, it’ll be kind of a great onetwo punch. So, if that’s something you’re
19:30 — interested in, like and subscribe because we’re going to be going over it tomorrow. And on that note, this is where we’re going to kind of wrap up. Um, hope you enjoyed it. I This is my first video, too, with this new camera setup. The lighting, I can already tell, is not not exactly where I wanted it to be, so apologize for all that. Still working out the kinks. Just glad um it was working at all, and the camera didn’t overheat in the middle of this thing. Um, but yeah, all the skills are inside of the free school. The rag stuff is super interesting, especially light
20:01 — rag. It’s been a great project. I’ve been using it for quite a while. So 100% 100% check this thing out and it’s so easy to integrate inside of Cloud Code like you saw. So check out the free school for the skills as well as the prompt if you need it. To be totally honest, if you just point Cloud Code at Lightra, it will set it up just fine on its own. Um, but other than that, make sure to check out Chase AI Plus if you want to get your hands on that master class.