Transcript: Self-Training Agents: Hermes Agent, HF Traces, Skills, MCP & Finetuning — Merve Noyan, Hugging Face

Watch video

AI Engineer19:10Transcript ✅Added May 19, 2:40 am GMT+8

Source video ID: OV56RddyFuU

Transcript

0:15 — Hello everyone and welcome to this talk in open agent ecosystem and I would like to call it having an AI engineer at your fingertips. I’m Marwa and I work in the open source team of hugging face. How many of you are hugging using hugging face on daily basis? Oh, let’s change that. This is not okay. But first let’s talk a bit about open source and what it is. So when it comes to machine learning open source is absolutely differential. Basically you
0:46 — have the open weights models that go in with non-commercial licenses. We call them open weights and then we have open source models that have commercially available licenses such as this one from deep seek. It’s called MIT license or Apache 2.0 and then there is like even more open license models that have the code open. If you have like agents that the harness is open, everything is open and this matters even more by the fact that like yesterday or
1:17 — the other day it was revealed that the cloud performance was going down. So if you if you have everything in the open, nothing changes without you knowing, no performance degradation without you knowing, everything’s great. But on top of it if you have access to the weights, you can shrink them, you can quantize them. You can fine tune them if you feel like it. And it’s absolute guaranteed privacy for your end user because
1:49 — you can deploy it to edge devices, browsers without the data going somewhere else. This matters a lot in my opinion even more these days with the security breaches and everything. And there was this argument maybe a few years ago that open source models aren’t as good as close No, this is not the case. Like you see for instance the latest GLM 5.1 is absolutely crashing it and I’m actually using it in my coding setup. Uh the This is the artificial
2:20 — intelligence index and the green ones are open models meanwhile the black ones are the closed models and we are we just catched up. And we will catch up even more with the upcoming models and stuff. And let’s go back to Hugging Face Hub. So everything is facilitated through Hugging Face Hub, all of the open releases. It’s the infra layer for all of your open source workflows. And as of now it’s hosting even more
2:50 — models. I should have updated the number. It’s probably close to 3 million. A lot of data sets, spaces, and everything, but that’s not all when it comes to the agentic ecosystem and this is what we’re going to talk about today. So when you go to the models, you can filter for agentic models. They are mostly the trending ones. And there is like two types of models in my opinion. There is the vision LLMs and then there is the LLMs and the vision LLMs can also act as like a computer use
3:22 — agent over the screenshots. They know where to click etc., which is pretty cool. And one trend I have recently noticed is the fact that you have labs releasing their LLMs as vision with vision capabilities day zero. Like for instance the Gemini 4 was an omni model and still it’s an agentic model. There is Qwen 3.5, there is Chimera Chimera 2.5. These were VLMs. So I
3:53 — foresee that all of these models will be over time released day zero with vision capabilities. And it’s super easy to run this actually. Like you can just use like VLLM MLX or like Llama CPP Llama server from the get-go with like few lines of code. Like it used to be much more um frictiony, but these days this is a not a big deal. And if you want to compare open models,
4:24 — we have recently launched this feature called benchmark datasets. So, when you go to the datasets on the left-hand side, there is like on the bottom there is a bunch benchmark button. You just click it and then you can see the popular benchmarks such as SWE E Bench Pro or Humanity’s Last Exam or AIMEE and others. And when you go to for instance SWE Bench to see like how your agent is like good in coding and stuff. Uh you see the open models ranked
4:56 — according to the scores. So, like currently GLM 5.1 is top of the list. So, it’s also easy to pick an open model these days because there’s 3 million models out there and it used to be a challenge to pick different models. And if you actually want to vibe check it, Hugging Face has this service called inference providers which does routing for the best models to best providers. Like all of the
5:26 — providers out there there’s Grok, Cerebras, I don’t know Novita and everything. And then it’s super easy to compare them as well. If you see like you have the cheapest or the fastest option. Actually, I had to truncate it, but also there is the tool use column. So, you can actually pick one of the open source models for the agentic use case and stuff. And going back to agents after all of these hugging face hub shield Hugging face hub actually recently has
5:57 — shipped a ton of futures for you to use open models with agents agents and stuff. And first of like there is the MCP server where you can plug the hub into your LLM. And there is skills which allow you to even vibe train models. Like you just go to your agent and say train Q and 23.5 on this data set for me and then it just trains. Which to me is like a sci-fi at this point because it used to not exist
6:28 — and like there is so many things going on in the back and and the agent actually handles them very well. And then there is the local agent so you can run full coding agents locally from models with hugging face hub because we integrate very well to them. And coming to the first one so basically my talk will be consisting about all of these. Coming to the first one there is the local coding agents. And your options you have like actually
6:59 — many many options but like one of my favorites is pie because it’s like super simple to set up. Basically you can I think you can also use it with inference providers remotely but also if you want to serve like a local coding agent you can use llama CPP to serve it and then pie will directly consume that. And something very cool is also llama agent which is baked into llama CPP as a binary that you can just directly execute and start a model by giving
7:29 — hugging face hub ID. So it’s super easy as well to get a local agent running. I will share my slides on my Twitter account after so no need to take pictures. My One of my most favorite things these days is Hermes agent, and I will just die on this hill. So, this is like This is a bit one step even further to from the open claw by means of memory management and everything. And it’s actually super
8:01 — easy to get started with that, and it is You can either use it locally or with Hugging Face inference provider. So, for instance, I was playing with that. Uh like the setup wizard does everything for you. You just give the keys and stuff, and then integrate into your Slack or WhatsApp or whatever, and you’re good to go. And I absolutely recommend using this if you want to use it with an open model. I absolutely recommend GLM 5.1. For instance, I actually failed initially to
8:33 — integrate into Slack. I have witnesses in here, my colleague uh Niels is here. And um I asked uh GLM 5.1 to fix it uh with the Hermes agent, and it fixed on its own, and it’s uh it was a good day. Like uh I I think GLM 5.1 is a very good model, and I cannot I can’t absolutely wait to use it with Gemma 4. But also, this weekend there was like on Twitter there
9:03 — was a rumored uh Minimax model coming up, so I will also probably try with that and share my findings. So, I absolutely recommend using Hermes agent with the open models. And one more thing. So, basically, uh Hugging Face Hub now has a new data set repository type called traces, and this is basically all of your uh code X, uh cloud code, or pie traces they hosted.
9:33 — And for instance, if you go to your um if you pushed uh trace and then you go over there you will see in the data set viewer if you click on the traces column it pops up like this it is very nicely parsed and you can just explore your data and then later if you want you can even train a model on that which is pretty cool in my opinion and if you want to push your agent
10:03 — traces you can just upload your sessions from these paths and nothing else is needed and we will also probably have Hermes agent very soon for traces going back if you want to use if you want more options to serve LLM behind the agent locally so some tips and tricks in finding a good model you just go to hugging face there is an other tab under the other tab there is the apps so these apps are like LM studio
10:35 — llama CPP everything that is for local serving is over there and when you filter for them you have the models that are supported by these by these local apps so whatever you want to serve we have you covered and when you go to the model repository something very cool in my opinion is that on the left and right hand side there is GGUF section so basically GGUF if you don’t know it’s supported it it’s basically
11:05 — comes in llama CPP the file format that is supported in many things like all llama LM studio everything and you have the hardware compatibility for instance the Gemma for larger model if you quantize it to four bit it fits inside an L4 GPU with the 24 gigabytes of VRAM so I think this is very cool and this is also served to MLX repositories as well.
11:35 — And when you go to the again to the model repository, if you have absolutely zero clue on how to serve this model, on top right there is use this model, and you have the options of the local apps that the model is supported in. And when you click that, you see like only with few lines of command that you can run, you install, you get the model served, and voila. It’s very, very convenient to run the open models these days. And lastly,
12:05 — supercharging your coding agents using Hugging Face skills. So, there is we have like bunch of skills in order to get you started with training, I don’t know, inferring with the open models, using open models, exploring open datasets, using AI apps, everything. And we have this thing called the Hugging Face CLI skill, which allows coding agents to manage repositories, run jobs, launch demos, and everything.
12:35 — And this is how you can install it. You can just type HF skills on Google, and you will find the commands. But we have more skills than that. So, basically, this allows you to plug hub in into your agents, like give you all of the Hugging Face Hub exploration. But rest of the skills are super cool. There is LLM trainer skill. Basically, this is This is not only for LLMs, but also vision language models. You can just tell the model to Okay, train this model
13:06 — on this dataset, and it will just kick off the job remotely on our infra, or like locally, wherever you want. And there is Gradio skill, which allows you to build demos. And there is Hugging Face dataset skill, which allows you to explore datasets through our dataset viewer API. And you can install it very easily. Again, we come with more integrations. I just put the cloud on
13:36 — Gemini here. So, putting this into action, for instance, I asked the model to I asked cloud code to say, “Hey, can you train Qwen-2-VL on Lava-Instruct-MiX, which is like a vision-language data set?” And it asked me a few questions. It said, “Okay, which instance would you like this to go in because you have multiple options?” The model actually, like in the back end, the agent actually uh, calculates
14:07 — the amount of VRAM required to run fine-tune that model in a given batch size and everything. So, it handles everything for you. It just asks you a few questions. Okay, what is your validation split, blah, blah. And then it just launches the job, which to me is absolute sci-fi still to this day as a person who have been training models since, I don’t know, beginning of my career, like six six years. And you at the end, you just find your model on hub.
14:38 — And this is not limited to LLMs and VLMs. I have recently shipped um, skills for, for instance, training object detectors or, I don’t know, segmenting model and everything for vision. It handles, for instance, different bounding box types and everything. You just give the command and let it handle everything. And going back to MCP, what do we serve? Uh, we have models, data sets, spaces. Search for your task. Uh, semantic
15:09 — search for spaces. So, if you don’t know spaces, it’s like the App Store of AI. You have a ton of uh, apps over there for absolutely everything you could see. And also, we have something called jobs, which allows you to kick off uh, one-off jobs that ends like if it fails or if it succeeds and you pay for the amount of time it was up. And also, you can query these apps from MCP like I’m going to show you shortly, but it plays nicely
15:39 — with all of your favorite platforms. And so, for instance, in here, I asked the model generate image of a baklava made of yarn and then it will call the Hugging Face space of Qwen image, which is an image generation model hosted remotely and then it will query that and it will bring the output of that. It works very nice. Look. But you need to turn on there is a setting in the MCP called dynamic
16:11 — spaces. If you want more options of like if you want absolutely all of the spaces, you need to turn that on, which is a bit of bit experimental. And here are some few ideas that you can use spaces MCP. Uh but you’re absolutely not limited to those. And tying it all together, my colleague Niels has built uh something I which I found cool, so I wanted to share. So, basically on Hugging Face Hub, there is papers and these papers
16:41 — basically AI related papers. We want people to be able to ask questions to these papers or share. Uh but not all of the papers come with markdown uh which the model which we can index and stuff. So, we OCR 30 30,000 papers uh using Codex open OCR models and jobs all through prompting, which is a bit crazy. So, the steps to do that is firstly pick an OCR model that is cheap and nice and performance.
17:12 — Ask the LLM to kick off a processing job and actually write the code for that and then kick it off on hanging face infra and then let the skill set up the instance of hosting that model and everything without you going through the pain of the napkin math and then profits. So to pick an OCR model you need to you need you can go to all OCR bench which is a benchmark data set that I have previously shown you. The first result is Chandra OCR but don’t be
17:43 — fooled by this we have just today shipped a skill that you can just ask the model okay what is the best model on OCR for fine-tuning and it’s will also make recommendations around fine-tuning and stuff. So if you need like smaller models etc. It will handle everything for you with this skill so it’s pretty cool. Check it out. Um once you pick the model okay we in this case we use Chandra. Uh we ask model to write the script and
18:14 — it did. And then the agents just does the napkin math for the instance and the calculates the cost of the running job and everything and then these jobs will be so so basically these jobs will be rerun so we have recently launched this infra product called buckets which is like a S3 buckets but much cheaper and faster. Um that you can use with mounting and yeah basically um
18:45 — you can just use that. And you can get started in these links. I hope you like this talk. Thank you so much.