Loading…
In-person
19-22 March
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Standard Time (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Wednesday, March 20 • 15:25 - 16:00
Strategies for Efficient LLM Deployments in Any Cluster - Angel M De Miguel Meana, VMware & Francisco Cabrera, Microsoft

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.


Undoubtedly, Large Language Models (LLMs) are the technological advancement of 2023. These models show many capabilities, from chatting like a historical character to converting unstructured data into JSON format. However, their substantial size (GBs), resource demands, and the management complexity present considerable challenges. At the same time, Kubernetes has emerged as the de facto technology for orchestrating workloads, and LLMs are no exception. In this talk, we will explore multiple strategies to reduce the footprint of these models in your cluster, making it possible to move them from the cloud to the edge. We will answer questions like how to select the right model, reduce their size, and optimize resource utilization by running them in a lightweight environment provided by WebAssembly. The end goal is to find a balance between resource usage and quality. It is a challenge, but this ecosystem is moving fast, and new technologies, projects and models are emerging.

Speakers
avatar for Angel M De Miguel Meana

Angel M De Miguel Meana

Staff 2 Engineer, VMware
Angel is a Staff Engineer at VMware AI Labs working on multiple WebAssembly initiatives. His background is as full-stack web developer working primarily with UIs, APIs, automation and Kubernetes. Angel is an Open Source (OSS) enthusiast, both as a creator and contributor to different... Read More →
avatar for Francisco Cabrera

Francisco Cabrera

Senior Technical Program Manager, Microsoft
Francisco is a Technical Program Manager at AKS Hybrid team, working on edge computing and Kubernetes at the Edge. For the past couple of years, he’s been working within the open-source community, developing end-to-end IoT solutions. Since joining Microsoft, he’s been responsible... Read More →



Wednesday March 20, 2024 15:25 - 16:00 CET
Pavilion 7 | Level 7.1 | Room C
  ML/AI + Data Processing + Storage
  • Content Experience Level Beginner
  • Presentation Slides Attached Yes