The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Central European Standard Time (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.
Kueue is a Job-level queueing manager which stands up to the challenges of managing computational resources to run batch workloads on Kubernetes. We walk you through its architecture, demonstrating how it can be used to set up quota- and priority-based sharing of resources between multiple teams. We describe how the Kueue’s scheduler decides when to start or stop (preempt) a job. We showcase Kueue by its production use at CyberAgent, where it is a building block of the multi-tenant system, supporting multiple engineers and ML research teams; using multiple types of CPUs and GPUs. Here, Kueue manages various types of Jobs (batch Job, MPIJob, or in-house Jobs), using various ML frameworks (TensorFlow, PyTorch or DeepSpeed). Finally, we discuss the challenge of running ML training jobs which require all pods to be scheduled. We show how it is solved by using Kueue at CyberAgent, and how it can be solved using Kueue in the autoscaling environments with the new ProvisioningRequest API.
Michał is a software engineer with background in computer science, a PhD in computational biology, and 5+ years of professional experience. In his current role he is focusing on enhancing the support for batch workloads in the Kubernetes ecosystem. Outside of work he enjoys playing... Read More →
Yuki is a Software Engineer at CyberAgent, Inc. He works on an internal platform for machine-learning applications and high-performance computing. He is currently a maintainer of some Kubeflow WG AutoML / Training sub-projects. He is also a WG Batch member and a Kubernetes' Kueue... Read More →