As AI and machine learning become ubiquitous, GPU acceleration is essential for model training and inference at scale. However, effectively leveraging GPUs in Kubernetes brings challenges around efficiency, configuration, extensibility, and scalability.
This talk provides an overview of the capabilities needed to address these challenges, enabling seamless support for next-generation AI applications on Kubernetes.
- GPU resource-sharing mechanisms such as MPS (Multiple-Process Service), Time-Slicing, MIG (Multi-Instance GPU), and GPU virtualization
- Flexible accelerator configuration using the traditional device plugin and the upcoming Dynamic Resource Allocation (DRA) feature
- Advanced scheduling and resource management techniques, including gang scheduling, topology-awareness, fault-tolerance and more
- Key learnings (and areas of improvement) necessary to scale multi-node AI/ML jobs in large production clusters
Some of these capabilities are already supported today and some of them are not. By addressing the remaining challenges, Kubernetes is poised to emerge as the go-to platform for accelerated AI/ML in the cloud, mirroring Linux's pervasive dominance in the datacenter.
Kevin Klues is a distinguished engineer on the NVIDIA Cloud Native team. Kevin has been involved in the design and implementation of a number of Kubernetes technologies, including the Topology Manager, the Kubernetes stack for Multi-Instance GPUs, and Dynamic Resource Allocation (DRA... Read More →
Sanjay Chatterjee is an engineering manager at NVIDIA. He works on GPU compute infrastructure with a focus on GPU scheduling to enable DL/AI and HPC workloads scale on Kubernetes. Previously he worked on multiple DoE/DARPA funded advanced technology projects towards designing the... Read More →