Join the Bloomberg Data Science Platform's engineering team as they share some of their experience in providing managed ML platforms that have been built with versatile, open source CNCF projects. This session will offer an overview, beginning with how to manage a Jupyter notebook platform using Istio, OPA, Web Assembly (WASM), and Calico to provide a secure interactive environment.
This talk will then share strategies and lessons learned for building a model training platform, utilizing Kubeflow training operators, cloud-native GPU components and Buildpacks. The speakers will also reveal how cross-cluster batch scheduling can be a game-changer to improve resource utility.
Moving onto Istio and KServe, this talk will also discuss how to set up a resilient model serving platform that is fit for production demands. By the end of this talk, you'll have a blueprint for building an efficient, scalable ML platform within the CNCF ecosystem. Let's build together!
Leon Zhou is a software engineer on the Data Science Platform engineering team at Bloomberg. With prior NLP experience, he is now building ML platforms to facilitate machine learning development. He is interested in ML infrastructure to enable large-scale training and complex pipelines... Read More →
Yuzhui Liu leads a strong and dynamic engineering team at Bloomberg, which is focused on providing managed solutions for model training, notebook, and HPC infrastructure. She collaborates widely in the CNCF community, was a contributor to KServe, and is the co-chair for Cloud Native... Read More →