We are excited to announce the May meetup scheduled for May 21st.
There'll be 2 presentations on Using AI on Kubernetes
1. Scaling AI Workloads on Kubernetes: Reliability Patterns for Production Agentic Platforms
2. Getting started with Inferencing on K8S with Kserve
1st Presentation:
Scaling AI Workloads on Kubernetes: Reliability Patterns for Production Agentic Platforms
Running AI agents in production exposes failure modes that traditional microservices never surface. This session draws from real-world experience building and operating an enterprise agentic AI platform on Azure Kubernetes Service — where node pool exhaustion, autoscaling misconfigurations, and invisible queue failures collide with live inference workloads. I’ll cover how to design for horizontal scalability across heterogeneous AI workloads, what signals actually predict cluster saturation before pods go Pending, and a concrete set of reliability patterns: when to use HPA vs KEDA vs Cluster Autoscaler for AI workloads, how to benchmark inference throughput under real load, and what a production-grade control plane looks like when the workload is an agent, not a REST API.
Speaker Bio: Jothsna Praveena Pendyala
Jothsna is an AI Platform Architect and Senior Data Scientist at Infosys, where she leads the design and operation of an enterprise-scale agentic AI platform on Azure Kubernetes Service and Langsmith. Her work spans cloud-native infrastructure, distributed systems reliability, and production observability for AI workloads. She is also an executive member of the ACM Dallas Chapter, an active conference speaker and panel moderator, and a researcher with publications at IEEE and NeurIPS venues.
2nd Presentation:
Getting started with Inferencing on K8S with Kserve
You’ve built a model—now what? Deploying it reliably, scaling it efficiently, and managing it in production is a whole different challenge. In this talk, we’ll break down how to get started with AI inferencing on Kubernetes using KServe. From simple deployments to autoscaling and real-world considerations, you’ll leave with a clear roadmap to take your models from notebook to production
Speaker Bio: Damian Igbe, PhD
Damian builds the infrastructure that makes AI work in production — not just in notebooks. With a PhD in Computer Science, Kubestronaut certification, and 20+ years of hands-on systems experience, Damian operates at the intersection of AI platform engineering, GPU orchestration, and cloud-native security. He has delivered mission-critical infrastructure for clients, including Pentagon staff, where reliability and security aren't aspirational — they're mandatory. His core focus is the layer most AI teams underestimate: the infrastructure underneath the model. Getting LLMs to production requires GPU-aware Kubernetes scheduling, high-throughput inference pipelines, zero-trust security, and the operational discipline to keep it all running at scale.
6:30 - 6:45 - Social
6:45 - 6:55 - Club Business
6:55 - 7:30 - Scaling AI Workloads on Kubernetes: Reliability Patterns for Production Agentic Platforms
7:30 - 8:30 - Getting started with Inferencing on K8S with Kserve
8.30 - 8.35 - Social/Wrap-up