Engineering Insights

Deep dives into software architecture, cloud infrastructure, and scalable system design. Technical perspectives on modern development and emerging technologies.

Autoscaling Revisited: LLMs, MCP, and the Stack

Two years ago I wrote about why reactive autoscaling falls short and what ML brings to the table. A lot has changed. LLMs are now a primary workload in most cloud fleets, and they break almost every assumption the classic autoscaling stack was built on. Here's what's actually different, and where Model Context Protocol fits into the picture.

The Open-Source Autoscaling Stack in 2024

Part 1 and Part 2 covered the theory and one major commercial platform. Now the practical question: what does the open-source Kubernetes ecosystem actually give you for intelligent autoscaling in 2024, and where is the ML layer starting to plug in? The answer is more composable — and more interesting — than it was two years ago.

Autoscaling From the Inside: Seven Years at Turbonomic

I spent seven years at Turbonomic — back when it was still called VMTurbo, through the rebranding, through the IBM acquisition in 2021, and a few years past that. So writing about autoscaling without touching what I actually worked on every day would feel dishonest. This is the insider perspective: what Turbonomic actually does, why the economic model it's built on is genuinely clever, and where the edges of that model sit.

Why Reactive Autoscaling Isn't Enough — and How ML Changes That

Every major cloud has autoscaling. It's table stakes. But "autoscaling" usually means: observe a metric, cross a threshold, add capacity. That works — until your traffic spikes faster than your VMs boot, or you've padded your fleet so aggressively that you're paying for air. ML can do better. Here's how.