The Swyx Mixtape | Stop Worrying About Cold Starts

Stop Worrying About Cold Starts

February 18, 2021 / 05:38/E31 Download MP3

For the vast majority of developers, cold start anxiety is unwarranted.

Sources (both podcasts are fully transcripted)

- Serverless Properties with Johann Schleier-Smith: https://softwareengineeringdaily.com/2021/02/11/serverless-properties-with-johann-schleier-smith/
- Azure Functions with Jeff Hollan: https://www.serverlesschats.com/88/
- This is all you need to know about Lambda cold starts by Yan Cui: https://lumigo.io/blog/this-is-all-you-need-to-know-about-lambda-cold-starts/

---

Serverless Properties with Johann Schleier-Smith: https://softwareengineeringdaily.com/2021/02/11/serverless-properties-with-johann-schleier-smith/

"The cold start problem got a lot of attention early on. And I'm happy to say that I think that for a lot of practical purposes is something that people can either consider resolved or worked around sufficiently that they don't need to worry about it so much. But let me go into
a little bit more detail on that. So what is the cold start problem? Well, in order to provide these secure execution environments, the cloud provider needs to create a VM for your workload, because that's really how you can guarantee that you're not going to be exposed to other
clients, other tenants.

And so booting a VM traditionally means booting and operating system. Operating systems just simply aren't designed to boot up super-fast. It's not something that really matters. You're usually happy, or traditionally you'd be okay if a server booted up within a few minutes,
because it's going to run for days. So what does it matter? And traditionally also the things that happen during boot up time involve things like probing for devices and figuring out whether you've upgraded the hardware and other things that have just no role in a serverless
environment. You know what the hardware is and you want to get going as quickly as possible because you want to be able to have that ability to expand elastically. And similarly in order to keep costs low you want to have that ability to just shut things off and effectively power down.

And so what the cold start is really about is it's about that time that it takes. And to be clear, also, what's important about a cold start versus a warm start is that when you have a – Once you start it up, what you can do is you can just leave that function instance you. Can just leave it running so that it can do more than one request so you get to amortize your startup cost over many, many requests. So sort of two reasons why this startup time is becoming less of an issue, and they're actually both related to the technology that's in Firecracker.

So Firecracker makes it much, much faster to boot up the VMs in part because it sort of strips down that kernel so that it has just simply has a much faster boot time. And so the boot times are, instead of seconds, they come down to something like 100 milliseconds or so. And there are a number of other techniques. Some of these are in Firecrackers. Some of these are in research papers that are about making these boot up processes much faster. For example, one thing that you can do is once you have booted an image of a virtual machine, what you can actually do is you could just save those pages essentially, save a state of the memory. And then when you need another one, you can simply clone that and you can use sort of copy and write semantics for that as well so that really you're just creating a new set of page tables to reference that underlying image."

---

This is all you need to know about Lambda cold starts by Yan Cui: https://lumigo.io/blog/this-is-all-you-need-to-know-about-lambda-cold-starts/

---

Azure Functions with Jeff Hollan: https://www.serverlesschats.com/88/

In the last six months, we actually have been rolling out some machine learning, too. So we've got some folks in Microsoft Research who worked at looking at a bunch of historical data for functions. It's actually all open source. It's anonymized. But if you go to GitHub, you can actually see a bunch of Azure Functions anonymized data.

And they trained a bunch of models. So that hopefully, Jeremy, if you were using Azure Functions and it's Monday at 8:00 AM, that our model, hopefully, would get smart enough over time to say, "Oh, there's a 70% chance that at Monday at 8:00 AM. Jeremy's about to hit this thing. We're actually just going to warm it up before he even executes it." So that's something that we've been rolling with for a while. But even then the ... And then just trying to make progress on the underlying technology, the underlying platform. There's a lot of components to building a multi-tenant secured service that all add a little bit of a national latency.

So something we're aware of. And then I guess to the second part of that question is, we do have some options to fully mitigate it or partially mitigate it. The one is the fateful pinger. We have folks, I mentioned, you can create this Function app concept. You can have multiple functions in there. One thing that even I have done, and I would say don't quote me on this, but I'm on a podcast. Now, my name's right there. You can create another function in that same app that triggers on a timer. So a timer is a first-class concept in Functions. Just have that thing trigger once every 10 minutes, and your whole app is going to get poked every 10 minutes by us. You don't even have to poke it. We'll poke it ourselves on that interval and keep it warm.

Broadcast by

headphones Listen Anywhere

Listen Anywhere