Adding EC2 Instance Recovery Alarms with CloudFormation2019-09-11
Instance Recovery is a little-advertised, little-used feature of EC2. It doesn’t take long to set up and promises to recover your instance on the rare occasion that the underlying hardware fails. Recovery resumes the instance on new hardware, retaining its instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.
I’ve deployed it on “snowflake” instances that don’t have the luxury of using Auto Scaling Groups. This gives me a little extra uptime assurance. I don’t think I’ve actually ever seen any EC2 instance get auto-recovered though.
Maybe I’m cargo-culting it, but it’s not much work to set up, so it feels like an easy (potential) win.
Update (2019-09-13): I asked on reddit for examples and Redditron-2000-4 replied. They have 1800 EC2 instances and see 1-3 automatic recoveries a month. This is a failover rate of 0.05%-0.15%. Small but significant if you’re looking for even 99.9% uptime!
You can click to set up a recover alarm for an instance on the console as per the documentation.
I like automating with CloudFormation though. I converted the resulting manually-created alarm into a template snippet some time ago, and copy paste it between projects. Let’s take a look at it.
If you have an EC2 instance in your CloudFormation template’s
Resources like so:
…then a basic recovery alarm for it would look like this:
EvaluationPeriods set to gives you 2 minutes of a failing instance before it’s recovered.
This is as recommended in the manual setup documentation.
That documentation page also shows how to create alarms to stop, terminate, or reboot instances. I’ve not needed to do any of those, but if you do, you should be able to adapt this template snippet to match.
Hope this helps you recover,
- A Minimum Viable CloudFormation Template
- Running CloudFormation Drift Detection on All Your Stacks
- Testing Boto3 with Pytest Fixtures
© 2019 All rights reserved.