In the real time We had faced an issue where the memory of the EC2 Instance got Exhausted.But Unfortunately We haven’t implemented any alerting and recovery mechanism for the same.
Due to which some of the critical operations which has been configured in the instance didn’t executed.
Later We came to know about the issue.To overcome such issues in future I have implemented an solution where the health of EC2 Instance will be monitored and the necessary actions will be taken automatically.
In this guide , We will learn how to setup a Health monitoring for the EC2 Instance and recovery of the Instance In case of any failure.
- Cloudwatch Alarm
- Cloudwatch Event Rule
- SNS Topic with Subscriber
Configuring Cloudwatch Alarm
To know more about Cloudwatch and How to monitor EC2 Instances using Cloudwatch Alarms have been explained in my previous articles , Links below.
Okay , Lets go ahead and configure Cloudwatch alarm for the existing EC2 Instance.
Login to EC2 Console , Select Instances , Choose the EC2 Instance
Under Actions –> Cloudwatch Monitoring –> Add/Edit Alarms
Click Create Alarm
First , We need to configure Alerting mechanism , So that We will be alerted with a message about an issue.
So We have to configure an SNS Topic with a Subscriber , If you want to know more about SNS ,
I have published a separate article on, How to Configure SNS Topic with a Subscriber
Else You can create a SNS Topic and add a subscriber on the go , To do so,
Make sure Send a notification to is checked , Click create topic , Give a topic name and
With these recipients , Enter an email address , You will get an confirmation email from Amazon SNS to that email address , Make sure you confirm the subscription.
Check Take the action , You can Let the Cloudwatch alarm to take any of the Below Actions if the Instance is repaired or Unreachable.
Note : Recover this Instance option is available only for the Below Instance types.
A1, C3, C4, C5, C5n, Inf1, M3, M4, M5, M5a, M5n, P3, R3, R4, R5, R5a, R5n, T2, T3, T3a, X1, or X1e
If your EC2 Instance doesn’t fall under one of the above instance type , The auto-recovery option won’t be available. In this case You can choose Restart the EC2 Instance
Choose Status check failed (Any) , For atleast 2 failures for the period of 1 minute.
And then Click Create Alarm
If your Instance is one of the above Instance type , Select Recover this Instance , Give a name for the alarm and click Create Alarm
Now that the Cloudwatch Alarm if Created Successfully.
Configuring Cloudwatch Event Rule
From the previous step , If the EC2 Instance is impaired , It will automatically be recovered using Cloudwatch Alarms and Alert will be sent by SNS.
But the alert sent by SNS won’t be having enough information of the automatic recovery action.
In this case , We need to monitor the AWS Health Events for the EC2 Instances.By doing this , You will get the exact results of the automatic recovery actions.
Login to Cloudwatch Console , In the left pane , Select Events , Click Rules
Choose Create Rule, Under Event Source ,
Select the Event Pattern ,
- Service name –> Health
- Event Type –> Specific health events
Select the Specific service as EC2 , Specific event type category as issue , Specific event type code as
On the Right hand side , Under Targets , Click Add Target
In the drop down menu , Select the SNS Topic , Under Topic , Choose the topic name you have created.
Click Configure details , Give a name for the Event Rule and Click Create rule
Once We have everything in Place.
Automatic recovery results will be mailed to the AWS’s Main account as well as the subscribed email address.
The results which we get will through the Mail will have one of the below as a Message Title.
[Auto recovery] Amazon EC2 instance recovery: Success
[Auto recovery] Amazon EC2 instance recovery: Failure
We have successfully configured Cloudwatch Alarm , SNS Topic and Cloudwatch Event Rule to automatically recover the repaired EC2 Instances and sent us an email about the results of the Recovery actions.
Hope It is really helpful for you , Thanks for reading this article.
Please do check out my other publications