Monitoring Lambda functions using CloudWatch

Throughout the book we have been talking about checking and monitoring your Lambda functions using CloudWatch. It's really not that difficult to set it up and once you have the base ready, you can reuse the same setup for monitoring almost all of your functions. So, let us quickly recap on how to monitor Lambda functions using CloudWatch!

To start off, we first need to prepare the base that we talked about. The base here is nothing more that the correct set of policies that allow your function to send its logs to CloudWatch. In most cases, your functions will require rights to create log groups and streams in CloudWatch, as well as to put log events into that particular stream. The log group creation, as well as the stream creation, is all taken care of by CloudWatch itself. Here is a simple IAM policy that will basically allow your functions to dump their logs into CloudWatch. Remember, this is just a template so you should always follow the practice of creating specific IAM policies and roles for your functions, especially if they are going to be running on a live production environment:

{ 
  "Version": "2012-10-17", 
  "Statement": [ 
  { 
    "Effect": "Allow", 
    "Action": [ 
      "logs:CreateLogGroup", 
      "logs:CreateLogStream", 
      "logs:PutLogEvents" 
    ], 
    "Resource": "*" 
  } 
  ] 
} 

Once the base is ready, you can write, package, and upload your functions to Lambda and when your functions get triggered, they will ideally start pumping logs to CloudWatch. You can view your function's logs by selecting CloudWatch Logs option from the CloudWatch dashboard and typing the name of your function in the filter text as shown as follows:

/aws/lambda/<Name_Of_Your_Function>

Select your function and you should see a log stream created already for you. If you don't see a log stream, it is probably because you haven't configured the IAM role to grant the necessary permissions to write the logs to CloudWatch.

You can then use the CloudWatch Logs dashboard to scroll and filter your application logs as you see fit. Here's a sample CloudWatch Logs dashboard view for one of our calculator functions that we created in the previous chapter:

Apart from the standard logs, you can also use CloudWatch metrics to view and analyze a few of your function's runtime parameters, such as Errors, Invocations, Duration, and Throttles; each is explained briefly as follows:

To view your function's metrics, simply select the Metrics option from the CloudWatch dashboard. Next, search for the Lambda metrics group from the All metrics tab. You can now drill further down to your individual functions by selecting either the By Resource or By Function Name options. You can alternatively view the collective metrics for all your functions using the Across All Functions option as well.

In this case, I have opted for the By Function Name option and selected the Error, Throttles, Invocations, and Duration metrics for the calculator function that we deployed from our earlier chapter. You can select any of the function metrics as you see fit. Once the metrics are selected, you will automatically be shown a simple Line graph that depicts the overall duration of the function's execution, as well as whether there were any error or throttle events. You can switch between Line graphs or Stacked area graphs by selecting the Graph options tab provided beneath your graph area:

Alternatively, you can even configure CloudWatch alarms by selecting the individual metric from the Graphed metrics tab and clicking on the adjoining alarm icon as depicted in the previous image.

Although CloudWatch provides a good assortment of services for monitoring your Lambda functions, it still had some clinks in its armor. First up, as we know, Lambda functions are more or less designed around the principles of microservices, where each service gets its own functional container for hosting. However, unlike the traditional EC2 instances that hosted monolithic apps, thousands of containers can be spun up within fractions of seconds using Lambda. This, along with the large number of other moving parts in the form of AWS services such as DynamoDB and API Gateway, can prove too much for even CloudWatch to handle. A specialized tool was required that could effectively trace each request made by functions against other services and also that could be used to analyze performance bottlenecks and remediate against them. Enter the newest kid on the block! AWS X-Ray!