Mapping Lambda with CloudWatch events

CloudWatch offers easy event mapping integrations with Lambda using which you can execute Lambda functions either based on triggered events or even schedule their execution using CloudWatch events.

The following use case uses CloudWatch events to take periodic backups of data stored in a DynamoDB table over to S3. There are different ways to export data from your DynamoDB table and store it for later by using services such as data pipeline and EMR, but these approaches make sense when you have a really huge database consisting of millions of rows of data. What if you have a minimalistic DynamoDB table with a 100 or 200 rows of data only? In that case, it makes sense to write a simple function using Lambda that executes periodically, collecting the data from the table into a CSV file, and uploading the same to S3 for archival.

To get started with the use case, we once again create the necessary project directory folder for APEX:

# mkdir ~/workdir/apex/event_driven/functions/myCWScheduleToLambdaFunc

Next, we create the function.dev.json file that contains few descriptive elements with respect to the function code:

{ 
"description": "Node.js lambda function using CloudWatch Scheduled
events as a trigger to export a dynamodb table to s3",
"role": "arn:aws:iam::<account_id>:role/myLambdaCWScheduleFuncRole",
"handler": "index.handler",
"environment": {}
}

Once created, go ahead and create the required IMA role as well. Remember to name the IAM role myLambdaCWScheduleFuncRole as done in the earlier step.

{ 
"Version": "2012-10-17",
"Statement": [
{
"Sid": "myLogsPermissions",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"*"
]
},
{
"Sid": "myS3Permissions",
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::dynamodb-backup-s3*"
]
},
{
"Sid": "myDynamodbPermissions",
"Effect": "Allow",
"Action": [
"dynamodb:Scan"
],
"Resource": [
"arn:aws:dynamodb:us-east-1:
<account_id>:table/LambdaExportToS3*"
]
}
]
}

Finally, we create the index.js file that will house the actual function's code. You can download the entire code and it's associated support files at https://github.com/PacktPublishing/Mastering-AWS-Lambda:

console.log('Loading function'); 
exports.handler = function(event, context, callback) {
var csvExport = require('dynamodbexportcsv');
var exporter = new csvExport(null, null, 'us-east-1');

exporter.exportTable('LambdaExportToS3', ['userName'], 1,
true, 250, 'dynamodb-backup-s3', '04-17-2017', function(err) {
if(err){
console.log("An error occurred while exporting the
table to s3. The error is: "+err);
return callback(err);
}
console.log("Succesfully exported the table to S3!");
callback(null, "success");
});
};

The function code is extremely streamlined and simple to use. Internally, we make use of a third-party npm module called dynamodbexportcsv that exports a DynamoDB table's records to a CSV and then writes that to local a file system or streams it to S3 as we are performing in this case. The module calls the exportTable function that takes the following parameters to execution:

  • table: The name of DynamoDB table from which we need to export the contents.
  • columns: The column name or names from where the data has to be extracted.
  • totalSegments: The number of parallel scans to run on the table.
  • compressed: The compresses the CSV file using GZIP compression.
  • filesize: The maximum size of each file in megabytes. Once a file hits this size it is closed and a new file is created.
  • s3Bucket: This is the name of the S3 bucket where you wish to stream the CSV file. If no value is provided, the file is streamed to a local directory instead.
  • s3Path: Used as a prefix for the files created.
  • callback (err): A callback which is executed when finished and includes any errors that occurred.

Before we go ahead with the deployments, ensure that you compile the necessary npm modules into the project directory using the following command:

# node index.js

With all the preparations completed, you can now go ahead and deploy your function to Lambda using the following command:

# apex --env dev deploy myLambdaCWScheduleFuncRole

With the function deployed, let us move on the creation of the CloudWatch events that will schedule and execute the functions on our behalf. To do this, log on to the AWS Management Console and select the CloudWatch option from the main page.

In the CloudWatch dashboard, select CloudWatch Events to get started. Click on Create rule to bring up the scheduler wizard. Here, you can configure a particular event source based on which you would want your function to get triggered. In this case, I've opted to configure a Schedule with a Fixed rate of execution set as 1 day. You can, optionally, even configure an equivalent cron expression for the same:

With the Event Source configured, we move on towards configuring the Targets that is, the Lambda functions. From the dropdown list, select Lambda function as the Target. Next, furnish the necessary details such as the name of the Function to trigger, the particular version/ alias of the function that you wish to invoke. Once completed, select Next to proceed. In the final step of the wizard, provide a suitable Name for the CloudWatch event and make sure to check the State option as Enabled before creating the rule.

You can now go ahead and run a few tests to make sure the code works as expected. First up, create a DynamoDB Simple Table named LambdaExportToS3 containing a column called as username. Fill out few rows of the column and based on the scheduled time that you have specified during the CloudWatch event rule configuration, the associated Lambda function would then get triggered and export the contents of the table to a CSV in S3.

You can always verify the results by using the Function's CloudWatch Logs as shown as follows:

The important point to note here again is that this technique is only going to be useful if you have a few set of records or at least the number of records that a function can consume within its maximum runtime limit of five minutes.