Running a job on the cluster

With the connectivity established, you can now execute jobs as one or more steps on your cluster. In this section, we will be demonstrating the working of a step using a simple example which involves the processing of a few Amazon CloudFront logs. The details of the sample data and script can be found at: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs-prepare-data-and-script.html. You can use similar techniques and bases to create and execute your own jobs as well:

  1. To get started with a job, from the EMR dashboard select your cluster's name from the Cluster list page. This will bring up the newly created clusters details page. Here, select the Steps tab.
  1. Since this is going to be our first step, go ahead and click on the Add step option. This brings up the Add step dialog as shown in the following screenshot. Fill in the required information as described and, once all the fields are filled in, click on Add to complete the step's creation:
  1. Once the required fields are filled out, click on Add to complete the process.

The step now starts executing the supplied script on the EMR cluster. You can view the progress by viewing the changes in the step's status from Pending to Running to Completed, as shown in the following screenshot:

Once the job completes its execution, head back to your Amazon S3's output bucket and view the output of the processing. In this case, the output contains the number of access requests made to CloudFront, sorted by the operating system.