-
Open Terminal and run the following command:
ssh -o ServerAliveInterval=10 -i ~/.ssh/tableau-cookbook.pem -N -L 10000:localhost:10000 hadoop@ec2-18-215-157-216.compute-1.amazonaws.com
But this could be a different command in your case.
-
Open Tableau Desktop and create a new connection using Amazon Hadoop EMR Hive.
-
Download and install ODBC drivers for your OS from: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-bi-tools.html
-
Now, we can connect Hive, as follows:
-
Then choose default as a schema and cloudfront_logs as a table and go to the sheet. As you know, every drag and drop will initialize the SQL query and will trigger our EMR. In order to avoid this, we should pause Auto Updates and create our report, then run the final query, as follows:
As a result, we were able to connect to Tableau. You can connect to the bigger eng_1M_1gram table as well. There are lots of different techniques for accessing data from Hadoop, and often it depends on use case.