17 Big Data: Hadoop, Spark, NoSQL and IoT

Objectives

In this chapter you’ll:

  • Understand what big data is and how quickly it’s getting bigger.

  • Manipulate a SQLite relational database using Structured Query Language (SQL).

  • Understand the four major types of NoSQL databases.

  • Store tweets in a MongoDB NoSQL JSON document database and visualize them on a Folium map.

  • Understand Apache Hadoop and how it’s used in big-data batch-processing applications.

  • Build a Hadoop MapReduce application on Microsoft’s Azure HDInsight cloud service.

  • Understand Apache Spark and how it’s used in high-performance, real-time big-data applications.

  • Use Spark streaming to process data in mini-batches.

  • Understand the Internet of Things (IoT) and the publish/subscribe model.

  • Publish messages from a simulated Internet-connected device and visualize its messages in a dashboard.

  • Subscribe to PubNub’s live Twitter and IoT streams and visualize the data.