A primer on Stream Analytics

Event Hubs is only an event ingestion platform, so we need another service that can process these events as a stream rather than just as stored data. Stream Analytics helps in processing and examining a stream of big data, and Stream Analytics jobs help to execute the processing of events.

Stream Analytics can process millions of events per second and it is quite easy to get started with it. Azure Stream Analytics is a PaaS that is completely managed by Azure. Customers of Stream Analytics do not have to manage the underlying hardware and platform.

Each job comprises multiple inputs, outputs, and a query, which does the transformation of incoming data into new output. The whole architecture of Stream Analytics is shown in the following diagram:

In the diagram, on the extreme left are the event sources. These are the sources that produce the events. They could be IoT devices, custom applications written in any programming language, or events coming from other Azure platforms such as Log Analytics or Application Insights.

These events must first be ingested into the system, and there are numerous Azure services that can help ingest this data. We've already looked event hubs and how they help in ingesting data. There are other services, such as IoT Hub, that also help in ingesting device- and sensor-specific data. IoT Hub and ingestion is covered in detail in the chapter related to IoT Hub. This ingested data undergoes processing as it arrives in a stream, and this processing is done using Stream Analytics. The output from Stream Analytics could be a presentation platform such as Power BI, showing real-time data to stakeholders, or a storage platform such as Cosmos DB, Data Lake Storage, or Azure Storage, from which the data can be read and actioned later by Azure Functions and Service Bus queues, for instance.

Stream Analytics is capable of ingesting millions of events per second and has the capability to execute queries on top on them. At the time of writing, Stream Analytics supports the three sources of events listed here:

Input data is supported in any of the three following formats:

{
"SensorId" : 2,
"humidity" : 60,
"temperature" : 26C
}

Not only can Stream Analytics receive events, but it also provides advanced query capability for the data that it receives. The queries can extract important insights from the temporal data streams and output them. As shown in the following screenshot, there is an input and an output.

The query moves events from input to output. The INTO clause refers to the output location, and theĀ FROM clause refers to the input location. The queries are very similar to SQL queries, so the learning curve is not steep for SQL programmers:

Event Hubs also provides mechanisms for sending outputs from queries to target destinations. At the time of writing, Stream Analytics supports multiple destinations for events and query outputs. These destinations are shown in the next screenshot:

It is also possible to define custom functions that can be reused within queries. There are three options provided to define custom functions: