Architecture of Event Hubs

There are three main components of the Event Hubs architecture: the Event Producers, the Event Hub, and the Event Consumer, as shown in the following diagram:

Event Producers generate events and send them to the Event Hub. The Event Hub stores the ingested events and provides that data to the Event Consumer. The Event Consumer is whatever is interested in those events, and they connect to the Event Hub to fetch the data.

Event hubs cannot be created without an Event Hubs namespace. The Event Hubs namespace acts as a container and can host multiple event hubs. Each Event Hubs namespace provides a unique REST-based endpoint that is consumed by clients to send data to Event Hubs. This namespace is the same namespace as is needed for Service Bus artifacts, such as topics and queues.

The connection string of an Event Hubs namespace is composed of its URL, policy name, and the key. A sample connection string is shown next:

Endpoint=sb://demoeventhubnsbook.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=M/E4eeBsr7DAlXcvw6ziFqlSDNbFX6E49Jfti8CRkbA=

This can be found in the Shared Access Signature (SAS) menu item of the namespace. There can be multiple policies defined for a namespace, with each having different levels of access to the namespace. The three levels of access are as follows:

Manage: This can manage the event hub from an administrative perspective. It also has rights for sending and listening to events.
Send: This can write events to Event Hubs.
Listen: This can read events from Event Hubs.

By default, the RootManageSharedAccessKey policy is created when creating an event hub, as shown in the following screenshot. Policies help in creating granular access control on Event Hubs. The key associated with each policy is used by consumers to determine their identity. Additional policies can be created:

Event hubs can be created from the Event Hubs namespace service by clicking on Event Hubs in the left-hand menu and clicking on + Event Hub in the resultant screen:

Provide information about the Partition Count and Message Retention values, along with the name to create the event hub with. Select Off for Capture:

Separate policies can be assigned to event hubs by adding a new policy at the event hub level.

After creating the policy, the connection string is available from the Secure Access Signature left-menu item on the Azure portal.

Since a namespace can consist of multiple event hubs, the connection string for an individual event hub would be similar to what's shown next. The difference is in the key value and the addition of EntityPath with the name of the event hub:

Endpoint=sb://azuretwittereventdata.servicebus.windows.net/;SharedAccessKeyName=EventhubPolicy;SharedAccessKey=rxEu5K4Y2qsi5wEeOKuOvRnhtgW8xW35UBex4VlIKqg=;EntityPath=myeventhub

We had to keep the Capture option Off while creating the event hub. It can be switched on after creating the event hub. It helps to save events to Azure Blob Storage or an Azure Data Lake Storage account automatically, along with the configuration for the size and time interval:

We did not cover the concepts of partitions and message retention options when creating an event hub.

Partition is an important concept related to the scalability of any data store. Events are retained within event hubs for a specific period of time. If all the events are stored within the same data store, then it becomes extremely difficult to scale that data store. Every event producer will connect to the same data store and send their events to it. Compare this with a data store that can partition the same data into multiple smaller data stores, each uniquely identified with a value. The smaller data store is called a partition, and the value that defines the partition is known as the partition key. The partition key is part of the event data.

Now, the event producers can connect to the event hub, and based on the value of the partition key, the event hub will store the data in an appropriate partition. This will allow the event hub to ingest multiple events at the same time in parallel.

Deciding on the number of partitions is a crucial aspect for the scalability of an event hub. The diagram shows that ingested data is stored in the appropriate partition internally by Event Hubs using the partition key:

It is important to understand that the same partition might have multiple keys. The user decides how many partitions are required, and the event hub internally decides the best way to allocate partition keys between them. Each partition stores data in an orderly way using a timestamp, and newer events are appended toward the end of the partition.

It is important to note that it is not possible to even change the number of partitions once the event hub is created.

It is also important to remember that partitions also help in bringing parallelism and concurrency for applications reading the events. If there are 10 partitions, 10 parallel readers can read the events without any degradation in performance.