11 Cloud and Fog Topologies

Without the cloud, IoT growth and its market would be nonexistent. Essentially, billions of endpoint devices that were historically dumb and not connected would need to manage themselves without the ability to share or aggregate data. Billions of such small embedded systems add no marginal value for customers. The value of the IoT is in the data it produces—not at a single endpoint but in thousands or millions of nodes. The cloud provides the ability to have simple sensors, cameras, switches, beacons, and actuators participate in a common language with each other. The cloud is the common denominator of the data currency.

The ubiquitous cloud metaphor refers to an infrastructure of computing services that are generally on-demand. The pool of resources (computing, networking, storage, and the associated software services) can dynamically scale up or down based on load average or quality of service. Clouds are typically large data centers that provide outward facing services to customers on a pay-for-use model. These centers provide the illusion of a single cloud resource while, in fact, there may be many geographically dispersed resources being used. This gives the user a sense of location independence. Resources are elastic (meaning scalable), and services are on-demand, generating a recurring revenue stream for the provider. Services that run in the cloud differ in their construction and deployment from traditional software. Cloud-based applications can be developed and deployed faster and with fewer degrees of environmental variability. Thus, cloud deployment enjoys rapid feature velocity.

There are accounts that the first description of the cloud originated at Compaq in the mid-1990s, where technology futurists predicted a computing model that moved computing to the web versus on host platforms. Essentially, this was the basis of cloud computing, but it wasn't until the advent of certain other technologies that cloud computing became practical for the industry.

The telecommunications industry traditionally was built on a point-to-point system of circuits. The creation of VPNs allows for secure and controlled access to clusters and has enabled private-public cloud hybrids to exist.

This chapter studies cloud architecture and the following areas:

A formal definition of cloud topologies and a vernacular
An architectural overview of the OpenStack cloud
A study of the fundamental problem with cloud-only architecture
An overview of fog computing
OpenFog reference architecture
Fog computing topologies and use cases

Throughout the chapter, several use cases will be talked about so you can understand the impact of big data semantics on IoT sensor environments.

Cloud services model

Cloud providers typically support a range of Everything as a Service (XaaS) products; that is, as a pay-for-use software service. Services include Networking as a Service (NaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Each model introduces more and more cloud vendor services. These service offerings are the value-add of cloud computing. At a minimum, these services should offset the capital expense a customer faces for purchasing and maintaining such data center equipment and replace it as an operational expense. The standard definition of cloud computing can be found via the National Institute of Standards and Technology: Peter M. Mell and Timothy Grance. 2011. SP 800-145. The NIST Definition of Cloud Computing. Technical Report. NIST, Gaithersburg, MD, United States (https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf).

The following figure illustrates the differences in management of cloud models that will be described in the subsequent sections:

Figure 1: Cloud architecture models. On-premises is where all services, infrastructure, and storage are managed by the owner.

NaaS includes services like SDP and SDN. IaaS pushes the hardware systems and storage to the cloud. PaaS includes the infrastructure but also manages the OS and system runtime or containers in the cloud. Finally, SaaS pushes all services, infrastructure, and services to the cloud provider.

NaaS

Services such as software-defined networking (SDN) and software-defined perimeters (SDP) are typical of NaaS. These products are cloud-managed and organized mechanisms for providing overlay networks and enterprise security. Rather than building a worldwide infrastructure and capital to support a corporation's communications, a cloud approach can be used to form a virtual network. This allows for the network to optimally scale resources up or down based on demand, and new network features can be purchased and deployed rapidly. This topic will be covered in depth in the related SDN chapter.

SaaS

SaaS is the foundation of cloud computing. A provider usually has applications or services on offer that expose themselves to the end user through clients such as mobile devices, thin clients, or frameworks on other clouds. From a user point of view, the SaaS layer is virtually running on their client. This software abstraction has enabled the industry to achieve substantial growth in the cloud. SaaS services include such name brand appliances as Google Apps, Salesforce, and Microsoft Office 365.

PaaS

PaaS refers to the underlying hardware and lower-layer software facilities provided by the cloud. In this case, the end user simply uses a provider's data center hardware, OS, middleware, and assorted frameworks to host their private application or services.

Middleware may be composed of database systems. Many industries have been built using cloud provider commodity hardware such as Swedbank, Trek Bicycles, and Toshiba.

Examples of public PaaS providers are IBM Bluemix, Google App Engine, and Microsoft Azure. The value of a PaaS deployment versus IaaS is the fact that you get the benefits of scalability and operating expenses (OPEX) with a cloud infrastructure, but you also have proven middleware and operating systems from the provider. This is the realm of systems such as Docker, where software is deployed as containers. If your overall application stays within the constraints of the vendor-provided framework and infrastructure, you can achieve a faster time to market since most of the componentry, OS, and middleware are guaranteed to be available.

IaaS

IaaS was the original concept of cloud services. In this model, the provider builds scalable hardware services in the cloud and provides a modicum of software frameworks to build client virtual machines. This offers the most flexibility in deployment but requires more lifting on the part of the customer.

Public, private, and hybrid cloud

Within the cloud environment stand three different models of cloud topologies that are generally used: private cloud, public cloud, and hybrid cloud. Regardless of the model, cloud frameworks should all provide the ability to dynamically scale, develop, and deploy rapidly, and have the appearance of locality regardless of proximity:

Figure 2: Left: Public cloud. Middle: Private versus public cloud. Right: Hybrid cloud.

Private clouds also imply on-premises managed components. Modern enterprise systems tend to use a hybrid architecture to ensure the safety of mission-critical applications and data on-premises, and use the public cloud for connectivity, deployment ease, and rapid development.

Private cloud

In a private cloud, the infrastructure is provisioned for a single organization or corporation. There is no concept of resource sharing or pooling outside of the owner's own infrastructure. Within the premises, sharing and pooling are common. A private cloud exists for a number of reasons, including security and assurance—that is, to guarantee information is confined solely to systems managed by the customer. To be considered a cloud, however, certain aspects of cloud-like services must exist, such as virtualization and load balancing. A private cloud may be on-premises, or it may be dedicated machinery provided by a third party exclusively for their use.

Public cloud

A public cloud is the opposite situation. Here, the infrastructure is provisioned on-demand for a multitude of customers and applications. The infrastructure is a pool of resources anyone can use at any time as part of their service-level agreements. The advantage here is that the sheer scale of cloud data centers allows for unprecedented scalability for many customers, who are only limited to how much of the service they wish to purchase. An example of public cloud is Microsoft Azure or Amazon AWS.

Hybrid cloud

The hybrid architectural model is a combination of private and public clouds. Such combinations may be multiple public clouds used simultaneously or a combination of public and private cloud infrastructure. Organizations tend to favor a hybrid model if sensitive data needs unique management, while the frontend interface can make use of the reach and scale of the cloud. Another use case is maintaining a public cloud agreement to offset conditions where scalability exceeds the corporation's private cloud infrastructure. In this case, the public cloud will be used as a load balancer until the swell of data and usage drops back to the private cloud constraints. This use case is called cloud bursting and refers to the use of clouds as contingent resource provisions.

Many corporations have public and private cloud infrastructure. This is especially prevalent in situations where frontend services and web portals may be services in the public cloud for scalability, while customer data is located in private systems for security.

The OpenStack cloud architecture

OpenStack is an open source Apache 2.0 licensed framework used to build cloud platforms. It is primarily an IaaS and has been in the developer community since 2010. The OpenStack Foundation manages the software and has support from more than 500 companies, including Intel, IBM, Red Hat, and Ericsson.

OpenStack started as a joint project between NASA and Rackspace around 2010. The architecture has all the major components of other cloud systems, including compute and load balancing; storage components, including backup and recovery; networking components, dashboards, security and identity, data and analytics packages, deployment tools, monitors and meters, and application services. These are the components that an architect would look for when choosing a cloud service.

Rather than focus on a single commercial cloud architecture, we will examine OpenStack in depth as many of the artifacts of OpenStack are used or analogous to components in commercial cloud services like Microsoft Azure.

Architecturally, OpenStack is an interwoven layer of components. The basic form of an OpenStack cloud is shown in the following figure. Each service has a particular function and a unique name (such as Nova). The system acts as a whole, providing a scalable enterprise-class cloud functionality:

All communication within the OpenStack components is done through Advanced Message Queueing Protocol (AMQP) message queues, specifically, the RabbitMQ or Qpid. Messages can either be non-blocking or blocking depending on how the message was sent. A message would be sent as a JSON object into RabbitMQ, and receivers would find and fetch their messages from the same service. This is a loosely coupled remote procedure call (RPC) method of communication between the major subsystems. The benefit in a cloud environment is that the client and server are completely decoupled, and this allows the servers to dynamically scale up or down. Messages are not broadcast but directed, which keeps traffic to a minimum. You may also recall that AMQP is a common messaging protocol used in the IoT space.

Figure 3: OpenStack top-level architectural diagram

Keystone – identity and service management

Keystone is the identity manager service of the OpenStack cloud. An identity manager establishes user credentials and login authorization. It is essentially the starting point or entry point into the cloud. This resource will maintain a central directory of users and their access rights. This is the top level of security to ensure that user environments are mutually exclusive and secure. Keystone can interface with services like LDAP in an enterprise-level directory. Keystone also maintains a token database and delivers temporary tokens to users similarly to how Amazon Web Services (AWS) establishes credentials. A service registry is used to query what products or services are available to the user programmatically.

Glance – image service

Glance provides the heart of virtual machine management for OpenStack. Most cloud services will provide some degree of virtualization and have an analog resource similar to Glance. The image service API is a RESTful service and lets a customer develop VM templates, discover available VMs, clone images to other servers, register VMs, and even move running virtual machines to different physical servers without interruption. Glance calls into Swift (the object store) to retrieve or store different images. Glance supports different styles of virtual images:

raw: Unstructured images
vhd: VMWare, Xen, Oracle VirtualBox
vmdk: Common disk format
vdi: QEMU emulator image
iso: Optical drive image (CD ROM)
aki/ari/ami: Amazon image

A virtual machine consists of the entire hard drive volume image content, including guest operating systems, runtimes, applications, and services.

Nova compute

Nova is the heart of the OpenStack compute resource management service. Its purpose is to identify and appropriate compute resources based on demand. It also has the responsibility of controlling the system hypervisor and virtual machines. Nova will work with several VMs, as mentioned, such as VMware or Xen, or it can manage containers. On-demand scaling is part and parcel of any cloud offering.

Nova is based on a RESTful API web service for simplified control.

To get a list of servers, you would GET the following into Nova through the API:

{your_compute_service_url}/servers

To create a bank of servers (ranging from a minimum of 10 to a maximum of 20), you would POST the following:

{
  "server": {
    "name": "IoT-Server-Array",
    "imageRef": "8a9a114e-71e1-aa7e-4181-92cc41c72721", "flavorRef": "1",
    "metadata": {
      "My Server Name": "IoT"
    },
    "return_reservation_id": "True", "min_count": "10",
    "max_count": "20"
  }
}

Nova would respond with a reservation_id:

{
  "reservation_id": "84.urcyplh"
}

Thus, the programming model is fairly simple in order to manage the infrastructure.

The Nova database is needed to maintain the current state of all objects in the cluster. For example, a few of the states the various servers in the cluster can include are as follows:

ACTIVE: The server is actively running.
BUILD: The server is in the process of being built and is not completed.
DELETED: The server has been deleted.
MIGRATING: The server is migrating to a new host.

Nova relies on a scheduler to determine which task to execute and where to execute it. The scheduler can associate affinity randomly or use filters to choose a set of hosts that best match some sets of parameters. The filter end product will be an ordered list of host servers to use from best to worst (with incompatible hosts removed from the list).

The following is the default filter used to assign server affinity:

scheduler_available_filters = nova.scheduler.filters.all_filters

A custom filter can be created (for example, a Python or JSON filter called IoTFilter.IoTFilter) and attached to the scheduler as follows:

scheduler_available_filters = IoTFilter.IoTFilter

To set a filter to find servers that have 16 VCPUs programmatically through the API, we construct a JSON file as follows:

{
  "server": {
    "name": "IoT_16",
    "imageRef": "8a9a114e-71e1-aa7e-4181-92cc41c72721", "flavorRef": "1"
  },
  "os:scheduler_hints": {
    "query": "[&gt;=,$vcpus_used,16]"
  }
}

Alternatively, OpenStack also allows you to control the cloud through a command-line interface:

$ openstack server create --image 8a9a114e-71e1-aa7e-4181-92cc41c72721 \
  --flavor 1 --hint query='["&gt;=","$vcpus_used",16]' IoT_16

OpenStack has a rich set of filters to allow for the custom allocation of servers and services. This allows for very explicit control of server provisioning and scaling. This is a classic and very important aspect of cloud design. Such filters include, but are not limited to:

RAM size
Disk capacity and type
IOPS levels
CPU allocation
Group affinities
CIDR affinity

Swift – object storage

Swift provides a redundant storage system for the OpenStack data center. Swift allows clusters to scale by adding new servers. The object storage will contain things such as the accounts and containers. A user's virtual machine may be stored or cached in Swift. A Nova compute node can call directly into Swift and download the image on the first run.

Neutron – networking services

Neutron is the OpenStack network management and VLAN service. The entire network is configurable and provides services such as:

Domain name services
DHCP
Gateway functions
VLAN management
L2 connectivity
SDN
Overlay and tunneling protocols
VPN
NAT (SNAT and DNAT)
Intrusion detection systems
Load balancing
Firewalls

Cinder – block storage

Cinder provides OpenStack with the persistent block storage services needed for a cloud. It acts as a storage as a service for uses cases such as databases and dynamically growing file systems, including data lakes, which are of particular importance in streaming IoT scenarios. Like other components in OpenStack, the storage system is itself dynamic and scales as needed. The architecture is built on high availability and open standards.

The functionality provided by Cinder includes the following:

Creation, deletion, and binding of storage devices to Nova compute instances
Multiple storage vendor interoperability (HP 3PAR, EMC, IBM, Ceph, CloudByte, Scality)
Support for multiple interfaces (Fibre Channel, NFS, Shared SAS, IBM GPFS, iSCSI)
Backup and retrieval of disk images
Snapshot images of a point in time
Alternative storage for VM images

Horizon

The final element covered here is Horizon. Horizon is the OpenStack dashboard. It is the single-pane-of-glass view into OpenStack for the customer. It provides a web-based view of the various components that comprise OpenStack (Nova, Cinder, Neutron, and others).

Horizon provides a user interface view of the cloud system as an alternative means over the API. Horizon is extensible so a third party can add their widgets or tools to the dashboard. A new billing component may be added, and a Horizon dashboard element can then be instantiated for customers.

Most IoTs that use cloud deployments will include some form of dashboard with similar features.

Heat – orchestration (optional)

Heat can launch multiple composite cloud applications and manage cloud infrastructure based on templates on an OpenStack instance. Heat integrates with telemetry to autoscale a system to match load needs. Templates in Heat try to comply with AWS CloudFormation formats, and relationships between resources can be specified in a similar manner (for example, this volume is connected to this server).

A Heat template may resemble the following:

heat_template_version: 2015-04-30 description: 

example template

resources: 
  my_instance:
    type: OS::Nova::Server 
    properties:
      key_name: { get_param: key_name } image: { 
      get_param: image } flavor: { get_param: flavor }
      admin_pass: { get_param: admin_pass } user_data:
        str_replace: 
        template: |
          #!/bin/bash
          echo hello_world

Ceilometer – telemetry (optional)

OpenStack provides an optional service called Ceilometer that can be used for telemetry data gathering and the metering of resources used by each service. Metering is used to collect information about usage and convert that into customer bills. Ceilometer also provides rating and billing tools. Rating is converting a billed value into equivalent currency, and billing is used to start a payment process.

Ceilometer monitors and meters different events, such as starting a service, attaching a volume, and stopping an instance. Metrics are gathered on CPU usage, the number of cores, memory usage, and data movement. All this is collected and stored in a MongoDB database.

Constraints of cloud architectures for IoT

A cloud service provider sits outside the IoT edge device and presides over the wide area network. One particular trait of the IoT architecture is that the PAN and WPAN devices may not be IP-compliant. Protocols such as Bluetooth Low Energy (BLE) and Zigbee are not IP-based, while everything on the WAN, including the cloud, is IP-based.

Thus, the role of the edge gateway is to perform that level of translation:

Figure 4: Latency effects in the cloud. Hard real-time response is critical in many IoT applications and forces processing to move closer to the endpoint device

Latency effect

Another effect is the latency and response time for events. As you get closer to the sensor, you enter the realm of hard real-time requirements. These systems are typically deeply embedded systems or microcontrollers that have latency set by real-world events. For example, a video camera is sensitive to the frame rate (typically 30 or 60 fps) and must perform a number of sequential tasks in a data flow pipeline (demosaicing, denoting, white balance and gamma adjusting, gamut mapping, scaling, and compression). The amount of data flowing through a video imaging pipeline (1080p video using 8-bits per channel at 60 fps) is roughly 1.5 GB/s. Every frame must flow through this pipeline in real time; therefore, most video image signal processors perform these transforms in silicon.

If we move up the stack, the gateway has the next best response time and is usually in the single-digit millisecond range. The gating factor in the response time is the WPAN latency and the load on the gateway. As previously mentioned in the WPAN chapter, most WPANs, such as BLE, are variable and dependent on the number of BLE devices under the gateway, scan intervals, advertisement intervals, and so on. BLE connection intervals can be as low as a few milliseconds but can vary depending on how the customer adjusts the advertisement intervals to minimize power usage. Wi-Fi signals typically have a 1.5 ms latency. Latency at this level requires a physical interface to the PAN. You wouldn't expect to pass raw BLE packets to the cloud with any hope of near real-time control.

The cloud component introduces another degree of latency over the WAN. The route between the gateway and cloud provider can take multiple paths based on the locations of the data centers and the gateway. Cloud providers usually provide a set of regional data centers to normalize traffic. To understand the true latency impact of a cloud provider, you must sample the latency ping over the course of weeks or months and across regions:

Region	Latency
US-East (Virginia)	91 ms
US-East (Ohio)	80 ms
US-West (California)	50 ms
US-West (Oregon)	37 ms
Canada (Central)	90 ms
Europe (Ireland)	177 ms
Europe (London)	168 ms
Europe (Frankfurt)	180 ms
Europe (Paris)	172 ms
Europe (Stockholm)	192 ms
Middle East (Bahrain)	309 ms
Asia Pacific (Hong Kong)	216 ms
Asia Pacific (Mumbai)	281 ms
Asia Pacific (Osaka-Local)	170 ms
Asia Pacific (Seoul)	192 ms
Asia Pacific (Singapore)	232 ms
Asia Pacific (Sydney)	219 ms
Asia Pacific (Tokyo)	161 ms
South America (São Paulo)	208 ms
China (Beijing)	205 ms
China (Ningxia)	227 ms
AWS GovCloud (US-East)	80 ms
AWS GovCloud (US)	37 ms

Data was analyzed using a service called CloudPing on Amazon AWS data centers and utilizing the US-West Client on CloudPing.info (for more information, visit http://www.cloudping.info)

An exhaustive analysis of cloud latency and response times is kept by CLAudit (for more information, visit http://claudit.feld.cvut.cz/index.php#). Other tools exist to analyze latency, such as Azurespeed.com, Fathom, and SmokePing (for more information, visit https://oss.oetiker.ch/smokeping/). These sites research, monitor, and archive TCP, HTTP, and SQL database latency across AWS and Microsoft Azure on a daily basis across many regions worldwide. This produces the best visibility to the overall latency impact you can expect from a cloud solution. For example, the following figure illustrates the round-trip times (RTT) for a test client in the US communicating with a leading cloud solution provider and its various global data centers. It is also useful to note the variability in RTT. While a spike of 5 ms may be tolerable in many applications, it may lead to failure in a hard real-time control system or factory automation:

A screenshot of a computer Description automatically generated

Figure 5: RTT and latency for a leading cloud provider from a client in the US Mountain time zone to various data centers globally

Typically, cloud latency will be in the order of tens, if not hundreds, of milliseconds without accounting for any overhead of processing for the incoming data. This should now set an expectation for the varying levels of response when building a cloud-based architecture for the IoT. Near-device architectures allow for sub-10 ms responses and also enjoy the benefit of being repeatable and deterministic. Cloud solutions can introduce variability into response times as well as an order of magnitude/greater response time than a near-edge device. An architect needs to consider where to deploy portions of the solution based on these two effects.

Cloud providers should also be chosen based on their data center deployment models. If an IoT solution is being deployed worldwide or perhaps will grow to cover multiple regions, the cloud service should have data centers located in geographically similar areas to assist in normalizing the response time. The chart reveals the great variance in latency based on a single client reaching data centers worldwide. This is not an optimal architecture.

Fog computing

Fog computing is the evolutionary extension of cloud computing at the edge. Fog represents a system-level horizontal architecture that distributes resources and services across a network fabric. These services and resources include storage components, computing devices, networking functions, and so on. The nodes can be located anywhere between the cloud and the "things" (sensors). This section details the difference between fog and edge computing and provides the various topologies and architectural references for fog computing.

The Hadoop philosophy for fog computing

Fog computing draws its analogy from the success of Hadoop and MapReduce, and to better understand the importance of fog computing, it is worth taking some time to think about how Hadoop works. MapReduce is a method of mapping, and Hadoop is an open source framework based on the MapReduce algorithm.

MapReduce has three steps: map, shuffle, and reduce. In the map phase, computing functions are applied to local data. The shuffle step redistributes the data as needed. This is a critical step as the system attempts to collocate all dependent data to one node. The final step is the reduce phase, where processing across all the nodes occurs in parallel.

The general takeaway here is that MapReduce attempts to bring processing to where the data is and not to move the data to where the processors are. This scheme effectively removes communication overhead and a natural bottleneck in systems that have extremely large structured or unstructured datasets. This paradigm applies to IoT as well. In the IoT space, data (possibly a very large amount of data) is produced in real time as a stream of data. This is the big data in IoT's case. It's not static data like a database or a Google storage cluster, but an endless live stream of data from every corner of the world. A fog-based design is the natural way to resolve this new big data problem.

Comparing fog, edge, cloud, and mist computing

We have already defined edge computing as moving processing close to where data is being generated. In the case of IoT, an edge device could be the sensor itself with a small microcontroller or embedded system capable of WAN communication. Other times, the edge will be a gateway in architectures with particularly constrained endpoints hanging off the gateway. Edge processing is also usually referred to in a machine-to-machine context where there is a tight correlation between the edge (client) and a server located elsewhere. Edge computing exists, as stated, to resolve issues with latency and unnecessary bandwidth consumption, and to add services such as denaturing and security close to the data source. An edge device may have a relationship with a cloud service at the cost of latency and a carrier; it doesn't actively participate in the cloud infrastructure.

Fog computing is slightly different from an edge computing paradigm. Fog computing, first and foremost, shares a framework API and communications standard with other fog nodes and/or an overlay cloud service. Fog nodes are extensions of the cloud, whereas edge devices may or may not involve a cloud whatsoever. Another key tenet of fog computing is that the fog can exist in layers of hierarchy. Fog computing can also load balance and steer data east-west and north-south to assist in resource balancing. From the previous section's definition of the cloud and the services it offers, you can think of these fog nodes as simply more (albeit less powerful) infrastructures in a hybrid cloud.

Mist computing is sometimes called "cloudlets". The mist serves the extreme edge of a network, usually on low-cost and low-power microcontrollers or embedded computers. They are as physically close to the sensors as possible to gather data and perform near-source computing. They interface to fog nodes through standard protocols, and mist components are usually part of the overall network topology. An example of a mist device is a smart thermostat:

Figure 6: Relationship between cloud, edge, fog, and mist components

OpenFog reference architecture

A fog architectural framework such as a cloud framework is necessary to understand the interworking and data contracts between various layers. Here, we explore the OpenFog Consortium reference architecture: https://www.openfogconsortium.org/wp-content/uploads/OpenFog_Reference_Architecture_2_09_17-FINAL.pdf. The OpenFog Consortium is a nonprofit industry group chartered with defining the interoperability standards for fog computing. While not a standards body, they influence the direction of other organizations through liaison and industry influence. The OpenFog reference architecture is a model to assist architects and business leaders in creating hardware, building software, and acquiring infrastructure for fog computing. OpenFog realizes the benefit of cloud-based solutions and the desire to move that level of computing, storage, networking, and scaling to the edge without sacrificing latency and bandwidth.

The OpenFog reference architecture consists of a layered approach from edge sensors and actuators at the bottom to application services at the top. The architecture has some similarities with typical cloud architecture such as OpenStack, but extends this further since it is more analogous to a PaaS than an IaaS. For that matter, OpenFog provides a full stack and is generally hardware agnostic, or at least abstracts the platform from the rest of the system:

Figure 7: OpenFog reference architecture

Application services

The role of the service layer is to provide the pane-of-glass and custom services needed for the mission. This includes providing connectors to other services, hosting data analytics packages, providing a user interface if needed, and providing core services.

The connectors in the application layer connect the services to the support layer. The protocol abstraction layer provides the pathway for a connector to talk directly to a sensor. Each service should be thought of as a microservice in a container. The OpenFog Consortium advocates container deployment as the proper method for software to be deployed at the edge. This makes sense when we think of the edge devices as extensions of the cloud. An example of a container deployment could look like the following diagram.

Each cylinder represents an individual container that can be deployed and managed separately. Each service then exposes APIs to reach between containers and layers:

Figure 8: Example OpenFog application. Here, multiple containers can be deployed, each providing different services and support functions.

Application support

This is infrastructure componentry to help build the final customer solution. This layer may have dependencies in terms of how it was deployed (for example, as a container). Support comes in many forms, including:

Application management (image identification, image verification, image deployment, and authentication)
Logging tools
Registration of components and services
Runtime engines (containers, VMs)
Runtime languages (Node.js, Java, Python)
Application servers (Tomcat)
Message buses (RabbitMQ)
Databases and archives (SQL, NoSQL, Cassandra)
Analytic frameworks (Spark)
Security services
Web servers (Apache)
Analytics tools (Spark, Drool)

OpenFog suggests these services should be packaged as containers, as shown in the preceding diagram. The reference architecture isn't a strict guideline, and the architect must choose the correct amount of support that can be enabled on a constrained edge device. For example, processing and resources may only allow for a simple rules engine and disallow anything like a stream processor, let alone a recurrent neural network.

Node management and software backplane

This refers to in-band (IB) management and governs how a fog node communicates with other nodes in its domain. Nodes are also managed through this interface for upgrades, status, and deployment. The backplane can include the operating system of the node, custom drivers and firmware, communication protocols and management, file system control, virtualization software, and containerization for microservices.

This level of the software stack touches nearly every other layer in the OpenFog reference architecture. Typical features of the backplane include:

Service discovery: Allows for ad hoc, fog-to-fog trust models.

Node discovery: Allows for fog nodes to be added and for joins in a cluster similar to cloud-clustering techniques.

State management: Allows for different computational models for stateful and stateless computes with multiple nodes.

Pub/sub management: Allows for data to be pushed rather than pulled. Additionally, it allows for levels of abstraction in software construction.

The OpenFog reference architecture, or, for that matter, any fog-based architecture, should allow for tiers of deployment. That is, a fog architecture isn't simply limited to a cloud connected to a fog gateway connected to a handful of sensors. In fact, there are multiple topologies dependent on the scale, bandwidth, processing load, and economies that can be designed. The reference architecture should provision itself for multiple topologies, just as the real cloud can dynamically scale and load balance itself based on demand.

Hardware virtualization

Like typical cloud systems, OpenFog defines the hardware as a virtualization layer. Applications should have no bindings to a particular set of hardware. Here, the system should load balance across the fog and migrate or add resources as needed. All hardware components are virtualized at this level, including compute, network, and storage.

OpenFog node security

The consortium defines this level as the hardware security portion of the stack. Higher-level fog nodes should be able to monitor lower-level fog nodes as part of the hierarchy in the topology (covered later in this chapter). Peer nodes should be able to monitor their east-west neighbors.

This layer also has the following responsibilities:

Encryption
Tamper and physical security monitors
Packet inspection and monitoring (east-west and north-south)

Network

This is the first component of the hardware system layer. The network module is the east-west and north-south communication module. The network layer is cognizant of the fog topology and routing. It has the role of routing physically to other nodes. This is a major difference from a traditional cloud network that virtualizes all internal interfaces. Here the network has meaning and geographical presence in an IoT deployment. For example, a parent node hosting four child nodes all attached to cameras may have the responsibility of aggregating video data from the four sources and sticking (fusing) the image content together to create a 360-degree field of view. To do this, it must know which child node pertains to which direction, and it cannot do that arbitrarily or randomly.

The network component requirements include:

Resilience in case a communication link goes down. In effect, it may need to understand how to rebuild a mesh to keep data flowing.

Translating and repackaging data from non-IP sensors to IP protocols. Examples of this are Bluetooth, Z-Wave, and wired sensors.

Handling failover cases.

Binding to a variety of communication fabrics (Wi-Fi, wired, 5G).

Providing the typical network infrastructure needed for an enterprise deployment (security, routing, and so on).

Accelerators

Another aspect of OpenFog that differs from other cloud schemas is the notion of accelerator services. Accelerators are commonplace now in the form of GPGPUs and even FPGA to provide services for imaging, machine learning, computer vision and perception, signal processing, and encryption/decryption.

OpenFog envisions fog nodes that are able to be resourced and allocated on an as-needed basis. You can force a second- or third-level node farm in the hierarchy that could provide additional computing facilities dynamically as needed.

We can even force other forms of acceleration into the fog, for example:

Nodes dedicated to bulk mass storage in the event a large data lake needs to be generated
Nodes that include alternative communication links such as satellite radio in the catastrophic event that all land-based communication is lost

Compute

The compute portion of the stack is similar to the compute functionality of the Nova layer in OpenStack. The principal functions include:

Task execution
Resource monitoring and provisioning
Load balancing
Capability queries

Storage

The storage slice of the architecture maintains the low-level interface to the fog storage. The types of storage we spoke of earlier, such as data lakes or workspace memory, may be needed at the edge for hard real-time analysis. The storage layer will also manage all the traditional types of storage devices, such as:

RAM arrays
Rotating disks
Flash
RAID
Data encryption

Hardware platform infrastructure

The infrastructure layer isn't so much an actual layer between software and hardware, but more of the physical and mechanical structure of the fog node. As fog devices often will be in harsh and remote locations, they must be rugged and resilient as well as self-reliant.

OpenFog defines the cases that need to be considered in a fog deployment, including:

Size, power, and weight characteristics
Cooling systems
Mechanical support and retention
Serviceability mechanics
Physical attack resistance and reporting

Protocol abstraction

The protocol abstraction layer binds the lowest elements of the IoT system (sensors) with other layers of the fog node, other fog nodes, and the cloud. OpenFog advocates an abstraction model to identify and communicate with a sensor device through the protocol abstraction layer. By abstracting the interface to sensors and edge devices, a heterogeneous mixture of sensors can be deployed on a single fog node, for example, analog devices that pass through digital-to-analog converters or digital sensors. Even the interfaces to the sensors can be individualized, such as Bluetooth to temperature devices in vehicles, the CAN bus for interfacing with engine diagnostic sensors, SPI interfaced sensors on various vehicle electronics, and general purpose input/output GPIO sensors to the various door and theft sensors. By abstracting the interface, the upper layers of the software stack can access such disparate devices through a standardized approach.

Sensors, actuators, and control systems

This is the bottom end of the IoT stack: the actual sensors and devices at the edge. These devices can be smart, dumb, wired, wireless, near-range, far-field, and more. The association, however, is that they are communicating in some manner to the fog node, and the fog node has the responsibility to provision, secure, and manage that sensor.

EdgeX

OpenFog represents one fog-level architecture, and EdgeX offers another example of fog-based frameworks to consider. EdgeX represents an open source project hosted by the Linux Foundation (as the LF Edge) with similar goals as OpenFog. Members in EdgeX include silicon and software providers such as ARM, ATT, IBM, HP, Dell, Qualcomm, Red Hat, and Samsung, and consortiums such as the Industrial Internet Consortium.

The goals of EdgeX include:

Building an open platform for IoT edge computing

Ensuring components are interoperable and plug-and-play

Adding to positive business value through accelerating time-to-market, allowing users to add value, and providing easily adaptable tools

Reducing the risk of IoT adoption through certifying EdgeX components, building a global ecosystem of users and contributors, and reducing cost through economies of scale

Collaborating with other open source projects and standards groups

EdgeX architecture

EdgeX is intended to be compatible with any silicon and CPU architecture (for example, x86 or ARM), any operating system, and any application environment. Additionally, EdgeX uses cloud native concepts such as the use of microservices. Microservices are services that are structured as a loosely coupled software architecture. They are independently deployable and therefore highly maintainable and testable. This allows small teams to own a service that can be deployed to construct larger applications. Because the services are loosely coupled, EdgeX offers the ability to distribute various microservice components across different edge nodes or perhaps on a single node. The architecture is flexible to how the architect wants to adapt it for their solution.

A screenshot of a cell phone Description automatically generated

Figure 9: EdgeX reference architecture. Note the use of container-based deployment and microservice-based architectural separation.

EdgeX projects and additional components

Because of the microservice architecture, more projects are being constructed to add to the EdgeX foundation. These include:

Akraino Edge Stack: An open source software stack for high-availability cloud services for edge computing.

Baetyl: Enables cloud solutions to work seamlessly (and migrate) to edge nodes using containers and serverless computing concepts. It was intended initially for smart home appliances, wearables, and other IoT devices.

EdgeX Virtual Engine (EVE): Builds a type-1 hypervisor on bare metal hardware. This project also has runtime deployment services and a container runtime engine. Using this solution, edge devices can be quickly managed using container "drops" of code. Refer to the chapter on edge computing to further understand container-based deployment engines.

Fledge: Intended for industrial IoT. Fledge is an open source framework to perform predictive maintenance and process sensor data in factory and manufacturing settings. It uses a RESTful API to develop, manage, and secure IoT applications.

Home Edge: Built by Samsung Electronics, an open source framework and platform for home-based IoT solutions.

Amazon Greengrass and Lambda

In this section, we cover an alternative fog service called Amazon Greengrass. Amazon has provided years of world-class leading cloud services and infrastructure such as AWS, S3, EC2, Glacier, and others. Since 2016, Amazon has invested in a new style of edge computing called Greengrass. It is an extension of AWS that allows a programmer to download a client to the fog, gateway, or smart sensor device.

Similar to other fog frameworks, the intent of Greengrass is to provide a solution to reduce latency and response time, recuse bandwidth costs, and provide security for the edge. Features of Greengrass include:

Cache data in case a connection is lost
Syncing data and device state to AWS cloud on reconnect
Local security (authentication and authorization services)
Message broker on the device and outside of the device
Data filtering
Command and control of device and data
Data aggregation
Operation while offline
Iterative learning
Calling any AWS service directly from Greengrass on the edge

To use Greengrass, a program will design a cloud platform in AWS IoT and define certain Lambda functions in the cloud. These Lambda functions are then assigned to edge devices and deployed to those devices running the client and authorized to execute Greengrass Lambda functions.

Currently, the Lambda functions are written in Python 2.7. Shadows are JSON abstractions in Greengrass that represent the state of the device and Lambda functions. These are synced back to AWS when desired.

Behind the scenes, communication between Greengrass on the edge and AWS in the cloud is done through MQTT.

Note that Lambda functions are not to be confused with Lambda architectures mentioned earlier. A Lambda function in the Greengrass context refers to an event-driven compute function.

An example Lambda definition used in Greengrass would look like the following. From the console in AWS, we run a command line; we run the following tool and specify the Lambda function definition by name:

aws greengrass create-function-definition --name "sensorDefinition"

This will output the following:

{
  "LastUpdatedTimestamp": "2017-07-08T20:16:31.101Z", 
  "CreationTimestamp": "2017-07-08T20:16:31.101Z", "Id": "26309147-58a1-490e-a1a6-
  0d4894d6ca1e",
  "Arn":"arn:aws:greengrass:us-west-2:123451234510:
  /greengrass/definition/functions/26309147-58a1-490e-a1a6-0d4894d6ca1e",
  "Name": "sensorDefinition"
}

We now create a JSON object with the Lambda function definitions and use the ID provided in the preceding and call create-functiondefinition-version from the command line:

Executable is the Lambda function by name.
MemorySize is the amount of memory to allocate to the handler.
Timeout is the time in seconds before the timeout counter expires.

The following is an example of the JSON object to use with a Lambda function:

aws greengrass create-function-definition-version --function-definition-id "26309147-58a1-490e-a1a6-0d4894d6ca1e". --functions
'[
{
  "Id": "26309147-58a1-490e-a1a6-0d4894d6ca1e",
  "FunctionArn": "arn:aws:greengrass:us-west-2:123451234510:
  /greengrass/definition/functions/26309147-58a1-490e- a1a6-0d4894d6ca1e",
  "FunctionConfiguration": {
    "Executable": "sensorLambda.sensor_handler", "MemorySize": 32000,
    "Timeout": 3
  }
}]'

Several other steps are necessary to provision and create a subscription between the edge node and the cloud, but the Lambda handler will be deployed. This provides an alternative view of fog computing, as provided by Amazon. You can think of this model as a method of extending cloud services to an edge node and the edge being empowered to call on any resource provided by the cloud. This is, by definition, a true fog computing platform.

Fog topologies

Fog topologies can exist in many forms, and the architect needs to consider several aspects when designing an end-to-end fog system. In particular, constraints such as cost, processing load, manufacturer interface, and east-west trafficking all come into play when designing the topology. A fog network can be as simple as a fog-enabled edge router connecting sensors to a cloud service. It can also grow in complexity to a multitier fog hierarchy with different degrees of processing ability and roles at each tier simultaneously distributing processing loads when and where necessary (east-west and north-south). The determining factors of the models are based on:

Data volume reduction: For example, is the system collecting unstructured video data from thousands of sensors or cameras, aggregating the data, and looking for particular events in real time? If so, then the dataset reduction will be significant as thousands of cameras will be producing hundreds of GBs of data daily, and fog nodes will need to distill large amounts of data into simple yes, no, danger, and safe event tokens.

The number of edge devices: If the IoT system is simply one sensor, then the dataset is small and may not justify a fog edge node at all. However, if the number of sensors grows, or in the worst case, the number of sensors is unpredictable and dynamic, then the fog topology may need to scale up or down dynamically. A use case would be a stadium venue using Bluetooth beaconing. As the audience grows for certain venues, the system must be able to scale nonlinearly. At other times, the stadium may seat only a fraction of its space and require marginal processing and connectivity resources.

Fog node capabilities: Depending on the topology structure and cost, some nodes may be better suited for connectivity to WPAN systems, while other nodes in the hierarchy may have additional processing capabilities for machine learning, pattern recognition, or image processing. An example would be edge fog nodes that manage a secure Zigbee mesh network and have special hardware for failover situations or WPAN security. Above that fog level would exist a fog processing node that would have extra RAM and GPGPU hardware to support the processing of raw data streaming from the WPAN gateways.

System reliability: An architect may need to consider forms of failure in an IoT model. If one edge fog node were to fail, another could take its place to perform an action or service. This case is important in life-critical or real-time environments. For example, in a first responder appliance, you may need edge computing devices that must be reliable, such as in a disaster response. If a fog node were to fail or lose connectivity a sibling node could take over its functions until connectivity is restored. In the same manner, extra fog nodes may be provisioned on demand; redundant nodes may be needed in fault-tolerant situations. In the case that there are no additional redundant nodes, some processing may be shared with neighbor nodes at the cost of system resources and latency, but the system will keep functioning. The final use case is where neighbor nodes act as watchdogs for each other. In the event a fog node fails or communication to the node fails, the watchdog will signal a failure event to the cloud and may perform some life-critical actions locally. A good example is when a fog node fails to monitor traffic on a highway; a neighbor node may see the point failure, alert the cloud of the event, and signal on a billboard on the highway to reduce speed.

When referencing network topologies and fog architectures, we sometimes use terms such as north-south traffic or east-west traffic. This correlates to the direction in which the data moves. North-south implies moving data from one level of the network to a parent or a child (for example, from an edge router to the cloud). East-west implies moving to peers at the same level of the network hierarchy, such as sibling nodes in a mesh network or access points in a Wi-Fi network.

The simplest fog solution is an edge processing unit (gateway, thin clients, router) placed in proximity to a sensor array.

Here, a fog node may be used as a gateway to a WPAN network or mesh and communicate with a host:

Figure 10: Simple fog topology. The edge-fog device manages an array of sensors and may communicate in an M2M manner with another fog node.

The next basic fog topology includes the cloud as the parent over a fog network. The fog node, in this case, will aggregate data, secure the edge, and perform the processing necessary to communicate with the cloud. What separates this model from edge computing is that the service and software layers of the fog node share a relationship with the cloud framework:

Figure 11: Fog to cloud topology. Here, a fog node establishes a link to a cloud provider

The next model uses multiple fog nodes responsible for services and edge processing, and each connects to a set of sensors. A parent cloud provisions each fog node as it would a single node. Each node has a unique identity so it can provide a unique set of services based on geography. For example, each fog node may be at a different location for a retail franchise. Fog nodes may also communicate and traffic data east-west between edge nodes. An example use case would be a cold storage environment where a number of coolers and freezers need to be maintained and governed to prevent food spoilage. A retailer may have multiple coolers in multiple locations, all managed by a single cloud service, but working with fog nodes at the edge:

Figure 12: Multiple fog nodes with a single master cloud

Another model extends topology with the ability to communicate securely and privately to multiple cloud vendors from multiple fog nodes. In this model, multiple parent clouds may be deployed. For example, in smart cities, multiple geographical areas may exist and be covered by different municipalities. Each municipality may prefer one cloud provider over the other, but all municipalities are required to use one approved and budgeted camera and sensor manufacturer. In that case, the camera and sensor manufacturer would have their single cloud instance coexist with multiple municipalities.

The fog nodes must be able to steer data to multiple cloud providers:

Figure 13: Multiple fog nodes with multiple cloud providers. Clouds could be a mixture of public and private clouds.

Fog nodes also do not need a strict one-to-one relationship as regards bridging sensors to clouds. Fog nodes can be stacked, tiered, or even kept in stasis until needed. Tiering layers of fog nodes above each other may sound counterintuitive if we are trying to reduce latency, but, as mentioned previously, nodes can be specialized. For example, nodes closer to the sensors may provide hard real-time services or have cost constraints requiring them to have the minimal amount of storage and compute. A tier above them may provide the compute resources needed for aggregate storage, machine learning, or image recognition through the use of additional mass storage devices or GPGPU processors. The following example illustrates a use case in a city lighting scenario.

Here, a number of cameras sense moving pedestrians and traffic; the fog nodes closest to the cameras perform aggregation and feature extraction and pass those features upstream to the next tier. The parent fog node retrieves the features and performs the necessary image recognition through a deep learning algorithm. If an event of interest is seen (such as a pedestrian walking at night along a path), the event will be reported to the cloud. The cloud component will register the event and signal to a set of streetlights in the pedestrian's vicinity to increase illumination. This pattern will continue as long as the fog nodes see the pedestrian moving. The end goal is the overall energy saved by not illuminating every streetlamp to full intensity at all times:

Figure 14: Multitier fog topology: Fog nodes stack in a tier hierarchy to provide additional services or abstractions

Summary

Collecting, analyzing, and acting upon data and deriving meaningful conclusions from a sensor is the goal of IoT. When we scale to thousands or to millions and potentially billions of objects communicating and streaming data nonstop, we have to introduce advanced tools to ingest, store, marshal, analyze, and predict meaning from this sea of data. Cloud computing is one element in enabling that service in the form of clusters of scalable hardware and software. Fog computing brings cloud processing closer to the edge to resolve issues with latency, security, and communication costs. Both technologies work together to run analytics packages in the form of rules engines with complex event processing agents. Choosing the model of cloud providers, frameworks, fog nodes, and analytics modules is a significant task and much literature goes deep into the semantics of programming and building these services. An architect must understand the topology and the end goal of the system to build a structure that meets today's needs and scales into the future.

In the next chapter, we will discuss the data analytics portion of IoT. The cloud certainly can host a number of analytic functions. However, we need to be prepared to understand that certain analysis should be performed on the edge close to the data source (sensor) or, if it makes more sense, in the cloud (using long-term historical data).