The first part of article discussed the devices layer, gateways, and registry. The devices registered in the gateway receive commands from external applications. Mobile applications that control the devices are connected via the gateway. That scenario represents partial functionality of an IoT solution. Since most of the devices acquire data from various sensors, they stream the data either directly or through the field gateway to the data processing pipeline. In this concluding part of the series, we take a closer look at storage, processing, analysis and presentation layers of an enterprise IoT platform.
Most of the devices connected to the gateway send data or messages that contain the state reported by the sensors. In a large industrial deployment, multiple sensors may send data at a high frequency. Those large datasets with high velocity are ingested through the scale data ingestion service. Depending on the criticality of the monitored parameters, connected devices, and local gateways acquire data from a large set of sensors that is ingested in real-time. Software such as Apache Kafka – an open source, high-throughput distributed messaging system – is used as the infrastructure for ingesting the data. Devices act as the data publishers while the consumers read the ingested data and feed it into the processing pipeline.
The next step in the workflow is real-time stream processing. The raw feed originating from the devices layer is processed by the stream analytics engine. It performs two roles – analysis and transformation. The raw sensor data is analyzed in real-time to find anomalies or unusual patterns. For example, if the engine oil percentage of a connected car is reported to be dangerously low, the driver and with the mechanic at the service center needs to be immediately alerted. IoT platforms enable business analysts to easily define the thresholds, alarms, and actions for the incoming sensor data. They implement a loosely coupled architecture where non-technical users will be able to manipulate the thresholds, alarms, along with the action that are triggered. In some scenarios, the raw sensor data needs to be transformed before it is stored and processed. When the connected cars send the latitude and longitude of their current location, they may be mapped to a particular region, zone as defined by the automobile dealer. Stream processing pipeline is the best place to perform those transformations before persisting the dataset. Apache Strom and Apache Samza are preferred for IoT stream processing.
Not every data point needs to be analyzed in real-time. A majority of the data sets are captured for historical analysis and trend spotting. The data goes into multiple stores ranging from data lake to file system to time series databases to NoSQL databases to RDBMS to the traditional data warehouse. Devices such as sound recorders and surveillance cameras generate unstructured binary data that may be stored directly in the file system or object storage. Time-sensitive data goes into a time series database that is indexed by the timestamp. Semi-structured data is persisted in NoSQL databases like Apache HBase for Big Data analytics. Structured data is stored in traditional RDBMS and multi-dimensional databases.
The last stage of the data processing pipeline is the application of Hadoop based on the map reduce algorithm. Data stored in multiple sources and formats is processed by reliable Hadoop clusters. Sensor data that’s not time sensitive contributes to these datasets.
The output of this analysis delivers the most valuable insights for the analysts and decision makers. Thousands of HVAC systems deployed across hundreds of sites sending the temperature and power consumption data on an hourly basis form suitable candidates for long-term analysis. After acquiring, aggregating, storing, processing, and analyzing the datasets, the organization will be able to find the least energy efficient air conditioning system. The output from Hadoop may be stored in one of the data sources that is used for business intelligence.
The presentation tier acts as an interface to the powerful insights provided by the system. Wearables, mobiles, the web, desktop, and TV can act as the front-end to the whole IoT deployment. The presentation layer can also control the device layer by sending the commands through the gateway. For example, a plant supervisor will be able to adjust the speed of a cooling fan interactively from his mobile application. These applications consume the REST API exposed by the gateway. Some platforms also support pub/sub protocol through MQTT. With wearables gaining traction, developers are now extending mobile applications to smart watches. These apps act as companions to the IoT mobile applications offering the comfort and convenience of controlling devices from wrist watches.
The second part of the presentation layer is providing insights to the decision makers through interactive visualizations and dashboards. These dashboards contain the output from both real-time stream analysis and historic trends generated by Hadoop clusters. Many commercial business intelligence tools such as Tableau, Pentaho, Microsoft Power BI, SAP BusinessObjects, Amazon QuickSight that present complex data sets are used for the visualization. Developers combine these components with custom user experience aligned with the business scenario.
Enterprise IoT presents a significant opportunity to the software ecosystem. System integrators can build the entire stack with the combination of commercial and open source software. They can also leverage IoT PaaS offered by Amazon, Microsoft, IBM, and others.
This article was written by Janakiram MSV from Forbes and was legally licensed through the NewsCred publisher network.