Why Lakes are Cool but Streams are Cooler

By its nature, the Internet of Things (IoT) generates an enormous amount of data. Data lakes play a pivotal role in capturing and organizing all of that data. However, for time-critical IoT scenarios, data lakes are not the end of the story. Streams are needed to process the data in real-time before the time-sensitive value in the data perishes. While lakes are still totally cool, streams are cooler!

Lakes are cool

IoT data is growing much faster than all other data sources combined, and has become the biggest of big data. Data lakes enable the capture and storage of this vast amount of IoT-generated data. Hence, data lakes are required for IoT. But with data lakes, the processing paradigm is “store first, analyze later.” While this approach is good for historical analytics and for machine learning, it is not good for the many IoT use cases that are time sensitive, such as predictive failure, fraud detection, resource optimization, and one-to-one marketing. For these use cases, the valuable insights perish while the data simply rests in a data lake.

Streams are cooler

To capture the perishable insights hidden within IoT data, a new data processing paradigm is needed –stream processing. Stream processing analyzes data while still in motion, as soon as it is ingested, and before being stored in data lakes. Hence, analytics are computed immediately, in real-time, and yield time-sensitive insights in time to capitalize on their value.

data lakes vs data streams

Stream processing is fast, because it all happens in-memory using special incremental algorithms. These incremental algorithms, as opposed to traditional bulk data processing techniques used with lakes, are the key to analyzing big IoT data, in real-time.

How streams and lakes complement each other

Just because data lakes don’t allow real-time analytics on their own it doesn’t mean they have no role in time-sensitive analytics. The historical data in lakes can be used by machine learning tools to build predictive models. Using stream processing, these predictive models can be operationalized to produce real-time predictions. Also, historical data in lakes can be used to produce baseline analytics, such as standard operational KPIs for IoT devices. These historical baselines can then be compared with real-time KPIs to provide additional context for detecting unusual device behavior.

More ways in which streams are cooler

Clearly stream analytics, that is real-time analytics over streaming data, is useful over “fast data” that is being reported in near real-time. However, streaming analytics is also very useful even for “slow data”, where devices report in on an hourly, daily or weekly cycle. Despite their slow reporting cycles, in sizable IoT networks, a significant amount of new data is generated with each passing moment, and hidden within that data may be an important trend or an early indicator of a problem or opportunity. Stream analytics can detect these important trends and early indicators quickly, enabling faster actions and more options to mitigate an emerging problem or capitalize on a nascent opportunity. Read The Power of Fast Analytics over Slow Data for more about this.

Stream Analytics are a great way to future-proof businesses against ever faster data cycles. For example, while today’s data cycles may be daily, next year it may go to hourly, and the year after down to minutes. No matter how fast data cycles become, fast analytics powered by stream processing can always keep up.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>