Predictive Analytics on Streaming Data: Value Proposition for Data Scientists

Often data scientists experience great difficulty in conveying the business value of the predictive models that they build.

This makes it difficult for them to convince the business professionals (analysts, strategy/marketing/operations managers, executives, etc.) to adopt, leverage, and put the predictive models to the best use for greater business outcomes.

One of the primary reasons that this disconnect between the data science and the business worlds exists is the lack of a common predictive analytics platform  – a platform for data scientists to deploy the predictive models and the business professionals to run the models with new data for insights and visualization. Especially, data scientists lack access to a platform to deploy the predictive model, score the streaming event data against the model, and visualize the predictions in real-time.

Data Scientists use a variety of tools – open-source (e.g., R, Python, Weka, Mahout, Spark MLlib, etc.) or commercially licensed (e.g., SAS, SPSS, Matlab, etc.) – to build and validate the predictive models using historical data.

Most business professionals do not have a hands-on understanding of the data science tools, techniques, models, and algorithms. They are often confined to using only business intelligence (BI) tools for data insights, and these BI tools do not have any predictive modeling and scoring capabilities.

When business professionals require some specific insights using predictive analytics, they share the necessary requirements with the data engineering and the data science teams and expect the insights to be returned to them in some visual format. For example, business professionals might require a list of customers who are predicted to churn in the coming quarter sorted by the probability of the churn.

In such a case, the data engineering team prepares the recent data about the customers with all the relevant attributes, supplies that data to the data science team who will feed it as a batch to the pre-built predictive model in the modeling environment itself (e.g., R or SAS), generate the required list of customers with their churn probabilities, and hand over that list to the business professionals for some business action.

In this cycle of business requirements to data preparation to actionable insights, the business professionals do not have a clear view into the data scientists’ efforts in building the models and generating the insights; they do not have a handle on the data science tools and the relevant models to be able to run the predictions on their own with a range of intuitive variations in data and scoring options (for example, predicting demand for the next 3 days or 7 days or 14 days depending on the business need and also with appropriate confidence intervals; or simply predicting the top 5 or top 10 products/items to recommend to consumers depending on the business model). Hence, their limited exploitation of the predictive models built by data scientists make them under-appreciate the science behind the models and the potential contributions of the data scientists.

Vitria is set to achieve a paradigm shift in this not-so-well-connected, yet necessary, relationship between the data science and the business worlds. The predictive analytics capabilities within the Vitria Operational Intelligence (OI) platform make the data scientists and the business professionals collaborate more than ever.

The data scientists can transfer the predictive models from the parent model building environment and very quickly deploy them in the operational intelligence environment that the business professionals are more familiar with. This shift to a commonly understood platform to score the new data using predictive models enables the two disparate teams to work together – data scientists can quickly demonstrate the business insights via visualization of the predicted results in the OI platform, especially on the continuous streaming event data, and the business professionals can better understand the realm of possibilities with the predictive models in real-time.

Another significant value addition for the data scientist is that the exporting and deployment of the predictive model into Vitria OI for run-time prediction lifts the restriction on the choice of the model building tool to a large extent. Data scientists can build the model in R and export the model as a PMML file or as an R object, or they can use SAS/SPSS or other tools that can export PMML. Either way, the model in any of these formats can be imported and deployed in Vitria OI for run-time prediction by the business professionals, regardless of how and where the data scientist has built the predictive model.

The model building environments (e.g., R, SAS, etc.) consume only a batch of aggregated data for modeling as well as for testing and prediction; they are not designed to consume real-time data for prediction. Hence, data scientists have never had the advantage of visualizing how their predictive models perform on the real-time streaming event data. With the predictive model deployed in Vitria OI, data scientists can also visually observe the behavior and the performance of the predictive model on streaming event data in real-time.

2 thoughts on “Predictive Analytics on Streaming Data: Value Proposition for Data Scientists

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>