In the first part of this 4-part blog series, we presented a business use case wherein retailers are able to engage customers at the right moment armed with insights in real-time (to the second or minute) and supported with accurately predicted individual customer preferences. We also gave an overview of relevant technologies that not only have to solve individual disparate problems, but also have to come together and act in unison. In the second part, we discussed the details of the Streaming Analytics Engine, which is leveraged for behavioral analytics on the dynamic tracking data to obtain insights about individual customer activity in real-time.

As the real-time monitoring of a customer in a zone is set in place, the second piece of the puzzle is the predictive analytics for recommendations/messages, personalized at the individual customer level to be delivered when the customer is in the zone.

Some popular approaches for product/item recommendations have been based on collaborative filtering techniques. The fundamental assumption in these techniques, as it is suitably referred to as a “collaborative” technique, is that an individual consumer tends to view/like/purchase the same items that other consumers with similar patterns of views/likes/purchases have also done. Various algorithms/models in the realm of collaborative filtering techniques vary in their level of efficiency in extracting the “product/item similarity” between two consumers. Matrix factorization method, particularly the Alternate Least Squares (ALS) method, has been a preferred choice among collaborative filtering techniques. The ALS method works well for the less sparse implicit data matrix, which is a user-item preference matrix constructed using the historical observation of an individual consumer across product items. The matrix will be of size *n *x* m*, with *n* number of users and *m* number of items. Each element in the matrix is a user-item affinity score, which can be built either from a user’s implicit actions (view, like, add-to-cart, etc.) or explicit feedback (ratings, reviews, etc.). Most often, the former matrix is less sparse than the latter. The matrix can be expressed as a set of observations, with each observation having a tuple <user_id, item_id, affinity_score>.

The construction of the matrix is one of the mediums through which cross-channel information about a particular customer can be imbibed. For example, if a customer views and expresses interest in an item while shopping in the physical store and then purchases it online, a collective affinity score has to be computed with appropriate weights for each action. Similarly, if a customer likes an item on social media and then purchases it in the physical store, again the collective affinity score has to be computed. This requires the retailer’s data science and data engineering teams to be able to merge the data sets from disparate channels with appropriate common customer identification information across the data sets.

Once the user-item *m* x *n* matrix U is constructed, the ALS matrix factorization method is used to factorize the matrix into two vectors, an n x k user vector P and an m x k item vector Q, where k is the number of features/dimensions, such that U = P * Transpose(Q). The user vector P has n rows, with each row defining a user with k features/dimensions. In the conventional method, for a given user the most similar user among remaining n-1 users is the one with nearest values of the k attributes (for simplicity, it can be the Euclidean distance between the two sets of k values). The above method can be challenging for one main reason – the same scalability reason that user-user collaborative filtering models are less preferred compared to item-item collaborative filtering techniques. In addition to the fact that user properties are less static than item properties. That is, for most organizations the number of users will be in the order of millions while the number of items will be in the orders of tens or hundreds of thousands, and so user-user computations are less scalable than item-item computations. To overcome the above computational scalability problem, another approach is to create clusters of users based on various predefined attributes. Once the user base is divided into C clusters (using k-means or other preferred techniques), one can create a user-item matrix separately for each cluster. Then for each matrix, perform the matrix factorization to derive the user vector and the item vector. The user vector can now be used to find a most similar user within the same cluster of any given user. This approach alleviates the scalability problem, as the user-user computations are performed on the matrices of reduced size.

As in any problem in data science, the building of a good model for a recommender system will depend on a lot of experimentation, rigorous offline testing, tuning the model, conducting suitable A/B testing, and then repeating for continuous improvement. The above discussed approaches are some possible methodologies only to build the model; the final model will depend on the actual data and also the experimentation that involves tuning the various parameters, including the number of features *k* for matrix factorization and the number of clusters C in the second approach. One has to define the metrics, such as precision, recall, RMSE, to measure the performance of the model and have the patience to keep experimenting.

Regardless of the approach, the recommender system will produce for each individual user an ordered set of recommended items based on the items not yet “touched by the user,” but have been “touched” by other similar users. It is important to note that the collaborative filtering techniques work well for user-item affinity, where the items (consumer products, movies, etc.) have sufficient longevity to be “touched by” a considerable number of users. But, what if the items are promotions or coupons or other similar things with a short lifecycle to map relevant promotions to users? One approach is to leverage the triangulation of “user <=> products” and “promotions <=> products” and thus deriving the “user <=> promotions” affinity.

There are various open-source tools, such as R, Apache Mahout, etc. or other licensed applications, that can consume the user-item affinity matrix as input data and produce for a large batch of users an ordered set of recommendations at the individual user level. The set of recommendations can be periodically updated, as and when the tool completes a cycle of computation with the historical data aggregated so far. The computational cycle duration can depend on the size of the matrix, the computational resources (Hadoop, Spark, server instances, etc.), and other factors.

With the ordered set of predicted products/promotions available ready and the EPN tool deployed to monitor in real-time the activity of an individual consumer in the zone, how do we combine these two to deliver the right message at the right time? We will discuss that in the final part of this blog series. Stay tuned….