We have released a new Universal Recommender engine in our Template Gallery under open-source Apache 2.0 license.
Recommender systems are more and more prevalent in everyday life. Whether keeping up to date with friends and work colleagues, finding something you want to buy, a movie you might like to watch or, well, you know, Twitter.
Masters of The Universe
Just look at the predominant tech companies of our time: Facebook’s newsfeed, LinkedIn’s “People You May Know”, Amazon’s “People Who Bought This”, Netflix’s movie recommendations, Twitter’s “Who to follow”…
The list goes on. These are all powered by recommender systems. However, building a good recommender system is far from a solved problem. It brings together challenges in the domains of systems engineering, data science and user interaction design.
A (Very) Brief History of Recommender Systems
Popularised, in particular by companies such as Amazon and Netflix, recommender systems have been both an area of active research and used widely in industry for over a decade now.
Notoriously, the one million dollar Netflix Prize shone light on the area in an attempt to improve Netflix’s own Cinematch algorithm. The winning BelKor team behind the Netflix Prize showed how adding biases and temporal dynamics (i.e., more importance to more recent reviews) improved accuracy. Unfortunately, the story goes that the resulting ensemble of 100+ models was too complex to ever be deployed in production. A problem that has plagued many Kaggle data science competitions since.
For a long time companies have largely relied on collaborative filtering approaches, for example item-based K-nearest neighbour, due to being domain agnostic (i.e., works for video, news, products, etc.) and relatively simple to implement on production. Research moved away from using explicit user feedback (e.g., five star reviews) to implicit feedback (e.g., views, purchases) with the realization that ranking results were far more important than predicting ratings. In other words, predicting ratings is not the best way to rank recommendations. Also matrix factorization approaches optimized, for example, using alternative least squares (ALS) became popular and have been implemented libraries such as Apache Spark’s MLlib.
A recent renaissance in #RecSys is bearing fruits with growth in the ACM Conference Series on Recommender Systems (the 9th RecSys 2015 is this September in Vienna) in recent years, a new book “Innovations in Recommendation” published under O’Reilly Media’s Practical Machine Learning series, and community contributions including multi-modal and co-occurrence recommenders in the Apache Mahout project.
PredictionIO was originally built on Apache Mahout, re-engineered on Apache Spark since version 0.7. So it is quite fitting we have now integrated some of the latest recommender techniques from Mahout (pronounced “muh-hout” or “muh-hoot” depending who you speak to) to PredictionIO’s new “Universal Recommender” released under open-source Apache 2.0 license.
A Universal Recommender
The name “Universal” refers to the applicability in virtually any case that calls for a collaborative filtering. That might be E-commerce, News, Video, Music, etc. – virtually anywhere behavioural usage data is known.
This recommender can auto-correlate different user actions and contextual information to make better recommendations. It supports both personalized recommendations (i.e., Recommended for you) and similar item recommendations. Also allowing for filters and boosts on results based on item properties (categories, subject, etc.).
The Universal Recommender is a co-occurrence recommender that creates correlators from several user actions, events, or even profile information and performs the recommendations query using a search engine. This allows the recommender to use any part of users’ clickstream or even profile and contextual information in making recommendations.
Future versions to the Universal Recommender will allow for several forms of popularity type backfill, date range filters (scheduled for upcoming v0.2.0 release), and content-based correlators for content-based recommendation (one of the most commonly requested features for PredictionIO).
With these additions the recommender will more closely live up to the name “Universal”.
Recommender Systems in Action
Anyone interested in recommender systems today is faced with a plethora of commercial offerings for “as-a-service” recommendation and numerous libraries for implementing recommendation algorithms. It is the typical Build versus Buy dilemma.
There seems to be a trend in companies wanting to have in-house data science expertise and for many industries (Media, E-commerce, Finance) owning the systems that power such an important part of what customers experience seems like it should be a core competency, not something outsourced to a magic black box.
As mentioned before, the challenge is as much about systems (data pipelines, updating models, real-time serving), as it is data science (utilising unique data attributes, hyperparameter tuning, A/B testing recommendation strategies) and not to forget user interaction design (usability testing, user-centred design). There are several cases of recommendation systems getting companies into trouble and effective explainability of recommendations is non trivial.
The Universal Recommender and PredictionIO’s open source machine learning server intends to be easy to use for beginners and ultimately very flexible to configure for experts all without customising the code. Too many it will seem more like using a database than a blackbox or loose collection of code.
This should go some way towards making world class recommender systems in reach for many more companies, not only the masters of the RecSys universe.
Interested in the Universal Recommender for commercial use? We have an official enterprise support partner. For more details see https://docs.prediction.io/support/#enterprise-support.
Image credits: Facebook, LinkedIn, Amazon.com, Netflix, Twitter and NASA