Recommender Systems

An experiment with Amazon reviews

by Tomás Pica / @tomaspdc / github.com/tomaspdc

What is a Recommender System?

Filters information

Discovers preferences on a subject

Measures similarities between users

Real Examples

After these abstract explanations, I'd like to give you a few examples of recommender systems "in the wild".

In the movie and music side, several companies use recommender systems heavily like Netflix, Last.fm, Pandora...

But also Google, both in the search engine and in other products like Google+ uses recommendations, eBay and Valve's Steam both use recommendations based on your historical preferences.

If what we want to recommend are not objects but other persons, Twitter recommends you who to follow based on your previous preferences as well. And of course the core of dating sites like Lulu, OkCupid and Match.com is recommending you how to date based on your tastes and historical preferences too. I focused my attention on a what most people agree that is the canonical recommender system at the core of the business: Amazon.com

Understanding
Amazon's
ecosystem

Over 144 million active customer accounts.
( ~2.27 times the population of the UK )

Over 222 million products on sale.

426 items sold per second.
(Christmas 2013)

Customers can review products they bought on a scale of 1 to 5

Where's the Data?

~34.5 million reviews.

~6.5 million users.

~2.5 million products.

Spanning from Jun 1995 to Mar 2013

Source: http://snap.stanford.edu/data/web-Amazon.html
Permission granted by Julian McAuley (jmcauley@cs.stanford.edu)
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

A raw review


product/productId: B00005X3U4
product/title: The voice of Bugle Ann
product/price: unknown
review/userId: A169ZYI77GT1F3
review/profileName: Janet K. May
review/helpfulness: 0/0
review/score: 5.0
review/time: 1288051200
review/summary: Childhood Memories
review/text: My husband remembered this as a little boy. He tried to find one in the library but they had none. What a surprise he had on his birthday and thoroughly enjoyed it again. Brought a lot of old memories and stories to be told.

Processed reviews


...
B000HEKTIW::A7EWCPD8COL3X::5::4.99::2/2::1292889600
B00005AQF1::A19CQRD6DIHMQL::5::unknown::0/3::1124409600
B000DZH89I::A2POGVCWFR6738::2::unknown::0/0::1358208000
B0007HEURA::A3C2A3D2KG1F1A::5::unknown::2/2::1266796800
B0002DJNNA::A1MFR5PGMZFQPX::1::5.93::0/1::1290297600
B003Y6ID2Y::ATGPAY0V61JO7::5::2.99::0/0::1178928000
B00029BM6A::A7M0T2XJM74DN::5::unknown::0/0::1333929600
B0000DD75Q::A1BKIHESLDFD95::4::9.89::3/3::1180656000
B743504704::A1IE6VWY0U0VNT::3::unknown::0/0::1204156800
B000E0C6SK::A16QQ78I8J29PA::4::unknown::3/3::1275264000
...

Dataset slicing

True Blind subset: Random sample of ~9.8 million reviews

Second Blind subset: Random sample of ~6.5 million reviews

Training/Test sets: 80%/20% in random incremental samples with step size of 100k reviews, from 100,500 reviews to 11 million reviews.

That's a 100500 reviews subset, a 200,500 reviews subset, etc.

Measuring Similarity

Cosine Similarity

Predicting Behaviour

Singular Value Decomposition

One SVD model was computed for each Train/Test subset.

Training Errors

5-fold averages on Mean Absolute Error and Root Mean Squared Error

Second Blind Errors

MAE and RMSE against Second Blind subset.

Selecting a model

The model minimizing the Second Blind MAE error was chosen.

MAE and RMSE were measured against the True Blind subset.


MODEL/ID: 7500500
MAE: 0.766031
RMSE: 1.596316

OK cool but... how good is that?

Baseline model: guess at random, weighted:


1 "star"   ~7.62%
2 "stars"  ~5.13%
3 "stars"  ~8.55%
4 "stars" ~19.38%
5 "stars" ~59.29%


MODEL/ID: WEIGHTED-RANDOM
MAE: 3.93757
RMSE: 4.192413


MODEL/ID: 7500500
MAE: 0.766031
RMSE: 1.596316

SVD model is ~5 times more accurate than the Weighted-Random model

Conclusions

Big Data != Better Data

Model Storage Size

Predict Offline, Recommend Online

The current model can predict the review an user will give to a product, with a mean absolute error of ~0.7 "stars".

More data doesn't necessarily means better results, although this model needed a considerable amount of reviews to bypass the "cold-start" disadvantage of recommender systems (final model was trained with ~7.5 million reviews).

An SVD model grows non-trivially in size (this one sits at ~1.1GB), and is not able to predict in real-time without a considerable amount of processing power.

Although, it's trivial to compute periodically the predictions for the users, and if needed, is also trivial to re-train the model in parallel, making it a viable solution to offer recommendations.

FIN

tomaspdc.github.io/amazon-recsys