Recommender Systems

A Recommender Systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered.

It's a form of personalisation that is related to instance based learning (uses a similarity function).

Examples, Amazon or eBay.

Content-Based Recommendation

Users are recommended items that are similar to past choices. The idea comes from information retrieval and requires a profile of the content/description of items.

c = user, s = items

(1)
\begin{equation} u(c,s) = score(profile(c), content(s)) \end{equation}

e.g.

(2)
\begin{align} u(c,s) = cosineDistance(\vec{w_c}, \vec{w_s}) = \frac{\vec{w_c}\times\vec{w_s}}{||\vec{w_c}||^2\times||\vec{w_s}||^2} \end{align}

$\vec{w_c}$ is a vector summarising c's past choices, and $\vec{w_s}$ is a vector of the terms describing s.

Advantages

  • Well-understood techniques from information retrieval
  • Can extract latent features from text analysis (determine underlying themes)

Disadvantages

  • May not have access to the content
  • Can over-specialise (not branch out)
  • What to do with new users?

Collaborative-Based Recommendation

Users are recommended items that users with similar tastes have chosen.

The two main methods are memory-based and model-based collaborative filtering (CF).

Memory-Based CF

(3)
\begin{align} r_{c,s} = aggregate_{c' \in C} r_{c', s} \end{align}

Where c is the user, c' is other users and $r_{c,s}$ is the rating for the item s by the user c.

Can use the weighted sum as the aggregation:

(4)
\begin{align} r_{c,s} = k\sum_{c' \in C} similarity(c, c') \times r_{c',s} \end{align}

Where k is a normalising factor and the similarity function can be correlation, cosine distance, item-based similarity, etc.

Model-Based CF

It's like a nearest-neighbour method, and it uses other ML methods to build a model to predict the rating from database examples.

Advantages

  • Works well in practice
  • Doesn't require content descriptions

Disadvantages

  • Still new user problem - no 'taste' developed yet
  • New item problem - must be rated before can be used
  • Grey sheep = insufficiently individual users
  • Black sheep = too individual users

Hybrid Recommender Systems

The key idea is to combine memory and model based approaches:

  • "cold-start" (new user) problem - provide a default model to predict before user activity
  • "sparsity" problem - use the model to predict missing values

Learning these models may be difficult/expensive.