공동 필터링에서 SVD를 어떻게 사용합니까?

30

SVD가 공동 필터링에 사용되는 방식과 약간 혼동됩니다. 소셜 그래프가 있고 가장자리에서 인접 행렬을 만든 다음 SVD (정규화, 학습 속도, 희소성 최적화 등을 잊어 버리십시오)를 사용한다고 가정하면이 SVD를 사용하여 권장 사항을 개선하는 방법은 무엇입니까?

소셜 그래프가 인스 타 그램에 해당하고 소셜 그래프만을 기반으로 서비스에서 사용자를 추천해야 할 책임이 있다고 가정 해 봅시다. 내가 먼저 인접 행렬 구축 할 의 SVD를 취할를 첫 번째 선택 후 어떤 고유 값을? $\mathbf A$ $(m\times m)$ $\mathbf A = \mathbf{U s V}$ $k$

I would presumably create a new set of matrices:

\begin{aligned} U_{n e w} & \sim m \times k \\ s_{n e w} & \sim k \times k \\ V_{n e w} & \sim k \times m \end{aligned}

$\begin{align} \mathbf U_{new} &\sim m\times k \\ \mathbf s_{new} &\sim k\times k \\ \mathbf V_{new} &\sim k\times m \end{align}$ then what does one do?

I've looked on the web, and most links focus on calculating the SVD, but no one tells you what to do with it. So what should I do?

svd recommender-system

— Vishal
소스

1

This may answer your question: datascience.stackexchange.com/a/16523

— avli

7

However: With pure vanilla SVD you might have problems recreating the original matrix, let alone predicting values for missing items. The useful rule-of-thumb in this area is calculating average rating per movie, and subtracting this average for each user / movie combination, that is, subtracting movie bias from each user. Then it is recommended you run SVD, and of course, you would have to record these bias values somewhere, in order to recreate ratings, or predict for unknown values. I'd read Simon Funk's post on SVD for recommendations - he invented an incremental SVD approach during Netflix competition.

http://sifter.org/~simon/journal/20061211.html

I guess demeaning matrix A before SVD makes sense, since SVD's close cousin PCA also works in a similar way. In terms of incremental computation, Funk told me that if you do not demean, first gradient direction dominates the rest of the computation. I've seen this firsthand, basically without demeaning things do not work.

— BBDynSys
소스

24

I would like to offer a dissenting opinion:

Missing Edges as Missing Values

In a collaborative filtering problem, the connections that do not exist (user $i$ has not rated item $j$ , person $x$ has not friended person $y$ ) are generally treated as missing values to be predicted, rather than as zeros. That is, if user $i$ hasn't rated item $j$ , we want to guess what he might rate it if he had rated it. If person $x$ hasn't friended $y$ , we want to guess how likely it is that he'd want to friend him. The recommendations are based on the reconstructed values.

When you take the SVD of the social graph (e.g., plug it through svd()), you are basically imputing zeros in all those missing spots. That this is problematic is more obvious in the user-item-rating setup for collaborative filtering. If I had a way to reliably fill in the missing entries, I wouldn't need to use SVD at all. I'd just give recommendations based on the filled in entries. If I don't have a way to do that, then I shouldn't fill them before I do the SVD.*

SVD with Missing Values

Of course, the svd() function doesn't know how to cope with missing values. So, what exactly are you supposed to do? Well, there's a way to reframe the problem as

"Find the matrix of rank $k$ which is closest to the original matrix"

That's really the problem you're trying to solve, and you're not going to use svd() to solve it. A way that worked for me (on the Netflix prize data) was this:

Try to fit the entries with a simple model, e.g., $\hat{X}_{i,j} = \mu + \alpha_i + \beta_j$ . This actually does a good job.
Assign each user $i$ a $k$ -vector $u_i$ and each item $j$ a $k$ -vector $v_j$ . (In your case, each person gets a right and left $k$ -vector). You'll ultimately be predicting the residuals as dot products: $\sum u_{im}v_{jm}$
Use some algorithm to find the vectors which minimize the distance to the original matrix. For instance, use this paper

Best of luck!

_{* : What Tenali is recommending is basically nearest neighbors. You try to find users who are similar and make recommendations on that. Unfortunately, the sparsity problem (~99% of the matrix is missing values) makes it hard to find nearest neighbors using cosine distance or jaccard similarity or whatever. So, he's recommending doing an SVD of the matrix (with zeros imputed at the missing values) to first compress users into a smaller feature space and then do comparisons there. Doing SVD-nearest-neighbors is fine, but I would still recommend doing the SVD the right way (I mean... my way). No need to do nonsensical value imputation!}

— Stumpy Joe Pete
소스

This was actually the response I was looking for, and wanted to hear :) Thank you very much!

— Vishal

Oddly the question asked "I've looked on the web, and most links focus on calculating the SVD, but no one tells you what to do with it. So what should I do?" or for that matter the title says, " How do I use the SVD in collaborative filtering?"

— TenaliRaman

Yep, and my answer summarized how I use it in collaborative filtering.

— Stumpy Joe Pete

1

+1, as I understand it, you're not compute the low rank matrix using SVD, but a iterative method to minimize the squared error, right? However, if I do want to use SVD, then I should fill in the missing entries with some values before I do the matrix factorization, right?

— avocado

1

So when they say they used svd, they didn't mean using

s v d ()

$svd()$ to do the matrix factorization? The reason why they say svd, is because the result or the basic idea behind this iterative solution resembles svd?

— avocado

14

The reason no one tells you what to do with it is because if you know what SVD does, then it is a bit obvious what to do with it :-).

Since your rows and columns are the same set, I will explain this through a different matrix A. Let the matrix A be such that rows are the users and the columns are the items that the user likes. Note that this matrix need not be symmetric, but in your case, I guess it turns out to be symmetric. One way to think of SVD is as follows : SVD finds a hidden feature space where the users and items they like have feature vectors that are closely aligned.

So, when we compute $A = U \times s \times V$ , the $U$ matrix represents the feature vectors corresponding to the users in the hidden feature space and the $V$ matrix represents the feature vectors corresponding to the items in the hidden feature space.

Now, if I give you two vectors from the same feature space and ask you to find if they are similar, what is the simplest thing that you can think of for accomplishing that? Dot product.

So, if I want to see user $i$ likes item $j$ , all I need to do is take the dot product of the $i$ th entry in $U$ and $j$ th entry in V. Of course, dot product is by no means the only thing you can apply, any similarity measure that you can think of is applicable.

— TenaliRaman
소스

Two questions: 1) Do you fill missing values with zero (item j not reviewed by user i) before running SVD? 2) How do you compute if a new user will like item j?

— B_Miner

1

@B_Miner Hi, sorry for the delayed response. The answers: 1) Well, yes, we usually fill the missing values with zero before running SVD. However, I usually recommend to fill it with non-zero rating - for example, you can fill the missing values by the average rating that the user has given so far. 2) SVD-based approach is for only known users and known items. It cannot handle new users or new items. And how can it, if a new user comes in, we don't know anything about him in this framework to predict.

— TenaliRaman

1

@B_Miner If you want to work with new users/items, we have to assume that we have access to some user features and item features. Then, you can use a more sophisticated model like PDLF (Predictive Discrete Latent Factor model). This will allow you to handle new users/items because it works with a known feature space.

— TenaliRaman

@TenaliRaman Not sure if you'll see this, but here goes. So I've been using topic models (LDA) to build features for users (literally users) based on documents they've read. I just average the topic vectors to get a "user-topic vector". I want to do something similar with SVD (or ALS possibly). Let's say I compute SVD using known user-item data, and then I have new users that "visit" several known items. In this case the item vectors are known but the user vectors are unknown. Can I use the item vectors to calculate the user vector or do I need to compute SVD again using all data?

— thecity2

great answer tenali. very helpful for understanding the concept

— Nihal

3

This is to try and answer the "how to" part of the question for those who want to practically implement sparse-SVD recommendations or inspect source code for the details. You can use an off-the-shelf FOSS software to model sparse-SVD. For example, vowpal wabbit, libFM, or redsvd.

vowpal wabbit has 3 implementations of "SVD-like" algorithms (each selectable by one of 3 command line options). Strictly speaking these should be called "approximate, iterative, matrix factorization" rather than pure "classic "SVD" but they are closely related to SVD. You may think of them as a very computationally-efficient approximate SVD-factorization of a sparse (mostly zeroes) matrix.

Here's a full, working recipe for doing Netflix style movie recommendations with vowpal wabbit and its "low-ranked quadratic" (--lrq) option which seems to work best for me:

Data set format file ratings.vw (each rating on one line by user and movie):

5 |user 1 |movie 37
3 |user 2 |movie 1019
4 |user 1 |movie 25
1 |user 3 |movie 238
...

Where the 1st number is the rating (1 to 5 stars) followed by the ID of user who rated and and the movie ID that was rated.

Test data is in the same format but can (optionally) omit the ratings column:

 |user 1 |movie 234
 |user 12 |movie 1019
...

optionally because to evaluate/test predictions we need ratings to compare the predictions to. If we omit the ratings, vowpal wabbit will still predict the ratings but won't be able to estimate the prediction error (predicted values vs actual values in the data).

To train we ask vowpal wabbit to find a set of N latent interaction factors between users and movies they like (or dislike). You may think about this as finding common themes where similar users rate a subset of movies in a similar way and using these common themes to predict how a user would rate a movie he hasn't rated yet.

vw options and arguments we need to use:

--lrq <x><y><N> finds "low-ranked quadratic" latent-factors.
<x><y> : "um" means cross the u[sers] and m[ovie] name-spaces in the data-set. Note that only the 1st letter in each name-space is used with the --lrq option.
<N> : N=14 below is the number of latent factors we want to find
-f model_filename: write the final model into model_filename

So a simple full training command would be:

    vw --lrq um14 -d ratings.vw -f ratings.model

Once we have the ratings.model model file, we can use it to predict additional ratings on a new data-set more_ratings.vw:

    vw -i ratings.model -d more_ratings.vw -p more_ratings.predicted

The predictions will be written to the file more_ratings.predicted.

Using demo/movielens in the vowpalwabbit source tree, I get ~0.693 MAE (Mean Absolute Error) after training on 1 million user/movie ratings ml-1m.ratings.train.vw with 14 latent-factors (meaning that the SVD middle matrix is a 14x14 rows x columns matrix) and testing on the independent test-set ml-1m.ratings.test.vw. How good is 0.69 MAE? For the full range of possible predictions, including the unrated (0) case [0 to 5], a 0.69 error is ~13.8% (0.69/5.0) of the full range, i.e. about 86.2% accuracy (1 - 0.138).

You can find examples and a full demo for a similar data-set (movielens) with documentation in the vowpal wabbit source tree on github:

Notes:

The movielens demo uses several options I omitted (for simplicity) from my example: in particular --loss_function quantile, --adaptive, and --invariant
The --lrq implementation in vw is much faster than --rank, in particular when storing and loading the models.

Credits:

--rank vw option was implemented by Jake Hofman
--lrq vw option (with optional dropout) was implemented by Paul Minero
vowpal wabbit (aka vw) is the brain child of John Langford

— arielf
소스

1

I would say that the name SVD is misleading. In fact, the SVD method in recommender system doesn't directly use SVD factorization. Instead, it uses stochastic gradient descent to train the biases and factor vectors.

The details of the SVD and SVD++ algorithms for recommender system can be found in Sections 5.3.1 and 5.3.2 of the book Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. Recommender Systems Handbook. 1st edition, 2010.

In Python, there is a well-established package implemented these algorithms named surprise. In its documentation, they also mention the details of these algorithms.

— lenhhoxung
소스