How to implement a Bayesian recommender system?

A

Alexander Skakunov2014-10-09 16:50:34

Machine learning

Alexander Skakunov, 2014-10-09 16:50:34

I want to implement a simple recommender system based on Bayes' theorem.
There is a database of cars with 3 parameters: color, body type ("coupe", "sedan" or "cabriolet") and ground clearance (mm).
A person looks at a dozen pictures of cars and classifies each "like" - "dislike".
My system should build a model of his preferences based on the likes data and eventually figure out which cars to show him in the future.
Confuses such moments:
1. How to combine all three criteria? In all the examples of the Bayes formula that I found, only one parameter is given.
2. Can you give them weights? (For example, body type is twice as important as the rest.)
3. How to handle the clearance value - after all, it is set to an arbitrary number, not a list. Make a discrete list or group ranges of values into some classes like "low", "medium", "high"?
Throw a specific book or article on how to implement this.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

[

[email protected]><e, 2014-10-10
@barmaley_exe

Apparently, we are talking about a naive Bayesian classifier.
First, let's put the problem a little differently: instead of recommendations, we should talk about classification into 2 classes (like or dislike).
Further, the naive Bayesian classifier is based on 2 ideas: in fact, Bayes' Theorem and the conditional independence of features of objects.
Let P(like|x) be the probability that a given user likes an object x described by the characteristics x ₁ , ..., x _n. These characteristics can be completely arbitrary, not necessarily of the same "type". The only question is what distribution you will assign to them. Obviously, if the probability P(like|x) > 1/2, then the probability of a negative assessment will be less, which means that you need to predict "like". Thus, our task is to evaluate the probabilities P(like|x) and P(dislike|x) (which, in turn, is equal to 1-P(like|x), since the sum of the probabilities is 1) and choose the largest one.
This is the time to apply Bayes' theorem:
P(like|x) = P(x|like) P(like) / P(x) and P(dislike|x) = P(x|dislike) P(dislike) / P( x)
A remarkable fact is that we can ignore the denominator, because it is the same for P(like|x) and P(dislike|x), and we are only interested in the ratio between these numbers, and not in themselves. Then we will compare P(x|like) P(like) and P(x|dislike) P(dislike). In this case, P(like) expresses our a priori knowledge of how likely the user is to like the object. If there is no such knowledge, you can safely take 1/2.
P(x|like), in turn, describes how likely it is to encounter such an object in the like class. All the naivety of the considered classifier lies precisely in the modeling of this distribution.
Since the probability P(x|like) can depend on x in the most bizarre and arbitrary way, we need to make some assumption. In the case of a naive Bayesian classifier, this assumption is the unsupported hypothesis of the conditional independence of the features of an object for a given class, that is: P(x|like) = P ₁ (x ₁ |like) ... P _n (x _n |like). Here P _k is an arbitrary distribution, it can be either discrete (color or type, in your case) or "continuous" (clearance). The distribution data must be chosen by the classifier developer. Usually they contain some parameters that we will further tune according to the data using the maximum likelihood method .. For many distributions, estimates can be written analytically. For example, for discrete features, you can calculate the empirical frequency of the value (plus smoothing for those objects that the user has never seen), and for a normal distribution, calculate the sample mean and variance.
In summary, the answers to your questions:
1. The Bayes formula has the same relation to the Bayesian classifier as a tree has to a table. Yes, the formula is used, but that's not all.
2. I won’t say anything about known approaches, but the following comes to mind: when classifying, you can compare not the product of probabilities, but its logarithm (due to its monotonicity) log(P ₁ (x ₁ |like) ... P _n (x _n | like)) = log(P ₁(x ₁ |like)) + ... + log(P _n (x _n |like)). The larger this amount, the greater the likelihood of a like. You can try to weigh these terms.
3. Individual features can be assigned arbitrary distributions, for example, normal for numerical values.
Now about the problems: the above approach is good, but it has a significant drawback: if you apply it to all users indiscriminately, you will get an average user model that will recommend the same thing to everyone. The feature of the recommender system is personalization. On the other hand, if you build on a Bayesian classifier per user, you likely won't have enough data to get any meaningful results. The second problem can be dealt with if we take into account the existence of similar users: If Alice is interested in objects {A, B, C, D}, and Boris likes {B, C, D, E}, then naive Bayes will either average them with all the others (let's imagine that there are 1000 users who are interested in the object P. Then its "weight" will be significantly greater, but only due to its popularity, and even simple sorting is enough to find the most popular objects), or it will build for both according to its own classifier, without even suspecting that these users are similar. One approach that takes this into account iscollaborative filtering , most heavily used in recommendation tasks.

R

Roman Mirilaczvili, 2016-05-14
@2ord

According to the description, there is a task of classification.
At the input, a vector is given
- body type
- body color
- ground clearance
At the output
- recommend / not recommend (yes / no, i.e. 1.0 / 0.0)
This can be described by a multidimensional function F(x1, x2, .., xN) = Ideally, I would like 0.0 or 1.0, but in practice - somewhere in between (like, low / medium / high probability of recommendation).
Input parameters can be qualitative (characteristic of the object: person's gender, position), quantitative (discrete -1, 0, 1, 2 and continuous -30.2°C .. 44.6°C). More details at www.pm298.ru/shkala3.php (Data classification and measuring scales)
Let's consider each parameter.
Color is a rather capacious concept, not just the HTML code #FE33CC (magenta in RGB). Simplifying, mathematically, color can be represented as a combination of hue (all color shades in the palette), saturation (variegated / dull) and light intensity (light / dark) - the HSL color space . For meaningful results according to the transformation formula, you need to get these three components and work with them separately.
Hue is measured in radians, from 0 to 2 pi, or in a normalized range from 0.0 to 1.0.
Saturation and color intensity - from weak to strong - are also scaled between 0.0 and 1.0.
It seems to me that it is most difficult for a person to decide on color, since it affects him psycho-physiologically.
Body type - nominal type (no precedence value).
Nominal types are a vector consisting of yes/no class values. When a certain class is selected, it has a maximum value, i.e. 1.0 (yes) and the rest are set to 0.0 (no).
Thus, if we designate body types as A (coupe), B (sedan) and C (convertible), then in the form of a vector they will be represented as [A, B, C].
For example, by selecting the type "sedan", the vector has the values [0.0, 1.0, 0.0].
Ground clearance is a continuous type in the source data. It needs to be converted into categories (small, medium, high) - similar to the representation of body type.
A much more significant parameter in choosing cars is, in my opinion, the price category (preferably no more than 3-4 categories).
Most likely, the person's gender (nominal type) can also influence the choice of car, m/f, 0/1.
I believe that not only neural networks will cope with this task.