Proper organization of the database for working with statistical data

A

Andrey Strelkov2013-07-29 18:34:55

MySQL

Andrey Strelkov, 2013-07-29 18:34:55

I'm not a database guru, so I can't solve the problem I'm facing without risking more problems later. :)

So, the initial conditions : PHP5 and MySQL with a minimum of features, the choice of MyISAM and InnoDB, but after reading about one and the other, I refused MyISAM (correct me if suddenly I was wrong).

Task : unfortunately, I cannot describe the task specifically, but I will try to state the essence as much as possible - the wording, in my opinion, is quite trivial.

In short, you can imagine that I collect statistics about fruit lovers.
On the one hand, there is Vasya, about whom I know the following: what fruits he eats, how much he eats, and which ones he loves the most. The favorite fruit is considered based on a five-point rating that the eater himself can once give, and the number of times this fruit has been eaten.
On the other hand, there is Petya and hundreds of other people about whom I know the same thing. Based on the available statistics, I need to offer Vasya those fruits that he did not eat, but which he might like.

The fruits recommended for Vasya are searched as follows: we take Vasya's favorite fruits and look for those people who also love them. Then we look for those fruits from these people that they like and that Vasya did not eat, sort them by the number of matches and issue the Top 10.

Thus, there is a table with rows of the following form:
[User] [Fruit] [How Much Eaten] [Five Points]
From these records, I need to get the very fruits that Vasya might like.

Purpose : due to limited resources, it is necessary to minimize the load on the database, which I planned to do through properly organized work with the database, that is:

Well-organized table structures.
If possible, find a clever way to optimize the search for your favorite fruit, as it involves processing a significant amount of data.

If necessary, you can use some additional data, group existing ones, and so on - everything is within reasonable limits, so long as the server does not go down at the output.

PS: All numbers like "hundreds of people" and "Top 10" are purely abstract and in reality a lot more.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

R

rozhik, 2013-07-30
@alphashooter

The fruits recommended for Vasya are searched as follows: we take Vasya's favorite fruits and look for those people who also love them. Then we look for those fruits from these people that they like and that Vasya did not eat, sort them by the number of matches and issue the Top 10.

Tip: Reword the condition. On a large number of goods in shops, under the above condition, there are frequent empty answers, and even more often you get chlochmas like “those who bought a processor buy ink”.
- Recommended fruits are searched as the most frequently eaten fruits together with the fruits that Vasya ate.
Structure---
kuplenno:
userID,
fruitID recomendacii fruitID
fruitID2 cnt Pseudo code to understand

function buy( userId, fruitId ) {
    for fruit in SELECT * FROM kuplenno {
        INSERT INTO recomendacii VALUES ( fruitId, fruit, 1 ) ON DUBLICATE KEY
        UPDATE recomendacii where fruitID = fruitID AND fruitID2 = fruit
    }
   INSERT INTO kuplenno VALUES ( userId, fruitId )
}

function getBestFruits( userId ) { 
    return
       SELECT fruitID2, sum( cnt ) as ocenka FROM kuplenno, recomendacii
       WHERE userID = userId AND kuplenno.fruitID = recomendacii.fruitID AND frutID not in (SELECT fruitID FROM kuplenno where userID = userId )
       GRUP BY fruitID2
       ORDER BY ocenka 
       LIMIT 10
}

It seems like this method better predicts the next purchase.

M

mrspd, 2013-07-29
@mrspd

You need several tables:
User table
- id
- name
Fruit table
- id
- fruit name
User preference table
- user
id - fruit id
- how many times eaten
- rating
Here, of course, the question arises how the rating is calculated on a five-point scale, and when the fruit is evaluated, once or every time after eating?
For a day, such a structure will be enough to operate on data for everyone

A

Amon_Sha, 2013-07-29
@Amon_Sha

If there are hundreds (and not hundreds of thousands) of “fruit eaters”, then you don’t have to rack your brains, any base can handle it.