Answer the question
In order to leave comments, you need to log in
Is it a normal theme for the project and its implementation plan?
There was a task to write the project. The theme, of course, must also be chosen by yourself. I am far from mathematics, statistics, big data, but I had such an idea - to find out what interests the active subscribers of group X have on VKontakte.
Who is an active subscriber? This is the one who likes the post. (This immediately raises the question of how to measure activity? The ratio of the number of likened posts to the number of posts? Then, what threshold should be set. I think these issues will be resolved after data collection).
Data collection method
We take the original group, parse N of the latest posts, get a list of users who liked these posts. Next, we get a list of groups from these users, parse the posts of these groups, get a list of users who like these posts ... This can be continued indefinitely, but resources are limited.
I have already collected data from 5000 posts from one group, got likes (took 2.75 hours). Surprisingly, the total number of likes is 22 * 10^6, and the number of unique users is 9 * 10^5.
And from that I can make a graph of interests. Those. for each pair of groups, you can find out the total number of active users - this will be the weight of the edge. And already further, you can manually mark up the group - specify the subject, and based on these two things, conclude what interests the active users of the subscribers of group X have.
How do you like the idea? Is the study okay?
Answer the question
In order to leave comments, you need to log in
In my opinion you are doing a lot of extra work. Your goal is to find out what interests the active subscribers of group X have. The first problem is to find active subscribers. You need to focus here. By what criteria is it to be determined? Then you get some number of active users. And parse not other groups, but these users. (Pareto principle to help you) You need to know their interests, and there is no point in parsing other groups. The problem is how you will determine the theme of the group. Manually? Then it's easier to just get a list of groups from the studied users and make a sorted list from largest to smallest and determine the topics yourself (remember the pareto principle).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question