I
I
ice_kernel2014-03-23 21:20:06
PHP
ice_kernel, 2014-03-23 21:20:06

How to create a service for collecting and analyzing statistics from a social network?

There is a task to write a web service focused on the community of SMM specialists at the current stage for VK.
- There are plans to collect statistics hourly or customizable, for a certain period.
Question: how to organize the backend of this service, as I understand it, php + cron alone will not be enough here ...
If there are knowledgeable people or who have had similar experience, I would very much like to hear your advice.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
Nat, 2014-11-22
Gadzhibalaev @NaTTs

We made http://amplifr.com , now the first version we released is four months old. In total we are engaged in it a little more than half a year.
This is a service for publishing and analytics in social networks. Publishing and analytics turned out to be roughly similar tasks in terms of complexity, although at first it may seem that publishing is a very simple thing. The difficulty is in the details.
In analytics, everything is quite simple now. At the moment, there are 14 million records in the database about the actions of social network users (likes, comments, reposts), 12+ million social network users, this data is slightly less than five thousand social network accounts. We calculate the data every hour or two, depending on the current load.
Analytics works in several layers. First you need to get the raw data (this is not a single operation in itself), then process it into a format that is convenient to show to users.
Data collection begins with scheduling in the crown. Once an hour, it executes a script that queues (!) tasks for collecting data. The queue handler parses tasks (it has a quota for tasks, otherwise social networks will be evil if spammed), first it makes an activity request. The problem is that if the activity is dofiga, then you will have to make several requests, the api will return the “cursor” and you will make incomprehensible how many requests with it until you receive all the data. The easiest way is to simply add the raw data to the database.
Then the second handler takes the data from the database when the “data is complete, can be read” flag is checked and calculates all the publications for each group and systematizes it all.
This is enough for statistics. If you want analytics and draw conclusions from it, this is the third layer :-)

A
Alex Mustdie, 2014-03-24
@alexmustdie

vk.com/dev/stats.get + php + cron

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question