S
S
santaux2014-01-29 16:09:20
ruby
santaux, 2014-01-29 16:09:20

What architecture should be created for writing a statistics server in Ruby?

Good afternoon everyone!
At work, there was a need for a total collection of statistics on the site. Statistics are required both behavioral on the site (number of pages visited, buttons pressed, etc.), and statistics on video viewing in our flash player (view duration, page from which the view was made, etc.).
Therefore, the first question is - are there ready-made solutions for this that can at least partially cover this task? At first I considered the option of using Google Analytics or Yandex.Metrica, but there are two points:
1. using them it is impossible to track exactly which user completed the event (meaning which of the users registered on the site)
2. You will have to extract the data from there through the API in heavy .XML files, which can eventually come back to haunt you even if the data is duplicated on them and on our servers.
I read about such a Piwik service, but this is essentially the same Google analytics, but on its own server and in PHP. Or am I wrong?
Therefore, apparently, you will have to create your own service. Hence the second question - on what to do it? At the moment, the sites on which this service will be used are written in Ruby (RoR, Sinatra). I would like not to be sprayed on technologies and make a statistical server / application on it. But maybe. Is it better to immediately look towards Erlang/Closure/Node.js for performance reasons? Because for high loads, you will probably have to look in the direction of asynchronous request processing (EventMachine), which might be. not very convenient to do in Ruby, and it is unlikely that this solution will be more productive than other technologies on this list.
Moreover, the choice of base is also not so simple. I would even say that it is even more confusing there. Very often I heard and read that NoSQL solutions (Redis, MongoDB, CouchDB, Riak, etc.) "better" cope with such tasks both in terms of storing data according to statistics (no table structure) and in terms of performance (map-reduce , lack of joins). But there are doubts in terms of reliability (there will be no transactions), and how convenient and simple it will be to use Map-Reduce is unknown, to be honest. Still, the same PostgreSQL has not stood still for the past few years and works great with tables of many millions / tens of millions of records.
I would be grateful for any advice, experience in creating similar services and links to related articles :)

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Sardar, 2014-01-29
@Sardar

I was advised here Scribe for quick collection of events. It is used by facebook, whose volume of statistics cannot be called small. Scribe can very quickly accept a message and then dump it at random times to another Scribe server or to a local file. Thus, sending a "nothing" event does not cost time. Events can be internal (page opening) or received from your player/application (play/pause) via the (REST) ​​API. In the second case, it is better not to use a heavy RoR stack, but to take something extremely lightweight. After all, there will be no logic to accept the data and immediately into Scribe. I would do it in pure python/WSGI.
Ideally, put a scribe process on each app. a server that will merge events into a single dedicated scribe server that merges everything into a set of files. On the same server, you can put any analytics that will receive large files with logs as input, and the statistics you need as output (you can use map-reduce, you can do it easier). File rotation can be set for 10 minutes, scribe rotates them for you. Ready-made aggregates are merged into any database, the speed of its operation is not critical, because they will be used for charts for a narrow circle of people.

V
vyacheslavka, 2014-01-29
@vyacheslavka

Probably not quite answer your question on architecture, but still.
If you look at your analytics requirements, then KissMetrics can help you . Great for user behavior analytics... and more.

S
santaux, 2014-01-30
@santaux

Interesting stuff, never heard of it, thanks! :)
It confuses me that this is just a layer for accepting requests and writing them to a file. After all, this data still needs to be uploaded to the database and processed in the database. And it is not necessary that in the future this statistics will be used by a narrow circle of people. Therefore, data processing is also a critical issue.
I came across this article: kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
It compares different kinds of NoSQL databases. I liked the option with Redis, for its high speed and sharding. I also liked HBase for the ability to store a huge ("Billions of rows X millions of columns") amount of data. But somehow I can’t believe that everything is so cloudless :)
Simply, will it make sense to raise Disk I / O usage on a server with Scribe, if you can immediately write this data in Redis / HBase, which hang in memory?

S
santaux, 2014-01-30
@santaux

Thought understood. I agree with her. I'll keep it on mind :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question