What architecture to choose for the click tracking system?

M

mediadata2016-01-30 02:04:50

Web traffic analysis

mediadata, 2016-01-30 02:04:50

We are faced with the task of creating a distributed tracker for tracking clicks, which will be used for post-analysis of advertising campaigns (ppc).
The difficulty is that advertising campaigns are carried out all over the world, so the system must have several servers closest to the end user in order to avoid unnecessary losses. At the same time, all clicks should be accumulated in a single storage with which the analytics system will work. The maximum load on each location is 1-2 million clicks per day.
While we abstractly imagine such an option - each location hosts a server of average performance, its task is only to process the click (the simplest rules based on IP and User Agent), save its data and transfer them "higher". A powerful server is also needed, which is designed to aggregate data and which will host the analytics system and storage itself.
The speed of the "receiving" servers is very critical, and the aggregator server will be used only for sampling and marketing analytics, so its speed is not the first priority.
The question is, what modern technologies are better to choose for such a system? What is the best way to build the technical part of the advanced servers, the analytics server and, most importantly, how to organize the transfer of data from the receiving server to the analytical one so that the speed and click processing do not suffer? Perhaps someone came across similar cases, we will be grateful for the links.
The solution is for internal use only.
Thank you all for the ideas

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

Z

ZurgInq, 2016-01-30
@ZurgInq

On the receiving servers - nginx. Directly from it, we put data into an in-memory DB or into a queue using the built-in lua or javascript (in the latest version of nginx). Either nginx passes data further to the backend, which can be something very fast, like an eventmachine in ruby, analogues from python or php, nodejs, go languages. For the database, you can use redis, or if there are few operatives, but there is a lot of data, you can use mongodb, from which you can then select data and send it to the queue.
For queues, you can take something from RabbitMQ, apache kafka, beanstalk and others.
On aggregating Hadoop servers or other buzzwords.