T
T
Twindo2015-02-23 20:53:08
Java
Twindo, 2015-02-23 20:53:08

Application architecture like datanyze.com?

How is the architecture organized in such projects as www.datanyze.com (the application is engaged in the fact that it bypasses millions of sites daily and collects information on the technologies that are used on these sites), etc.
I am interested in both the architecture as a whole and specific points: working with the network (whether iocp (for windows) or epoll (for * nix) or any other technologies are used), working with the database (which database is best suited for such tasks), frameworks for organizing tasks and monitoring them, restarting tasks after an abnormal termination, etc.
If anyone has any examples, that would be great.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vitaly Pukhov, 2015-02-24
@Neuroware

it’s not clear who needs such information “in the form of a service”, well, they collected it once .. a year, laid out in the form of a picture with graphs, then the service loses its meaning, but if you really want the task is not particularly difficult, collecting data in 99% of cases is the simplest parsing by "imprints" of "technologies", there are no problems with storage either, because "millions of sites" is 1 tablet with several million rows, any database eats it up and will not choke, the framework needs to be defined only after it has been determined what exactly it will do, for an abstract a horse in a vacuum is not advised, in the simplest case it is enough for your framework to write a separate class in a managed environment(Java \ .net), which will keep the task being executed in the try block and then the "crash" will be impossible in principle, because any crash will be caught at the level of the task manager, in C # a similar one is written in 50 lines maximum.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question