What inexpensive crawler server would you recommend?

X

xmoonlight2015-01-13 21:22:10

Database

xmoonlight, 2015-01-13 21:22:10

Hello.
It is planned to make a pool of crawlers to collect web data in the database.
So far, there is not much money, but you need to start with something ...
1. What architecture would you recommend?
2. What should be on the server (hard & soft) so that the solution can then be scaled?
3. Hardware config and form factor?
(Initially, 1 server is planned, then - as the load increases ...)
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

SilentFl, 2015-01-15
@xmoonlight

For myself, I came to the conclusion to use RabbitMQ as a cluster for setting tasks, and a simple Golang/Ruby/Python parser that interacts with the rabbit and depends only on the local rabbit instance. The parser can write the result already in the database.
In this form, there is the possibility of scaling (adding a rabbit node, launching the parser and that's it), there is stability (this is the rabbit's concern; the parser, if it cannot process the task, simply does not send Ack), simple deployment.
You can see the prototype in Golang here