M
M
misterkust2013-12-09 21:22:18
Multithreading
misterkust, 2013-12-09 21:22:18

Distributed and/or multi-threaded networking?

Good day.
I just can't think of a solution for the following problem:
There are, for example, 1,000,000 domains.
It is required to multi-threaded (and ideally - also distributed) to get the main page from each domain.
The problem is that when using self-written desktop solutions, the speed is limited by the DNS servers of both the provider and alternative ones.
And the channel from the average home Internet provider does not contribute.
Which way to look?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Sergey, 2013-12-09
@begemot_sun

1. Use many different DNS to resolve names.
2. Use a multi-threaded parser that will create 100,000 threads for parsing + 1000 for determining the domain name + tie it all together. As an option, use Erlang for this case;)

Y
Yuri Yarosh, 2013-12-09
@d00mko

I don't think it's that hard to set up a caching local DNS.
Setting a lot of different DNS for resolving is just a stupid idea ...
First of all, asynchronous processing is needed for such a "multi-threaded" page loading.
The idea is not to call 1,000,000 threads per 1,000,000 domains, just 8-16.
I would just take netty and not bother...
You can try C++ epoll() libcurl and pthreads, but you can also use Python Twisted.
In general, there is enough junk, the main thing is the use of epoll() kqueue() kernel pollers, language and libraries do not matter more ...

V
Vlad Zhivotnev, 2013-12-09
@inkvizitor68sl

A caching DNS will help in that it will go for domains directly to authoritarian NS (those to which the domain is delegated) - so it will not run into someone else's DNS servers .. Of course, no forwarders should be written in it and it should be the only one in resolv.conf on the system. On ubunte-debian, it is "configured" like this:
apt-get install bind9 The
default config does what you are looking for. It remains only for the system to explain that it is necessary to use it. And he can easily resolve 10 thousand domains per second into one core.

I
Ilya Evseev, 2013-12-10
@IlyaEvseev

Unbound's own DNS server will solve the problem with resolving.
The main pages need to be downloaded with a multi-threaded asynchronous rocking chair,
which is written in any popular programming language in a couple of days.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question