E
E
Eugene2019-03-14 15:47:47
PHP
Eugene, 2019-03-14 15:47:47

Why does DNS not resolve domains with a huge number of HTTP requests?

I need to check 200 million domains for availability and belonging to a particular CMS.
I'm using PHP 7.1 and doing a lot of process checking.
HARDWARE AND SETTINGS

  • Server: Multicore CPU, 64GB RAM, SSD disks, 500 Mbits dedicated bandwidth (OVH server).
  • In resolv.conf file Google DNS: 8.8.8.8 / 8.8.4.4
  • ulimit -n set to 655350
  • Using nload to analyze channel load

TESTING
I tested the first 1 million domains from the database using a different number of parallel running processes. I ran into a problem that by increasing the number of processes, the number of domains that do not respond within 30 seconds has increased greatly. Here are the results.
1. 1000 processes
Test: 1,000,000 domains, 1000 parallel processes, average channel load 85 Mbits, total test time 1 hour.
Result: 65% of domains were successfully resolved , 35% were not resolved due to a timeout.

2. 300 processes
Test: 1,000,000 domains, 300 parallel processes, average channel load 70 Mbits, total test time 2 hours.
Result: 85% of domains are successfully resolved, 15% were not resolved due to a timeout.

CONCLUSIONS
As we can see, by increasing the number of processes by 3 times, we do not get a 3 times increase in channel load.
The number of domains that were not available/unresolved is greatly increasing. At the same time, the verification speed was increased by 2 times.
QUESTION
Where is the bottleneck of such a check? How can I use all the bandwidth of a 500 Mbit link? Should I use my own DNS server and if so, how do I properly configure it?
I would appreciate any ideas and advice!

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
athacker, 2019-03-14
@athacker

Set unbound, it will send recursive queries itself. How to configure correctly is described in the documentation. In principle, the default config will be enough, you only need to specify external / internal interfaces, and in the ACL, register permission only to resolve from the address 127.0.0.1.

V
Vladimir, 2019-03-14
@MechanID

The cause of the problem is Googledns rate limiting you.
The solution is to use another, preferred recursive dns, for example, as athacker advised you.

S
Sergey, 2019-03-14
@begemot_sun

And I have a solution that can resolve a bunch of DNS using third party DNS servers. Let's say there is a list of 30k such DNS servers, my solution can take a 4 GB file (emails) and resolve MX and A records in about 40 minutes for a normal server without bells and whistles. If you're interested, you can customize it to suit your needs.

N
neovav, 2019-03-19
@neovav

You need to first decide on the goal that you want to get, then figure out how it works.
1. For starters, I would define domain data using WHOIS.
https://www.imena.ua/domains/whois?domain=toster.ru
2. Then I would have servers:
nserver: ns1.habradns.net.
nserver: ns2.habradns.net.
nserver: ns3.habradns.net.
I checked the relevance and lifetime of the record for the domain. Based on this data, I would understand how often you need to update information about the domain.
3. After that, you can check the site for availability (knowing its name and IP) and what CMS, services, etc. are installed there.
To check this, you can use both direct access and indirect, through the search engine cache.
PS Well, for anonymity, I would do it through TOR, VPN, or, at worst, use a proxy

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question