D
D
Dilik Pulatov2020-01-31 09:45:21
Nginx
Dilik Pulatov, 2020-01-31 09:45:21

How to block specific bots in Nginx?

Good afternoon!
I need to block all bots except search bots or social network bots,
for example, I need to skip Google, Yandex, Yahoo bots, etc.
I mean this when I insert a link in social networks, in
5e33cd8b94af4358135762.jpeg
my opinion the social network bot takes data from my site (title, descriptions, etc.)

And the rest of the bots need to be blocked .... because of these bots the server is under heavy load ... and sometimes the server crashes
. In Nginx, I don’t understand how to do this. Please help (with an example if possible).
I found this in stackoverflow, but they say that this is what blocks search bots

map $http_user_agent $limit_bots {
     default 0;
     ~*(google|bing|yandex|msnbot) 1;
     ~*(AltaVista|Googlebot|Slurp|BlackWidow|Bot|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker) 1;
     ~*(Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It) 1;
     ~*(rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE) 1;
     ~*(GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider) 1;
     ~*(Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Wget|Widow|Zeus) 1;
     ~*(Twengabot|htmlparser|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|WebCollector|WebCopy|webcraw) 1;
 } 

location / {
  if ($limit_bots = 1) {
    return 403;
  }
}

Answer the question

In order to leave comments, you need to log in

3 answer(s)
P
Page-Audit.ru, 2020-01-31
@dilikpulatov

map $http_user_agent $limit_bots {
     default 0;
     ~*(AltaVista|BlackWidow|Bot|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker) 1;
     ~*(Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It) 1;
     ~*(rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE) 1;
     ~*(GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider) 1;
     ~*(Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Wget|Widow|Zeus) 1;
     ~*(Twengabot|htmlparser|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|WebCollector|WebCopy|webcraw) 1;
 } 

location / {
  if ($limit_bots = 1) {
    return 403;
  }
}

I'm running into a similar problem right now.
This option will roughly cut off many (but not all) bots.
If the bot does not report its name to the User-Agent, but pretends to be a regular user, then this scheme will not stop it. It will be necessary to analyze its behavioral factors, but this can no longer be done at the NGINX level.

D
Dr. Bacon, 2020-01-31
@bacon

It is better to optimize your code so that the server does not fall from such small loads.

T
to_east, 2020-02-02
@to_east

PHP solution: https://github.com/JayBizzle/Crawler-Detect. But if you need it purely on nginx, then it's probably better to install https://openresty.org/ with Lua, it will be easier than in configs, IMHO.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question