K
K
Konstantin [c0nst_float]2019-05-31 20:46:03
Nginx
Konstantin [c0nst_float], 2019-05-31 20:46:03

Nginx as reverse proxy: 1% of requests timed out (504). What can be done?

We have:

  1. Project with load ~ 1000-3000 RPS. In peaks - up to 5K RPS.
  2. A partner to whom we constantly turn for data. (One of the main functions). The data is updated very frequently. One user creates a load on partners ≈ 0.5 - 2 RPS. And this is normal, such a specificity ...
  3. Main server 16 Gb RAM, 16 Cores, Ubuntu 16.04.1 LTS, Nginx 1.10.3
  4. 5 "smaller" servers - our BackEnd is spinning on them
  5. Server with DB, Redis and RabbitMQ

Simplified, the interaction can be represented as follows:
The main nginx acts as a reverse-proxy. Balances the load on our servers and proxies requests to partners.
vu_ALIr8oNw.jpgProblem:
Under load, some requests (~1K out of 100K) start to time out.
Performed load testing:
  • If the user requests data from partners directly, everything is given quickly, clearly and correctly. Here is a link to Yandex.Overload
  • And here is the testing schedule when proxying through our main Nginx. On average - twice as fast due to caching, but if you look at the "100%" quantile ...

Moreover, the problem becomes MUCH more significant, at peak loads.
There is an idea to try another reverse-proxy, for example "Traefik" or the same Apache.
But after all, Nginx is a time-tested tool and hundreds of large companies with a load much higher than ours. Therefore, most likely the problem is in our configs or (and) incorrectly configured network interaction.
I give examples of configuration files (simplified, but left the main settings).
nginx.conf (main)
user www-data;
worker_processes 16;
worker_priority -10;
pid /run/nginx.pid;
worker_rlimit_nofile 200000;

events {
  worker_connections 2048;
  use epoll;
}

http {
  sendfile on;
  tcp_nopush off;
  tcp_nodelay on;
  keepalive_timeout 65;
  types_hash_max_size 2048;
  # server_tokens off;

  # server_names_hash_bucket_size 64;
  # server_name_in_redirect off;

  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
  ssl_prefer_server_ciphers on;

  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;

  gzip on;
  gzip_disable "msie6";

  # gzip_vary on;
  # gzip_proxied any;
  # gzip_comp_level 6;
  # gzip_buffers 16 8k;
  # gzip_http_version 1.1;
  # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

        upstream backend {
        least_conn;
        ip_hash;
         server localhost:8082;
         server 37.48.123.321:8082;
         # other servers...
        }

        upstream partner_api {
                server WW.XX.YY.ZZ;
        }
        proxy_cache_path /etc/nginx/conf.d/data/cache/line levels=1:2 keys_zone=line_cache:150m max_size=1g inactive=60m;
        proxy_cache_path /etc/nginx/conf.d/data/cache/live levels=1:2 keys_zone=live_cache:150m max_size=1g inactive=60m;
        include /etc/nginx/sites-enabled/*;
}
/etc/nginx/sites-enabled/domain.com.conf
server {
  listen 443 ssl http2;
  server_name domain.com www.domain.com *.domain.com;
  ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem;
  ssl_trusted_certificate /etc/letsencrypt/live/domain.com/chain.pem;

  ssl_stapling on;
  ssl_stapling_verify on;

  resolver 127.0.0.1 8.8.8.8;

  add_header Strict-Transport-Security "max-age=31536000";

  include /etc/nginx/conf.d/reverse_porxy.conf;
 
  location / {
    proxy_pass http://backend;
    proxy_http_version 1.1;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    error_page 500 502 503 504 /50x.html;
  }
}

server
{
    listen 80;
    server_name domain.com www.domain.com;
    location /
    {
        return 301 https://$host$request_uri;
    }
}
/etc/nginx/conf.d/reverse_porxy.conf
location ~ ^/api/remote/(.+) {
    add_header 'Access-Control-Allow-Origin' '*';
    add_header 'Access-Control-Allow-Credentials' 'true';
    add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
    add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';
    
    include /etc/nginx/conf.d/reverse_proxy_settings.conf;
    proxy_pass http://partner_api/WebApiG/WebServices/BCService.asmx/$1;
}

location ~ ^/api/bs2/remote/api/line/(.+) {
  include /etc/nginx/conf.d/reverse_proxy_settings.conf;
  proxy_cache_valid any 2m;
  proxy_pass http://partner_api/WebApiG/api/line/$1;
  proxy_cache line_cache;
  proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
}

location ~ ^/api/bs2/remote/api/live/(.+) {
  include /etc/nginx/conf.d/reverse_proxy_settings.conf;
  proxy_pass http://partner_api/WebApiG/api/live/$1;
  proxy_cache live_cache;
  proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
}
# etc...
/etc/nginx/conf.d/reverse_proxy_settings.conf
proxy_buffering on;
proxy_ignore_headers "Cache-Control" "Expires";
expires    max;
proxy_buffers    32 4m;
proxy_busy_buffers_size    25m;
proxy_max_temp_file_size    0;
proxy_cache_methods    GET;
proxy_cache_valid    any 3s;
proxy_cache_revalidate    off;
proxy_set_header    Host "PARTNER_IP";
proxy_set_header    Origin "";
proxy_set_header    Referer "";
proxy_set_header    Cookie '.TOKEN1=$cookie__TOKEN1; ASP.NET_SessionId=$cookie_ASP_NET_SessionId;';
proxy_hide_header    'Set-Cookie';
proxy_http_version              1.1;

What can you advise? Maybe throw a VPN tunnel to partners or can really replace nginx with something else? (Increase the timeout, as advised everywhere - not our case)

Answer the question

In order to leave comments, you need to log in

5 answer(s)
K
Konstantin [c0nst_float], 2019-06-11
@c0nst_float

The solution was: increase the number of nginx workers and connections:

user www-data;
worker_processes 32;
worker_priority -10;
pid /run/nginx.pid;
worker_rlimit_nofile 200000;

events {
        worker_connections 4096;
        use epoll;
}
...

And most importantly : enabling the keep-alive mode for upstream-a:
upstream partner_api {
                server WW.XX.YY.ZZ;
                
                keepalive 512;
        }

It also turns out that by setting proxy_http_version 1.1;the keep-alive connection support will not turn on. You also need to install proxy_set_header Connection "";
With all the numbers you can play around and achieve the desired result.
Earlier at us 5-6 requests per second fell off. Now it also falls off, but much less often: 1 request in 10-15 minutes. The current result is satisfactory so far. In the future, we will achieve the best possible results.

K
ky0, 2019-05-31
@ky0

Increase the timeout, and then investigate in a calm environment without falling off requests. Is the network subsystem on ubunts normally configured, tuned, etc.?

G
grinat, 2019-05-31
@grinat

It is necessary to cut down buffering at reverse_porxy.conf or increase read timeout in domain.com.conf (like 30 seconds by default). It's just that nginx by default receives all the data and only then gives it further. I suspect that on load it does not have time to wait for the data to be received, and read timeout is triggered, since it did not receive a byte. And if it connects directly, then there is no nginx, read timeout does not work and everything is ok.
https://nginx.org/ru/docs/http/ngx_http_proxy_modu...

K
klepiku, 2019-06-03
@klepiku

recompress wires and check between stations

R
Roman, 2019-06-06
@alone_lion1987

in the config for the domain, we increase the timeout, which is minimal by default. After that, you can study why the request is being processed for a long time or perform further optimization (if necessary)
Add to /etc/nginx/sites-enabled/domain.com.conf in the location / instruction section:

fastcgi_read_timeout 600;  # ждать обработку 600 сек, например

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question