T
T
tartarelin2017-03-10 14:52:51
Debian
tartarelin, 2017-03-10 14:52:51

How to deal with a simple check in Zabbix with and without an agent?

Есть веб сервер, на котором стоит Debian, крутится несколько сайтов и установлен агент Zabbix, сам сервер стоит в дата центре.
На сервере Zabbix, который стоит в офисе подключил шаблон ICMP Ping и понаблюдал сутки, выходит картина не очень красивая, в среднем 0,72% потерянных.
Прописал как узел сети один из сайтов, что крутиться на этом веб сервере и подключил шаблон ICMP Ping за час 0% ICMP loss, а если смотреть за тот же час по графику для веб сервера с агентом, то 0,56%
Как это понимать?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vasily, 2017-03-10
@tartarelin

I wouldn’t be surprised at all if the agent is just buggy. I had exactly the same situation with servers. In fact, it turned out that no one noticed anything and everything worked fine. "Ok" I thought and just scored. The average losses were about the same. Plus, there is a completely normal situation when a packet is simply lost along the way (rebuilding global routes and the packet got to an inactive node). Yes, there can be many things. 99.28% availability is good availability. Even more. Almost no one can provide the notorious 99.99%, even in theory, because. very expensive. Very, very expensive.
For example, we take 2-3 data centers, buy an ID for a BGP network, set up a cluster FS between these data centers, set up BGP in networks of 2-5 providers for each of the data centers. All this costs us several million, if not tens of millions of rubles, but the availability will be approximately 99.98%. Because the equipment can glitch or catch an overload, and switching between cluster nodes, although it takes milliseconds, there are still high chances of losing some packets. Or it can again be stupid at the stage of rebuilding global routes, or it can be stupid the client’s DNS server and not resolve the address in time, the client’s website will not open, and he will blame you. Options - 100500 pieces.
You are absolutely fine with accessibility. Do not even pay attention to such losses. As a last resort, try to track the moment of failure and log into the server at that moment. If it does not enter, and then immediately enters, then most likely the issue is in routing, which you will not solve in any way. And it's unlikely that anyone will decide at all. If it doesn’t log in for a long time (seconds 20-30-60), then most likely there is a problem with the server and let the hosters sort it out. They may also have problems with the provider, etc.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question