Answer the question
In order to leave comments, you need to log in
And how do you monitor a large number of servers?
Good night everyone,
For a long time I have been tormented by the question of how specialists from Habr monitor a large number of servers (> 50), please share your experience.
We use the monitoring system WhatsUp and PRTG, self-written scripts, SNMP, etc. all this is undoubtedly convenient, but ... many other problem areas associated with hardware problems are missed, for example, a failed power supply, or a problem with one of the screws on the HW Raid, this can certainly be tied to monitoring by scripts, etc. but this is too clumsy (in principle, this is how it works now), since different OS, different pieces of iron.
How would you centralize all this?
Answer the question
In order to leave comments, you need to log in
I settled on zabbix. A fairly user-friendly interface, many built-in triggers, the ability to create your own, you can bind to almost any hardware. Well, for free, which with such functionality captivates.
zabbix. It perfectly monitors http / s, snmp, the execution time of queries to the database (or anything else, for which there is enough fantasy to write a script).
Cool records. Good alarms and reports.
I advise you to look at the nagios assembly called CheckMK. The Germans are doing it, almost everything has already been rewritten for themselves. Excellent clear interface. Easy to set up. It has its own passive agent with pre-installed checks for many services. Optimized for highload. (all possible nosql, cache, etc. are already bundled and work. )
Zabbix, but you still have to write scripts for specific things, the bonus is that everything will be in one interface.
We use Nagios. Scripts are easy to write. I write in ruby for our needs. Monitoring MS SQL, DB2, backups... So far I haven't found a task that could not be solved.
I use ganglia to monitor quantitative metrics. For event monitoring of Shinken or Icinga, plus add to this a centralized collection of logs in logstash + elasticsearch. It looks complicated, but for systems of over 50 machines with requests for monitoring hardware, network nodes, etc. There is no easy way, in my opinion.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question