Currently, the Infrastructure team uses Nagios , an open source monitoring software application. It is used for system, network, infrastructure monitoring, and collecting and plotting a variety of metrics. Nagios also generates necessary alerts via email to keep the team informed about warnings and errors. On the Zerion servers, Nagios is used to run multiple System and Application Checks. System Checks, such as load, CPU utilization, memory and disk space checks are run to ensure that the servers aren’t running out of memory or disk space. Nagios also has some good database plugins to monitor our database clusters. These database checks are needed to monitor availability, database and table sizes, cache ratios, and other key metrics. We also use Nagios to monitor our queuing engine and big data databases.
But, as Zerion breaches the 10 million transactions per month mark, the need for more servers and better server utilizations becomes ever so important. Capacity planning will help us stay ahead of the usage curve to give our users a better experience with our products. This is where we see the shortfalls in Nagios. With Nagios we do generate graphs for plotting metrics but the process involved is cumbersome on a single server and could become a single point of failure. Hence, we have taken another step in monitoring and capacity planning direction by using Sensu and Grafana.
Sensu is an open source monitoring software application and is very similar to Nagios.
One of the main reasons why the infrastructure team is migrating to Sensu is because of the ability to configure standalone checks and fix important problems that are persistent with Nagios.
Currently with Nagios, the monitoring server has to constantly send out check requests to the client servers. This causes the load to be extremely high on the monitoring server, which can lead Nagios to malfunction. Sensu fixes this by added a queuing engine in the middle where client servers send their data and where the monitoring server reads all the data from.
Another problem with sending check requests to the clients is that some of the checks contain sensitive information and need to be secure when sent through the internet. Although highly unlikely, there is a chance that these checks can be intercepted. With the ability to configure standalone checks, this security concern will be eliminated as nothing sensitive is transferred over the internet.
As Zerion grows, the need for more servers will grow and more requests will be sent over the internet, requiring a greater load on the Nagios monitoring server. With Sensu, not only will these issues be solved but the Zerion team will be able to use a monitoring system that is highly scalable.