Performance graph problems in Nagios and how to troubleshoot it?

Performance graph problems in Nagios

The performance graphs in Nagios may do not display data while your checks are giving performance data.

Nagios forms performance graphs that are automatically updated with the execution of a single check when you enable the feature 'performance data'.

That gives performance data and it collects the results in RRD databases.

Data sources are at solid states in the RRD Databases. But, after updates of Nagios checks the number or the names of data sources of a check result may change.It causes the problem that the performance graph is not updating anymore.

How to fix performance graph problems in Nagios?

Here, you have a systematic analysis approach to troubleshooting the performance graph problems in Nagios.
You have to do the following steps to fix the issue.

1) Ensure that Performance Data is enabled

Firstly you have to make sure that the Performance Data is enabled.

For this, navigate to Admin > System Information > Monitoring Engine Status

Make sure that the Performance Data process is green.

2) Calculate The Number Of Spooled Files

Nagios spools performance data into small files. It stops the processing of that files and thereby that files begin to spool up.

The following commands will count the number of files:

# ls /usr/local/nagios/var/spool/perfdata/ | wc -l

# ls /usr/local/nagios/var/spool/xidpe/ | wc -l

If you get a number greater than 20000, it is more likely for the processes to get caught in a loop. Then you have a need to delete them.

To remove this huge number of files in a directory, execute this command:

# find /usr/local/nagios/var/spool/perfdata/ -type f -delete

After deleting the files, wait almost thirty minutes to know either the performance graphs start to work or not.

3) Increase Performance Data Logging Verbosity

You need to increase the Performance Data Logging Verbosity if deletion of spooled files doesn't help you.

Edit the following file from an SSH session and change the LOG_LEVEL value fro 0 to 2

/usr/local/nagios/etc/pnp/process_perfdata.cfg

Now the process_perfdata.pl script should log all errors and debug information to the file /usr/local/nagios/var/perfdata.log.

You can watch it by using the following command:

# tail -f /usr/local/nagios/var/perfdata.log

Watch for any errors, wrong exit codes, and/or timeouts.

After the completion, remember that to return this value to its default settings.

A common error found in this log is the typical timeout error. To solve it temporarily, you can increase the performance data processor’s timeout range by changing the TIMEOUT field in the process_perfdata.cfg file.

4) Increase NPCD Logging Verbosity

NPCD is a mass processing tool that collects and processes the performance data.

Edit the following file in an SSH session and adjust the log_level field from 0 to -1, to increase its logging verbosity.

/usr/local/nagios/etc/pnp/npcd.cfg

Then, restart the NPCD service using the restart command.

After the completion of troubleshooting, remember that to return this value to its default settings

NPCD should now log all errors and debug data to the file /usr/local/nagios/var/npcd.log file. You can watch this using the following command:

# tail -f /usr/local/nagios/var/npcd.log

You may find a common error in the log file which indicates that you are hitting a load threshold.

You can increase this threshold by editing the following file and adjusting the load_threshold value to a higher one:

/usr/local/nagios/etc/pnp/npcd.cfg

5) Check Nagios User Account

In some conditions, the Nagios user account can expire creating issues like this to happen.
You can run the following command to see if the Nagios user account expired or not:

# chage -l nagios

You can enable the expired Nagios user account with the below command:

# chage -I -1 -m 0 -M 99999 -E -1 nagios

Performance graph problems in Nagios and how to troubleshoot it?

技術支援

熱門文章

Performance graph problems in Nagios and how to troubleshoot it?

技術支援

熱門文章

產生密碼