13.4. Configuring Nagios to Monitor Localhost

Problem

You've successfully installed Nagios, configured Apache, and set up your configuration files in an orderly manner as outlined in the previous recipe. Reading the local Nagios documentation at http://localhost/nagios is nice, but you really want to get going on setting up Nagios to keep an untiring eye on your network. What's the next step?

Solution

Nagios is best set up in small steps, so we'll start with monitoring five basic functions on the Nagios server: ping, disk usage, local users, total processes, and CPU load. This is a long recipe, but when you're finished, you'll have your basic Nagios framework constructed.

Copy the following five configuration files exactly as shown, except where it says to use your own information, and put them in the directories as outlined in the previous recipe:

/usr/local/nagios/etc/nagios.cfg
/usr/local/nagios/etc/lan_objects/timeperiods.cfg
/usr/local/nagios/etc/lan_objects/contacts.cfg
/usr/local/nagios/etc/lan_objects/hosts.cfg
/usr/local/nagios/etc/lan_objects/services.cfg

Obviously, retyping all this is the path to madness, so please visit http://www.oreilly.com/catalog/9780596102487 to download them.

First, create nagios.cfg:

	################
	# nagios.cfg
	# main Nagios configuration file
	################
	log_file=/usr/local/nagios/var/nagios.log
	cfg_dir=/usr/local/nagios/etc/lan_objects
	object_cache_file=/usr/local/nagios/var/objects.cache
	resource_file=/usr/local/nagios/etc/resource.cfg
	status_file=/usr/local/nagios/var/status.dat

	nagios_user=nagios
	nagios_group=nagios

	check_external_commands=1
	command_check_interval=-1
	command_file=/usr/local/nagios/var/rw/nagios.cmd

	comment_file=/usr/local/nagios/var/comments.dat
	downtime_file=/usr/local/nagios/var/downtime.dat
	lock_file=/usr/local/nagios/var/nagios.lock
	temp_file=/usr/local/nagios/var/nagios.tmp
	event_broker_options=-1

	log_rotation_method=d
	log_archive_path=/usr/local/nagios/var/archives
	use_syslog=1
	log_notifications=1
	log_service_retries=1

	log_host_retries=1
	log_event_handlers=1
	log_initial_states=0
	log_external_commands=1
	log_passive_checks=1

	service_inter_check_delay_method=s
	max_service_check_spread=30
	service_interleave_factor=s
	host_inter_check_delay_method=s
	max_host_check_spread=30
	
	max_concurrent_checks=0
	service_reaper_frequency=10
	auto_reschedule_checks=0
	auto_rescheduling_interval=30
	auto_rescheduling_window=180

	sleep_time=0.25
	service_check_timeout=60
	host_check_timeout=30
	event_handler_timeout=30
	notification_timeout=30

	ocsp_timeout=5
	perfdata_timeout=5
	retain_state_information=1
	state_retention_file=/usr/local/nagios/var/retention.dat
	retention_update_interval=60

	use_retained_program_state=1
	use_retained_scheduling_info=0
	interval_length=60
	use_aggressive_host_checking=0
	execute_service_checks=1

	accept_passive_service_checks=1
	execute_host_checks=1
	accept_passive_host_checks=1
	enable_notifications=1
	enable_event_handlers=1

	process_performance_data=0
	obsess_over_services=0
	check_for_orphaned_services=0
	check_service_freshness=1
	service_freshness_check_interval=60

	check_host_freshness=0
	host_freshness_check_interval=60
	aggregate_status_updates=1
	status_update_interval=15
	enable_flap_detection=0

	low_service_flap_threshold=5.0
	high_service_flap_threshold=20.0
	low_host_flap_threshold=5.0
	high_host_flap_threshold=20.0
	date_format=us

	p1_file=/usr/local/nagios/bin/p1.pl
	illegal_object_name_chars=`~!$%^&*|'"<>?,()=
	illegal_macro_output_chars=`~$&|'"<>
	use_regexp_matching=0
	use_true_regexp_matching=0

	admin_email=nagios
	admin_pager=pagenagios
	daemon_dumps_core=0

Now, create timeperiods.cfg:

	# Time periods
	# All times are valid for all
	# checks and notifications

	define timeperiod{
	        timeperiod_name 24x7
	        alias           24 Hours A Day, 7 Days A Week
	        sunday          00:00-24:00
	        monday          00:00-24:00
	        tuesday         00:00-24:00
	        wednesday       00:00-24:00
	        thursday        00:00-24:00
	        friday          00:00-24:00
	        saturday        00:00-24:00
	        }

Next, create contacts.cfg. The contact_name must be a Nagios user with a Nagios login in htpasswd.users, and an email account:

	################
	# Contacts- individuals and groups
	################
	define contact{
	        contact_name                    nagios
	        alias                           Nagios Admin
	        service_notification_period     24x7
	        host_notification_period        24x7
	        service_notification_options    w,u,c,r
	        host_notification_options       d,r
	        service_notification_commands   notify-by-email
	        host_notification_commands      host-notify-by-email
	        email                           nagios@alrac.net
	        }

	# contact groups
	# Nagios only talks to contact groups, not individuals
	# members must be Nagios users, alias and contact_group
	# are whatever you want

	define contactgroup{
	        contactgroup_name      admins
	        alias                  Nagios Administrators
	        members                nagios
	        }

Next, create hosts.cfg:

	################
	# Hosts file- individual hosts and host groups
	################
	# Generic host definition template - This is NOT a real host, just a template!

	define host{
	  name                generic-host
	  notifications_enabled       1
	  event_handler_enabled       1
	  flap_detection_enabled      1
	  failure_prediction_enabled  1
	  process_perf_data           1
	  retain_status_information   1
	  retain_nonstatus_information  1
	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
	  register  0
	        }
	# local host definition

	define host{
	        use                     generic-host
	        host_name               localhost
	        alias                   Nagios Server
	        address                 127.0.0.1
	        check_command           check-host-alive
	        max_check_attempts      10
	        check_period            24x7
	        notification_interval   120
	        notification_period     24x7
	        notification_options    d,r
	        contact_groups      admins
	        }

	##############
	# Host groups
	##############

	# Every host must belong to a host group

	define hostgroup{
	        hostgroup_name  test
	        alias           Test Servers
	        members         localhost
	        }

Finally, create services.cfg:

	################
	# Services
	################

	# Generic service definition template - This is NOT a real service, just a template!

	define service{
	  name           generic-service
	  active_checks_enabled    1
	  passive_checks_enabled   1
	  parallelize_check        1
	  obsess_over_service      1
	  check_freshness          0
	  notifications_enabled        1
	  event_handler_enabled        1
	  flap_detection_enabled       1
	  failure_prediction_enabled   1
	  process_perf_data            1
	  retain_status_information    1
	  retain_nonstatus_information 1
	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
	 register    0
	        }

	# Define a service to "ping" the local machine

	define service{
	        use                             generic-service
	        host_name                       localhost
	        service_description             PING
	        is_volatile                     0
	        check_period                    24x7
	        max_check_attempts              4
	        normal_check_interval           5
	        retry_check_interval            1
	        contact_groups                  admins
	        notification_options              w,u,c,r
	        notification_interval           960
	        notification_period             24x7
	        check_command                  check_ping!100.0,20%!500.0,60% 
	        }

	# Define a service to check the disk space of the root partition
	# on the local machine. Warning if < 20% free, critical if
	# < 10% free space on partition.

	define service{
	        use                            generic-service
	        host_name                      localhost
	        service_description            Root Partition
	        is_volatile                    0
	        check_period                   24x7
	        max_check_attempts             4
	        normal_check_interval          5
	        retry_check_interval           1
	        contact_groups                 admins
	        notification_options             w,u,c,r
	        notification_interval          960
	        notification_period            24x7
	        check_command                 check_local_disk!20%!10%!/
	        }

	# Define a service to check the number of currently logged in
	# users on the local machine. Warning if > 20 users, critical
	# if > 50 users.

	define service{
	        use                            generic-service
	        host_name                      localhost
	        service_description            Current Users
	        is_volatile                    0
	        check_period                   24x7
	        max_check_attempts             4
	        normal_check_interval          5
	        retry_check_interval           1
	        contact_groups                 admins
	        notification_options             w,u,c,r
	        notification_interval          960
	        notification_period            24x7
	        check_command                 check_local_users!20!50
	        }

	# Define a service to check the number of currently running procs
	# on the local machine. Warning if > 250 processes, critical if
	# > 400 users.

	define service{
	        use                            generic-service
	        host_name                      localhost
	        service_description            Total Processes
	        is_volatile                    0
	        check_period                   24x7
	        max_check_attempts             4
	        normal_check_interval          5
	        retry_check_interval           1
	        contact_groups                 admins
	    notification_options       w,u,c,r
	        notification_interval          960
	        notification_period            24x7
	    check_command           check_local_procs!250!400
	        }

	# Define a service to check the load on the local machine.

	define service{
	        use                            generic-service
	        host_name                      localhost
	        service_description            Current Load
	        is_volatile                    0
	        check_period                   24x7
	        max_check_attempts             4
	        normal_check_interva           5
	        retry_check_interval           1
	        contact_groups                 admins
	         notification_options             w,u,c,r
	        notification_interval          960
	        notification_period            24x7
	        check_command                 check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
	        }

OK, we're almost there! Make all the files in lan_objects/ owned and writable by the nagios user:

	# chown nagios:nagios /usr/local/nagios/etc/lan_objects/*
	# chmod 0644 /usr/local/nagios/etc/lan_objects/*

Adjust these file ownerships and modes as shown:

	# chown nagios:nagios /usr/local/nagios/etc/nagios.cfg
	# chmod 0644 /usr/local/nagios/etc/nagios.cfg
	# chown nagios:nagios /usr/local/nagios/etc/resource.cfg
	# chmod 0600 /usr/local/nagios/etc/resource.cfg
	# chown nagios:nagios /usr/local/nagios/etc/cgi.cfg
	# chmod 0644 /usr/local/nagios/etc/cgi.cfg

Now, you can run Nagios' syntax checker. You need to do this as root:

	# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

You should see a lot of output ending in these lines:

	Total Warnings: 0
	Total Errors:   0
	Things look okay - No serious problems were detected during the pre-flight check

If there are any errors, it will tell you exactly what you need to fix. When you get a clean run, start up the Nagios daemon:

	# /etc/init.d/nagios start

Now, log in to the Nagios web interface at http://localhost/nagios, and start clicking on various links in the left navigation bar. The Service Detail page should look like Figure 13-2.

Figure 13-2. Service Detail page on a fresh Nagios installation

This means you have successfully gotten Nagios up and running and monitoring localhost. Congratulations!

Discussion

You may name Nagios configuration files whatever you want, as long they have the .cfg extension—this is required.

You won't be able to access all of the Nagios web interface pages yet; you'll get an "It appears as though you do not have permission to view the information you requested…" error on some of them because we haven't set the correct CGI permissions yet. See the next recipe to learn how to do this.

During its initial run, my Nagios system couldn't run the "Total Processes" check. The error message was check_procs: Unknown argument—(null). This means that either one of the options in the command definition (commands.cfg) was incorrect, or the service definition (services.cfg) was incorrect. I used the default files, so chances are you fine readers might encounter the same error. A quick comparison showed a mismatch between the two:

	# commands.cfg
	# 'check_local_procs' command definition
	define command{
	        command_name    check_local_procs
	        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
	        }

	# services.cfg
	define service{
	        use                             generic-service
	        host_name                       localhost
	        service_description             Total Processes
	<...>
	    check_command            check_local_procs!250!400!
	        }

Compare the command_line and check_command lines. The check_local_procs command wants three arguments, but the service definition check_local_procs!250!400! only defined two. Because all I want is to keep track of the total number of running processes, the first two arguments are sufficient. Deleting -s $ARG3$ and restarting Nagios fixed it.

When the total number of running processes reaches 250, Nagios sends a warning. 400 is critical.

The exclamation points simply separate the two alert values; they don't mean you need to get excited.

13.4. Configuring Nagios to Monitor Localhost

Problem

Solution

Discussion

See Also