61 Monitor Availability





Monitor Availability

figs/moderate.gif figs/hack61.gif

Use Nagios to keep tabs on your network.

Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn't enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org).

Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API.

To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates:

$ tar xfz nagios-1.1.tar.gz

$ cd nagios-1.1

Before running Nagios's configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this:

$ ./configure --with-nagios-user=nagios --with-nagios-grp=nagios

This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the --prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios's initialization scripts by running make install-init.

If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you'll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running.

Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory.

At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately.

Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program.

To compile the plug-ins, run commands similar to these:

$ ./conFigureprefix=/usr/local/nagios \

--with-nagios-user=nagios --with-nagis-grp=nagios

$ make all

You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service.

After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec).

There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a --help option that displays information about the plug-in and how it works. This feature is very helpful when you're trying to monitor a new service using a plug-in you haven't used before.

For instance, to learn how the check_ssh plug-in works, run the following command:

$ /usr/local/nagios/libexec/check_ssh

check_ssh (nagios-plugins 1.4.0alpha1) 1.13

The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute

copies of the plugins under the terms of the GNU General Public License.

For more information about these matters, see the file named COPYING.

Copyright (c) 1999 Remi Paulmier <[email protected]>

Copyright (c) 2000-2003 Nagios Plugin Development Team

        <[email protected]>



Try to connect to SSH server at specified server and port



Usage: check_ssh [-46] [-t <timeout>] [-p <port>] <host>

       check_ssh (-h | --help) for detailed help

       check_ssh (-V | --version) for version information



Options:

 -h, --help

    Print detailed help screen

 -V, --version

    Print version information

 -H, --hostname=ADDRESS

    Host name or IP Address

 -p, --port=INTEGER

    Port number (default: 22)

 -4, --use-ipv4

    Use IPv4 connection

 -6, --use-ipv6

    Use IPv6 connection

 -t, --timeout=INTEGER

    Seconds before connection times out (default: 10)

 -v, --verbose

    Show details for command-line debugging (Nagios may truncate output)



Send email to [email protected] if you have questions

regarding use of this software. To submit patches or suggest improvements,

send email to [email protected]

Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it's configured properly.

The sample configuration files provide a good starting point:

$ cd /usr/local/nagios/etc 

$ ls -1

cgi.cfg-sample 

checkcommands.cfg-sample 

contactgroups.cfg-sample 

contacts.cfg-sample 

dependencies.cfg-sample 

escalations.cfg-sample 

hostgroups.cfg-sample 

hosts.cfg-sample 

misccommands.cfg-sample 

nagios.cfg-sample 

resource.cfg-sample 

services.cfg-sample 

timeperiods.cfg-sample

Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don't change the configuration filenames, Nagios will not be able to find them.)

You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line:

# for i in *cfg-sample; do mv $i `echo $i | \

  sed -e s/cfg-sample/cfg/`; done;

First there is the main configuration file, nagios.cfg. You can pretty much leave everything as is—the Nagios installation process will make sure the file paths used in the configuration file are correct. There's one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands.

To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running:

#  /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch:

hosts.cfg 

services.cfg

contacts.cfg 

contactgroups.cfg 

hostgroups.cfg

Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity.

Here are the contents of hosts.cfg:

# Generic host definition template

define host{

 # The name of this host template - referenced i

 name                            generic-host    

 n other host definitions, used for template recursion/resolution

 # Host notifications are enabled

 notifications_enabled           1     

 # Host event handler is enabled   

 event_handler_enabled           1        

 # Flap detection is enabled  

 flap_detection_enabled          1     

 # Process performance data

 process_perf_data               1

 # Retain status information across program restarts       

 retain_status_information       1   

 # Retain non-status information across program restarts    

 retain_nonstatus_information    1       

 # DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST,

 # JUST A TEMPLATE!

 register                        0        

}



# Host Definition

define host{

 # Name of host template to use

 use                     generic-host             

 host_name               freelinuxcd.org

 alias                   Free Linux CD Project Server

 address                 www.freelinuxcd.org

 check_command           check-host-alive

 max_check_attempts      10

 notification_interval   120

 notification_period     24x7

 notification_options    d,u,r

}

The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze.

With this setup we are monitoring only one host, www.freelinuxcd.org, to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to.

Here's what hostgroups.cfg looks like:

define hostgroup{

 hostgroup_name  flcd-servers

 alias           The Free Linux CD Project Servers

 contact_groups  flcd-admins

 members         freelinuxcd.org

}

This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you'll need to define that contact group in contactgroups.cfg:

define contactgroup{

 contactgroup_name       flcd-admins

 alias                   FreeLinuxCD.org Admins

 members                 oktay, verty

}

Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users.

Here are the definitions for those two members in contacts.cfg:

define contact{

 contact_name                    oktay

 alias                           Oktay Altunergil

 service_notification_period     24x7

 host_notification_period        24x7

 service_notification_options    w,u,c,r

 host_notification_options       d,u,r

 service_notification_commands   notify-by-email,notify-by-epager

 host_notification_commands      host-notify-by-email,host-notify-by-epager

 email                           [email protected]

 pager                           [email protected]

 }



define contact{

 contact_name                    Verty

 alias                           David 'Verty' Ky

 service_notification_period     24x7

 host_notification_period        24x7

 service_notification_options    w,u,c,r

 host_notification_options       d,u,r

 service_notification_commands   notify-by-email,notify-by-epager

 host_notification_commands      host-notify-by-email

 email                           [email protected]

 }

In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server.

This is done in services.cfg :

# Generic service definition template

define service{

# The 'name' of this service template, referenced in other service definitions

 name    generic-service  

 # Active service checks are enabled

 active_checks_enabled  1 

 # Passive service checks are enabled/accepted

 passive_checks_enabled  1 

 # Active service checks should be parallelized 

 # (disabling this can lead to major performance problems)

 parallelize_check  1  

 # We should obsess over this service (if necessary)

 obsess_over_service  1  

 # Default is to NOT check service 'freshness'

 check_freshness   0  

 # Service notifications are enabled

 notifications_enabled  1 

 # Service event handler is enabled

 event_handler_enabled  1 

 # Flap detection is enabled

 flap_detection_enabled  1 

 # Process performance data

 process_perf_data  1 

 # Retain status information across program restarts

 retain_status_information 1  

 # Retain non-status information across program restarts

 retain_nonstatus_information 1  

 # DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!

 register   0

 }



# Service definition

define service{

 # Name of service template to use

 use    generic-service   

 host_name   freelinuxcd.org 

 service_description  HTTP

 is_volatile   0

 check_period   24x7

 max_check_attempts  3

 normal_check_interval  5

 retry_check_interval  1

 contact_groups   flcd-admins

 notification_interval  120

 notification_period  24x7

 notification_options  w,u,c,r

 check_command   check_http

 }



# Service definition

define service{

 # Name of service template to use

 use    generic-service   

 host_name   freelinuxcd.org

 service_description  PING

 is_volatile   0

 check_period   24x7

 max_check_attempts  3

 normal_check_interval  5

 retry_check_interval  1

 contact_groups   flcd-admins

 notification_interval  120

 notification_period  24x7

 notification_options  c,r

 check_command   check_ping!100.0,20%!500.0,60%

 }

This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there's a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions.

Once you're happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch:

#  /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

That's all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows