0.1.0
Nagios版权归nagios软件的著作权者所有,本书仅对中文化后内容保留著作权。需要提醒的是:无论你将采用何种方式来引用本书,全部或部分章节,请一定要给出本书的来源站点是http://nagios-cn.sourceforge.net/,并且一定引用sourceforge站点的相关出版物的版权提示与声明。
| 修订历史 | ||
|---|---|---|
| 修订 0.0.3 | 30/01/2008 | enochcytian |
| 将翻译完成的部分初步生成在线帮助文档。 | ||
| 修订 0.0.2 | 20/12/2007 | enochcytian |
| 建立DocBook工程,从源html文件反向生成xml章节文件。 | ||
| 修订 0.0.1 | 12/12/2007 | enochcytian |
| 建立初稿,开始编写初始文件。 | ||
摘要
Nagios是一款非常优秀的网络主机管理软件,它在开源社区的影响力是非同寻常的。但很可惜的是,它的界面及操作使用过程中采用了英语的语言提示与源程序紧密结合使得这款软件的汉化界面迟迟不能推出,影响了它在中文区的使用。为推进Nagios的使用,笔者建立了nagios-cn工程,该工程的主要目标是翻译源程序中运行提示、界面生成和文档说明,通过一些努力,nagios-cn终于可以正常运转了,本书编写的主要目的是为在中文使用区域推广和使用Nagios软件,让这款优秀的软件为国人服务。
相信玩计算机网络的人都或多或少地知道网络管理这一类型软件,但真正在实际中使用并以此为工作基础的人相信并不多,毕竟它不象游戏或字处理类软件那么常见。要不是某些事情所迫,我也不会尽心来了解并使用网管软件,在2004年年底,因为某些任务实在安排不下,“尚有剩余时间”的我接下研究一款网络管理软件的事情。没有最终目标,没有时间截止期限,也不会有太多的人员资金投入,但要把一些很实际的问题解决掉,这就是这些工作的起点。
好在软件并不难以安装和试用,我只花了一天就下载、编译和安装好了,试着把配置文件改了一下,也可以操作着试着用了,但操作界面丑陋、配置更新繁琐、初建系统工作量大等一系列问题使我不得不怀疑是否还需要它?毕竟有一款商业化的软件就放在手边,虽然定制得不太合乎要求,但至少没有这么繁杂的责任背身上,毕竟,我可以不为这些事情负责任的。
考虑在三,"放弃"并不是我想要做的,既然时间没有限制,那就两条腿走路吧,先稳妥地配置好那个商业化软件,让它可以操作与运转,但对后序的改动,只好开启一个记录库,不断地将问题记录下来,而对于Nagios,再清理一下思路,先看看到底我要它做些什么事情,在使用中会有多少问题需要解决,解决到什么程度,再把现有条件对比一下,看看能否走通。
不断地尝试与调整是一个漫长的过程,尤其是到着手编写检测插件的阶段,并不是象想像中的那么顺利,好在时间是挤出来的,写来写去竟然也有了些心得,顺手把Perl和BASH给练习了(只是这些插件与工作内容相关,可惜不能公开),也把几个Nagios安装和运行中常有问题给改掉了,还写了个专门给实施和运行用的BASH脚本方便后来者研究和利用它。
再往下,因为工作情况有变,把掌握的东西交付出来,让它真正有所实用。而后面再搞东西就完全是自己的兴趣了,我先后对nagios-cn项目加入了SVG格式支持、把RRD和Grapher功能整合、写SPEC以定制RPM、增加DocBook转换工程等等,每每做完这些总能让人感到有一种新鲜愉快的感受。
直到最后阶段,我才想到要宣传和推广它,也是因为脱离工作内容的关系,使我做的这些事情不再带有工作内容才有条件在网上公开,这就是后面几个网站或博客上给出的日益增多的项目信息,这本书也是其中的一部分。
Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。
Nagios最初被设计为在Linux系统之上运行,然而它同样可以在类Unix的系统之上运行。
Nagios更进一步的特征包括:
Nagios所需要的运行条件是机器必须可以运行Linux(或是Unix变种)并且有C语言编译器。你必须正确地配置TCP/IP协议栈以使大多数的服务检测可以通过网络得以进行。
你需要但并非必须正确地配置Nagios里的CGIs程序,而一旦你要使用CGI程序时,你必须要安装以下这些软件...
Nagios版权遵从于由自由软件基金会所发布的GNU版权协议第二版。有关GNU协议请查阅自由软件基金会网站。该版权协议允许你在某些条件下可以复制、分发并且或者是修改它。可以在Nagios软件发行包里阅读版权文件LICENSE或是在网站上阅读在线版权文件以获取更多信息。
Nagios is provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING THE WARRANTY OF DESIGN, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
一些人对Nagios的发布尽力,不管是报告错误、提供建议、编写插件等等,可以在网站http://www.nagios.org上找到这些人的名字列表。
可以在Nagioshttp://www.nagios.org站点获取最新版本。
Nagios及Nagios商业标识由Ethan Galstad所拥有。其他的商业标识、服务标识、注册商标及注册服务属于各自的所有者。
Important: Make sure you read through the documentation and the FAQs at http://www.nagios.org/ before sending a question to the mailing lists.
Nagios的更新日志可以在这里的在线文件或是在源程序的发行包的根目录里找到。
祝贺你选择了Nagios!Nagios是一个非常强大且柔性化的软件,但可能需要不少心血来学习如何配置使之符合你所需,一旦掌握了它如何工作并怎样来工作时,你会觉得再也离不开它! :-) 对于初次使用Nagios的新手这有几个建议需要遵从:
目录
如果是使用3.x的旧版,肯定是要尽快升级到当前版本。新版本修正了许多错误,下面假定已经根据快速安装指南的操作步骤从源代码包开始安装好Nagios,下面可以安装更新的版本。虽然下面的操作都是用root操作的,但可以不用root权限也可以升级成功。下面是升级过程...
先确认已经备份好现有版本的Nagios软件和配置文件。如果升级过程中有不对的,至少可以回退到旧版本。
切换为Nagios用户。使用Debian/Ubuntu系统的可以用sudo -s nagios来切换。
su -l nagios
下载最新的Nagios安装包(http://www.nagios.org/download/)。
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.x.tar.gz展开源码包。
tar xzf nagios-3.x.tar.gz cd nagios-3.x
运行Nagios源程序的配置脚本,把加入外部命令的组名加上,象这样:
./configure --with-command-group=nagcmd
编译源程序
make all
安装升级后的二进制程序、文档和Web接口程序。在这步时旧配置文件还不会被覆盖。
make install
验证配置并重启动Nagios
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg /sbin/service nagios restart
好了,升级完成!
Nagios从2.x升级到3.x并不难。升级过程如同上面的旧版3.x的升级过程。但是Nagios3.x中有几处配置文件的改动需要注意:
Also make sure to read the "What's New" section of the documentation. It describes all the changes that were made to the Nagios 3 code since the latest stable release of Nagios 2.x. Quite a bit has changed, so make sure you read it over.
如果当前是用RPM包安装的,或是用Debian/Ubuntu的APT软件包来安装Nagios的,需要用源程序包来安装升级,下面是操作步骤:
注意RPM和APT包把Nagios的文件放置的位置有所不同。在升级前要确保那些配置文件备份好以在碰到解决不了的升级问题时可以回退到旧版本。
现在可以提供如下Linux发行版本上的快速安装指南:
你可以在NagiosCommunity.org的维基百科上找到更多的安装上手指南。什么?找不到你所用的操作系统版本的指南?在维基百科上给其他人写一条吧!
如果你在一个上面没列出的操作系统或Linux发行包上安装Nagios,请参照Fedora快速指南来概要地了解一下你需要做的事情。命令名、路径等可能因不同的发行包或操作系统而不同,因而这时你可能需要些努力来搞一下安装文档里的东西。
一旦你正确地安装并使Nagios运行起来后,毫无疑问你不仅要监控你的主机,你需要审视一下更多的文档来做更多的事情...
本指南试图让你通过简单的指令以在20分钟内在Fedora平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。
这些指令在基于Fedora Core 6的系统下写成的。
最终结果是什么
如果按照本指南安装,最后将是这样结果:
在做安装之前确认要对该机器拥有root权限。
确认你安装好的Fedora系统上已经安装如下软件包再继续。
可以用yum命令来安装这些软件包,键入命令:
yum install httpd yum install gcc yum install glibc glibc-common yum install gd gd-devel
1)建立一个帐号
切换为root用户
su -l
创建一个名为nagios的帐号并给定登录口令
/usr/sbin/useradd nagios passwd nagios
创建一个用户组名为nagcmd用于从Web接口执行外部命令。将nagios用户和apache用户都加到这个组中。
/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd apache
2)下载Nagios和插件程序包
建立一个目录用以存储下载文件
mkdir ~/downloads cd ~/downloads
下载Nagios和Nagios插件的软件包(访问http://www.nagios.org/download/站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.0rc1.tar.gz wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
3)编译与安装Nagios
展开Nagios源程序包
cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1
运行Nagios配置脚本并使用先前开设的用户及用户组:
./configure --with-command-group=nagcmd
编译Nagios程序包源码
make all
安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限
make install make install-init make install-config make install-commandmode
现在还不能启动Nagios-还有一些要做的...
4)客户化配置
样例配置文件默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...
用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。
vi /usr/local/nagios/etc/objects/contacts.cfg
5)配置WEB接口
安装Nagios的WEB配置文件到Apache的conf.d目录下
make install-webconf
创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
重启Apache服务以使设置生效。
service httpd restart
6)编译并安装Nagios插件
展开Nagios插件的源程序包
cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11
编译并安装插件
./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install
7)启动Nagios
把Nagios加入到服务列表中以使之在系统启动时自动启动
chkconfig --add nagios chkconfig nagios on
验证Nagios的样例配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有报错,可以启动Nagios服务
service nagios start
8)更改SELinux设置
Fedora与SELinux(安全增强型Linux)同步发行与安装后将默认使用强制模式。这会在你尝试联入Nagios的CGI时导致一个"内部服务错误"消息。
如果是SELinux处于强制安全模式时需要做
getenforce
令SELinux处于容许模式
setenforce 0
如果要永久性更变它,需要更改/etc/selinux/config里的设置并重启系统。
不关闭SELinux或是永久性变更它的方法是让CGI模块在SELinux下指定强制目标模式:
chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/ chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
更多有关Nagios的CGI模块增加目标策略的强制权限方式见NagiosCommunity.org的维基百科http://www.nagioscommunity.org/wiki。
9)登录WEB接口
你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接
http://localhost/nagios/
点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。
10)其他的变更
确信你机器的防火墙规则配置允许你可以从远程登录到Nagios的WEB服务。
配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点NagiosCommunity.org wiki来查找更进一步的信息,以使你的系统上可以向外部地址发送EMail信息。更多有关通知的信息可以查阅这篇文档。
11)完成了
祝贺你已经成功安装好Nagios,但网络监控工作只是刚开始。毫无疑问你不是只监控本地系统,所以要看以下这些文档...
本指南试图让你通过简单的指令以在20分钟内在你的openSUSE平台上通过对Nagios的源程序的安装来监控本地主机。这里没有讨论更高级的设置项 - 只是一些基本操作,但这足以使95%的用户启动Nagios。
这些指令在基于openSUSE10.2的系统下写成的。
1)建立一个帐号
切换为root用户
su -l
创建新帐户名为nagios并给它一个登录口令
/usr/sbin/useradd nagios
passwd nagios
创建一个用户组名为nagios,并把nagios帐户加入该组
/usr/sbin/groupadd nagios
/usr/sbin/usermod -G nagios nagios
创建一个用户组名为nagcmd来执行外部命令并可以通过WEB接口来执行。将nagios用户和apache用户都加到这个组中。
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -G nagcmd nagios
/usr/sbin/usermod -G nagcmd wwwrun
2)下载Nagios和插件程序包
建立一个目录用以存储下载文件
mkdir ~/downloads
cd ~/downloads
下载Nagios和Nagios插件的软件包(访问http://www.nagios.org/download/站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.0rc1.tar.gz
wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
3)编译与安装Nagios
展开Nagios源程序包
cd ~/downloads
tar xzf nagios-3.0rc1.tar.gz
cd nagios-3.0rc1
运行Nagios配置脚本并使用先前开设的用户及用户组:
./configure --with-command-group=nagcmd
编译Nagios程序包源码
make all
安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限
make install
make install-init
make install-config
make install-commandmode
现在还不能启动Nagios - 还有一些要做的...
4)客户化配置
样例配置文件默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...
用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。
vi /usr/local/nagios/etc/objects/contacts.cfg
5)配置WEB接口
安装Nagios的WEB配置文件到Apache的conf.d目录下
make install-webconf
创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。
htpasswd2 -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
重启Apache服务以使设置生效。
service apache2 restart
6)编译并安装Nagios插件
展开Nagios插件的源程序包
cd ~/downloads
tar xzf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
编译并安装插件
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
7)启动Nagios
把Nagios加入到服务列表中以使之在系统启动时自动启动
chkconfig --add nagios
chkconfig nagios on
验证Nagios的样例配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有报错,可以启动Nagios服务
service nagios start
8)登录WEB接口
你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接
konqueror http://localhost/nagios/
点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。
9)其他的变更
确信你机器的防火墙规则配置允许你可以从远程登录到Nagios的WEB服务。
你可以这样做:
配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点NagiosCommunity.org wiki来查找更进一步的信息,以使你的openSUSE系统上可以向外部地址发送EMail信息。
本指南试图让你通过简单的指令以在20分钟内在Ubuntu平台上通过对Nagios的源程序的安装来监控本地主机。没有讨论更高级的设置项-只是一些基本操作,但这足以使95%的用户启动Nagios。
这些指令在基于Ubuntu6.10(桌面版)的系统下写成的。
What You'll End Up With
如果按照本指南安装,最后将是这样结果:
确认你安装好的系统上已经安装如下软件包再继续。
可以用apt-get命令来安装这些软件包,键入命令:
sudo apt-get install apache2 sudo apt-get install build-essential sudo apt-get install libgd2-dev
1)建立一个帐号
切换为root用户
sudo -s
创建一个名为nagios的帐号并给定登录口令
/usr/sbin/useradd nagios passwd nagios
在Ubuntu服务器版(6.01或更高版本),创建一个用户组名为nagios(默认是不创建的)。在Ubuntu桌面版上要跳过这一步。
/usr/sbin/groupadd nagios /usr/sbin/usermod -G nagios nagios
创建一个用户组名为nagcmd用于从Web接口执行外部命令。将nagios用户和apache用户都加到这个组中。
/usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd www-data
2)下载Nagios和插件程序包
建立一个目录用以存储下载文件
mkdir ~/downloads cd ~/downloads
下载Nagios和Nagios插件的软件包(访问http://www.nagios.org/download/站点以获得最新版本),在写本文档时,最新的Nagios的软件版本是3.0rc1,Nagios插件的版本是1.4.11。
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.0rc1.tar.gz wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
3)编译与安装Nagios
展开Nagios源程序包
cd ~/downloads tar xzf nagios-3.0rc1.tar.gz cd nagios-3.0rc1
运行Nagios配置脚本并使用先前开设的用户及用户组:
./configure --with-command-group=nagcmd
编译Nagios程序包源码
make all
安装二进制运行程序、初始化脚本、配置文件样本并设置运行目录权限
make install make install-init make install-config make install-commandmode
现在还不能启动Nagios-还有一些要做的...
4)客户化配置
样例配置文件默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...
用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。
vi /usr/local/nagios/etc/objects/contacts.cfg
5)配置WEB接口
安装Nagios的WEB配置文件到Apache的conf.d目录下
make install-webconf
创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
重启Apache服务以使设置生效。
/etc/init.d/apache2 reload
6)编译并安装Nagios插件
展开Nagios插件的源程序包
cd ~/downloads tar xzf nagios-plugins-1.4.11.tar.gz cd nagios-plugins-1.4.11
编译并安装插件
./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install
7)启动Nagios
把Nagios加入到服务列表中以使之在系统启动时自动启动
ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios
验证Nagios的样例配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有报错,可以启动Nagios服务
/etc/init.d/nagios start
8)登录WEB接口
你现在可以从WEB方式来接入Nagios的WEB接口了,你需要在提示下输入你的用户名(nagiosadmin)和口令,你刚刚设置的,这里用系统默认安装的浏览器,用下面这个超链接
http://localhost/nagios/
点击“服务详情”的引导超链来查看你本机的监视详情。你可能需要给点时间让Nagios来检测你机器上所依赖的服务因为检测需要些时间。
9)其他的变更
如果要接收Nagios的EMail警报,需要安装(Postfix)包
sudo apt-get install mailx
需要编辑Nagios里的EMail通知送出命令,它位于/usr/local/nagios/etc/commands.cfg文件中,将里面的'/bin/mail'全部替换为'/usr/bin/mail'。一旦设置好需要重启动Nagios以使配置生效。
sudo /etc/init.d/nagios restart
配置EMail的报警项超出了本文档的内容,指向你的系统档案用网页查找或是到这个站点NagiosCommunity.org wiki来查找更进一步的信息,以使Ubuntu系统上可以向外部地址发送EMail信息。
本文用来说明如何监控Windows主机的本地服务和特性,包括:
Publicly available services that are provided by Windows machines (HTTP, FTP, POP3, etc.) can be monitored easily by following the documentation on monitoring publicly available services.

Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample config files (commands.cfg, templates.cfg, etc.) that are installed if you follow the quickstart.
Monitoring private services or attributes of a Windows machine requires that you install an agent on it. This agent acts as a proxy between the Nagios plugin that does the monitoring and the actual service or attribute of the Windows machine. Without installing an agent on the Windows box, Nagios would be unable to monitor private services or attributes of the Windows box.
For this programlisting, we will be installing the NSClient++ addon on the Windows machine and using the check_nt plugin to communicate with the NSClient++ addon. The check_nt plugin should already be installed on the Nagios server if you followed the quickstart guide.
Other Windows agents (like NC_Net) could be used instead of NSClient++ if you wish - provided you change command and service definitions, etc. a bit. For the sake of simplicity I will only cover using the NSClient++ addon in these instructions.
There are several steps you'll need to follow in order to monitor a new Windows machine. They are:
To make your life a bit easier, a few configuration tasks have already been done for you:
The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your Windows boxes in no time.
The first time you configure Nagios to monitor a Windows machine, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* Windows machine you monitor.
Edit the main Nagios config file.
vi /usr/local/nagios/etc/nagios.cfg
Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg
Save the file and exit.
What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/windows.cfg to find additional object definitions. That's where you'll be adding Windows host and service definitions. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* Windows machine you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.
Before you can begin monitoring private services and attributes of Windows machines, you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at http://sourceforge.net/projects/nscplus. These instructions will take you through a basic installation of the NSClient++ addon, as well as the configuration of Nagios for monitoring the Windows machine.
1. Download the latest stable version of the NSClient++ addon from http://sourceforge.net/projects/nscplus
2. Unzip the NSClient++ files into a new C:\NSClient++ directory
3. Open a command prompt and change to the C:\NSClient++ directory
4. Register the NSClient++ system service with the following command:
nsclient++ /install
5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):
nsclient++ SysTray
6. Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed to interact with the desktop, check the box to allow it to.

7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
8. Start the NSClient++ service with the following command:
nsclient++ /start
9. If installed properly, a new icon should appear in your system tray. It will be a yellow circle with a black 'M' inside.
10. Success! The Windows server can now be added to the Nagios monitoring configuration...
Now it's time to define some object definitions in your Nagios configuration files in order to monitor the new Windows machine.
Open the windows.cfg file for editing.
vi /usr/local/nagios/etc/objects/windows.cfg
Add a new host definition for the Windows machine that you're going to monitor. If this is the *first* Windows machine you're monitoring, you can simply modify the sample host definition in windows.cfg. Change the host_name, alias, and address fields to appropriate values for the Windows box.
define host{
use windows-server ; Inherit default values from a Windows server template (make sure you keep this line!)
host_name winserver
alias My Windows Server
address 192.168.1.2
}
Good. Now you can add some service definitions (to the same configuration file) in order to tell Nagios to monitor different aspects of the Windows machine. If this is the *first* Windows machine you're monitoring, you can simply modify the sample service definitions in windows.cfg.

Note: Replace "winserver" in the programlisting definitions below with the name you specified in the host_name directive of the host definition you just added.
Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you'll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++.
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
Add the following service definition to monitor the uptime of the Windows server.
define service{
use generic-service
host_name winserver
service_description Uptime
check_command check_nt!UPTIME
}
Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater.
define service{
use generic-service
host_name winserver
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}
Add the following service definition to monitor usage of the C:\ drive on the Windows server and generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.
define service{
use generic-service
host_name winserver
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
Add the following service definition to monitor the W3SVC service state on the Windows machine and generate a CRITICAL alert if the service is stopped.
define service{
use generic-service
host_name winserver
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
Add the following service definition to monitor the Explorer.exe process on the Windows machine and generate a CRITICAL alert if the process is not running.
define service{
use generic-service
host_name winserver
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}
That's it for now. You've added some basic services that should be monitored on the Windows box. Save the configuration file.
If you specified a password in the NSClient++ configuration file on the Windows machine, you'll need to modify the check_nt command definition to include the password. Open the commands.cfg file for editing.
vi /usr/local/nagios/etc/commands.cfg
Change the definition of the check_nt command to include the "-s <PASSWORD>" argument (where PASSWORD is the password you specified on the Windows machine) like this:
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$
}
Save the file.
You're done with modifying the Nagios configuration, so you'll need to verify your configuration files and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
本文档描述了如果监控Linux/UNIX的"私有"服务和属性,如:
由Linux系统上的公众服务(HTTP、FTP、SSH、SMTP等)可以按照这篇监控公众服务文档。

[注意:本文档没有结束。推荐阅读文档NRPE外部构件里如何监控远程Linux/Unix服务器中的指令]
有几种不同方式来监控远程Linux/UNIX服务器的服务与属性。一个是应用共享式SSH密钥运行check_by_ssh插件来执行对远程主机的检测。这种方法本文档不讨论,但它会导致安装有Nagios的监控服务器很高的系统负荷,尤其是你要监控成百个主机中的上千个服务时,这是因为要建立/毁构SSH联接的总开销很高。
另一种方法是使用NRPE外部构件监控远程主机。NRPE外部构件可以在远程的Linux/Unix主机上执行插件程序。如果是要象监控本地主机一样对远程主机的磁盘利用率、CPU负荷和内存占用率等情况下,NRPE外部构件非常有用。
本文档将介绍如何来监控路由器和交换机的状态。一些便宜的"无网管"功能的交换机与集线器不能配置IP地址而且对于网络是不可见的组成构件,因而没办法来监控这种东西。稍贵些的交换机和路由器可以配置IP地址可以用PING检测或是通过SNMP来查询状态信息。
下面将描述如果来监控这些有网管功能的交换机、集线器和路由器:
监控交换机与路由器可简可繁-主要是看拥有什么样设备与想监控什么内容。做为极为重要的网络组成构件,毫无疑问至少要监控一些基本状态。
交换机与路由器可以简单地用PING来监控丢包率、RTA等数据。如果交换机支持SNMP,就可以监控端口状态等,用check_snmp插件,也可以监控带宽(如果用了MRTG),用check_mrtgtraf插件。
check_snmp插件只有当系统里安装了net-snmp和net-snmp-utils包后才编译。先确定插件已经在/usr/local/nagios/libexec目录里再继续做,如果没有这个文件,安装net-snmp和net-snmp-utils包并且重编译并重新安装Nagios插件包。
为了让工作轻松点,几个配置任务已经做好了:
以上的监控配置文件可以在/usr/local/nagios/etc/objects/目录里找到。如果愿意可以修改这些定义或是加入其他适合需要的更好的定义。但推荐你最好是等到你熟练地掌握了Nagios配置之后再这么做。开始的时候,只要按上述的配置来监控网络里的路由器和交换机就可以了。
要配置Nagios用于监控网络里的交换机之前,有必要做点额外工作。记住,这是首先要做的工作才能监控。
编辑Nagios的主配置文件
vi /usr/local/nagios/etc/nagios.cfg
移除文件里下面这行的最前面的(#)符号
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg
保存文件并退出。
为何要这么做?这是要让Nagios检查/usr/local/nagios/etc/objects/switch.cfg配置文件来找些额外的对象定义。在文件里可以增加有关路由器和交换机设备的主机与服务定义。配置文件已经包含了几个样本主机、主机组和服务定义。做为监控路由器与交换机的第一步工作是最好在样例的主机与服务对象定义之上修改而不是重建一个。
需要做些对象定义以监控新的交换机与路由器设备。
打开switch.cfg文件进行编辑。
vi /usr/local/nagios/etc/objects/switch.cfg
给要监控的交换机加一个新的主机对象定义。如果这是第一台要监控的交换机设备,可以简单地修改switch.cfg里的样例配置。修改主机对象里的host_name、alias和address域值来适用于监控。
define host{ use generic-switch ; Inherit default values from a template host_name linksys-srw224p ; The name we're giving to this switch alias Linksys SRW224P Switch ; A longer name associated with the switch address 192.168.1.253 ; IP address of the switch hostgroups allhosts,switches ; Host groups this switch is associated with }
现在可以加些针对监控交换机的服务对象定义(在同一个配置文件)。如果是第一台要监控的交换机设备,可以简单地修改switch.cfg里的样例配置。

增加如下的服务定义以监控自Nagios监控主机到交换机的丢包率和平均回包周期RTA,在一般情况下每5分钟检测一次。
define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p ; The name of the host the service is associated with service_description PING ; The service description check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service normal_check_interval 5 ; Check the service every 5 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }
这个服务的状态将会处于:
如果交换机与路由器支持SNMP接口,可以用check_snmp插件来监控更丰富的信息。如果不支持SNMP,跳过此节。
加入如下服务定义到你刚才修改的交换机对象定义之中
define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }
在上述服务定义中的check_command域里,用"-C public"来指定SNMP共同体名称为"public",用"-o sysUpTime.0"指明要检测的OID(译者注-MIB节点值)。
如果要确保交换机上某个指定端口或接口的状态处于运行状态,可以在对象定义里加入一段定义:
define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Link Status check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB }
在上例中,"-o ifOperStatus.1"指出取出交换机的端口编号为1的OID状态。"-r 1"选项是让check_snmp插件检查返回一个正常(OK)状态,如果是在SNMP查询结果中存在"1"(1说明交换机端口处于运行状态)如果没找到1就是紧急(CRITICAL)状态。"-m RFC1213-MIB"是可选的,它告诉check_snmp插件只加载"RFC1213-MIB"库而不是加载每个在系统里的MIB库,这可以加快插件运行速度。
这就是给SNMP库的例子。有成百上千种信息可以通过SNMP来监控,这完全取决于你需要做什么和如果来做监控。祝你好运!

可以监控交换机或路由器的带宽利用率,用MRTG绘图并让Nagios在流量超出指定门限时报警。check_mrtgtraf插件(它已经包含在Nagios插件软件发行包中)可以实现。
需要让check_mrtgtraf插件知道如何来保存MRTG数据并存入文件,以及门限等。在例子中,监控了一个Linksys交换机。MRTG日志保存于/var/lib/mrtg/192.168.1.253_1.log文件中。这就是我用于监控的服务定义,它可以用于监控带宽数据到日志文件之中...
define service{ use generic-service ; Inherit values from a template host_name linksys-srw224p service_description Port 1 Bandwidth Usage check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.253_1.log!AVG!1000000,2000000!5000000,5000000!10 }
在上例中,"/var/lib/mrtg/192.168.1.253_1.log"参数传给check_local_mrtgtraf命令意思是插件的MRTG日志文件在这个文件里读写,"AVG"参数的意思是取带宽的统计平均值,"1000000,200000"参数是指流入的告警门限(以字节为单位),"5000000,5000000"是输出流量紧急状态门限(以字节为单位),"10"是指如果MRTG日志如果超过10分钟没有数据返回一个紧急状态(应该每5分钟更新一次)。
保存该配置文件
一旦给switch.cfg文件里加好新的主机与服务对象定义,就可以开始对路由器与交换机进行监控。为了开始监控,需要先验证配置文件再重新启动Nagios。
如果验证过程有有任何错误信息,修改配置文件再继续。一定要保证配置验证过程中没有错误信息再启动Nagios!
本文件描述了如何监控网络打印机。特别是有内置或外置JetDirect卡的HP惠普打印机设备,或是其他(象Troy PocketPro 100S或Netgear PS101)支持JetDirect协议的打印机。
check_hpjd插件(该命令是Nagios插件软件发行包的标准组成部分)可以用SNMP使能的方式来监控JetDirect兼容型打印机。该插件可以检查如下打印机状态:
监控网络打印机的状态很简单。有JetDirect功能的打印机一般提供SNMP功能,可以用check_hpjd插件来检测状态。
check_hpjd插件只是当当前系统中安装有net-snmp和net-snmp-utils软件包时才会被编译和安装。要保证在/usr/local/nagios/libexec目录下有check_hpjd文件再继承,否则,要安装好net-snmp和net-snmp-utils软件包再重新编译安装Nagios插件包。
为使这项工作更轻松,几个配置工作已经做好:
上面的监控配置文件可以在/usr/local/nagios/etc/objects/目录里找到。如果想做,可以修改里面的定义以更好地适用于你的情况。但是在此之前,推荐你要熟悉Nagios的配置之后再做。起初,最好只是按下面的大概修改一下以实现对网络打印机的监控。
在配置Nagios用于监控网络打印机之前,有些额外工作,记住这是要对第一台打印机设备进行监控。
编辑Nagios的主配置文件。
vi /usr/local/nagios/etc/nagios.cfg
移除下面这行最前面的(#)号:
#cfg_file=/usr/local/nagios/etc/objects/printer.cfg
保存文件并退出编辑。
为何要这样?告诉Nagios查找/usr/local/nagios/etc/objects/printer.cfg文件以取得额外对象定义。该文件中将加入网络打印机设备的主机与服务对象定义。这个配置文件里已经包含有一个样本主机、主机组和服务定义。给第一台打印机设备做监控,可以简单地修改这个文件而不需重生成一个。
需要创建几个对象定义以进行网络打印机的监控。
打开printer.cfg文件并编辑它。
vi /usr/local/nagios/etc/objects/printer.cfg
增加一个你要监控的网络打印机设备的主机对象定义。如果这是第一台打印机设备,可以简单地修改printer.cfg文件里的样本主机定义。将合理的值赋在host_name、alias和address域里。
define host{ use generic-printer ; Inherit default values from a template host_name hplj2605dn ; The name we're giving to this printer alias HP LaserJet 2605dn ; A longer name associated with the printer address 192.168.1.30 ; IP address of the printer hostgroups allhosts ; Host groups this printer is associated with }
现在可以给监控的打印机加些服务定义(在同一个配置文件里)。如果是第一台被监控的网络打印机,可以简单地修改printer.cfg里的服务配置。

按如下方式加好对打印机状态检测的服务定义。服务用check_hpjd插件来检测打印机状态,默认情况下每10分钟检测一次。SNMP共同体串是"public"。
define service{ use generic-service ; Inherit values from a template host_name hplj2605dn ; The name of the host the service is associated with service_description Printer Status ; The service description check_command check_hpjd!-C public ; The command used to monitor the service normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined }
加入一个默认每10分钟进行一次的PING检测服务。用于检测RTA、丢包率和网络联接状态。
define service{ use generic-service host_name hplj2605dn service_description PING check_command check_ping!3000.0,80%!5000.0,100% normal_check_interval 10 retry_check_interval 1 }
保存配置文件。
本文档描述了如何对Netware服务器的"私有"服务和属性进行监控,象这些:
由Netware服务器提供的公众服务(HTTP、FTP等)的监控可以按文档监控公众服务来做。
TODO...
Novell有一些有关Nagios如何来做Netware监控的文档,在网站的Cool Solutions栏目里,包括:
This document describes how you can monitor publicly available services, applications and protocols. By "public" I mean services that are accessible across the network - either the local network or the greater Internet. Examples of public services include HTTP, POP3, IMAP, FTP, and SSH. There are many more public services that you probably use on a daily basis. These services and applications, as well as their underlying protocols, can usually be monitored by Nagios without any special access requirements.
Private services, in contrast, cannot be monitored with Nagios without an intermediary agent of some kind. Examples of private services associated with hosts are things like CPU load, memory usage, disk usage, current user count, process information, etc. These private services or attributes of hosts are not usually exposed to external clients. This situation requires that an intermediary monitoring agent be installed on any host that you need to monitor such information on. More information on monitoring private services on different types of hosts can be found in the documentation on:

Tip: Occassionally you will find that information on private services and applications can be monitored with SNMP. The SNMP agent allows you to remotely monitor otherwise private (and inaccessible) information about the host. For more information about monitoring services using SNMP, check out the documentation on monitoring switches and routers.

Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample commands.cfg and localhost.cfg config files.
When you find yourself needing to monitor a particular application, service, or protocol, chances are good that a plugin exists to monitor it. The official Nagios plugins distribution comes with plugins that can be used to monitor a variety of services and protocols. There are also a large number of contributed plugins that can be found in the contrib/ subdirectory of the plugin distribution. The NagiosExchange.org website hosts a number of additional plugins that have been written by users, so check it out when you have a chance.
If you don't happen to find an appropriate plugin for monitoring what you need, you can always write your own. Plugins are easy to write, so don't let this thought scare you off. Read the documentation on developing plugins for more information.
I'll walk you through monitoring some basic services that you'll probably use sooner or later. Each of these services can be monitored using one of the plugins that gets installed as part of the Nagios plugins distribution. Let's get started...
Before you can monitor a service, you first need to define a host that is associated with the service. You can place host definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by a cfg_dir directive. If you have already created a host definition, you can skip this step.
For this programlisting, lets say you want to monitor a variety of services on a remote host. Let's call that host remotehost. The host definition can be placed in its own file or added to an already exiting object configuration file. Here's what the host definition for remotehost might look like:
define host{
use generic-host ; Inherit default values from a template
host_name remotehost ; The name we're giving to this host
alias Some Remote Host ; A longer name associated with the host
address 192.168.1.50 ; IP address of the host
hostgroups allhosts ; Host groups this host is associated with
}
Now that a definition has been added for the host that will be monitored, we can start defining services that should be monitored. As with host definitions, service definitions can be placed in any object configuration file.
For each service you want to monitor, you need to define a service in Nagios that is associated with the host definition you just created. You can place service definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by a cfg_dir directive.
Some programlisting service definitions for monitoring common public service (HTTP, FTP, etc) are given below.
Chances are you're going to want to monitor web servers at some point - either yours or someone else's. The check_http plugin is designed to do just that. It understands the HTTP protocol and can monitor response time, error codes, strings in the returned HTML, server certificates, and much more.
The commands.cfg file contains a command definition for using the check_http plugin. It looks like this:
define command{
name check_http
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the HTTP service on the remotehost machine might look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description HTTP
check_command check_http
}
This simple service definition will monitor the HTTP service running on remotehost. It will produce alerts if the web server doesn't respond within 10 seconds or if it returns HTTP errors codes (403, 404, etc.). That's all you need for basic monitoring. Pretty simple, huh?

Tip: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. This --help syntax works with all of the plugins I'll cover in this document.
A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /download/index.php URI contains the string "latest-version.tar.gz". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description Product Download Link
check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz"
}
When you need to monitor FTP servers, you can use the check_ftp plugin. The commands.cfg file contains a command definition for using the check_ftp plugin, which looks like this:
define command{
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the FTP server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description FTP
check_command check_ftp
}
This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the FTP server running on port 1023 on remotehost. It will generate an alert if the server doesn't respond within 5 seconds or if the server response doesn't contain the string "Pure-FTPd [TLS]".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description Special FTP
check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]"
}
When you need to monitor SSH servers, you can use the check_ssh plugin. The commands.cfg file contains a command definition for using the check_ssh plugin, which looks like this:
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
A simple service definition for monitoring the SSH server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SSH
check_command check_ssh
}
This service definition will monitor the SSH service and generate alerts if the SSH server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the SSH server and generate an alert if the server doesn't respond within 5 seconds or if the server version string string doesn't match "OpenSSH_4.2".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SSH Version Check
check_command check_ssh!-t 5 -r "OpenSSH_4.2"
}
The check_smtp plugin can be using for monitoring your email servers. The commands.cfg file contains a command definition for using the check_smtp plugin, which looks like this:
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the SMTP server on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SMTP
check_command check_smtp
}
This service definition will monitor the SMTP service and generate alerts if the SMTP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the SMTP server and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description SMTP Response Check
check_command check_smtp!-t 5 -e "mygreatmailserver.com"
}
The check_pop plugin can be using for monitoring the POP3 service on your email servers. The commands.cfg file contains a command definition for using the check_pop plugin, which looks like this:
define command{
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the POP3 service on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description POP3
check_command check_pop
}
This service definition will monitor the POP3 service and generate alerts if the POP3 server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the POP3 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description POP3 Response Check
check_command check_pop!-t 5 -e "mygreatmailserver.com"
}
The check_imap plugin can be using for monitoring IMAP4 service on your email servers. The commands.cfg file contains a command definition for using the check_imap plugin, which looks like this:
define command{
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}
A simple service definition for monitoring the IMAP4 service on remotehost would look like this:
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description IMAP
check_command check_imap
}
This service definition will monitor the IMAP4 service and generate alerts if the IMAP server doesn't respond within 10 seconds.
A more advanced service definition is shown below. This service will check the IAMP4 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{
use generic-service ; Inherit default values from a template
host_name remotehost
service_description IMAP4 Response Check
check_command check_imap!-t 5 -e "mygreatmailserver.com"
}
Once you've added the new host and service definitions to your object configuration file(s), you're ready to start monitoring them. To do this, you'll need to verify your configuration and restart Nagios.
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!
在你开始监控网络与系统之前要有同个不同配置文件需要创建和编辑。耐心点,配置Nagios可能是要花些时间特别是对于那些初次使用者。弄清其机理所有的将它们搞定绝对是值得的。 :-)
主配置文件包括了一系列的设置,它们会影响Nagios守护进程。不仅是Nagios守护进程要使用主配置文件,CGIs程序组模块也需要,因此,主配置文件是你开始学习配置其他文件的基础。
有关主配置文件的文档在这里。
资源文件可以保存用户自定义的宏。资源文件的一个主要用处是用于保存一些敏感的配置信息如系统口令等不能让CGIs程序模块获取到的东西。
你可以在主配置文件中设置resource_file指向一个或是多个资源文件。
当创建或编辑配置文件时,要遵守如下要求:
主配置文件一般(实际是固定的)是nagios.cfg,存放位置在/usr/local/nagios/etc/目录里(--如果是rpm包来安装,应该是在/etc/nagios/)。
下面将对每个主配置文件里的选项进行说明...
这个变量用于设定Nagios在何处创建其日志文件。它应该是你主配置文件里面的第一个变量,当Nagios找到你配置文件并发现配置里有错误时会向该文件中写入错误信息。如果你使能了日志回滚,Nagios将在每小时、每天、每周或每月对日志进行回滚。
表 5.2. 对象配置文件
| 格式: | cfg_file=<file_name> |
| 样例: | cfg_file=/usr/local/nagios/etc/hosts.cfg cfg_file=/usr/local/nagios/etc/services.cfg cfg_file=/usr/local/nagios/etc/commands.cfg |
该变量用于指定一个包含有将用于Nagios监控对象的对象配置文件。对象配置文件中包括有主机、主机组、联系人、联系人组、服务、命令等等对象的定义。配置信息可以切分为多个文件并且用cfg_file=语句来指向每个待处理的配置文件。
表 5.3. 对象配置目录
| 格式: | cfg_dir=<directory_name> |
| 样例: | cfg_dir=/usr/local/nagios/etc/commands cfg_dir=/usr/local/nagios/etc/services cfg_dir=/usr/local/nagios/etc/hosts |
该变量用于指定一个目录,目录里包含有将用于Nagios监控对象的对象配置文件。所有的在这个目录下的且以.cfg为扩展名的文件将被作为配置文件来处理。另外,Nagios将会递归该目录下的子目录并处理其子目录下的全部配置文件。你可以把配置放入不同的目录并且用cfg_dir=语句来指向每个待处理的目录。
表 5.4. 对象缓冲文件
| 格式: | object_cache_file=<file_name> |
| 样例: | object_cache_file=/usr/local/nagios/var/objects.cache |
该变量用于指定一个用于缓冲对象定义复本的文件存放位置。对象缓冲将在每次Nagios的启动和重启时和使用CGI模块时被创建或重建。它试图加快在CGI里的配置缓冲并使得你在编辑对象配置文件时可以让正在运行的Nagios不影响CGI的显示输出。
表 5.5. 预缓冲对象文件
| 格式: | precached_object_file=<file_name> |
| 样例: | precached_object_file=/usr/local/nagios/var/objects.precache |
该变量用于指定一个用于指定一个用于预处理、预缓冲 This directive is used to specify a file in which a pre-processed, pre-cached copy of 对象定义复本的文件存放位置。在大型或复杂Nagios安装模式下这个文件可用于显著地减少Nagios的启动时间。如何加快启动的更多信息可以查看这个内容。
该变量用于指定一个可选的包含有$USERn$宏定义的可选资源文件。$USERn$宏在存放用户名、口令及通用的命令定义内容(如目录路径)时非常有用。CGIs模块将不会试图读取资源文件,所以你可以限定这权文件权限(600或660)来保护敏感信息。你可以在主配置文件里用resource_file语句来加入多个资源文件-Nagios将会处理它们。如何定义$USERn$宏参见样例resource.cfg文件,它放在Nagios发行包的sample-config/子目录下。
该变量用于指定一个临时文件,Nagios将在更新注释数据、状态数据等时周期性地创建它。该文件不再需要时会删除它。
这个变量是一个目录,该目录是块飞地,在监控过程中用于创建临时文件。你应在该目录内运行tmpwatch或类似的工具程序以删除早于24小时的文件(这是个垃圾文件存放地)。
这个变量指向一个文件,文件被Nagios用于保存当前状态、注释和宕机信息。CGI模块也会用这个文件以通过Web接口来显示当前被监控的状态,CGI模块必须要有这个文件的读取权限以使工作正常。在Nagios停机或在重启动时将会删除并重建该文件。
这个变量设置了Nagios更新状态文件的速度(秒为单位),最小更新间隔是1秒。
该变量指定了Nagios进程使用哪个用户运行。当程序启动完成并开始监控对象之前,Nagios将切换自己的权限并使用该用户权限运行。你可以指定用户或是UID名。
该变量用于指定Nagios使用哪个用户组运行。当程序启动完成并开始监控对象之前,Nagios将切换自己的权限并以该用户组权限运行。你可以拽定用户组或GID名。
该选项决定了Nagios在初始化启动或重启动时是否要送出通知。如果这个选项不使能,Nagios将不会向任何主机或服务送出通知。注意,如果你打开了状态保持选项,Nagios在其启动和重启时将忽略此设置并用这个选项的最近的一个设置(已经保存在状态保持文件)的值来工作,除非你取消了use_retained_program_state选项。如果你想在使能状态保存选项(并且是use_retained_program_state使能)的情况下更改这个选项,你必须要通过合适的外部命令或是通过Web接口来修改它。选项的取值可以是:
这个选项指定了Nagios在初始的启动或重启时是否要执行服务检测。如果这个没有使能,Nagios将不会主动地执行任何服务的检测并且保持一系列的"静默"状态(它仍旧可以接收被动检测除非你已经将accept_passive_service_checks选项关闭)。这个选项经常用于备份被监控服务配置,被监控服务的配置备份在文档冗余安装或设置成一个分布式监控环境中有描述。注意:如果你已经使能了状态保持,Nagios在其启动或重启时将会忽略这个选项设置并使用和旧的设置值(旧值保存于状态保持文件),除非你关闭了use_retained_program_state选项。如果你想在状态保持使能(和use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
该选项决定了Nagios在其初始化启动或重启后是否要授受强制服务检测,如果它关闭了,Nagios将不会接受任何强制服务检测结果。注意:如果你已经使能了状态保持,Nagios在其启动或重启时将会忽略这个选项设置并使用和旧的设置值(旧值保存于状态保持文件),除非你关闭了use_retained_program_state选项。如果你想在状态保持使能(和use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
该选项将决定Nagios在初始地启动或重启时是否执行按需地和有规律规划检测。如果该选项不使能,那么Nagios将不会对任何主机进行检测,然而它仍旧可以接收强制主机检测结果除非你已经将accept_passive_host_checks选项关闭。该选项通常用于监控服务器的配置备份,详细信息请查看冗余安装的配置,或是用于设置一个分布式监控环境中。注意:如果你已经使能retain_state_information状态保持选项,Nagios将在启动和重启时使用旧的选项值(保存于state_retention_file状态保持文件中)而忽略此设置,除非你关闭了use_retained_program_state选项。如果你想在保持选项使能(且use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
该选项决定了在Nagios初始启动或重启后是否要接受强制主机检测结果。如果这个选项关闭,Nagios将不再接受任何强制主机检测结果。注意:如果你使能retain_state_information状态保持选项,Nagios将在启动或重启动时使用旧的选项设置(保存于state_retention_file状态保持文件中)而忽略这个设置。除非你已经关闭use_retained_program_state选项。如果你想在保持选项使能(且use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
该选项决定了在Nagios初始启动或重启后是否要运行事件处理,如果该选项关闭,Nagios将不做任何主机或服务的事件处理。注意:如果你使能retain_state_information状态保持选项(保存于state_retention_file状态保持文件中)而忽略这个设置,除非你已经关闭use_retained_program_state选项。如果你想在保持选项使能(且use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
该选项决定了你想让Nagios以何种方法回滚你的日志文件。可用的值有:
该选项将指定一个用于存放回滚日志文件的保存路径。如果没有使用日志回滚功能时会忽略此设置。
该选项决定了Nagios是否要检查存于命令文件里的将要执行的命令。这个选项在你计划通过Web接口来运行CGI命令时必须要打开它。更多的关于外部命令的信息可以查阅这份文档。
如果你指定了一个数字加一个"s"(如30s),那么外部检测命令的间隔是这个数值以秒为单位的时间间隔。如果没有用"s",那么外部检测命令的间隔是以这个数值的“时间单位”的时间间隔,除非你把interval_length的值(下面有说明)从默认60给更改了,这个值的意思是60s,即一分钟。
注意:将这个值设置为-1可令Nagios尽可能频繁地对外命令进行检测。在进行其他任务之前,Nagios每次都将会读入并处理保存于命令文件之中的全部命令以进行命令检查。更多的关于外部命令的信息可以查阅这份文档。
这是一个Nagios用于外部命令检测处理的文件,命令CGI程序模块将命令写入该文件,外部命令文件实现成一个命名管道(先入先出),在Nagios启动时创建它,并在关闭时删除它。如果在Nagios启动时该文件已经存在,那么Nagios会给出一个错误信息后中止。更多的关于外部命令的信息可以查阅这份文档。
注意:这是个高级特性。该选项决定了Nagios将使用多少缓冲队列来缓存外部命令,外部命令是从一个工作线程从外部命令文件将命令读入的,但这些外部命令还没有被Nagios的主守护程序处理。缓冲中的每个位置可以处理一个外部命令,所以这个选项决定了有多少命令可以被缓冲处理。为了对一个有大量被动检测系统(比如分布式系统安装)进行安装时,你可能需要降低这个值。你要考虑使用MRTG工具来绘制外部命令缓冲的利用率图表,如何配置绘制图表可阅读这篇文档。
该选项指定了Nagios在以守护态运行(以-d命令行参数运行)时在哪个位置上创建互锁文件。该文件包含有运行Nagios的进程id值(PID)。
该选项决定了Nagios是否要在程序的两次启动之间保存主机和服务的状态信息。如果你使能了这个选项,你应预先给出了state_retention_file变量的值,当选项使能时,Nagios将会在程序停止(或重启)时保存全部的主机和服务的状态信息并且会在启动时再次预读入保存的状态信息。
表 5.27. 状态保持文件
| 格式: | state_retention_file=<file_name> |
| 样例: | state_retention_file=/usr/local/nagios/var/retention.dat |
该文件用于在Nagios停止之前保存状态、停机时间和注释等信息。当Nagios重启时它会在开始监控工作之前使用保存于这个文件里的信息用于初始化主机与服务的状态。为使Nagios在程序的启动之间利用状态保持信息,你必须使能retain_state_information选项。
该选项决定了Nagios需要以什么频度(分钟为单位)在正常操作时自动地保存状态保持信息。如果你把这个值设置为0,Nagios将不会以规则的间隔保存状态保持数据,但是Nagios仍旧会在停机或重启之前做保存状态保持数据的工作。如果你关闭了状态保持功能(用retain_state_information选项设置),这个选项值将无效。
这个设置将决定了Nagios是否要使用保存于状态保持文件之中的值以更新程序范围内的变量状态。有些程序范围内的变量的状态将在程序重启时被保存于状态保持文件之中,包括enable_notifications、enable_flap_detection、enable_event_handlers、execute_service_checks和accept_passive_service_checks选项。如果你没有使用retain_state_information状态保持选项使能,这个选项将无效。
该选项决定Nagios在重启时是否要使用主机和服务的保持计划表信息(下次检测时间)。如果增加了很多数量(或很大百分比)的主机和服务,建议你在首次重启动Nagios时关闭选项,因为这个选项将会使初始检测误入歧途。其他情况下你可以要使能这个选项。
表 5.31. 保持主机和服务属性掩码
| 格式: | retained_host_attribute_mask=<number> retained_service_attribute_mask=<number> |
| 样例: | retained_host_attribute_mask=0 retained_service_attribute_mask=0 |
警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。
该选项决定了哪个主机和服务的属性在程序重启时不会被保留。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。
表 5.32. 保持进程属性掩码
| 格式: | retained_process_host_attribute_mask=<number> retained_process_service_attribute_mask=<number> |
| 样例: | retained_process_host_attribute_mask=0 retained_process_service_attribute_mask=0 |
警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。
该选项决定了哪个进程属性在程序重启时不会被保留。有两个属性掩码因为经常是主机和服务的进程属性可以分别被修改。例如,主机检测在程序层面上被关闭,而服务检测仍旧被打开。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。
表 5.33. 保持联系人属性掩码
| 格式: | retained_contact_host_attribute_mask=<number> retained_contact_service_attribute_mask=<number> |
| 样例: | retained_contact_host_attribute_mask=0 retained_contact_service_attribute_mask=0 |
警告:这是个高级特性。你需要读一下源程序以看清楚它是如何起效果的。
该选项决定了哪个联系人属性在程序重启时不会被保留。有两个属性掩码因为经常是主机和服务的联系人属性可以分别被修改。这些选项值是与指定的"MODATTR_"值进行按位与运算出的,MODATTR_在源程序的include/common.h里定义,默认情况下,全部主机和服务的属性都会被保持。
该选项决定了是否将日志信息记录到本地的Syslog中。可用的值有:
该选项决定了是否将通知信息记录进行记录,如果有很多联系人或是有规律性的服务故障时,记录文件将会增长很快。使用这个选项来保存已发出的通知记录。
该选项决定了是否将服务检测重试进行记录。服务检测重试发生在服务检测结果返回一个异常状态信息之时,而且你已经配置Nagios在对故障出现时进行一次以上的服务检测重试。此时有服务状态被认为是处理“软”故障状态。当调试Nagios或对服务的事件处理进行测试时记录下服务检测的重试是非常有用的。
该选项决定了是否将主机检测重试进行记录。当调试Nagios或对主机的事件处理进行测试时记录下主机检测的重试是非常有用的。
该选项决定了是否将服务和主机的事件处理进行记录。一旦发生服务或主机状态迁移时,可选的事件处理命令会被执行。当调试Nagios或首次尝试事件处理脚本时记录下事件处理是非常有用的。
该选项决定了Nagios是否要强行记录全部的主机和服务的初始状态,即便状态报告是OK也要记录。只是在第一次检测发现主机和服务有异常时才会记录下初始状态。如果想用应用程序扫描一段时间内的主机和服务状态以生成统计报告时,使能这个选项将有很有帮助。
该选项决定了Nagios是否要记录外部命令,外部命令是从command_file外部命令文件中提取的。注意:这个选项并不控制是否要对强制服务检测 (一种外部命令类型)进行记录。为使能或关闭对强制服务检测的记录,使用log_passive_checks强制检测记录选项。
该选项决定了Nagios是否要记录来自于command_file外部命令文件的强制主机和强制服务检测命令。如果要设置一个分布式监控环境或是计划在规整的基础上要对大量的强制检测的结果进行处理时,需要关闭这个选项以防止日志文件过份增长。
表 5.42. 全局主机事件处理选项
| 格式: | global_host_event_handler=<command> |
| 样例: | global_host_event_handler=log-host-event-to-db |
该选项指定了当每个主机状态迁移时需要执行的主机事件处理命令。全局事件处理命令将优于在每个主机定义的事件处理命令而立即执行。命令参数是在对象配置文件里定义的命令的短名称。由event_handler_timeout事件处理超时选项控制的这个命令可运行的最大次数。更多的有关事件处理的信息可以查阅这篇文档。
表 5.43. 全局服务事件处理选项
| 格式: | global_service_event_handler=<command> |
| 样例: | global_service_event_handler=log-service-event-to-db |
该选项指定了当每个服务状态迁移时需要执行的服务事件处理命令。全局事件处理命令将优于在每个服务定义的事件处理命令而立即执行。命令参数是在对象配置文件里定义的命令的短名称。由event_handler_timeout事件处理超时选项控制的这个命令可运行的最大次数。更多的有关事件处理的信息可以查阅这篇文档。
它指定了Nagios在进行计划表的下一次服务或主机检测命令执行之前应该休止多少秒。注意Nagios只是在已经进行了服务故障的排队检测之后才会休止。
表 5.45. 服务检测迟滞间隔计数方法
| 格式: | service_inter_check_delay_method=<n/d/s/x.xx> |
| 样例: | service_inter_check_delay_method=s |
该选项容许你控制服务检测将如何初始展开事件队列。 Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended, as it will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel. More information on how to estimate how the inter-check delay affects service check scheduling can be found here. Values are as follows:
This option determines the maximum number of minutes from when Nagios starts that all services (that are scheduled to be regularly checked) are checked. This option will automatically adjust the service_inter_check_delay_methodservice inter-check delay method (if necessary) to ensure that the initial checks of all services occur within the timeframe you specify. In general, this option will not have an affect on service check scheduling if scheduling information is being retained using the use_retained_scheduling_infouse_retained_scheduling_info option. 默认值是30分钟。
This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of Nagios previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it. The best way to understand how interleaving works is to watch the status CGI (detailed view) when Nagios is just starting. You should see that the service check results are spread out as they begin to appear. More information on how interleaving works can be found here.
该选项可指定在任意给定时间里可被同时运行的服务检测命令的最大数量。如果指定这个值为1,则说明不允许任何并行服务检测,如果指定为0(默认值)则是对并行服务检测。你须按照可运行Nagios的机器上的机器资源情况修改这个值,因为它会直接影响系统最大负荷,它施加于系统(处理器利用率、内存使用率等)之上。更多的关于如何评估需要设置多少并行检测值的信息可以查阅这篇文档。
表 5.49. 检测结果的回收频度
| 格式: | check_result_reaper_frequency=<frequency_in_seconds> |
| 样例: | check_result_reaper_frequency=5 |
该选项控制检测结果的回收事件的处理频度(以秒为单位)。从主机和服务的检测过程里“回收”事件处理结果将是对已经执行结束的检测。事件的构成在Nagios里是监控逻辑里的核心内容。
该选项决定主机和服务检测结果回收时对结果回收时间段的控制,这个值是个以秒为单位的最大时间跨度。从主机和服务的检测过程里“回收”事件处理结果将是对已经执行结束的检测。如果有许多结果要处理,回收事件过程将占用很长时间来完成它,这将延迟对新的主机和服务检测的执行。该选项可以限制从检测结果得到与回收处理之间的最大时间间隔以使Nagios可以完成对其他监控逻辑的转换处理。
该选项决定了Nagios将在处理检测结果之前使用哪个目录来保存主机和服务检测结果。这个目录不能保存其他文件,因为Nagios会周期性地清理这个目录下的旧文件(更多信息见max_check_result_file_age选项)。
注意:确保只有一个Nagios的实例在操作检测结果保存路径。如果有多个Nagios的实例来操作相同的目录,将会因为错误的Nagios实例不正确地处理导致有错误结果!
该选项决定用最大多少秒来限定那些在check_result_path设置所指向目录里的检测结果文件是合法的。如果检测结果文件超出了这个门限,Nagios将会把过旧的文件删除而且不会处理内含的检测结果。若设置该选项为0,Nagios将处理全部的检测结果文件-即便这些文件比你的硬件还老旧。
This option allows you to control how host checks that are scheduled to be checked on a regular basis are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all hosts out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended. Using no delay will cause all host checks to be scheduled for execution at the same time. More information on how to estimate how the inter-check delay affects host check scheduling can be found here.Values are as follows:
This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host_inter_check_delay_methodhost inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_infouse_retained_scheduling_info option. Default value is 30 (minutes).
该选项指定了“单位间隔”是多少秒数,单位间隔用于计数计划队列处理、再次通知等。单位间隔在对象配置文件被用于决定以何频度运行服务检测、以何频度与联系人再通知等。
重要:默认值是60,这说明在对象配置文件里设定的“单位间隔”是60秒(1分钟)。我没测试过其他值,所以如果要用其他值要自担风险!
该选项决定了Nagios是否要试图自动地进行计划的自主检测主机与服务以使在之后的时间里检测更为“平滑”。这可以使得监控主机保持一个均衡的负载,也使得在持续检测之间的保持相对一致,其代价是要更刚性地按计划执行检测工作。
WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THIS OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!
表 5.57. Auto-Rescheduling Interval
| 格式: | auto_rescheduling_interval=<seconds> |
| 样例: | auto_rescheduling_interval=30 |
This option determines how often (in seconds) Nagios will attempt to automatically reschedule checks. This option only has an effect if the auto_reschedule_checksauto_reschedule_checks option is enabled. Default is 30 seconds.
WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THE AUTO-RESCHEDULING OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!
表 5.58. Auto-Rescheduling Window
| 格式: | auto_rescheduling_window=<seconds> |
| 样例: | auto_rescheduling_window=180 |
This option determines the "window" of time (in seconds) that Nagios will look at when automatically rescheduling checks. Only host and service checks that occur in the next X seconds (determined by this variable) will be rescheduled. This option only has an effect if the auto_reschedule_checksauto_reschedule_checks option is enabled. Default is 180 seconds (3 minutes).
WARNING: THIS IS AN EXPERIMENTAL FEATURE AND MAY BE REMOVED IN FUTURE VERSIONS. ENABLING THE AUTO-RESCHEDULING OPTION CAN DEGRADE PERFORMANCE - RATHER THAN INCREASE IT - IF USED IMPROPERLY!
Nagios tries to be smart about how and when it checks the status of hosts. In general, disabling this option will allow Nagios to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. Unless you have problems with Nagios not recognizing that a host recovered, I would suggest not enabling this option.
This option determines whether or not Nagios will DOWN/UNREACHABLE passive host check results to their "correct" state from the viewpoint of the local Nagios instance. This can be very useful in distributed and failover monitoring installations. More information on passive check state translation can be found here.
表 5.61. Passive Host Checks Are SOFT Option
| 格式: | passive_host_checks_are_soft=<0/1> |
| 样例: | passive_host_checks_are_soft=1 |
This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.
表 5.62. Predictive Host Dependency Checks Option
| 格式: | enable_predictive_host_dependency_checks=<0/1> |
| 样例: | enable_predictive_host_dependency_checks=1 |
This option determines whether or not Nagios will execute predictive checks of hosts that are being dependended upon (as defined in host dependencies) for a particular host when it changes state.
Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found here.
表 5.63. Predictive Service Dependency Checks Option
| 格式: | enable_predictive_service_dependency_checks=<0/1> |
| 样例: | enable_predictive_service_dependency_checks=1 |
This option determines whether or not Nagios will execute predictive checks of services that are being dependended upon (as defined in service dependencies) for a particular service when it changes state.
Predictive checks help ensure that the dependency logic is as accurate as possible. More information on how predictive checks work can be found here.
表 5.64. Cached Host Check Horizon
| 格式: | cached_host_check_horizon=<seconds> |
| 样例: | cached_host_check_horizon=15 |
This option determines the maximum amount of time (in seconds) that the state of a previous host check is considered current. Cached host states (from host checks that were performed more recently than the time specified by this value) can improve host check performance immensely. Too high of a value for this option may result in (temporarily) inaccurate host states, while a low value may result in a performance hit for host checks. Use a value of 0 if you want to disable host check caching. More information on cached checks can be found here.
表 5.65. Cached Service Check Horizon
| 格式: | cached_service_check_horizon=<seconds> |
| 样例: | cached_service_check_horizon=15 |
This option determines the maximum amount of time (in seconds) that the state of a previous service check is considered current. Cached service states (from service checks that were performed more recently than the time specified by this value) can improve service check performance when a lot of service dependencies are used. Too high of a value for this option may result in inaccuracies in the service dependency logic. Use a value of 0 if you want to disable service check caching. More information on cached checks can be found here.
表 5.66. Large Installation Tweaks Option
| 格式: | use_large_installation_tweaks=<0/1> |
| 样例: | use_large_installation_tweaks=0 |
This option determines whether or not the Nagios daemon will take several shortcuts to improve performance. These shortcuts result in the loss of a few features, but larger installations will likely see a lot of benefit from doing so. More information on what optimizations are taken when you enable this option can be found here.
This option determines whether or not Nagios will free memory in child processes when they are fork()ed off from the main process. By default, Nagios frees memory. However, if the use_large_installation_tweaks option is enabled, it will not. By defining this option in your configuration file, you are able to override things to get the behavior you want.
This option determines whether or not Nagios will fork() child processes twice when it executes host and service checks. By default, Nagios fork()s twice. However, if the use_large_installation_tweaks option is enabled, it will only fork() once. By defining this option in your configuration file, you are able to override things to get the behavior you want.
This option determines whether or not the Nagios daemon will make all standard macros available as environment variables to your check, notification, event hander, etc. commands. In large Nagios installations this can be problematic because it takes additional memory and (more importantly) CPU to compute the values of all macros and make them available to the environment.
This option determines whether or not Nagios will try and detect hosts and services that are "flapping". Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When Nagios detects that a host or service is flapping, it will temporarily suppress notifications for that host/service until it stops flapping. Flap detection is very experimental at this point, so use this feature with caution! More information on how flap detection and handling works can be found here.注意:如果你使能retain_state_information状态保持选项(保存于state_retention_file状态保持文件中)而忽略这个设置,除非你已经关闭use_retained_program_state选项。如果你想在保持选项使能(且use_retained_program_state选项使能)的情况下修改这个选项,你只得用适当的外部命令或是通过Web接口来修改它。选项可用的值有:
表 5.71. Low Service Flap Threshold
| 格式: | low_service_flap_threshold=<percent> |
| 样例: | low_service_flap_threshold=25.0 |
This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read this.
表 5.72. High Service Flap Threshold
| 格式: | high_service_flap_threshold=<percent> |
| 样例: | high_service_flap_threshold=50.0 |
This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read this.
This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read this.
表 5.74. High Host Flap Threshold
| 格式: | high_host_flap_threshold=<percent> |
| 样例: | high_host_flap_threshold=50.0 |
This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read this.
This option determines whether or not Nagios will use soft state information when checking host and service dependencies. Normally Nagios will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard state type), enable this option.
This is the maximum number of seconds that Nagios will allow service checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned. A timeout error will also be logged.
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each service check normally finishes executing within this time limit. If a service check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
This is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a 紧急 state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
This is the maximum number of seconds that Nagios will allow event handlers to be run. If an event handler exceeds this time limit it will be killed and a warning will be logged.
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each event handler command normally finishes executing within this time limit. If an event handler runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
This is the maximum number of seconds that Nagios will allow notification commands to be run. If a notification command exceeds this time limit it will be killed and a warning will be logged.
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each notification command finishes executing within this time limit. If a notification command runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
This is the maximum number of seconds that Nagios will allow an ocsp_commandobsessive compulsive service processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.
This is the maximum number of seconds that Nagios will allow an ochp_commandobsessive compulsive host processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.
This is the maximum number of seconds that Nagios will allow a host_perfdata_commandhost performance data processor command or service_perfdata_commandservice performance data processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.
This value determines whether or not Nagios will "obsess" over service checks results and run the ocsp_commandobsessive compulsive service processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing distributed monitoring. If you're not doing distributed monitoring, don't enable this option.
表 5.84. Obsessive Compulsive Service Processor Command
| 格式: | ocsp_command=<command> |
| 样例: | ocsp_command=obsessive_service_handler |
This option allows you to specify a command to be run after every service check, which can be useful in distributed monitoring. This command is executed after any event handler or notification commands. The command argument is the short name of a command definition that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ocsp_timeoutocsp_timeout option. More information on distributed monitoring can be found here. This command is only executed if the obsess_over_servicesobsess_over_services option is enabled globally and if the obsess_over_service directive in the service definition is enabled.
This value determines whether or not Nagios will "obsess" over host checks results and run the ochp_commandobsessive compulsive host processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing distributed monitoring. If you're not doing distributed monitoring, don't enable this option.
表 5.86. Obsessive Compulsive Host Processor Command
| 格式: | ochp_command=<command> |
| 样例: | ochp_command=obsessive_host_handler |
This option allows you to specify a command to be run after every host check, which can be useful in distributed monitoring. This command is executed after any event handler or notification commands. The command argument is the short name of a command definition that you define in your 对象配置文件. The maximum amount of time that this command can run is controlled by the ochp_timeoutochp_timeout option. More information on distributed monitoring can be found here. This command is only executed if the obsess_over_hostsobsess_over_hosts option is enabled globally and if the obsess_over_host directive in the host definition is enabled.
该选项决定Nagios是否要处理主机和服务检测性能数据。
This option allows you to specify a command to be run after every host check to process host performance data that may be returned from the check. The command argument is the short name of a command definition that you define in your 对象配置文件. This command is only executed if the process_performance_dataprocess_performance_data option is enabled globally and if the process_perf_data directive in the host definition is enabled.
表 5.89. 服务性能数据处理命令
| 格式: | service_perfdata_command=<command> |
| 样例: | service_perfdata_command=process-service-perfdata |
This option allows you to specify a command to be run after every service check to process service performance data that may be returned from the check. The command argument is the short name of a command definition that you define in your 对象配置文件. This command is only executed if the process_performance_dataprocess_performance_data option is enabled globally and if the process_perf_data directive in the service definition is enabled.
表 5.90. 主机性能数据文件
| 格式: | host_perfdata_file=<file_name> |
| 样例: | host_perfdata_file=/usr/local/nagios/var/host-perfdata.dat |
This option allows you to specify a file to which host performance data will be written after every host check. Data will be written to the performance file as specified by the host_perfdata_file_templatehost_perfdata_file_template option. Performance data is only written to this file if the process_performance_dataprocess_performance_data option is enabled globally and if the process_perf_data directive in the host definition is enabled.
表 5.91. 服务性能数据文件
| 格式: | service_perfdata_file=<file_name> |
| 样例: | service_perfdata_file=/usr/local/nagios/var/service-perfdata.dat |
This option allows you to specify a file to which service performance data will be written after every service check. Data will be written to the performance file as specified by the service_perfdata_file_template option. Performance data is only written to this file if the process_performance_dataprocess_performance_data option is enabled globally and if the process_perf_data directive in the service definition is enabled.
表 5.92. 主机性能数据文件模板
| 格式: | host_perfdata_file_template=<template> |
| 样例: | host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$ \t$HOSTOUTPUT$\t$HOSTPERFDATA$ |
This option determines what (and how) data is written to the host_perfdata_filehost performance data file. The template may contain macros, special characters (\t for tab, \r for carriage return, \n for newline) and plain text. A newline is automatically added after each write to the performance data file.
表 5.93. 服务性能数据文件模板
| 格式: | service_perfdata_file_template=<template> |
| 样例: | service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t $SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$ |
This option determines what (and how) data is written to the service performance data file. The template may contain macros, special characters (\t for tab, \r for carriage return, \n for newline) and plain text. A newline is automatically added after each write to the performance data file.
This option determines how the host_perfdata_filehost performance data file is opened. Unless the file is a named pipe you'll probably want to use the default mode of append.
This option determines how the service performance data file is opened. Unless the file is a named pipe you'll probably want to use the default mode of append.
表 5.96. 主机性能数据文件处理间隔
| 格式: | host_perfdata_file_processing_interval=<seconds> |
| 样例: | host_perfdata_file_processing_interval=0 |
This option allows you to specify the interval (in seconds) at which the host_perfdata_filehost performance data file is processed using the host_perfdata_file_processing_commandhost performance data file processing command. A value of 0 indicates that the performance data file should not be processed at regular intervals.
表 5.97. 服务性能数据文件处理间隔
| 格式: | service_perfdata_file_processing_interval=<seconds> |
| 样例: | service_perfdata_file_processing_interval=0 |
This option allows you to specify the interval (in seconds) at which the service_perfdata_fileservice performance data file is processed using the service_perfdata_file_processing_commandservice performance data file processing command. A value of 0 indicates that the performance data file should not be processed at regular intervals.
表 5.98. 主机性能数据文件处理命令
| 格式: | host_perfdata_file_processing_command=<command> |
| 样例: | host_perfdata_file_processing_command=process-host-perfdata-file |
This option allows you to specify the command that should be executed to process the host_perfdata_filehost performance data file. The command argument is the short name of a command definition that you define in your 对象配置文件. The interval at which this command is executed is determined by the host_perfdata_file_processing_intervalhost_perfdata_file_processing_interval directive.
表 5.99. 服务性能数据文件处理命令
| 格式: | service_perfdata_file_processing_command=<command> |
| 样例: | service_perfdata_file_processing_command=process-service-perfdata-file |
This option allows you to specify the command that should be executed to process the service_perfdata_fileservice performance data file. The command argument is the short name of a command definition that you define in your 对象配置文件. The interval at which this command is executed is determined by the service_perfdata_file_processing_intervalservice_perfdata_file_processing_interval directive.
This option allows you to enable or disable checks for orphaned service checks. Orphaned service checks are checks which have been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the service, it is not rescheduled in the event queue. This can cause service checks to stop being executed. Normally it is very rare for this to happen - it might happen if an external user or process killed off the process that was being used to execute a service check. If this option is enabled and Nagios finds that results for a particular service check have not come back, it will log an error message and reschedule the service check. If you start seeing service checks that never seem to get rescheduled, enable this option and see if you notice any log messages about orphaned services.
This option allows you to enable or disable checks for orphaned hoste checks. Orphaned host checks are checks which have been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the host, it is not rescheduled in the event queue. This can cause host checks to stop being executed. Normally it is very rare for this to happen - it might happen if an external user or process killed off the process that was being used to execute a host check. If this option is enabled and Nagios finds that results for a particular host check have not come back, it will log an error message and reschedule the host check. If you start seeing host checks that never seem to get rescheduled, enable this option and see if you notice any log messages about orphaned hosts.
This option determines whether or not Nagios will periodically check the "freshness" of service checks. Enabling this option is useful for helping to ensure that passive service checks are received in a timely manner. More information on freshness checking can be found here.
表 5.103. 服务更新检测间隔
| 格式: | service_freshness_check_interval=<seconds> |
| 样例: | service_freshness_check_interval=60 |
This setting determines how often (in seconds) Nagios will periodically check the "freshness" of service check results. If you have disabled service freshness checking (with the check_service_freshnesscheck_service_freshness option), this option has no effect. More information on freshness checking can be found here.
This option determines whether or not Nagios will periodically check the "freshness" of host checks. Enabling this option is useful for helping to ensure that passive host checks are received in a timely manner. More information on freshness checking can be found here.
This setting determines how often (in seconds) Nagios will periodically check the "freshness" of host check results. If you have disabled host freshness checking (with the check_host_freshnesscheck_host_freshness option), this option has no effect. More information on freshness checking can be found here.
表 5.106. Additional Freshness Threshold Latency Option
| 格式: | additional_freshness_latency=<#> |
| 样例: | additional_freshness_latency=15 |
This option determines the number of seconds Nagios will add to any host or services freshness threshold it automatically calculates (e.g. those not specified explicity by the user). More information on freshness checking can be found here.
This setting determines whether or not the embedded Perl interpreter is enabled on a program-wide basis. Nagios must be compiled with support for embedded Perl for this option to have an effect. More information on the embedded Perl interpreter can be found here.
表 5.108. Embedded Perl Implicit Use Option
| 格式: | use_embedded_perl_implicitly=<0/1> |
| 样例: | use_embedded_perl_implicitly=1 |
This setting determines whether or not the embedded Perl interpreter should be used for Perl plugins/scripts that do not explicitly enable/disable it. Nagios must be compiled with support for embedded Perl for this option to have an effect. More information on the embedded Perl interpreter and the effect of this setting can be found here.
This option allows you to specify what kind of date/time format Nagios should use in the web interface and date/time macros. Possible options (along with example output) include:
表 5.110.
| 选项 | 输出格式 | 输出样例 |
|---|---|---|
| us | MM/DD/YYYY HH:MM:SS | 06/30/2002 03:15:00 |
| euro | DD/MM/YYYY HH:MM:SS | 30/06/2002 03:15:00 |
| iso8601 | YYYY-MM-DD HH:MM:SS | 2002-06-30 03:15:00 |
| strict-iso8601 | YYYY-MM-DDTHH:MM:SS | 2002-06-30T03:15:00 |
This option allows you to override the default timezone that this instance of Nagios runs in. Useful if you have multiple instances of Nagios that need to run from the same server, but have different local times associated with them. If not specified, Nagios will use the system configured timezone.

Note: If you use this option to specify a custom timezone, you will also need to alter the Apache configuration directives for the CGIs to specify the timezone you want. Example:
<Directory "/usr/local/nagios/sbin/">
SetEnv TZ "US/Mountain"
...
</Directory>
表 5.112. 非法对象名字符
| 格式: | illegal_object_name_chars=<chars...> |
| 样例: | illegal_object_name_chars=`~!$%^&*"|'<>?,()= |
This option allows you to specify illegal characters that cannot be used in host names, service descriptions, or names of other object types. Nagios will allow you to use most characters in object definitions, but I recommend not using the characters shown in the example above. Doing may give you problems in the web interface, notification commands, etc.
This option allows you to specify illegal characters that should be stripped from macros before being used in notifications, event handlers, and other commands. This DOES NOT affect macros used in service or host check commands. You can choose to not strip out the characters shown in the example above, but I recommend you do not do this. Some of these characters are interpreted by the shell (i.e. the backtick) and can lead to security problems. The following macros are stripped of the characters you specify:
$HOSTOUTPUT$, $HOSTPERFDATA$, $HOSTACKAUTHOR$, $HOSTACKCOMMENT$, $SERVICEOUTPUT$, $SERVICEPERFDATA$, $SERVICEACKAUTHOR$, and $SERVICEACKCOMMENT$
This option determines whether or not various directives in your 对象定义 will be processed as regular expressions. More information on how this works can be found here.
表 5.115. True Regular Expression Matching Option
| 格式: | use_true_regexp_matching=<0/1> |
| 样例: | use_true_regexp_matching=0 |
If you've enabled regular expression matching of various object directives using the use_regexp_matching option, this option will determine when object directives are treated as regular expressions. If this option is disabled (the default), directives will only be treated as regular expressions if the contain *, ?, +, or \.. If this option is enabled, all appropriate directives will be treated as regular expression - be careful when enabling this! More information on how this works can be found here.
This is the email address for the administrator of the local machine (i.e. the one that Nagios is running on). This value can be used in notification commands by using the $ADMINEMAIL$macro.
表 5.117. 管理员BP机帐号
| 格式: | admin_pager=<pager_number_or_pager_email_gateway> |
| 样例: | admin_pager=pageroot@localhost.localdomain |
This is the pager number (or pager email gateway) for the administrator of the local machine (i.e. the one that Nagios is running on). The pager number/address can be used in notification commands by using the $ADMINPAGER$macro.
This option controls what (if any) data gets sent to the event broker and, in turn, to any loaded event broker modules. This is an advanced option. When in doubt, either broker nothing (if not using event broker modules) or broker everything (if using event broker modules). Possible values are shown below.
表 5.119. Event Broker Modules
| 格式: | broker_module=<modulepath> [moduleargs] |
| 样例: | broker_module=/usr/local/nagios/bin/ndomod.o cfg_file=/usr/local/nagios/etc/ndomod.cfg |
This directive is used to specify an event broker module that should by loaded by Nagios at startup. Use multiple directives if you want to load more than one module. Arguments that should be passed to the module at startup are seperated from the module path by a space.
!!! WARNING !!!
Do NOT overwrite modules while they are being used by Nagios or Nagios will crash in a fiery display of SEGFAULT glory. This is a bug/limitation either in dlopen(), the kernel, and/or the filesystem. And maybe Nagios...
The correct/safe way of updating a module is by using one of these methods:
This option determines where Nagios should write debugging information. What (if any) information is written is determined by the debug_level and debug_verbosity options. You can have Nagios automaticaly rotate the debug file when it reaches a certain size by using the max_debug_file_size option.
该选项决定Nagios将往debug_file文件里写入什么调试信息。下面值是可以逻辑或关系:
This option determines how much debugging information Nagios should write to the debug_filedebug_file.
该选项定义了以字节为单位的debug_file调试文件最大长度。如果文件增至大于该值,将会自动被命名为.old扩展名的文件,如果.old扩展名已经存在,那么旧.old文件将被删除。这可以保证在Nagios调试时磁盘空间不会过多占用而失控。
对象可以在一个用柔性化模板样式来定义,模板可使得对Nagios的配置管理更为容易,有关如果进行对象定义的基本信息可以查阅这篇文件。
一旦熟悉了如何进行对象定义的基础,需要阅读对象继承以在将来应用中配置更为鲁棒(就是尽量使用对象继承关系啦)。经验丰富的使用者可以在对象定义决窍一文中发掘到一些有关对象定义的高级特性.
关于对象的解释
下面在一些主要的对象的解释...
主机是监控逻辑中的核心对象之一。主机的重要属性有:
主机组是一台或多台主机组成的组。主机成组可以如下工作更简单(1)在Nagios的Web接口里查看相关的主机状态(2)使用对象定义决窍来简化配置。

服务监控逻辑中的一个核心对象之一。在主机上的服务用户可以:
服务组是一个或多个服务组成的组。服务组可以对如下工作更简单(1)在Nagios的Web接口里查看相关的服务状态(2)使用对象定义决窍来简化配置。

联系人是那些涉及到通知过程中的人:
联系人组是一个或多个联系人组成的组。联系人组可以简化在主机或服务故障时负责的人员划分。

时间周期用于控制:
时间段时如何工作的信息可以查阅这篇文档。

命令是指出Nagios用哪个程序、脚本等,它必须可执行后完成:

当创建或编辑配置文件时,要遵守如下要求:
默认情况下,Nagios期望的CGI配置文件被命名为cgi.cfg并且该配置文件被放在了主配置文件指定的位置。如果你想改变名称和位置,你可以在Apache里配置一个环境变量叫做NAGIO_CGI_CONFIG的(里面设置好文件名和位置)给CGI程序用。如何来做可以查看Apache文档里的说明。
下面将给出每个主配置文件里的变量与值选项说明...
表 5.124. 主配置文件的位置
| 格式: | main_config_file=<file_name> |
| 举例: | main_config_file=/usr/local/nagios/etc/nagios.cfg |
它用于指向主配置文件所在的位置。CGI模块需要知道在哪里可以得到主配置文件以取得配置信息、当前的主机和服务的状态等。
它用于指明用于服务器或工作站上的HTML文件所在的系统路径。Nagios假定文档和图片文件被分别放在了docs/和images/两个子目录下。
如果通过Web浏览器来操作Nagios,你要通过一个URL如http://www.myhost.com/nagios来操作的话,则需要设置为/nagios。一般是用这个URL来操作Nagios的HTML页面。
该选项控制着CGI模块里,对于用户操作或是取得信息时是否需要打开认证和授权功能。如果你断定你不使用认证,一定要把CGI命令移走以免没有授权的用户发出Nagios命令。如果不使用认证功能,CGI模块不会向Nagios发出命令,但我同时也建议你也把CGI模块同时移到安全位置。更多的有关设置认证与授权的内容可以查看这个文件。
用这个变量可以设置一个默认的用户来操作CGI程序。它可以在一个加密的域里(如在防火墙后建立的WEB)不需要WEB认证就可以操作CGI模块。你可能需要这个功能来避免仅仅在一个非加密的服务器上(通过因特网以明文方式来传递你的口令)来做基本的认证。
Important:除非你是在一个加密的WEB服务器上并且保证每个进入该域的用户都具备CGI操作权,否则的话,你不要定义这个默认用户。如果你决定用它,那么任何一个未经认证的WEB服务器用户都可以继承你设定的全部权限!
表 5.129. 系统和进程的信息操作权
| 格式: | authorized_for_system_information=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_system_information=nagiosadmin,theboss |
这是一个以逗号分陋的列表,列举出了在扩展CGI信息里查看系统和进程信息的可认证用户。在列表中列出的用户并不会自动被授权可发出系统和进程的命令。如果你想也同时可以发出系统和进程命令,你必须把这些用户也加到authorized_for_system_commands变量之中。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.130. 系统和进程的命令操作权
| 格式: | authorized_for_system_commands=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_system_commands=nagiosadmin |
这是一个以逗号分隔的列表,列出了可以通过CGI命令发出系统和进程命令的被认证用户。在列表中的用户并没有被自动授权查看系统和进程的信息。如果你想让用户也同时可以查看系统和进程信息的话,你必须把这些用户也加到authorized_for_system_information变量里面。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.131. 配置的信息获取权限
| 格式: | authorized_for_configuration_information=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_configuration_information=nagiosadmin |
这是一个以逗号分隔的列表,列出了可以通过配置查看CGI里查看配置信息的可认证用户。这些列表中的用户可以查看全部的配置好的主机、主机组、服务、联系人、联系人组等的配置信息。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.132. 全局主机的信息获取权限
| 格式: | authorized_for_all_hosts=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_all_hosts=nagiosadmin,theboss |
这是一个以逗号分隔的列表,列出了可以查看全部主机的状态和配置信息的被认证用户。这些列表中的用户同时被授权查看在全部的服务信息。但列表中的用户并没有自动地授权向全部的主机或服务发出命令。如果你想让这些用户同时可以向全部主机和服务发出命令,你必须将用户加入到authorized_for_all_host_commands变量里。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.133. 全局主机的命令操作权
| 格式: | authorized_for_all_host_commands=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_all_host_commands=nagiosadmin |
这是一个以逗号分隔的列表,列出了可以通过命令CGI功能模块向全部主机发出命令的被授权用户。列表中的用户同时自动地被授权可以向全部服务发出命令。但列表中的用户并没有自动地授权可以查看全部的主机或服务的状态和配置信息,如果你想让用户同样可以查看状态和配置信息,你需要将用户加入到authorized_for_all_hosts变量之中。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.134. 全局服务的信息获取权
| 格式: | authorized_for_all_services=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_all_services=nagiosadmin,theboss |
这是一个以逗号分隔的列表,列出了可以查看全部服务的状态和配置的被授权用户。但列表中的用户并没有自动地授权可以查看全部主机的信息。列表中的用户并没有自动地授权向全部服务发送命令。如果你想让这些用户也同样可以发全部服务发送命令,你必须将这些用户加入到authorized_for_all_service_commands变量之中。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
表 5.135. 全局服务的命令操作权
| 格式: | authorized_for_all_service_commands=<user1>,<user2>,<user3>,...<usern> |
| 举例: | authorized_for_all_service_commands=nagiosadmin |
这是一个以逗号分隔的列表,列出了可以通过命令CGI来向全部服务发送命令的被授权用户。但列表中的用户并没有自动地授权向全部主机发送命令。列表中的用户也没有自动地授权查看全部主机的状态和配置信息。如果你想让这些用户同样可以查年全部服务的状态和服务的信息,你必须把这些用户加入到authorized_for_all_services变量中。更多的如何给CGI模块设置认证和配置授权的内容可以查阅这个文档。
该选项将使用WEB接口时在提交注释、做内容确认和制订宕机计划等操作时限制修改已经他们的动作提交者的名字。如果该选项使能,那么用户在做这些进行命令时将不能修改发出操作者的名字。
表 5.137. 网络拓扑图的背景图设置
| 格式: | statusmap_background_image=<image_file> |
| 举例: | statusmap_background_image=smbackground.gd2 |
该选项将让你可以在使用网络拓扑图时可以指定一个图形文件做为背景图,如果你选择了使用用户定义坐标来绘制的二维网络拓扑图的话。该背景图文件将不能为其他绘制方式提供背景。它假定这个文件是放在图像文件的路径里了(如/usr/local/nagios/share/images)。该路径将自动地在physical_html_path域之后加上"/images"生成路径。注意,这个图像文件的格式可以是GIF、JPEG、PNG或GD2格式。而推荐是GD2格式的文件,因为它可以在生成二维图时降低CPU负荷。
这个选项将让你指定出网络拓扑图CGI的默认绘制方式,可用的选项值有:
表 5.139. Statusmap的<layout_number>取值
| Value | Layout Method |
|---|---|
| 0 | 用户定义坐标系 |
| 1 | 深度图 |
| 2 | 树形折叠图 |
| 3 | 平衡权图 |
| 4 | 圆形图 |
| 5 | 圆形图(出标记的) |
| 6 | 圆形图(气泡式) |
这个选项将让你指定一个你的对象实体在哪个三维空间的容纳器里展现。它默认是文件已经存放在指定的路径下了,该路径由physical_html_path域来指定。注意,这个文件必须是合格的虚拟现实建模(VRML)文件(如你可以在它的专用浏览器里可以查看它)。
该选项让你指定在三维空间图里对象的三维空间坐标的生成算法。可用的选项值有:
该选项将让你指定以秒为单位的对于CGI模块刷新的周期,CGI模块有状态列表、二维拓扑图和扩展信息等CGI模块。
表 5.144. 声音报警
| 格式: | host_unreachable_sound=<sound_file> host_down_sound=<sound_file> service_critical_sound=<sound_file> service_warning_sound=<sound_file> service_unknown_sound=<sound_file> |
| 举例: | host_unreachable_sound=hostu.wav host_down_sound=hostd.wav service_critical_sound=critical.wav service_warning_sound=warning.wav service_unknown_sound=unknown.wav |
这个选项将让你指定在查看状态列表时如果有故障发生,你的浏览器里将发出哪个声音文件。如果有故障将按指定的临界故障类型来播放不同的声音文件。这些临界的故障类型是一个或多个主机不可达,至少是一个或多个服务处于未知的状态(见上例中的次序)。声音文件将假定你放在了HTML目录的"media/"子目录里(如/usr/local/nagios/share/media)。
这个选项给出了当从WAP接口(使用statuswml CGI)做PING一个主机操作时的PING的语法。你必须给出包含全路径名的PING的执行文件及全部参数的命令行。命令中使用$HOSTADDRESS$宏来预指定在命令执行前对哪个地址替换并执行PING检测。
这个选项将决定是否在主机和服务(插件)的检测输出中包含使用HTML的扩展选项。如果你使能了它,你的插件将不能使用可点击的超链接标记。
这个选项决定了你的注释URL必须要显示的URL目标。合法的选项内容包括_blank、_self、_top、_parent或是其他合法目标的名字。
这个选项给定了框内对象的动作里显示的动作URL的目标。合法的选项值包括_blank、_self、_top、_parent或是任何其他合法目标名字。
这个选项决定了在WEB接口里与Splunk集成功能是否集成。如果使能它,你页面中将在许多地方呈现出"Splunk It"的链接,CGI模块页面(日志文件、告警历史、主机和服务的详细信息等)里都有。如果你想对特别的故障发生想知道原诿时很有用。更多关于Splunk的信息请访问http://www.splunk.com/。
这个选项设置了指向Splunk网站的URL。在enable_splunk_integration使能时这个URL被CGI模块用于指向Splunk。
Nagios对象格式的一个特点是可以创建上下继承关系的对象定义。一个如何实现对象继承关系的解释可查阅这篇文档。强烈建议你在阅读过下面内容后要再熟悉一下继承关系,因为它将使对象定义创建和维护变得更为容易,同样,还得阅读对象定义决窍一文以使一些冗长定义任务变得简短。
需要着重指出一点,当修改了配置文件时有几个在主机、服务和联系人定义里的域值不会清除。有这种特性的对象域在下面被标记了星号(*)。这个原因是由于Nagios会将一些对象域值会用保存在状态保持文件里的值来覆盖配置文件,前提是配置了对程序内容全面地状态保持选项使能并且域里的值在运行时被外部命令修改过。
绕过这个问题的一个方法是将非状态信息的保持选项关闭掉,在主机、服务和联系人对象定义里用retain_nonstatus_information选项开关。关掉这个选项后会令Nagios在重启动时使用配置文件里给出的域值而不是从状态保持文件中取值。
描述:
主机被定义为存在于网络中的一个物理服务器、工作站或设备等。
定义格式:
标记了(*)的域是必备的而黑色是可选的。
define host{ host_name host_name(*) alias alias(*) display_name display_name address address(*) parents host_names hostgroups hostgroup_names check_command command_name initial_state [o,d,u] max_check_attempts #(*) check_interval # retry_interval # active_checks_enabled [0/1] passive_checks_enabled [0/1] check_period timeperiod_name(*) obsess_over_host [0/1] check_freshness [0/1] freshness_threshold # event_handler command_name event_handler_enabled [0/1] low_flap_threshold # high_flap_threshold # flap_detection_enabled [0/1] flap_detection_options [o,d,u] process_perf_data [0/1] retain_status_information [0/1] retain_nonstatus_information [0/1] contacts contacts(*) contact_groups contact_groups(*) notification_interval #(*) first_notification_delay # notification_period timeperiod_name(*) notification_options [d,u,r,f,s] notifications_enabled [0/1] stalking_options [o,d,u] notes note_string notes_url url action_url url icon_image image_file icon_image_alt alt_string vrml_image image_file statusmap_image image_file 2d_coords x_coord,y_coord 3d_coords x_coord,y_coord,z_coord ... }
定义样例:
define host{ host_name bogus-router alias Bogus Router #1 address 192.168.1.254 parents server-backbone check_command check-host-alive check_interval 5 retry_interval 1 max_check_attempts 5 check_period 24x7 process_perf_data 0 retain_nonstatus_information 0 contact_groups router-admins notification_interval 30 notification_period 24x7 notification_options d,u,r }
域描述:
host_name: This directive is used to define a short name used to identify the host. It is used in host group and service definitions to reference this particular host. Hosts can have multiple services (which are monitored) associated with them. When used properly, the $HOSTNAME$ macro will contain this short name.
alias: This directive is used to define a longer name or description used to identify the host. It is provided in order to allow you to more easily identify a particular host. When used properly, the $HOSTALIAS$ macro will contain this alias/description.
address: This directive is used to define the address of the host. Normally, this is an IP address, although it could really be anything you want (so long as it can be used to check the status of the host). You can use a FQDN to identify the host instead of an IP address, but if DNS services are not availble this could cause problems. When used properly, the $HOSTADDRESS$ macro will contain this address. Note: If you do not specify an address directive in a host definition, the name of the host will be used as its address. A word of caution about doing this, however - if DNS fails, most of your service checks will fail because the plugins will be unable to resolve the host name.
display_name: This directive is used to define an alternate name that should be displayed in the web interface for this host. If not specified, this defaults to the value you specify for the host_name directive. Note: The current CGIs do not use this option, although future versions of the web interface will.
parents: This directive is used to define a comma-delimited list of short names of the "parent" hosts for this particular host. Parent hosts are typically routers, switches, firewalls, etc. that lie between the monitoring host and a remote hosts. A router, switch, etc. which is closest to the remote host is considered to be that host's "parent". Read the "Determining Status and Reachability of Network Hosts" document located here for more information. If this host is on the same network segment as the host doing the monitoring (without any intermediate routers, etc.) the host is considered to be on the local network and will not have a parent host. Leave this value blank if the host does not have a parent host (i.e. it is on the same segment as the Nagios host). The order in which you specify parent hosts has no effect on how things are monitored.
hostgroups: This directive is used to identify the short name(s) of the hostgroup(s) that the host belongs to. Multiple hostgroups should be separated by commas. This directive may be used as an alternative to (or in addition to) using the members directive in hostgroup definitions.
check_command: This directive is used to specify the short name of the command that should be used to check if the host is up or down. Typically, this command would try and ping the host to see if it is "alive". The command must return a status of OK (0) or Nagios will assume the host is down. If you leave this argument blank, the host will not be actively checked. Thus, Nagios will likely always assume the host is up (it may show up as being in a "PENDING" state in the web interface). This is useful if you are monitoring printers or other devices that are frequently turned off. The maximum amount of time that the notification command can run is controlled by the host_check_timeout option.
initial_state: By default Nagios will assume that all hosts are in UP states when in starts. You can override the initial state for a host by using this directive. Valid options are: o = UP, d = DOWN, and u = UNREACHABLE.
max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check again. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the check_command option blank.
check_interval: This directive is used to define the number of "time units" between regularly scheduled checks of the host. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
retry_interval: This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when the have changed to a non-UP state. Once the host has been retried max_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
active_checks_enabled **: This directive is used to determine whether or not active checks (either regularly scheduled or on-demand) of this host are enabled. Values: 0 = disable active host checks, 1 = enable active host checks.
passive_checks_enabled **: This directive is used to determine whether or not passive checks are enabled for this host. Values: 0 = disable passive host checks, 1 = enable passive host checks.
check_period: This directive is used to specify the short name of the time period during which active checks of this host can be made.
obsess_over_host **: This directive determines whether or not checks for the host will be "obsessed" over using the ochp_command.
check_freshness **: This directive is used to determine whether or not freshness checks are enabled for this host. Values: 0 = disable freshness checks, 1 = enable freshness checks.
freshness_threshold: This directive is used to specify the freshness threshold (in seconds) for this host. If you set this directive to a value of 0, Nagios will determine a freshness threshold to use automatically.
event_handler: This directive is used to specify the short name of the command that should be run whenever a change in the state of the host is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. The maximum amount of time that the event handler command can run is controlled by the event_handler_timeout option.
event_handler_enabled **: This directive is used to determine whether or not the event handler for this host is enabled. Values: 0 = disable host event handler, 1 = enable host event handler.
low_flap_threshold: This directive is used to specify the low state change threshold used in flap detection for this host. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the low_host_flap_threshold directive will be used.
high_flap_threshold: This directive is used to specify the high state change threshold used in flap detection for this host. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the high_host_flap_threshold directive will be used.
flap_detection_enabled **: This directive is used to determine whether or not flap detection is enabled for this host. More information on flap detection can be found here. Values: 0 = disable host flap detection, 1 = enable host flap detection.
flap_detection_options: This directive is used to determine what host states the flap detection logic will use for this host. Valid options are a combination of one or more of the following: o = UP states, d = DOWN states, u = UNREACHABLE states.
process_perf_data **: This directive is used to determine whether or not the processing of performance data is enabled for this host. Values: 0 = disable performance data processing, 1 = enable performance data processing.
retain_status_information: This directive is used to determine whether or not status-related information about the host is retained across program restarts. This is only useful if you have enabled state retention using the retain_state_information directive. Value: 0 = disable status information retention, 1 = enable status information retention.
retain_nonstatus_information: This directive is used to determine whether or not non-status information about the host is retained across program restarts. This is only useful if you have enabled state retention using the retain_state_information directive. Value: 0 = disable non-status information retention, 1 = enable non-status information retention.
contacts: This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this host. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don't want to configure contact groups. You must specify at least one contact or contact group in each host definition.
contact_groups: This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with this host. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each host definition.
notification_interval: This directive is used to define the number of "time units" to wait before re-notifying a contact that this server is still down or unreachable. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this host - only one problem notification will be sent out.
first_notification_delay: This directive is used to define the number of "time units" to wait before sending out the first problem notification when this host enters a non-UP state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will start sending out notifications immediately.
notification_period: This directive is used to specify the short name of the time period during which notifications of events for this host can be sent out to contacts. If a host goes down, becomes unreachable, or recoveries during a time which is not covered by the time period, no notifications will be sent out.
notification_options: This directive is used to determine when notifications for the host should be sent out. Valid options are a combination of one or more of the following: d = send notifications on a DOWN state, u = send notifications on an UNREACHABLE state, r = send notifications on recoveries (OK state), f = send notifications when the host starts and stops flapping, and s = send notifications when scheduled downtime starts and ends. If you specify n (none) as an option, no host notifications will be sent out. If you do not specify any notification options, Nagios will assume that you want notifications to be sent out for all possible states. Example: If you specify d,r in this field, notifications will only be sent out when the host goes DOWN and when it recovers from a DOWN state.
notifications_enabled **: This directive is used to determine whether or not notifications for this host are enabled. Values: 0 = disable host notifications, 1 = enable host notifications.
stalking_options: This directive determines which host states "stalking" is enabled for. Valid options are a combination of one or more of the following: o = stalk on UP states, d = stalk on DOWN states, and u = stalk on UNREACHABLE states. More information on state stalking can be found here.
notes: This directive is used to define an optional string of notes pertaining to the host. If you specify a note here, you will see the it in the extended information CGI (when you are viewing information about the specified host).
notes_url: This variable is used to define an optional URL that can be used to provide more information about the host. If you specify an URL, you will see a red folder icon in the CGIs (when you are viewing host information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). This can be very useful if you want to make detailed information on the host, emergency contact methods, etc. available to other support staff.
action_url: This directive is used to define an optional URL that can be used to provide more actions to be performed on the host. If you specify an URL, you will see a red "splat" icon in the CGIs (when you are viewing host information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/).
icon_image: This variable is used to define the name of a GIF, PNG, or JPG image that should be associated with this host. This image will be displayed in the various places in the CGIs. The image will look best if it is 40x40 pixels in size. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos).
icon_image_alt: This variable is used to define an optional string that is used in the ALT tag of the image specified by the <icon_image> argument.
vrml_image: This variable is used to define the name of a GIF, PNG, or JPG image that should be associated with this host. This image will be used as the texture map for the specified host in the statuswrl CGI. Unlike the image you use for the <icon_image> variable, this one should probably not have any transparency. If it does, the host object will look a bit wierd. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos).
statusmap_image: This variable is used to define the name of an image that should be associated with this host in the statusmap CGI. You can specify a JPEG, PNG, and GIF image if you want, although I would strongly suggest using a GD2 format image, as other image formats will result in a lot of wasted CPU time when the statusmap image is generated. GD2 images can be created from PNG images by using the pngtogd2 utility supplied with Thomas Boutell's gd library. The GD2 images should be created in uncompressed format in order to minimize CPU load when the statusmap CGI is generating the network map image. The image will look best if it is 40x40 pixels in size. You can leave these option blank if you are not using the statusmap CGI. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos).
2d_coords: This variable is used to define coordinates to use when drawing the host in the statusmap CGI. Coordinates should be given in positive integers, as the correspond to physical pixels in the generated image. The origin for drawing (0,0) is in the upper left hand corner of the image and extends in the positive x direction (to the right) along the top of the image and in the positive y direction (down) along the left hand side of the image. For reference, the size of the icons drawn is usually about 40x40 pixels (text takes a little extra space). The coordinates you specify here are for the upper left hand corner of the host icon that is drawn. Note: Don't worry about what the maximum x and y coordinates that you can use are. The CGI will automatically calculate the maximum dimensions of the image it creates based on the largest x and y coordinates you specify.
3d_coords: This variable is used to define coordinates to use when drawing the host in the statuswrl CGI. Coordinates can be positive or negative real numbers. The origin for drawing is (0.0,0.0,0.0). For reference, the size of the host cubes drawn is 0.5 units on each side (text takes a little more space). The coordinates you specify here are used as the center of the host cube.
描述:
主机组是指一台或多台主机构成的组,可使配置更简单或是为完成特定目的而在CGI里显示使用。
定义格式:
标记了(*)的域是必备的而黑色是可选的。
define hostgroup{ hostgroup_name hostgroup_name(*) alias alias(*) members hosts hostgroup_members hostgroups notes note_string notes_url url action_url url ... }
定义样例:
define hostgroup{ hostgroup_name novell-servers alias Novell Servers members netware1,netware2,netware3,netware4 }
域描述:
hostgroup_name: This directive is used to define a short name used to identify the host group.
alias: This directive is used to define is a longer name or description used to identify the host group. It is provided in order to allow you to more easily identify a particular host group.
members: This is a list of the short names of hosts that should be included in this group. Multiple host names should be separated by commas. This directive may be used as an alternative to (or in addition to) the hostgroups directive in host definitions.
hostgroup_members: This optional directive can be used to include hosts from other "sub" host groups in this host group. Specify a comma-delimited list of short names of other host groups whose members should be included in this group.
notes: This directive is used to define an optional string of notes pertaining to the host. If you specify a note here, you will see the it in the extended information CGI (when you are viewing information about the specified host).
notes_url: This variable is used to define an optional URL that can be used to provide more information about the host group. If you specify an URL, you will see a red folder icon in the CGIs (when you are viewing hostgroup information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). This can be very useful if you want to make detailed information on the host group, emergency contact methods, etc. available to other support staff.
action_url: This directive is used to define an optional URL that can be used to provide more actions to be performed on the host group. If you specify an URL, you will see a red "splat" icon in the CGIs (when you are viewing hostgroup information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/).
描述:
服务定义为在主机上运行的某种“应用服务”。这种服务定义得非常宽泛,可以是在主机上实际的服务进程(POP3、SMTP、HTTP等)或是与主机有关的某种计量值(PING响应值、在线用户数、磁盘空闲空间等),其中的差异见下面的说明。
定义格式:
标记了(*)的域是必备的而黑色是可选的。
define service{ host_name host_name(*) hostgroup_name hostgroup_name service_description service_description(*) display_name display_name servicegroups servicegroup_names is_volatile [0/1] check_command command_name(*) initial_state [o,w,u,c] max_check_attempts #(*) check_interval #(*) retry_interval #(*) active_checks_enabled [0/1] passive_checks_enabled [0/1] check_period timeperiod_name(*) obsess_over_service [0/1] check_freshness [0/1] freshness_threshold # event_handler command_name event_handler_enabled [0/1] low_flap_threshold # high_flap_threshold # flap_detection_enabled [0/1] flap_detection_options [o,w,c,u] process_perf_data [0/1] retain_status_information [0/1] retain_nonstatus_information [0/1] notification_interval #(*) first_notification_delay # notification_period timeperiod_name(*) notification_options [w,u,c,r,f,s] notifications_enabled [0/1] contacts contacts(*) contact_groups contact_groups(*) stalking_options [o,w,u,c] notes note_string notes_url url action_url url icon_image image_file icon_image_alt alt_string ... }
定义样例:
define service{ host_name linux-server service_description check-disk-sda1 check_command check-disk!/dev/sda1 max_check_attempts 5 check_interval 5 retry_interval 3 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,c,r contact_groups linux-admins }
域描述:
host_name: This directive is used to specify the short name(s) of the host(s) that the service "runs" on or is associated with. Multiple hosts should be separated by commas.
hostgroup_name: This directive is used to specify the short name(s) of the hostgroup(s) that the servi