nagios监控环境搭建
1.目录
2. 相关环境
jk1 192.168.199.110 CentOS 6.5 x86_64
jk2 192.168.199.184 CentOS 6.5 x86_64
nagios 4.0.8
lnamp环境
3. 部署规划
Nagios主节点需要安装:
nagios
nagios-plugin
nrpe
php
apache
Nagios从节点需要安装:
nagios-plugin
nrpe
安装路径规划
项 值
nagios安装路径 /usr/local/nagios
php安装路径 /usr/local/php
apache安装路径 /usr/local/apache2
4. 代码获取
nagios-4.0.2.tar.gz
nagios-plugins-1.5.tar.gz
nrpe-2.15.tar.gz
httpd-2.2.23.tar.gz
php-5.4.10.tar.gz
5. 前提依赖
5.1 主机环境检查(全部节点)
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-3.el6.x86_64
glibc-2.14.1-6.x86_64
glibc-common-2.14.1-6.x86_64
gd-2.0.35-11.el6.x86_64
package gd-devel is not installed
package xinetd is not installed
openssl-devel-1.0.0-27.el6.x86_64
若有缺失,请先安装. 可通过如下几个镜像网站下载相关安装包:
http://rpm.pbone.net/
http://mirrors.163.com/centos/6.4/os/x86_64/Packages/
http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/
安装后再次检查如下:
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-3.el6.x86_64
glibc-2.14.1-6.x86_64
glibc-common-2.14.1-6.x86_64
gd-2.0.35-11.el6.x86_64
gd-devel-2.0.35-11.el6.x86_64
xinetd-2.3.14-38.el6.x86_64
openssl-devel-1.0.0-27.el6.x86_64
6. 编译安装
6.1 创建用户nagios(全部节点)
useradd nagios -d /usr/local/nagios
passwd nagios (密码自定义)
chmod 777 nagios
6.2 安装nagios主程序(主节点安装)
tar -zxf nagios-4.0.2.tar.gz
cd nagios-4.0.2
./configure --prefix=/usr/local/nagios
make all
make install && make install-init && make install-commandmode && make install-config
将nagios添加为服务
chkconfig --add nagios
chkconfig nagios off
chkconfig --level 35 nagios on
chkconfig --list nagios
nagios 0:关闭 1:关闭 2:关闭 3:启用 4:关闭 5:启用 6:关闭
6.3 安装nagios插件(全部节点安装)
tar -zxf nagios-plugins-1.5.tar.gz
cd nagios-plugins-1.5
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-mysql=/usr/local/mysql
make && make install
如果出现mysql相关的编译错误,是mysql的默认安装路径被修改导致的,调整with-mysql后重新make
./configure --prefix=/usr/local/nagios --with-mysql=/usr/local/mysql (多余)
make && make install
6.4 安装NRPE(全部节点安装)
tar -zxf nrpe-2.15.tar.gz
cd nrpe-2.15
./configure --enable-command-args
make all
make install-plugin
下面步骤只需要在被监控节点执行
make install-daemon && make install-daemon-config && make install-xinetd
6.4.1 被监控节点配置
如果是被监控节点,需要配置NRPE已守护进程运行(通过xinetd来运行)
1、更改/etc/xinetd.d/nrpe文件,设置允许nagios主节点服务器连接
vi /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.56.10
2、在/etc/services结尾增加:
nrpe 5666/tcp # NRPE
3、增加对参数的支持
vi /usr/local/nagios/etc/nrpe.cfg
dont_blame_nrpe=1
4、启动xinetd
service xinetd restart
5、验证nrpe是否监听
netstat -at | grep nrpe
6、测试nrpe是否正常运行
/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.15
这里会报错,没事,把localhost换成127.0.0.1
6.4.2 主节点配置
如果是监控服务主节点,在全部被监控节点NRPE配置完成后,可以依次做下检测
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.11
NRPE v2.15
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.12
NRPE v2.15
6.5 安装Apache(主节点安装)
tar -zxf httpd-2.2.23.tar.gz
cd httpd-2.2.23
./configure --prefix=/usr/local/apache2
make && make install
6.6 安装PHP(主节点安装)
cd /export/home/tools/soft/php
tar -zxf php-5.4.10.tar.gz
cd /php-5.4.10
./configure --prefix=/usr/local/php --with-apxs2=/usr/local/apache2/bin/apxs
make && make install
6.7 使用apache 发布PHP的WEB
vi /usr/local/apache2/conf/httpd.conf
....
Listen 80
.... 在下面这个模块里面添加
<IfModule dir_module>
DirectoryIndex index.html index.php
AddType application/x-httpd-php .php
</IfModule>
.... 没有此模块,添加新如下模块
#setting for nagios
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
为web访问时添加用户名和密码(此处用户名为admin,可自定义)
/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin
启动apache
/usr/local/apache2/bin/apachectl start
访问页面:http://192.168.56.10/nagios/
出现欢迎界面。
7. 配置Nagios
7.1 配置远程被监控节点 监控节点配置
7.1.1 修改配置文件
# su - nagios
$ vi /usr/local/nagios/etc/nrpe.cfg
修改为如下配置内容:
command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_procs_args]=/usr/local/nagios/libexec/check_procs $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
以上监控命令功能:
check_users 监控登陆用户数
check_load 监控CPU负载
check_disk 监控磁盘的使用
check_procs 监控进程数量,状态包括 RSZDT
check_swap 监控SWAP分区使用
7.1.2 重启xinetd服务
配置完上述命令后,重启 xinetd服务
root:
service xinetd restart
7.1.3 校验配置
检查监控命令配置是否ok
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users -a 5 10
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load -a 15,10,5 30,25,20
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk -a 20% 10% /
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap -a 20% 10%
7.2 配置监控服务主节点
7.2.1 cgi.cfg(控制CGI访问的配置文件)
(使用 nagios 用户)
vi /usr/local/nagios/etc/cgi.cfg
修改如下内容,为admin用户增加权限:
default_user_name=admin 取消注释
authorized_for_system_information=nagiosadmin,admin 添加admin
authorized_for_configuration_information=nagiosadmin,admin
authorized_for_system_commands=nagiosadmin,admin
authorized_for_all_services=nagiosadmin,admin
authorized_for_all_hosts=nagiosadmin,admin
authorized_for_all_service_commands=nagiosadmin,admin
authorized_for_all_host_commands=nagiosadmin,admin
7.2.2 nagios.cfg(nagios主配置文件)
(使用 nagios 用户)
vi /usr/local/nagios/etc/nagios.cfg
#cfg_file=/export/home/nagios/etc/objects/localhost.cfg (注释掉)
cfg_dir=/export/home/nagios/etc/servers
没有/exprot/home目录就用/usr/local/即可
主配置文件声明了监控脚本的存储路径为 ./servers,默认没有此目录,需要手工创建
nagios 会读取 servers 目录下面后缀为.cfg的全部文件作为配置文件
cd /usr/local/nagios/etc
mkdir servers
cd servers
7.2.3 定义监控的主机组
声明一个监控的主机组,将主机环境中提到的三台主机全部加入监控
vi /export/home/nagios/etc/servers/group.cfg
新文件,内容如下:
define hostgroup{
hostgroup_name duangr-server
alias duangr Server
members duangr-1,duangr-2,duangr-3
}
解释下上面的配置:
hostgroup_name: 主机组的名称,可随意指定
alias: 主机组别名,可随意指定
members: 主机组成员,多个主机名称之前使用逗号分隔。另外主机名称必须与 define host 中host_name 一致。
主机的定义,后面会说到。
7.2.4 定义监控的主机
下面开始定义具体的主机
7.2.4.1 本地主机监控配置
先定义本地主机 duangr-1
vi /export/home/nagios/etc/servers/duangr-1.cfg
新文件,内容如下:
define host{
use linux-server
host_name duangr-1
alias duangr-1
address 192.168.56.10
}
define service{
use local-service
host_name duangr-1
service_description Host Alive
check_command check-host-alive
}
define service{
use local-service
host_name duangr-1
service_description Users
check_command check_local_users!20!50
}
define service{
use local-service
host_name duangr-1
service_description CPU
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service
host_name duangr-1
service_description Disk Root
check_command check_local_disk!20%!10%!/
}
define service{
use local-service
host_name duangr-1
service_description Disk Home
check_command check_local_disk!20%!10%!/export/home
}
define service{
use local-service
host_name duangr-1
service_description Zombie Procs
check_command check_local_procs!5!10!Z
}
define service{
use local-service
host_name duangr-1
service_description Total Procs
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service
host_name duangr-1
service_description Swap Usage
check_command check_local_swap!20!10
}
说明下,由于是此主机也是监控服务主节点所在主机,因此可以使用check_local_* 的相关命令来进行监控。
这个文件中已经将常用的监控项配置进去。
都是在主机上面配置
7.2.4.2 远程主机监控配置
再定义远程主机duangr-2和duangr-3
定义远程主机的监控之前,需要先定义check_nrpe命令
vi /usr/local/nagios/etc/objects/commands.cfg
在文件的最后面添加如下内容:
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
define command{
command_name check_nrpe_args
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
}
定义duangr-2主机的监控配置
$ vi /usr/local/nagios/etc/servers/duangr-2.cfg
注意:复制的时候define host容易漏掉
新文件,内容如下:
define host{
use linux-server
host_name duangr-2
alias duangr-2
address 192.168.56.11
}
define service{
use local-service
host_name duangr-2
service_description Host Alive
check_command check-host-alive
}
define service{
use local-service
host_name duangr-2
service_description Users
check_command check_nrpe_args!check_users!5 10
}
define service{
use local-service
host_name duangr-2
service_description CPU
check_command check_nrpe_args!check_load!15,10,5 30,25,20
}
define service{
use local-service
host_name duangr-2
service_description Disk Root
check_command check_nrpe_args!check_disk!20% 10% /
}
define service{
use local-service
host_name duangr-2
service_description Disk /export/home
check_command check_nrpe_args!check_disk!20% 10% /export/home
}
define service{
use local-service
host_name duangr-2
service_description Procs Zombie
check_command check_nrpe_args!check_procs!5 10 Z
}
define service{
use local-service
host_name duangr-2
service_description Procs Total
check_command check_nrpe_args!check_procs_args!"-w400 -c600"
}
define service{
use local-service
host_name duangr-2
service_description Swap Usage
check_command check_nrpe_args!check_swap!20% 10%
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 下面是一些常用进程的监控,主要是云平台相关进程
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 监控crond进程
define service{
use local-service
host_name duangr-2
service_description PS: crond
check_command check_nrpe_args!check_procs_args!"-c1:1 -Ccrond"
}
;; 监控zookeeper进程
define service{
use local-service
host_name duangr-2
service_description PS: QuorumPeerMain
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain"
}
;;监控storm的从节点进程
define service{
use local-service
host_name duangr-2
service_description PS: supervisor
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor"
}
;; 监控storm的主节点进程
define service{
use local-service
host_name duangr-2
service_description PS: nimbus
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus"
}
;; 监控MetaQ进程
define service{
use local-service
host_name duangr-2
service_description PS: MetaQ
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w"
}
;; 监控Redis进程
define service{
use local-service
host_name duangr-2
service_description PS: redis-server
check_command check_nrpe_args!check_procs_args!"-c1:1 -Credis-server"
}
;; 监控hadoop主节点NameNode进程
define service{
use local-service
host_name duangr-2
service_description PS: NameNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode"
}
;; 监控hadoop主节点SecondaryNameNode进程
define service{
use local-service
host_name duangr-2
service_description PS: SecondaryNameNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode"
}
;; 监控hadoop主节点ResourceManager进程
define service{
use local-service
host_name duangr-2
service_description PS: ResourceManager
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager"
}
;; 监控hadoop从节点DataNode进程
define service{
use local-service
host_name duangr-2
service_description PS: DataNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode"
}
;;监控hadoop从节点NodeManager进程
define service{
use local-service
host_name duangr-2
service_description PS: NodeManager
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager"
}
说明下,由于duangr-2是远程主机,因此使用check_nrpe_args命令来监控.
这个文件中已经将常用的监控项配置进去, 同时还包含了hadoop、storm、zookeeper、metaq、redis的相关进程监控,主要的监控思路是判断进程是否存在。
定义duangr-3主机的监控配置
vi duangr-3.cfg
内容与duangr-2.cfg类似,只需要修改 host_name 、alias、 address即可.
主机配置
7.2.4.3 邮件监控
定义监控人邮件地址
vi /usr/local/nagios/etc/objects/contacts.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email 125177796@qq.com
; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
除了配置监控邮件的接收人外,还要确保:
本主机与邮件服务器互通
本主机SendMail可以使用外部SMTP服务发送邮件
7.2.4.4 校验配置
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
成功提示:
[root@jk1 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 27 services.
Checked 2 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 26 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
7.2.4.5 启动
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios已经是一个服务,也可以执行如下操作:
service nagios start/stop/restart/status
8. 监控页面
请多指教。
参考:
http://www.centoscn.com/image-text/config/2013/1216/2236.html
http://www.cnblogs.com/mchina/archive/2013/02/20/2883404.html