corosync+pacemaker实现web集群高可用-阿里云开发者社区

一、高可用集群

1、高可用集群的定义

高可用集群，是指以减少服务中断（如因服务器宕机等引起的服务中断）时间为目的的服务器集群技术。简单的说，集群就是一组计算机，它们作为一个整体向用户提供一组网络资源。这些单个的计算机系统就是集群的节点。

　　高可用集群的出现是为了减少由计算机硬件和软件易错性所带来的损失。它通过保护用户的业务程序对外不间断提供的服务，把因软件/硬件/人为造成的故障对业务的影响降低到最小程度。如果某个节点失效，它的备援节点将在几秒钟的时间内接管它的职责。因此，对于用户而言，集群永远不会停机。高可用集群软件的主要作用就是实现故障检查和业务切换的自动化。

2、高可用集群的结构

要想实现、配置高可用集群，就必须要了解高可用集群的结构，从底至上分为三层结构

先上高可用集群的结构图

1）Messaging Layer

信息层，心跳信息传输层，它是运行在每一个主机上的一个进程

今天要讲的corosync就是运行在这一层的

2）CRM，Cluster Resources Manager

集群资源管理器，依赖于各自底层的心跳信息层。之所以有这一层是因为那些非ha_aware软件本身不具备集群高可用能力，才借助于CRM来实现的；而如果一个应用程序自己能够利用底层心跳信息传递层的功能完成集群事务决策的软件就叫ha_aware。

在这层中，其实还有一层叫做LRM（Local Resource Manager）本地资源管理层，这一层是真正去把CRM层的决策去落实的层次；就好比，CRM层是公司的董事长，LRM就是总经理，CRM负责整个公司的远景规划及战略实施，然后分配给总经理（LRM）去实施，总经理再分摊给下面的各小弟（RA）去完成，这些在上图中也可以直观的表现出来

pacemaker就是属于这一层的，而pacemaker的配置接口的crm（suse），所以我们安装的时候需要安装crmsh

3）RA，Resource Agent

资源代理就是能够接收CRM的调度，用于实现在节点上对某一个资源完成管理的工具，通常是一些脚本

（1）heartbeat legacy

heartbeat的传统类型，监听在udp的694端口上

（2）LSB,linux standard base

那些在/etc/rc.d/init.d/*的脚本就是属于LSB的

（3）OCF，Open Cluster Framework

开放集群架构，那些提供资源代理脚本的组织叫provider，pacemaker就是其中的一个provider

（4）STONITH

shoot the other node in the head，这个RA类型主要是做节点隔离的，专为配置stonith设备耐用。

使用STONITH主要目的就是为了避免由于网络原因，节点之间不能完全通信（比如分为了两部分，左边3台，右边2台），左边的3台能收到各自的心跳信息，右边的2台也可以收到各自的心跳信息，就是左边部分和右边部分收不到各自的心跳信息，因此，它们都各自以为对方故障了，就会各自重新推选出一台做为DC（Designated Coordinator），从而出现了两个集群，这就导致了资源争用；如果恰好双方都往其共享存储写数据，很可能就会导致文件系统崩溃，这种现象就叫做集群分裂（brain-split）。

为了避免集群分裂，就出现了法定票数（quorum，票数>半数票数的集群成为满足法定票数）这一说法，就是在集群通信故障时，为了避免资源抢占，应该让一方放弃成为集群，具体应该哪方放弃呢？这就是投票的结果了，只有具有法定票数的一方才有资格做为集群，相反的一方就应该退出集群，但它放弃后并不代表服务停止，所以应该让其释放资源，关闭电源。stonith设备就是在这里用的，要让退出集群的设备彻底失效，电源交换机就是这个原理了

而如果一个集群只有两个节点的话，这又是一种特殊的集群，万一出现集群分裂后，它们双方可能都不会具有法定票数，那结果可想而知，资源不会转移，导致整个资源都故障了，因为没有仲裁设备

说这么多，就是为了说明下面两个比较重要的概念

①、corosync默认启用了stonith功能，而我们要配置的集群并没有stonith设备，因此在配置集群的全局属性时要对其禁用

②、当一个集群没有法定票数时，资源是不会正常转移的，当一个节点出现故障时，资源不会正常的转移到正常的节点上，就会导致所有的资源都故障了。所以，应该定义法定票数不足时做忽略而不是停止所有资源

二、前景说明

1、拓扑图

2、服务器说明

本篇博文主要讲corosync+pacemaker实现web的高可用性，所以，为了配置的方便，使用yum方式安装web和php到一台主机，做为主节点，另一台做为备用

为了实现web的高可用性，两台web服务器挂载使用NFS文件系统，且NFS与MySQL数据库安装到一台服务器上

当主节点出现故障的时候，能够实现IP自动转移到备用节点上，这就需要一个虚拟IP来实现流转，这里定义VIP：172.16.7.188

3、系统平台

全部系统都为centos6.5

4、NFS

在NFS上创建目录/www，web服务器的网页目录都挂载于此

三、前提准备（两个节点都一样，这里只演示在node1上操作）

1、各节点之间实现互相解析

所有节点的主机名称和对应的IP地址解析服务可以正常工作，且每个节点的主机名称需要跟"uname -n“命令的结果保持一致

 
         [root@node1 ~]
         # uname -n 
        
         node1.shuishui.com
        
         [root@node1 ~]
         # vim /etc/hosts 
        
         127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
        
         ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
        
         172.16.7.10 node1.shuishui.com node1
        
         172.16.7.20 node2.shuishui.com node2
        
         [root@node1 ~]
         # ping node2 
        
         PING node2.shuishui.com (172.16.7.20) 56(84) bytes of data.
        
         64 bytes from node2.shuishui.com (172.16.7.20): icmp_seq=1 ttl=64 
         time
         =1.41 ms

2、做双机互信

设定两个节点可以基于密钥进行ssh通信，密码要设为空，否则在节点切换的时候将会失败

 
         [root@node1 ~]
         # ssh-keygen -t rsa -P ''                                  /生成空密码 
        
         [root@node1 ~]
         # ssh-copy-id -i .ssh/id_rsa.pub root@node2.shuishui.com   /发送到node2

验证从node1 ssh到node2不需密码

 
         [root@node1 ~]
         # ssh 172.16.7.20 
        
         Last login: Sun Apr 20 02:37:55 2014 from 172.16.250.87
        
         [root@node2 ~]
         #

3、时间同步

 
         [root@node1 ~]
         # ntpdate 172.16.0.1

4、确保网页都被挂载到了NFS上

 
         [root@node1 ~]
         # curl node1.shuishui.com 
        
         web 
         in 
         nfs 
        
         [root@node1 ~]
         # curl node2.shuishui.com 
        
         web 
         in 
         nfs

5、手动停止服务并关闭开机自动启动

加在集群中的服务就交由crm来管理了，不需用户参与

 
         [root@node1 ~]chkconfig httpd off
        
         [root@node1 ~]service httpd stop

四、corosync+pacemaker实现web高可用

1、安装corosync和pacemaker

因为有依赖关系，所以我们最好使用yum方式安装，前提配置好YUM源

 
         [root@node1 ~]
         # yum -y install corosync 
        
         [root@node1 ~]
         # yum -y install pacemaker

2、修改corosync的配置文件

1）corosync安装完成了，提供了配置文件模板，我们只需改名即可使用

 
         [root@node1 ~]
         # cd /etc/corosync/ 
        
         [root@node1 corosync]
         # ls 
        
         corosync.conf.example  corosync.conf.example.udpu  service.d  uidgid.d
        
         [root@node1 corosync]
         # mv corosync.conf.example corosync.conf

2）修改配置文件，增加service段和aisexec段

 
         totem {
        
         version: 2 
        
         secauth: off 
        
         threads: 0 
        
         interface { 
        
         ringnumber: 0 
        
         bindnetaddr: 172.16.0.0      
         #绑定网络地址 
        
         mcastaddr: 230.100.100.7     
         #心跳信息传递的组播地址 
        
         mcastport: 5405 
        
         ttl: 1 
        
         } 
        
         }
        
         logging {
        
         fileline: off 
        
         to_stderr: no 
        
         to_logfile: 
         yes 
        
         to_syslog: 
         yes 
        
         logfile: 
         /var/log/cluster/corosync
         .log   
         /corosync
         的日志文件 
        
         debug: off 
        
         timestamp: on 
        
         logger_subsys { 
        
         subsys: AMF 
        
         debug: off 
        
         } 
        
         }
        
         amf {
        
         mode: disabled 
        
         }
        
         service {
        
         ver: 0 
        
         name: pacemaker            
         #定义corosync在启动时自动启动pacemaker 
        
         }
        
         aisexec {                          
         #表示启动corosync的ais功能，以哪个用户的身份运行 
        
         user: root 
        
         group: root 
        
         }

3、生成密钥文件

对于corosync而言，各节点之间通信需要安全认证，所以需要安全密钥，生成后会自动保存至当前目录下，命名为authkey，权限为400

 
         [root@node1 corosync]
         # corosync-keygen 
        
         Corosync Cluster Engine Authentication key generator.
        
         Gathering 1024 bits 
         for 
         key from 
         /dev/random
         . 
        
         Press keys on your keyboard to generate entropy.
        
         Press keys on your keyboard to generate entropy (bits = 272).

生成的密钥文件是1024字节的，但是在这个生成密钥的过程中，系统会去调用/etc/random中的随机数，如果熵池中的随机数不够用，就会提示让我们逛敲键盘以来弥补随机数的不足，直到生成密钥。如果对安全不是要求太高，此过程可以使用伪随机数，伪随机数是有规律的，所以可能会被找到规律从而破解密钥，慎用。

经过痛苦的敲键盘，终于成生了我们想要的密钥，接下来就是把密钥和刚才配置的corosync.conf复制到node2上

 
         [root@node1 corosync]
         # ls 
        
         authkey  corosync.conf  corosync.conf.example.udpu  service.d  uidgid.d
        
         [root@node1 corosync]
         # scp -p corosync.conf authkey  node2:/etc/corosync/

4、安装pacemaker的配置接口crmsh

RHEL自6.4起不再提供集群的命令行配置工具crmsh，转而使用pcs；如果想继续使用crm命令，必须下载相关的程序包自行安装才可。crmsh依赖于pssh，因此需要一并下载，安装过程中还会有其它的依赖关系，所以使用yum方式进行安装

 
         [root@node1 ~]
         # yum -y install pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm

5、启动corosync并查看相关信息

 
         [root@node1 ~]
         # service corosync start 
        
         Starting Corosync Cluster Engine (corosync):               [  OK  ]

（1）查看corosync引擎是否正常启动：

 
         [root@node1 ~]
         # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 
        
         Apr 20 12:07:38 corosync [MAIN  ] Corosync Cluster Engine (
         '1.4.1'
         ): started and ready to provide service. 
        
         Apr 20 12:07:38 corosync [MAIN  ] Successfully 
         read 
         main configuration 
         file 
         '/etc/corosync/corosync.conf'
         .

（2）查看初始化成员节点通知是否正常发出：

 
         [root@node1 ~]
         # grep  TOTEM  /var/log/cluster/corosync.log 
        
         Apr 20 12:07:38 corosync [TOTEM ] Initializing transport (UDP
         /IP 
         Multicast). 
        
         Apr 20 12:07:38 corosync [TOTEM ] Initializing transmit
         /receive 
         security: libtomcrypt SOBER128
         /SHA1HMAC 
         (mode 0). 
        
         Apr 20 12:07:38 corosync [TOTEM ] The network interface [172.16.7.10] is now up.
        
         Apr 20 12:07:39 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
        
         Apr 20 12:12:42 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

如果最下面显示的是一行，那说明是错误的，你还需启动node2的corosync

（3）检查启动过程中是否有错误产生（此处可忽略）。

 
         [root@node1 ~]
         # grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources 
        
         Apr 20 12:07:38 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin 
         for 
         Corosync. The plugin is not supported 
         in 
         this environment and will be removed very soon. 
        
         Apr 20 12:07:38 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 
         'Clusters from Scratch' 
         (http:
         //www
         .clusterlabs.org
         /doc
         ) 
         for 
         details on using Pacemaker with CMAN

（4）查看pacemaker是否正常启动

 
         [root@node1 ~]
         # grep pcmk_startup /var/log/cluster/corosync.log 
        
         Apr 20 12:07:39 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
        
         Apr 20 12:07:39 corosync [pcmk  ] Logging: Initialized pcmk_startup
        
         Apr 20 12:07:39 corosync [pcmk  ] info: pcmk_startup: Maximum core 
         file 
         size is: 18446744073709551615 
        
         Apr 20 12:07:39 corosync [pcmk  ] info: pcmk_startup: Service: 9
        
         Apr 20 12:07:39 corosync [pcmk  ] info: pcmk_startup: Local 
         hostname
         : node1.shuishui.com

（5）查看集群状态

 
         [root@node1 ~]
         # crm_mon 
        
         Last updated: Sun Apr 20 12:20:24 2014
        
         Last change: Sun Apr 20 12:12:42 2014 via crmd on node1.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node1.shuishui.com - partition with quorum
        
         Version: 1.1.10-14.el6-368c726
        
         2 Nodes configured, 2 expected votes
        
         0 Resources configured
        
         Online: [ node1.shuishui.com node2.shuishui.com ]

从集群状态信息中我们可以看到，节点node1和node2都在线，node1是DC且拥有法定票数，但是0 Rewources configured，集群中没有任何资源，所以接下来的任务就是使用pacemaker配置资源

6、crmsh使用简单说明，有迷糊的地方就help吧

 
         [root@node1 ~]
         # crm              #进入CRM 
        
         crm(live)
         # help                  #获取帮助 
        
         This is crm shell, a Pacemaker 
         command 
         line interface. 
        
         Available commands:
        
         cib              manage shadow CIBs 
        
         resource         resources management 
        
         configure        CRM cluster configuration 
        
         node             nodes management 
        
         options          user preferences 
        
         history          
         CRM cluster 
         history 
        
         site             Geo-cluster support 
        
         ra               resource agents information center 
        
         status           show cluster status 
        
         help,?           show help (help topics 
         for 
         list of topics) 
        
         end,
         cd
         ,up        go back one level 
        
         quit,bye,
         exit    
         exit 
         the program    
        
         crm(live)
         # configure              #进入configure 
        
         crm(live)configure
         #               #两下tab可显示所有命令 
        
         ?                  erase              ms                 rsc_template
        
         bye                
         exit               
         node               rsc_ticket 
        
         cd                 
         fencing_topology   op_defaults        rsctest 
        
         cib                filter             order              save
        
         cibstatus          graph              primitive          schema
        
         clone              group              property           show
        
         collocation        help               ptest              simulate
        
         colocation         
         history            
         quit               template 
        
         commit             load               ra                 up
        
         default-timeouts   location           refresh            upgrade
        
         delete             master             rename             user
        
         edit               modgroup           role               verify
        
         end                monitor            rsc_defaults       xml
        
         crm(live)configure
         # help           #在configure下获取帮助 
        
         Commands 
         for 
         resources are: 
        
         - `primitive`
        
         - `monitor`
        
         - `group`
        
         - `clone`
        
         - `ms`/`master` (master-slave)
        
         In order to streamline large configurations, it is possible to
        
         define a template 
         which 
         can later be referenced 
         in 
         primitives: 
        
         - `rsc_template`
        
         In that 
         case 
         the primitive inherits all attributes defined 
         in 
         the 
        
         template.
        
         There are three types of constraints:
        
         crm(live)configure
         # help primitive            #查看primitive命令的使用格式 
        
         Usage:
        
         ...............
        
         primitive <rsc> {[<class>:[<provider>:]]<
         type
         >|@<template>} 
        
         [params attr_list] 
        
         [meta attr_list] 
        
         The primitive 
         command 
         describes a resource. It may be referenced 
        
         only once 
         in 
         group, clone, or master-slave objects. If it's not 
        
         referenced, 
         then 
         it is placed as a single resource 
         in 
         the CIB.

7、定义全局属性，设置没有法定票数的行为和禁用stonith

之于为什么要这么做，已经在第一部分的第4条做了详细说明，这里就不再赘述了

（1）禁用stonith

 
         [root@node1 ~]
         # crm 
        
         crm(live)
         # configure 
        
         crm(live)configure
         # property stonith-enabled=false     #禁用stonith 
        
         crm(live)configure
         # verify            #校验 
        
         crm(live)configure
         # commit             #校验没有错误再提交

（2）定义法定票数不够时应该做忽略操作

 
         crm(live)configure
         # property no-quorum-policy=ignore    #定义全局属性 
        
         crm(live)configure
         # verify        #每次在提交之前都要做一次校验 
        
         crm(live)configure
         # commit         #提交 
        
         crm(live)configure
         # show        
        
         node node1.shuishui.com
        
         node node2.shuishui.com
        
         property $
         id
         =
         "cib-bootstrap-options" 
         \ 
        
         dc
         -version=
         "1.1.10-14.el6-368c726" 
         \ 
        
         cluster-infrastructure=
         "classic openais (with plugin)" 
         \ 
        
         expected-quorum-votes=
         "2" 
         \ 
        
         stonith-enabled=
         "false" 
         \           
         #已然生效 
        
         no-quorum-policy=
         "ignore"

8、资源类型

上面提到过了，添加资源是在crm中定义的，但是真正去干活的却是做RA的那些小弟们，因此，在添加资源之前，必须要知道当前集群所支持的RA类型。corosync支持heartbeat，LSB和ocf等类型的资源代理，目前较为常用的类型为LSB和OCF两类，stonith类专为配置stonith设备而用。

（1）查看当前集群所支持的RA类型列表

 
         crm(live)
         # ra 
        
         crm(live)ra
         # classes 
        
         lsb
        
         ocf / heartbeat pacemaker
        
         service
        
         stonith

（2）查看某种类别下的所用资源代理的列表

 
         crm(live)ra
         # list lsb 
        
         auditd            blk-availability  corosync          corosync-notifyd  crond             haldaemon         halt              htcacheclean
        
         httpd             ip6tables         iptables          killall           libvirt-guests    lvm2-lvmetad      lvm2-monitor      messagebus
        
         netconsole        netfs             network           nfs               nfslock           ntpdate           pacemaker         postfix
        
         quota_nld         rdisc             restorecond       rpcbind           rpcgssd           rpcidmapd         rpcsvcgssd        rsyslog
        
         sandbox           saslauthd         single            sshd              svnserve          udev-post         winbind      
        
         crm(live)ra
         # list ocf heartbeat 
        
         CTDB            Dummy           Filesystem      IPaddr          IPaddr2         IPsrcaddr       LVM             MailTo          Route
        
         SendArp         Squid           VirtualDomain   Xinetd          apache          conntrackd      dhcpd           ethmonitor      exportfs
        
         mysql           mysql-proxy     named           nfsserver       nginx           pgsql           postfix         rsyncd          rsyslog
        
         slapd           
         symlink         
         tomcat      
        
         crm(live)ra
         # list ocf pacemaker 
        
         ClusterMon    Dummy         HealthCPU     HealthSMART   Stateful      SysInfo       SystemHealth  controld      
         ping          
         pingd 
        
         remote   
        
         crm(live)ra
         # list stonith 
        
         fence_legacy  fence_pcmk

（3）查看代理信息

我们在上面介绍过，pacemaker是OCF的一个provider，heartbeat也是其中一个，当你要查看某个OCF资源代理时，就需为其指定provider

 
         格式：crm ra info [class:[provider:]]resource_agent
        
         crm(live)ra
         # info ocf:heartbeat:IPaddr 
        
         Parameters (* denotes required, [] the default):
        
         ip* (string): IPv4 or IPv6 address
        
         The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation) 
        
         example IPv4 
         "192.168.1.1"
         . 
        
         example IPv6 
         "2001:db8:DC28:0:0:FC57:D4C8:1FFF"
         . 
        
         nic (string): Network interface
        
         The base network interface on 
         which 
         the IP address will be brought

9、为集群添加资源

一个web集群中应该有三个资源：webip，webstore和webserver

（1）为web集群创建IP地址资源

webip是要实现转移的，目的就是为了实现一个节点故障的时候，webip可以转移到另外一台备用的服务器上，以实现web服务器的高可用性

 
         [root@node1 ~]
         # crm 
        
         crm(live)
         # configure 
        
         crm(live)configure
         # primitive webip ocf:heartbeat:IPaddr params ip=172.16.7.188 op monitor interval=30s timeout=20s on-fail=restart 
        
         crm(live)configure
         # verify     #校验 
        
         crm(live)configure
         # commit     #提交 
        
         crm(live)configure
         # cd ..      #返回上层 
        
         crm(live)
         # status              #查看集群状态 
        
         Last updated: Sun Apr 20 14:52:53 2014
        
         Last change: Sun Apr 20 14:52:28 2014 via cibadmin on node1.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node1.shuishui.com - partition with quorum
        
         Version: 1.1.10-14.el6-368c726
        
         2 Nodes configured, 2 expected votes
        
         1 Resources configured
        
         Online: [ node1.shuishui.com node2.shuishui.com ]
        
         webip  (ocf::heartbeat:IPaddr):    Started node1.shuishui.com    
         #资源webip运行在node1上

这里在定义资源时，使用了监控（monitor）的概念，上面没提到，这里补充说明：

monitor是用来监控资源的，默认情况下pacemaker没有对任何资源进行监控，那为什么要对资源进行监控呢？假如一个节点出故障了，corosync检测不到这个节点的心跳信息，那么它就认为这个节点故障了，因此资源就会转移到另外的备用节点上；但如果是服务非正常关闭了呢？假如挂的是httpd服务而不是节点，在这种情况下，如果没有对资源进行监控，资源是不会转移的，因为压根儿就没节点什么事，它也不会意识到服务停掉了，这就意味着如果服务非正常关闭的话，那web也就不会响应了，所以我们应该在定义资源时对其进行监控

要想对资源进行监控，就必须在定义资源时指定op_type为monitor，假如服务非正常关闭的话，先让其重启，如果重启不了，再转移到其它节点上

这也解释清楚了，语法就在上面了，interval代表多久监听一次，timeout代表超时时间

 
         [root@node1 ~]
         # ip addr show     #查看下webip是否生效 
        
         1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
        
         link
         /loopback 
         00:00:00:00:00:00 brd 00:00:00:00:00:00 
        
         inet 127.0.0.1
         /8 
         scope host lo 
        
         inet6 ::1
         /128 
         scope host 
        
         valid_lft forever preferred_lft forever 
        
         2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        
         link
         /ether 
         00:0c:29:37:b4:75 brd ff:ff:ff:ff:ff:ff 
        
         inet 172.16.7.10
         /16 
         brd 172.16.255.255 scope global eth0 
        
         inet 172.16.7.188
         /16 
         brd 172.16.255.255 scope global secondary eth0   
         #webip在这里 
        
         inet6 fe80::20c:29ff:fe37:b475
         /64 
         scope link 
        
         valid_lft forever preferred_lft forever

既然webip已经生效，那么就可以使用访问这个webip：172.16.7.188查看网页了

此时，我们测试在node2上把node1的corosync停掉，然后看看集群的状态

 
         [root@node2 ~]
         # ssh node1 "service corosync stop"   #停掉node1的corosync 
        
         Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
        
         Waiting 
         for 
         corosync services to unload:.[  OK  ]    
         #停止node1上的corosync成功 
        
         [root@node2 ~]
         # crm status     #查看下node2上的集群状态信息 
        
         Last updated: Sun Apr 20 16:56:11 2014
        
         Last change: Sun Apr 20 16:50:32 2014 via crmd on node2.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node2.shuishui.com - partition WITHOUT quorum
        
         Version: 1.1.8-7.el6-394e906
        
         2 Nodes configured, 2 expected votes
        
         1 Resources configured.
        
         Online: [ node2.shuishui.com ]
        
         OFFLINE: [ node1.shuishui.com ]        
         #node1已经离线 
        
         webip  (ocf::heartbeat:IPaddr):    Started node2.shuishui.com    
         #webip转移到了node2上

在node1上查看集群状态信息

注意：这里是在node1上查看集群的状态信息，只代表node1这一台机器连不到集群中，并不是整个集群有问题

 
         [root@node1 ~]
         # crm status 
        
         Could not establish cib_ro connection: Connection refused (111)
        
         ERROR: crm_mon exited with code 107 and said: Connection to cluster failed: Transport endpoint is not connected

此时再去访问172.16.7.188依然是正常状态，而且在这个过程中，资源已经从node1转移到node2上了

因为我们这个集群中有两个节点，所以这里可能出现无法定票数的情况，这个已经在上面提到过了，而且已经在第一步的时候就已经修改了当法定票数不足是应该做忽略操作，所以这里就没有报那个partition WITHOUT quorum的错

验证完成之后，我把再让node1重新上线吧

 
         [root@node2 ~]
         # ssh node1 "service corosync start"    #在node2上启动node1的corosync 
        
         Starting Corosync Cluster Engine (corosync): [  OK  ]  
         #启动成功 
        
         [root@node2 ~]
         # crm status 
        
         Last updated: Sun Apr 20 17:14:53 2014
        
         Last change: Sun Apr 20 16:50:32 2014 via crmd on node2.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node2.shuishui.com - partition with quorum
        
         Version: 1.1.8-7.el6-394e906
        
         2 Nodes configured, 2 expected votes
        
         1 Resources configured.
        
         Online: [ node1.shuishui.com node2.shuishui.com ]       
         #node1上线成功 
        
         webip  (ocf::heartbeat:IPaddr):    Started node2.shuishui.com

（2）为集群创建webstore资源

 
         [root@node1 ~]
         # crm 
        
         crm(live)
         # configure 
        
         crm(live)configure
         # primitive webstore ocf:heartbeat:Filesystem params device="172.16.7.8:/www" directory="/var/www/html" fstype="nfs" op monitor interval=60s timeout=40s op start timeout=60s op stop timeout=60s 
        
         crm(live)configure
         # verify 
        
         crm(live)configure
         # commit

（3）为集群创建webserver资源

 
         crm(live)configure
         # primitive webserver lsb:httpd op monitor interval=30s timeout=20s on-fail=restart 
        
         crm(live)configure
         # verify 
        
         crm(live)configure
         # commit

当使用资源代理类型是LSB的时候，后面不需要批任何参数

查看集群状态，验证3个资源是否都在运行

 
         crm(live)
         # status 
        
         Last updated: Sun Apr 20 17:28:52 2014
        
         Last change: Sun Apr 20 17:26:19 2014 via cibadmin on node1.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node2.shuishui.com - partition with quorum
        
         Version: 1.1.8-7.el6-394e906
        
         2 Nodes configured, 2 expected votes
        
         3 Resources configured
        
         Online: [ node1.shuishui.com node2.shuishui.com ]
        
         webip  (ocf::heartbeat:IPaddr):    Started node2.shuishui.com 
        
         webstore   (ocf::heartbeat:Filesystem):    Started node1.shuishui.com 
        
         webserver  (lsb:httpd):    Started node2.shuishui.com

10、定义组资源

从上一条可以看出，为了分摊负载，资源默认是运行在不同的节点上的，为了使其都运行在一个节点上，有两种方式：（1）定义组资源（2）定义排列约束

这里就使用定义组资源吧

 
         crm(live)configure
         # group webcluster webip webstore webserver   #定义组webcluster，成员是后面三个资源 
        
         crm(live)configure
         # verify 
        
         crm(live)configure
         # commit

此时再查看集群状态，看看资源是怎么个分布法

 
         crm(live)configure
         # cd 
        
         crm(live)
         # status 
        
         Last updated: Sun Apr 20 17:45:06 2014
        
         Last change: Sun Apr 20 17:42:56 2014 via cibadmin on node1.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node2.shuishui.com - partition with quorum
        
         Version: 1.1.8-7.el6-394e906
        
         2 Nodes configured, 2 expected votes
        
         3 Resources configured
        
         Online: [ node1.shuishui.com node2.shuishui.com ]
        
         #三个资源以组的方式运行在一个节点上了 
        
         Resource Group: webcluster 
        
         webip  (ocf::heartbeat:IPaddr):    Started node2.shuishui.com 
        
         webstore   (ocf::heartbeat:Filesystem):    Started node2.shuishui.com 
        
         webserver  (lsb:httpd):    Started node2.shuishui.com

11、资源约束

（1）位置约束（Location）

定义资源更倾向运行在哪一个节点上，数值越大，倾向性超高

inf：无穷大 n -n -inf：负无穷，但凡有可能，就不会运行在这个节点上

（2）排列约束（Order）

资源运行在同一节点的倾向性

inf： -inf：就代表那种老死不相往来的状态

（3）顺序约束（Colocation）

定义资源的启动与关闭次序

就如我们定义的三个资源，他们的启动顺序应该是webip,webstore,webserver，那就给它们定义一个顺序约束吧

 
         crm(live)configure
         # order webip_webstore_webserver mandatory: webip webstore webserver 
        
         crm(live)configure
         # verify 
        
         crm(live)configure
         # commit

show命令：The `show` command displays objects.It may display all objects or a set of objects.The user may also choose to see only objects which were changed。如果想查看的更详细，可以使用show xml

 
         crm(live)configure
         # show 
        
         node node1.shuishui.com
        
         node node2.shuishui.com
        
         primitive webip ocf:heartbeat:IPaddr \
        
         params ip=
         "172.16.7.188" 
         \ 
        
         op 
         monitor interval=
         "30s" 
         timeout=
         "20s" 
         on-fail=
         "restart" 
        
         primitive webserver lsb:httpd \
        
         op 
         monitor interval=
         "30s" 
         timeout=
         "20s" 
         on-fail=
         "restart" 
        
         primitive webstore ocf:heartbeat:Filesystem \
        
         params device=
         "172.16.7.8:/www" 
         directory=
         "/var/www/html" 
         fstype=
         "nfs" 
         \ 
        
         op 
         monitor interval=
         "60s" 
         timeout=
         "40s" 
         \ 
        
         op 
         start timeout=
         "60s" 
         interval=
         "0" 
         \ 
        
         op 
         stop timeout=
         "60s" 
         interval=
         "0" 
        
         group webcluster webip webstore webserver
        
         order webip_webstore_webserver inf: webip webstore webserver
        
         property $
         id
         =
         "cib-bootstrap-options" 
         \ 
        
         dc
         -version=
         "1.1.8-7.el6-394e906" 
         \ 
        
         cluster-infrastructure=
         "classic openais (with plugin)" 
         \ 
        
         expected-quorum-votes=
         "2" 
         \ 
        
         stonith-enabled=
         "false" 
         \ 
        
         no-quorum-policy=
         "ignore"

大致就这样吧，将安装个论坛试试效果

五、安装discuz论坛，验证效果

1、在数据库服务器上授权

 
         MariaDB [(none)]> grant all on *.* to 
         'web'
         @
         '172.16.%.%' 
         identified by 
         'web'
         ; 
        
         Query OK, 0 rows affected (0.04 sec)
        
         MariaDB [(none)]> flush privileges;
        
         Query OK, 0 rows affected (0.05 sec)

2、安装论坛

基于NFS的论坛安装，博客中专门有一篇博文是介绍那个的；另外，此篇博文中所有没有详细介绍的，在本博客中都可以找到相应的博文

3、模拟节点损坏前服务器集群状态，各资源都是运行在node2上的

 
         [root@node1 ~]
         # crm status 
        
         Last updated: Sun Apr 20 18:18:01 2014
        
         Last change: Sun Apr 20 18:08:03 2014 via cibadmin on node1.shuishui.com
        
         Stack: classic openais (with plugin)
        
         Current DC: node2.shuishui.com - partition with quorum
        
         Version: 1.1.8-7.el6-394e906
        
         2 Nodes configured, 2 expected votes
        
         3 Resources configured
        
         Online: [ node1.shuishui.com node2.shuishui.com ]
        
         Resource Group: webcluster 
        
         webip  (ocf::heartbeat:IPaddr):    Started node2.shuishui.com 
        
         webstore   (ocf::heartbeat:Filesystem):    Started node2.shuishui.com 
        
         webserver  (lsb:httpd):    Started node2.shuishui.com

4、发一篇新贴，IP地址是VIP：172.16.7.188

5、刚才看到各资源是运行在node2上的，那我们就模拟node2损坏，看资源是否能够自动转移到node1上

 
         [root@node2 ~]
         # crm 
        
         crm(live)
         # node 
        
         crm(live)node
         # 
        
         crm(live)node
         # standby

6、在node1上查看此时的服务器集群状态

 
         [root@node1 ~]
         # crm status 
        
         Node node2.shuishui.com: standby           
         #node2standby了 
        
         Online: [ node1.shuishui.com ]
        
         Resource Group: webcluster                
         #资源都在node1上 
        
         webip  (ocf::heartbeat:IPaddr):    Started node1.shuishui.com 
        
         webstore   (ocf::heartbeat:Filesystem):    Started node1.shuishui.com 
        
         webserver  (lsb:httpd):    Started node1.shuishui.com