Debian Lenny HowTo
From Cluster Labs
This page will guide you trough installing a Corosync+ Pacemaker two node cluster which is later extended and worked with. The aim is to provide you with a working example of such a cluster.
Once you get up to speed using this HowTo you can dive into the more advanced configuration and documentation.
Contents |
Introduction
In this example we will first use these names and IP adresses for example code:
- node1 - ip 10.0.0.11 - first node
- node2 - ip 10.0.0.12 - second node
- virt1 - ip 10.0.0.21 - virtual IP adress
Disclaimer: We assume that you can work with debian linux already and know the security implications of working as root and so on.
If you get stuck using this HowTo you might try your luck on the #linux-ha irc channel on freenode.net
Installation
Install Debian Lenny
At first you need to [install] two servers (node1 and node2) with a Debian GNU/Linux version 5 also know as Debian Lenny.
Currently only x86 (for 32 bit systems) and amd64 (for 64 bit systems) are working.
Add repository to the apt system
This has to be done on both node1 as node2
Create a new file /etc/apt/sources.list.d/pacemaker.list that contains:
deb http://people.debian.org/~madkiss/ha lenny main
Add the Madkiss key to you package system:
apt-key adv --keyserver pgp.mit.edu --recv-key 1CFA3E8CD7145E30
If you omit this step you will get this error:
W: GPG error: http://people.debian.org lenny Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1CFA3E8CD7145E30
Update the package list
aptitude update
Install the packages
Installing the package pacemaker will install pacemaker with corosync, if you need openais lateron, you could install that as a plugin in corosync. OpenAIS is need for example for DLM or CLVM, but thats beyond the scope of this howto.
aptitude install pacemaker
Initial Configuration
Create authkey
To create an authkey for corosync communication between your two nodes do this on the first node:
node1~: sudo corosync-keygen
This creates a key in /etc/corosync/authkey
You need to copy this file to the second node and put it in the /etc/corosync directory with the right permissions. So on the first node:
node1~: scp /etc/corosync/authkey node2:
And on the second node:
node2~: sudo mv ~/authkey /etc/corosync/authkey node2~: sudo chown root:root /etc/corosync/authkey node2~: sudo chmod 400 /etc/corosync/authkey
Edit configfile
Most of the options in the /etc/corosync/corosync.conf file are ok to start with, you must however make sure that it can communicate so make sure to adjust this section:
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 192.168.2.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
Change your bindnetaddr to your local subnet so if you have configured the IP 10.0.0.23 for the first node and 10.0.0.24 for the second node, adjust your bindnetaddr to 10.0.0.0.
Enabling corosync
Corosync is disabled by default and starting it with the initscript will not work. To enable corosync you need to replace START=no with START=yes in /dev/default/corosync
Deal with firewall
Make sure you have opened the multicast port for udp traffic in your firewall. For example when using shorewall add this rule to your /etc/shorewall/rules file on both nodes:
# Multicast for pacemaker ACCEPT net fw udp 5405
Running corosync
Now that you have configured both nodes you can start the cluster on both sides:
node1~: sudo /etc/init.d/coroync start Starting corosync daemon: corosync.
node2~: sudo /etc/init.d/coroync start Starting corosync daemon: corosync.
Check the status
To check corosync status you can look at /var/log/daemon.log
If you take a look at the processlist using 'ps auxf' you should get something like this:
root 29980 0.0 0.8 44304 3808 ? Ssl 20:55 0:00 /usr/sbin/corosync root 29986 0.0 2.4 10812 10812 ? SLs 20:55 0:00 \_ /usr/lib/heartbeat/stonithd 102 29987 0.0 0.8 13012 3804 ? S 20:55 0:00 \_ /usr/lib/heartbeat/cib root 29988 0.0 0.4 5444 1800 ? S 20:55 0:00 \_ /usr/lib/heartbeat/lrmd 102 29989 0.0 0.5 12364 2368 ? S 20:55 0:00 \_ /usr/lib/heartbeat/attrd 102 29990 0.0 0.5 8604 2304 ? S 20:55 0:00 \_ /usr/lib/heartbeat/pengine 102 29991 0.0 0.6 12648 3080 ? S 20:55 0:00 \_ /usr/lib/heartbeat/crmd
And you can issue the crm_mon tool to get info about the current status of the cluster. We use -V for extra information.
node1~: sudo crm_mon --one-shot -V crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: No STONITH resources have been defined crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity ============ Last updated: Fri Nov 6 21:03:51 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1 node2 ]
As you can see the setup is complaining about STONITH, but that is since we have not configured that part of the cluster.
Configure an IP resource
We are now going to configure the Configuration Information Base or CIB using the Cluster Resouce Manager or CRM command line tool.
First we start the crm commandline tool:
node1~: sudo crm crm(live)#
Then we create a copy of the current configuration to edit in, we will commit this copy when we are done editing:
crm(live)# cib new config20090726 INFO: config20090726 shadow CIB created crm(config20090726)#
Then we go into configuration mode and we show the current config:
crm(config20090726)# configure crm(config20090726)configure# show node host132.procolix.com node host133.procolix.com property $id="cib-bootstrap-options" \
dc-version="1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe"
cluster-infrastructure="openais" \
expected-quorum-votes="2"
We now turn off STONITH since we don't need it in this example configuration:
crm(config20090726)configure# property stonith-enabled=false
Now we add our failover IP to the configuration:
crm(config20090726)configure# primitive failover-ip ocf:heartbeat:IPaddr params ip=10.0.0.21 op monitor interval=10s
And lastly we check if our configuration is valid and then commit it to the cluster and quit the configuration tool:
crm(config20090726)configure# verify crm(config20090726)configure# end There are changes pending. Do you want to commit them? y crm(config20090726)# crm(config20090726)# cib use live crm(live)# cib commit config20090726 INFO: commited 'config20090726' shadow CIB to the cluster crm(live)# quit bye
When we now do a one-shot crm_mon we get:
node1~: sudo crm_mon --one-shot ============ Last updated: Fri Nov 6 21:5:51 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 ] failover-ip (ocf::heartbeat:IPaddr): Started node1
Resource operations
There are quite some things you can do with a resource, here are a few examples:
Put a node in standby and back online again
Put node1 in standby
When you want to do maintenance on node1 you can put that node in standby mode. That works like this:
node1~: sudo crm crm(live)# node crm(live)node# standby crm(live)node# quit bye
You can see that it actually failed to the other node with crm_mon:
node1~: sudo crm_mon --one-shot ============ Last updated: Fri Nov 6 21:04:31 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Node node1: standby Online: [ node2 ] failover-ip (ocf::heartbeat:IPaddr): Started node2
Put node1 online again
When maintenance is over you can start node1 again like this:
node1~: sudo crm crm(live)# node crm(live)node# online crm(live)node# bye bye
Now you can see that the resource has failed back again to node1:
node1~: sudo crm_mon --one-shot ============ Last updated: Fri Nov 6 21:08:22 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes
1 Resources configured. ============ Online: [ node1 node2 ] failover-ip (ocf::heartbeat:IPaddr): Started node1
Migrate the resource to the other node
You might want the resource to run on the other node then the one it is running on right now, this is being done with the migrate command.
We are now telling our cluster to run the IP resource on node2 instead of node1:
node1~: sudo crm crm(live)# resource crm(live)resource# list failover-ip (ocf::heartbeat:IPaddr) Started crm(live)resource# migrate failover-ip node2 crm(live)resource# bye bye
You can now see that it is running on the other node using crm_mon:
node1~: sudo crm_mon --one-shot ============ Last updated: Fri Nov 6 21:09:45 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 ] failover-ip (ocf::heartbeat:IPaddr): Started node2
Stop the resource
You might want to stop your resource or, in other words, make your resource unavailable. That can be done like this:
node1~: sudo crm crm(live)# resource crm(live)resource# stop failover-ip crm(live)resource# bye bye
Using crm_mon that will look like this:
node1~: sudo crm_mon --one-shot ============ Last updated: Fri Nov 6 21:11:56 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 ]
Note that there is no resource listed here, but you can see that there is one configured resource.
Add another node
Now we have a two node cluster, but you might want to upgrade your setup by adding a node.
We will call this node:
- node3 - ip 10.0.0.13 - third node
First install an extra node as described above under 'Installation' and add it to the cluster by adding the authkey and the configuration and possibly configure the firewall.
Then check if it all worked:
node1~: crm_mon --one-shot ============ Last updated: Fri Nov 6 21:18:14 2009 Stack: openais Current DC: node1 - partition with quorum Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 node3 ] failover-ip (ocf::heartbeat:IPaddr): Started node1

