Caché ECP Clusters is a high availability feature that enables failover
from one ECP data server to another, using operating system level clustering to detect
a failed server. Caché ECP Clusters technology has been tested and is supported
on Red Hat Linux Advanced Server version 2.1. This document contains details of how
to configure the cluster and is organized into the following sections:
Pre-installation Planning
This section outlines the requirements for configuring the cluster system. Subsequent
sections describe the steps to install, define, and configure the various components
of the cluster. To plan the process for setting up the cluster system, first determine
whether the configuration is a
hot-standby configuration or an
active-active configuration.
In a hot-standby configuration only one node is running Caché at a time.
In an active-active configuration each node is running its own instance of Caché.
The nodes do not have direct access to the same databases; each database is assigned
to one Caché configuration or the other. You can network the Caché instances
with ECP and use namespace definitions to project the same data from both nodes.
-
Assign each Caché instance a unique cluster IP address.
-
Assign each Caché instance a unique default port number.
-
Assign each Caché instance one or more disk partitions where
it is installed and where the databases reside. Caché instances must not share
partitions.
Both types of configuration require the following tasks:
-
Calculate and update the settings for the various kernel parameters
that need to be modified to support the Caché installation and verify that
both nodes can support the requirements. If you are configuring an active-active cluster,
add the parameter values together, as one node may at some point be running both instances
of Caché.
-
Assign a virtual IP address for each active Caché instance.
-
Verify that there are enough partitions for each active Caché
instance to have its own.
-
-
Choose the Red Hat service names for your Caché configuration
names. Though not required, using the same names simplifies the cluster configuration.
Configuring the Cluster Services for Caché
To prepare the cluster system and configure the first cluster node, perform
the following steps:
-
-
Create the mount points for the partitions on each cluster node.
-
Choose one node on which to start work.
-
-
Define the Caché Cluster Services
-
Specify the virtual IP addresses and the storage, but not the startup
script, for Caché.
While each configuration is given a preferred
node, specify
no for the
relocate subcommand.
If you specify
yes, a service automatically relocates to the preferred
node when it starts. Though it is acceptable to specify
yes for
Caché, it is preferable to specify
no so that you can control
the relocation process manually.
cluadmin>service show config cacheha1
name: cacheha1 -->Name picked for this service
preferred node: lx4
relocate: no
user script: None -->Did not specify this yet
monitor interval: 0 -->Specify 0 here
IP address 0: 192.9.202.197 -->Virtual IP address assigned
netmask 0: 255.255.0.0
broadcast 0: 192.9.255.255
device 0: /dev/sdc4 -->Partition name
mount point, device 0: /storage2 -->Mount point for the partition
mount fstype, device 0: ext3
mount options, device 0: rw
force unmount, device 0: yes
samba share, device 0: None
-
Define all necessary services and assign their storage.
-
Install Caché on the first node. When the installation process asks for
a configuration name, use the same name as you used for the cluster service.
When the installation completes, use the Caché Configuration Manager
to make the following configuration changes. If you are installing multiple instances
of Caché, do this after each installation, not when they are all complete.
-
-
For
Manager #1, replace the
Name/IP Address with
the name (or IP address) of the current node. Do not use the virtual IP address; use
the real IP address (or DNS name).
Add a second license manager (highlight
License Managers and
click
Add) and enter the IP address or DNS name of the other
cluster member. Again, use the real name or IP address, not the virtual IP address
for the cluster.
-
Configure ECP If you are setting up an
active-active cluster and using ECP between the instances of Caché, you may
configure that now or wait until later. From the
ECP tab, select
the
Act as an Enterprise Cache Protocol Server check box. Click
Add and
define the other instance of Caché as a server to this one. For clarity, choose
the cluster configuration name for the name of the ECP server. This name is used in
the Configuration Manager’s
Create a Database wizard to
refer to the remote node. Use the
virtual IP address that you
assigned to that configuration for the
Host Name, not the real
IP address or the real DNS name.
-
Stop this running instance of Caché and repeat the process to install
and configure all additional instances of Caché on this node.
Configuring the Second Node
Define the Caché instances on this node in one of two ways:
-
Run the installation procedure again and give the same configuration
name you used on the other node. Install into the same directory.
This
is the best option if you are configuring a Web server for use with Caché,
because the Web server configuration is
local to each node. Reinstalling
Caché does not affect the changes you made to the configuration (
cache.cpf)
file.
-
If you do not need Caché to configure the Web server on the
other node, register the Caché instances on the local node using the
ccontrol
install command:
ccontrol create $cfgname directory=$tgtdir versionid=$ver
Configuration Version ID Port Directory
--------------- ---------------- ----- ----------
dn CACHEHA1 5.0.1.543 1973 /store1/c50ha
db CACHEHA2 5.0.1.543 1972 /store2/c50ha2
ccontrol create cacheha1 directory=”/store1/c50ha” versionid=”5.0.1.543”
ccontrol create cacheha2 directory=”/store1/c50ha” versionid=”5.0.1.543”
Adding Caché to the Cluster Services
The scripts necessary to start and stop Caché as part of service failover
do not ship with Caché. An example of the main script is included in the
Caché Initialization File for Linux section.
After creating the main initialization script, perform the following steps to add
Caché to the cluster services:
-
Create a script in
/etc/rc.d/init.d for each
instance of Caché you installone for each Caché cluster service
you define. Model it after one of the two following examples:
#!/bin/ksh
/etc/rc.d/init.d/cache $1 cacheha1 failover
exit ?$
#!/bin/ksh
/usr/local/etc/cachesys/cache-init $1 cacheha1 failover
exit ?$
-
The script supports the
status command; however, if Caché
becomes unresponsive and has a non-zero monitor interval, it will fail over automatically
to the other cluster member which may prevent it from collecting any information required
to diagnose the problem.
-
Relocate the services to the node on which you want them to start.
If it is the currently active node (that is, before adding the script, the services
were controlling the storage availability on this node), use the
service disable and
service enable commands.
Otherwise, use the
service relocate command.
Caché is now part of the failover cluster services. Test the cluster
using the following procedure:
-
-
Verify that the version of Caché that was running on the stopped
node starts on the second node.
-
Turn the failed machine back on; Caché should remain running
on the second node (it should not fail back automatically).
-
Unplug the second node; both Caché instances should migrate
to the first node.
-
Turn the second node back on.
-
Use the
service relocate command to move one of
the instances of Caché back to the second node.
-
Try to connect the Caché Configuration Manager to the two instances
of Caché using the cluster virtual IP addresses. Look at the node name in the
title bar to determine which node you have connected to and the path name to the
cache.cpf file
to determine which configuration it is.
Caché Initialization File for Linux
#!/bin/ksh
# cache
#
# Cache "System V init" script for Linux systems
#
# Copyright (c) 2003 by InterSystems.
# Cambridge, Massachusetts, U.S.A. All rights reserved.
# Confidential, unpublished property of InterSystems.
# ------------------------------------------------------------------
# Maintenance
# 04/01/2003 This script is born.
# ------------------------------------------------------------------
# This script is put in the init.d directory and is used by
# the HA failover package to start a Cache configuration when
# the node that was "serving" it failed.
#
# Three arguments should be specified:
# hacache start <config name> failover
# where <config name> is the name of the configuration that
# is displayed by "/usr/bin/ccontrol all" in the 2nd column.
#
# This script can be used to start Cache if Cache is currently
# down (meaning it is down on both nodes). However Cache must have
# been shut down cleanly, it cannot have crashed (eg. there must
# not be a cache.ids file in the cachesys/mgr directory).
# In the future when we are capable of detecting which node created
# the cache.ids file this script will be extended so that it can
# also restart Cache at boot time following a crash. At the moment
# if this script is called without the failover flag and the
# cache.ids file exists, it will display a message and refuse to start
# Cache. Do not use the failover flag to override this behaviour unless
# you know Cache is not running on the other node.
#
# It is very dangerous to call this script and specify the failover
# flag outside of the failover scripts. In an HA environment where
# multiple nodes can see the attached storage simultaneously (eg. NFS
# mounted file systems) it is possible to start Cache from the same
# directory on both nodes; Cache does not currently prevent this.
# If this occurs the results will be disasterous and both nodes will
# have to be shut down, database degradation may need to be repaired,
# and so on.
#
if [ "$2" = "" ]
then
type="xxxx" #invalid option, forces usage message
else
config=$2 #cache configuration to play with
state=$3 #failover or "nothing"
#
basdir=`/usr/bin/ccontrol list $config | grep -i directory | awk {'print $2'}`
localnode=`uname -a | awk '{print $2}'`
if [ "${basdir}" = "" ]
then
echo "Configuration $config not found"
exit 1
fi
type=$1
fi
#
#See how we were called.
case "$type" in
(start)
# Start daemons.
if [[((-e ${basdir}/mgr/cache.ids) &&("${state}" != "failover" ))]]
then
echo "$basdir/mgr/cache.ids exists and startup is not failover"
echo "Cache configuration $config not started on $localnode"
exit 1
fi
echo "Starting Cache-HA config $config on $localnode"
ccontrol start $config quietly
status=$?
case $status in
(1)
echo "...Failed to start"
exit 1
;;
(0)
echo "...Started"
exit 0
esac
;;
(stop)
# Stop daemons.
echo "Stopping Cache-HA config $config on $localnode"
ccontrol stop $config quietly
status=$?
case $status in
(1)
echo "Cache configuration $config failed to stop"
exit 1
;;
(0)
echo "Cache configuration $config stopped"
exit 0
esac
;;
(status)
FIELDWIDTH=2
state=`/usr/bin/ccontrol all | grep -i $config | awk {'print $1'}`
if [ "$state" = "up" ]
then
exit 0 #cache is up
fi
exit 1 #cache is down or we can't tell
;;
(restart)
$0 stop $2 $3 || :
$0 start $2 $3
;;
(*)
echo "Usage: $0 {start|stop|status|restart} <config> [failover|null]"
exit 1
esac
exit 0
Maintaining the Caché Registry When Upgrading
You can upgrade Caché in a failover cluster with Caché running
on either cluster member. However, the registry which Caché maintains (displayed
with
ccontrol all and
ccontrol list) does not
display the correct version id on the node that did not run the upgrade. Update this
manually using the
ccontrol update command. The syntax is:
ccontrol update $cfgname versionid=$ver
For example, to set the current version id to
5.0.1.579 for
configuration
cacheha1, run:
ccontrol update cacheha1 versionid="5.0.1.579"