Caché ECP Clusters is a high availability feature that enables failover from one ECP data server to another, using operating system level clustering to detect a failed server. Caché ECP Clusters technology has been tested and is supported on Red Hat Linux Advanced Server version 2.1. This document contains details of how to configure the cluster and is organized into the following sections:
For detailed information on configuring the cluster on the Red Hat Advanced Server, see The Red Hat Cluster Manager Installation and Administration Guide.
Pre-installation Planning
This section outlines the requirements for configuring the cluster system. Subsequent sections describe the steps to install, define, and configure the various components of the cluster. To plan the process for setting up the cluster system, first determine whether the configuration is a hot-standby configuration or an active-active configuration. In a hot-standby configuration only one node is running Caché at a time.
In an active-active configuration each node is running its own instance of Caché. The nodes do not have direct access to the same databases; each database is assigned to one Caché configuration or the other. You can network the Caché instances with ECP and use namespace definitions to project the same data from both nodes.
An active-active configuration requires the following tasks:
Both types of configuration require the following tasks:
Configuring the Cluster Services for Caché
To prepare the cluster system and configure the first cluster node, perform the following steps:
  1. Use the fdisk utility to create the partitions as described in the Partitioning Disks section of The Red Hat Cluster Manager Installation and Administration Guide.
  2. Create the mount points for the partitions on each cluster node.
  3. Choose one node on which to start work.
  4. Install Caché on this node.
Define the Caché Cluster Services
Use the cluadmin utility to define the Caché cluster services. See the Using the cluadmin Utility section of The Red Hat Cluster Manager Installation and Administration Guide for more information.
  1. Specify the virtual IP addresses and the storage, but not the startup script, for Caché.
    While each configuration is given a preferred node, specify no for the relocate subcommand. If you specify yes, a service automatically relocates to the preferred node when it starts. Though it is acceptable to specify yes for Caché, it is preferable to specify no so that you can control the relocation process manually.
    Use the service relocate command to move services between nodes. See the Relocating a Service section of The Red Hat Cluster Manager Installation and Administration Guide for more information.
    The following output from the service show config command shows the configuration for a service named cacheha1 with a virtual IP address of 192.9.202.197 and one disk partition assigned:
    cluadmin>service show config cacheha1
    
    name: cacheha1                       -->Name picked for this service
    preferred node: lx4
    relocate: no
    user script: None                    -->Did not specify this yet
    monitor interval: 0                  -->Specify 0 here
    IP address 0: 192.9.202.197          -->Virtual IP address assigned
      netmask 0: 255.255.0.0
      broadcast 0: 192.9.255.255
    device 0: /dev/sdc4                  -->Partition name
      mount point, device 0: /storage2   -->Mount point for the partition
      mount fstype, device 0: ext3
      mount options, device 0: rw
      force unmount, device 0: yes
      samba share, device 0: None
    
  2. Define all necessary services and assign their storage.
  3. Use the service show state command to list the services and their current states. If any services are disabled, enable them with the service enable command. If any are running on the other node, move them to the current node with the service relocate command.
See the Service Configuration and Administration chapter of The Red Hat Cluster Manager Installation and Administration Guide for more information.
Install Caché
Install Caché on the first node. When the installation process asks for a configuration name, use the same name as you used for the cluster service.
When the installation completes, use the Caché Configuration Manager to make the following configuration changes. If you are installing multiple instances of Caché, do this after each installation, not when they are all complete.
  1. Change port number — From the Advanced tab, expand the General branch. Change the Default Port Number from 1972 to a unique port number for this instance of Caché.
  2. Update license managers — From the Advanced Tab, expand the License and License Managers branches.
    For Manager #1, replace the Name/IP Address with the name (or IP address) of the current node. Do not use the virtual IP address; use the real IP address (or DNS name).
    Add a second license manager (highlight License Managers and click Add) and enter the IP address or DNS name of the other cluster member. Again, use the real name or IP address, not the virtual IP address for the cluster.
  3. Configure ECP — If you are setting up an active-active cluster and using ECP between the instances of Caché, you may configure that now or wait until later. From the ECP tab, select the Act as an Enterprise Cache Protocol Server check box. Click Add and define the other instance of Caché as a server to this one. For clarity, choose the cluster configuration name for the name of the ECP server. This name is used in the Configuration Manager’s Create a Database wizard to refer to the remote node. Use the virtual IP address that you assigned to that configuration for the Host Name, not the real IP address or the real DNS name.
  4. Increase maximum number of ECP servers and clients — If there are other ECP servers or clients in your network, increase the maximum settings as appropriate for your system. From the Advanced tab, expand the Network branch. Under This System as an ECP Server, increase Max # of ECP Clients and under This System as an ECP Client, increase Max # of ECP Servers. You can also enable ECP access control if required.
See the Configuring Distributed Systems chapter of the Caché Distributed Data Management Guide for more information on configuring ECP.
Stop this running instance of Caché and repeat the process to install and configure all additional instances of Caché on this node.
The first cluster node is configured. Before continuing, invoke ccontrol list or ccontrol all at a shell prompt to gather the information necessary for configuring the second node.
Configuring the Second Node
Use the service relocate command of cluadmin to move the services (the storage you defined) to the other cluster node.
Define the Caché instances on this node in one of two ways:
Use ccontrol start to start the configurations and verify that they work, shut them down with ccontrol stop.
Adding Caché to the Cluster Services
The scripts necessary to start and stop Caché as part of service failover do not ship with Caché. An example of the main script is included in the Caché Initialization File for Linux section. After creating the main initialization script, perform the following steps to add Caché to the cluster services:
  1. Create a script in /etc/rc.d/init.d for each instance of Caché you install—one for each Caché cluster service you define. Model it after one of the two following examples:
    #!/bin/ksh
    /etc/rc.d/init.d/cache $1 cacheha1 failover
    exit ?$
    
    #!/bin/ksh
    /usr/local/etc/cachesys/cache-init $1 cacheha1 failover
    exit ?$
    
    Replace cacheha1 with your configuration name. Name the script in the form “/etc/rc.d/init.d/cache-<config>” (for this example, the file is: /etc/rc.d/init.d/cache-cacheha1).
  2. Use the service modify command of the cluadmin utility to update the script location for each service. Leave the monitor interval set to 0.
    The script supports the status command; however, if Caché becomes unresponsive and has a non-zero monitor interval, it will fail over automatically to the other cluster member which may prevent it from collecting any information required to diagnose the problem.
  3. Relocate the services to the node on which you want them to start. If it is the currently active node (that is, before adding the script, the services were controlling the storage availability on this node), use the service disable and service enable commands. Otherwise, use the service relocate command.
    The service relocate and service disable commands call the script with the stop parameter, which attempts to shut down Caché gracefully.
Caché is now part of the failover cluster services. Test the cluster using the following procedure:
  1. Unplug one node.
  2. Verify that the version of Caché that was running on the stopped node starts on the second node.
  3. Turn the failed machine back on; Caché should remain running on the second node (it should not fail back automatically).
  4. Unplug the second node; both Caché instances should migrate to the first node.
  5. Turn the second node back on.
  6. Use the service relocate command to move one of the instances of Caché back to the second node.
  7. Try to connect the Caché Configuration Manager to the two instances of Caché using the cluster virtual IP addresses. Look at the node name in the title bar to determine which node you have connected to and the path name to the cache.cpf file to determine which configuration it is.
Caché Initialization File for Linux
This section shows a sample script to start and stop Caché on the Linux Red Hat Advanced Server. Save the file as /etc/rc.d/init.c/cache or /usr/local/etc/cachesys/cache-init and set the protection to 755. The sample script:
#!/bin/ksh
#   cache
#
#   Cache "System V init" script for Linux systems
#
#   Copyright (c) 2003 by InterSystems.
#   Cambridge, Massachusetts, U.S.A.  All rights reserved.
#   Confidential, unpublished property of InterSystems.
# ------------------------------------------------------------------
# Maintenance
# 04/01/2003  This script is born.
# ------------------------------------------------------------------
#   This script is put in the init.d directory and is used by
#   the HA failover package to start a Cache configuration when
#   the node that was "serving" it failed.
#
#   Three arguments should be specified:
#      hacache start <config name> failover
#   where <config name> is the name of the configuration that
#   is displayed by "/usr/bin/ccontrol all" in the 2nd column.
#
#   This script can be used to start Cache if Cache is currently
#   down (meaning it is down on both nodes). However Cache must have
#   been shut down cleanly, it cannot have crashed (eg. there must
#   not be a cache.ids file in the cachesys/mgr directory).
#   In the future when we are capable of detecting which node created
#   the cache.ids file this script will be extended so that it can
#   also restart Cache at boot time following a crash. At the moment
#   if this script is called without the failover flag and the
#   cache.ids file exists, it will display a message and refuse to start
#   Cache. Do not use the failover flag to override this behaviour unless
#   you know Cache is not running on the other node.
#
#   It is very dangerous to call this script and specify the failover
#   flag outside of the failover scripts. In an HA environment where
#   multiple nodes can see the attached storage simultaneously (eg. NFS
#   mounted file systems) it is possible to start Cache from the same
#   directory on both nodes; Cache does not currently prevent this.
#   If this occurs the results will be disasterous and both nodes will
#   have to be shut down, database degradation may need to be repaired,
#   and so on.
#
if [ "$2" = "" ]
then
  type="xxxx"  #invalid option, forces usage message
else
  config=$2  #cache configuration to play with
  state=$3   #failover or "nothing"
#
  basdir=`/usr/bin/ccontrol list $config | grep -i directory | awk {'print $2'}`
  localnode=`uname -a | awk '{print $2}'`
  if [ "${basdir}" = "" ]
  then
  echo "Configuration $config not found"
  exit 1
  fi
  type=$1
fi
#
#See how we were called. 
case "$type" in 
  (start)
        # Start daemons.
   if [[((-e ${basdir}/mgr/cache.ids) &&("${state}" != "failover" ))]]
   then
      echo "$basdir/mgr/cache.ids exists and startup is not failover"
      echo "Cache configuration $config not started on $localnode"
      exit 1
   fi
   echo "Starting Cache-HA config $config on $localnode" 
   ccontrol start $config quietly
   status=$?
   case $status in
   (1)
       echo "...Failed to start"
       exit 1
       ;;
   (0)
       echo "...Started"
       exit 0
   esac
        ;;
  (stop)
        # Stop daemons.
        echo "Stopping Cache-HA config $config on $localnode"
   ccontrol stop $config quietly
   status=$?
   case $status in
   (1)
       echo "Cache configuration $config failed to stop"
       exit 1
       ;;
   (0)
       echo "Cache configuration $config stopped"
       exit 0
   esac
   ;;
  (status)
   FIELDWIDTH=2
   state=`/usr/bin/ccontrol all | grep -i $config | awk {'print $1'}`
   if [ "$state" = "up" ]
   then
      exit 0   #cache is up
   fi
   exit 1   #cache is down or we can't tell
        ;;
  (restart)
        $0 stop $2 $3 || :
        $0 start $2 $3
        ;;
  (*)
        echo "Usage: $0 {start|stop|status|restart} <config> [failover|null]"
        exit 1
esac

exit 0
Maintaining the Caché Registry When Upgrading
You can upgrade Caché in a failover cluster with Caché running on either cluster member. However, the registry which Caché maintains (displayed with ccontrol all and ccontrol list) does not display the correct version id on the node that did not run the upgrade. Update this manually using the ccontrol update command. The syntax is:
ccontrol update $cfgname versionid=$ver
For example, to set the current version id to 5.0.1.579 for configuration cacheha1, run:
ccontrol update cacheha1 versionid="5.0.1.579"