This chapter discusses the following:
For an overview of the tasks that must be performed to configure a cluster, see “Configuring with the cmgr Command” in Chapter 9.
Tasks must be performed using a certain hierarchy. For example, to modify a partition ID, you must first identify the node name.
You can also use the cluster_status tool to view status in curses mode. See Chapter 16, “Monitoring Status”.
Note: CXFS requires a license to be installed on each node. If you install the software without properly installing the license, you cannot use the cmgr command. For more information about licensing, see Chapter 4, “Obtaining CXFS and XVM FLEXlm Licenses”, Chapter 6, “IRIX CXFS Installation”, and Chapter 7, “Linux 64-bit CXFS Installation”. For information about licensing and nodes running an operating system other than IRIX or Linux 64-bit, see the CXFS MultiOS Client-Only Guide for SGI InfiniteStorage |
To use the cmgr command, you must be logged in as root on a CXFS administration node. Then enter either of the following:
# /usr/cluster/bin/cmgr |
or
# /usr/cluster/bin/cluster_mgr |
After you have entered this command, you will see the following message and the command prompt (cmgr>):
Welcome to SGI Cluster Manager Command-Line Interface cmgr> |
Do not make configuration changes on two different administration nodes in the pool simultaneously, or use the CXFS GUI, cmgr, and xvm commands simultaneously to make changes. You should run one instance of the cmgr command or the CXFS GUI on a single administration node in the pool when making changes at any given time. However, you can use any node in the pool when requesting status or configuration information.
After the command prompt displays, you can enter subcommands. At any time, you can enter ? or help to bring up the cmgr help display.
The -p option to cmgr displays prompts for the required inputs of administration commands that define and modify CXFS components. You can run in prompt mode in either of the following ways:
Specify a -p option on the command line:
# cmgr -p |
Execute a set prompting on command after you have brought up cmgr, as in the following example:
cmgr> set prompting on |
This method allows you to toggle in and out of prompt mode as you execute individual subcommands. To get out of prompt mode, enter the following:
cmgr> set prompting off |
The following shows an example of the questions that may be asked in prompting mode (the actual questions asked will vary depending upon your answers to previous questions):
cmgr> define node nodename Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? Is this a FailSafe node <true|false> ? Is this a CXFS node <true|false> ? Operating System <IRIX|Linux32|Linux64|AIX|HPUX|Solaris|MacOSX|Windows> ? Node Function <server_admin|client_admin|client_only> ? Node ID ?[optional] Partition ID ?[optional] (0) Do you wish to define failure hierarchy[y/n]: Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to define system controller info[y/n]: Sysctrl Type <msc|mmsc|l2|l1>? (msc) Sysctrl Password[optional] ? ( ) Sysctrl Status <enabled|disabled> ? Sysctrl Owner ? Sysctrl Device ? Sysctrl Owner Type <tty> ? (tty) Number of Network Interfaces ? (1) NIC 1 - IP Address ? |
For details about this task, see “Define a Node with cmgr”.
When you are creating or modifying a component of a cluster, you can enter either of the following commands:
cancel, which aborts the current mode and discards any changes you have made
done, which executes the current definitions or modifications and returns to the cmgr> prompt
You can execute a series of cmgr commands by using the -f option and specifying an input file:
cmgr -f input_file |
Or, you could include the following as the first line of the file and then execute it as a script:
#!/usr/cluster/bin/cmgr -f |
Each line of the file must be a valid cmgr command line, comment line (starting with #), or a blank line.
Note: You must include a done command line to finish a multilevel command and end the file with a quit command line. |
If any line of the input file fails, cmgr will exit. You can choose to ignore the failure and continue the process by using the -i option with the -f option, as follows:
cmgr -if input_file |
Or include it in the first line for a script:
#!/usr/cluster/bin/cmgr -if |
Note: If you include -i when using a cmgr command line as the first line of the script, you must use this exact syntax (that is, -if). |
For example, suppose the file /tmp/showme contains the following:
cxfs6# more /tmp/showme show clusters show nodes in cluster cxfs6-8 quit |
You can execute the following command, which will yield the indicated output:
cxfs6# /usr/cluster/bin/cmgr -if /tmp/showme 1 Cluster(s) defined cxfs6-8 Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
Or you could include the cmgr command line as the first line of the script, give it execute permission, and execute showme itself:
cxfs6# more /tmp/showme #!/usr/cluster/bin/cmgr -if # show clusters show nodes in cluster cxfs6-8 quit cxfs6# /tmp/showme 1 Cluster(s) defined cxfs6-8 Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
For an example of defining a complete cluster, see “Script Example”.
To invoke a shell from within cmgr, enter the following:
cmgr> sh cxfs6# |
To exit the shell and to return to the cmgr> prompt, enter the following:
cxfs6# exit cmgr> |
You can enter some cmgr subcommands directly from the command line using the following format:
cmgr -c "subcommand" |
where subcommand can be any of the following with the appropriate operands:
admin, which allows you to perform certain actions such as resetting a node
start, which starts CXFS services and sets the configuration so that CXFS services will be automatically restarted upon reboot
stop, which stops CXFS services and sets the configuration so that CXFS services are not restarted upon reboot
For example, to display information about the cluster, enter the following:
# cmgr -c "show clusters" 1 Cluster(s) defined eagan |
See the cmgr man page for more information.
The /var/cluster/cmgr-templates directory contains template cmgr scripts that you can modify to configure the different components of your system.
Each template file contains lists of cmgr commands required to create a particular object, as well as comments describing each field. The template also provides default values for optional fields.
The /var/cluster/cmgr-templates directory contains the following templates to create a cluster and nodes:
To create a CXFS configuration, you can concatenate multiple templates into one file and execute the resulting script.
Note: If you concatenate information from multiple template scripts to prepare your cluster configuration, you must remove the quit at the end of each template script, except for the final quit. A cmgr script must have only one quit line. |
For example, for a three-node configuration, you would concatenate three copies of the cmgr-create-node file and one copy of the cmgr-create-cluster file.
You can set a default cluster and node to simplify the configuration process for the current session of cmgr. The default will then be used unless you explicitly specify a name. You can use the following commands to specify default values:
set cluster clustername set node hostname |
clustername and hostname are logical names. Logical names cannot begin with an underscore (_) or include any whitespace characters, and can be at most 255 characters.
To view the current defaults, use the following:
show set defaults |
For example:
cmgr> set cluster cxfs6-8 cmgr> set node cxfs6 cmgr> show set defaults Default cluster set to: cxfs6-8 Default node set to: cxfs6 Default cdb set to: /var/cluster/cdb/cdb.db Default resource_type is not set Extra prompting is set off |
This section tells you how to define, modify, delete, display, and reset a node using cmgr.
Note: The entire cluster status information is sent to each CXFS administration node each time a change is made to the cluster database; therefore, the more CXFS administration nodes in a configuration, the longer it will take. |
To define a node, use the following commands:
define node logical_hostname set hostname to hostname set nodeid to nodeID set node_function to server_admin|client_admin|client_only set partition_id to partitionID set reset_type to powerCycle|reset|nmi set sysctrl_type to msc|mmsc|l2|l1 (based on node hardware) set sysctrl_password to password set sysctrl_status to enabled|disabled set sysctrl_owner to node_sending_reset_command set sysctrl_device to port set sysctrl_owner_type to tty_device set is_failsafe to true|false set is_cxfs to true|false set operating_system to irix|linux32|linux64|aix|hpux|solaris|macosx|windows set weight to 0|1 (no longer needed) add nic IP_address_or_hostname (if DNS) set heartbeat to true|false set ctrl_msgs to true|false set priority to integer remove nic IP_address_or_hostname (if DNS) set hierarchy to [system][fence][reset][fencereset][shutdown] |
Usage notes:
logical_hostname is a simple hostname (such as lilly) or a fully qualified domain name (such as lilly.mycompany.com) or an entirely different name (such as nodeA). Logical names cannot begin with an underscore (_) or include any whitespace characters, and can be at most 255 characters.
hostname is the fully qualified hostname unless the simple hostname is resolved on all nodes. Use the ping to display the fully qualified hostname. Do not enter an IP address. The default for hostname is the value for logical_hostname; therefore, you must supply a value for this command if you use a value other than the hostname or an abbreviation of it for logical_hostname.
nodeid is an integer in the range 1 through 32767 that is unique among the nodes in the pool. You must not change the node ID number after the node has been defined.
For administration nodes, this value is optional. If you do not specify a number for an administration node, CXFS will calculate an ID for you. The default ID is a 5-digit number based on the machine's serial number and other machine-specific information; it is not sequential.
For client-only nodes, you must specify a unique value.
node_function specifies the function of the node. Enter one of the following:
server_admin is an IRIX or Linux 64-bit node on which you will execute cluster administration commands and that you also want to be a CXFS metadata server. (You will use the Define a CXFS Filesystem task to define the specific filesystem for which this node can be a metadata servers.) Use this node function only if the node will be a metadata servers. You must install the cluster_admin product on this node.
client_admin is an IRIX or Linux 64-bit node on which you will execute cluster administration commands but that you do not want to use as a CXFS metadata server. Use this node function only if the node will run FailSafe but you do not want it to be a metadata server. You must install the cluster_admin product on this node.
client_only, is a node that shares CXFS filesystems but on which you will not execute cluster administration commands and that will not be a CXFS metadata server. Use this node function for all nodes other than those that will be metadata servers, or those that will run FailSafe without being a metadata server. You must install the cxfs_client product on this node. This node can run IRIX, Linux 32-bit, Linux 64-bit, AIX, HPUX, Solaris, Mac OS X, or Windows. (Nodes other than IRIX and Linux 64-bit are required to be client-only nodes.)
AIX, HPUX, Solaris, Mac OS X, and Windows nodes are automatically specified as client-only. You should specify client-only with linux32.
partition_id uniquely defines a partition in a partitioned Origin 3000 or Altix 3000 system. The set partition_id command is optional; if you do not have a partitioned Origin 3000 system, you can skip this command or enter 0.
To unset the partition ID, use a value of 0 or none.
On an Altix 3000, you can find the partition ID by reading the proc file. For example:
[root@linux64 root]# cat /proc/sgi_sn/partition_id 0 |
The 0 indicates that the system is not partitioned. If the system is partitioned, the number of partitions (such as 1, 2, etc.) is displayed.
reset_type can be one of the following:
powerCycle shuts off power to the node and then restarts it
reset simulates the pressing of the reset button on the front of the machine
nmi (nonmaskable interrupt) performs a core-dump of the operating system kernel, which may be useful when debugging a faulty machine
sysctrl_type is the system controller type based on the node hardware, as show in Table 11-1.
Table 11-1. System Controller Types
l1 | l2 | mmsc | msc |
---|---|---|---|
Origin 300 | Origin 3400 | SGI 2400 rackmount | Origin 200 |
Origin 3200C | Origin 3800 | SGI 2800 rackmount | Onyx2 deskside |
Onyx 300 | Origin 300 with NUMAlink module | Onyx2 rackmount | SGI 2100 deskside |
Onyx 3200C | Onyx 3000 series |
| SGI 2200 deskside |
Tezro | Altix 3000 |
|
|
sysctrl_password is the password for the system controller port, not the node's root password or PROM password. On some nodes, the system administrator may not have set this password. If you wish to set or change the system controller password, consult the hardware manual for your node.
sysctrl_status allows you to provide information about the system controller but temporarily disable reset by setting this value to disabled (meaning that CXFS cannot reset the node). To allow CXFS to reset the node, enter enabled. For nodes without system controllers, set this to disabled; see “Requirements” in Chapter 1.
sysctrl_device is the port used. /dev/ttyd2 is the most commonly used port, except on Origin 300 and Origin 350 systems, where /dev/ttyd4 is commonly used.
sysctrl_owner is the name of the node that sends the reset command. Serial cables must physically connect the node being defined and the owner node through the system controller port. At run time, the node must be defined in the CXFS pool.
sysctrl_owner_type is the name of the terminal port (TTY) on the owner node to which the system controller is connected, such as /dev/ttyd2. The other end of the cable connects to this node's system controller port, so the node can be controlled remotely by the other end.
If you are running just CXFS on this node, set is_cxfs to true and is_failsafe to false. If you are running both CXFS and FailSafe on this node in a coexecution cluster, set both values to true.
operating_system can be set to irix, linux32, linux64, aix, hpux, solaris, macosx, or windows. (Use windows for Windows NT, Windows 2000, or Windows XP.)
Note: For support details, see the CXFS MultiOS Client-Only Guide for SGI InfiniteStorage. |
If you specify aix, hpux, solaris, macosx or windows, the weight is assumed to be 0. If you try to specify incompatible values for operating_system and is_failsafe or weight, the define command will fail.
weight, which is automatically set internally to either 0 or 1 to specify how many votes a particular CXFS administration node has in CXFS kernel membership decisions. This information is now set by the Node Function field and this command is no longer needed.
Note: Although it is possible to use the set weight command to set a weight other than 0 or 1, SGI recommends that you do not do so. There is no need for additional weight. |
nic is the IP address or hostname of the private network. (The hostname must be resolved in the /etc/hosts file.)
There can be up to 8 network interfaces. If the first priority network fails, the second will be used, and so on. SGI requires that this network be private; see “Private Network” in Chapter 1.
The priorities of the networks must be the same for each node in the cluster. For more information about using the hostname, see “Hostname Resolution and Network Configuration Rules” in Chapter 5.
hierarchy defines the fail action hierarchy, which determines what happens to a failed node. You can specify up to three options. The second option will be completed only if the first option fails; the third option will be completed only if both the first and second options fail. Options must be separated by commas and no whitespace.
The option choices are as follows:
system deletes all hierarchy information about the node from the database, causing the system defaults to be used. The system defaults are the same as entering reset,shutdown. This means that a reset will be performed on a node with a system controller; if the reset fails or if the node does not have a system controller, CXFS services will be stopped. Therefore, you should choose a setting other than the default for nodes without system controllers; see “Requirements” in Chapter 1. You cannot specify other hierarchy options if you specify the system option.
fence disables access to the storage area network (SAN) from the problem node. Fencing provides faster recovery of the CXFS kernel membership than reset. This action is available for all nodes.
On nodes with system controllers, you would want to use I/O fencing for data integrity protection when CXFS is just a part of what the node is doing, and you prefer losing access to CXFS to having the system rebooted; for example, for a big compute server that is also a CXFS client. You would want to use reset for I/O protection on a node with a system controller when CXFS is a primary activity and you want to get it back online fast; for example, a CXFS file server.
On nodes without system controllers, your only choice for data integrity protection is I/O fencing.
Note: A Brocade Fibre Channel switch sold and supported by SGI is mandatory to support I/O fencing. |
fencereset disables access to the SAN from the problem node and then, if the node is successfully fenced, also performs an asynchronous reset of the node; recovery begins without waiting for reset acknowledgement. This action is available only for nodes with system controllers; see “Requirements” in Chapter 1.
reset performs a reset via a serial line connected to the system controller. This action is available only for nodes with system controllers.
shutdown stops CXFS kernel-based services on the node in response to a loss of CXFS kernel membership . The surviving cluster delays the beginning of recovery to allow the node time to complete the shutdown. This action is available for all nodes.
To perform a reset only if a fencing action fails, specify the following:
set hierarchy fence,reset |
Note: If shutdown is not specified and the other actions fail, the node attempting to deliver the CXFS kernel membership will locally forcibly shutdown CXFS services. |
To perform a fence and an asynchronous reset, specify the following:
set hierarchy fencereset |
To return to system defaults (reset,shutdown), specify the following:
set hierarchy system |
For more information, see “CXFS Kernel Membership, Quorum, and Tiebreaker” in Appendix B, and “Define a Node with the GUI” in Chapter 10.
In prompting mode, press the Enter key to use default information. (The Enter key is not shown in the examples.) For general information, see “Define a Node with the GUI” in Chapter 10. Following is a summary of the prompts.
cmgr> define node logical_hostname Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? hostname Is this a FailSafe node <true|false> ? true|false Is this a CXFS node <true|false> ? truet Operating System <IRIX|Linux32|Linux64|AIX|HPUX|Solaris|MacOSX|Windows> ?OS_type Node Function <server_admin|client_admin|client_only> ? node_function Node ID ?[optional] node_ID Partition ID ?[optional] (0)partition_ID Do you wish to define failure hierarchy[y/n]:y|n Do you wish to define system controller info[y/n]:y|n Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to define system controller info[y/n]:y|n Sysctrl Type <msc|mmsc|l2|l1>? (msc) model (based on node hardware) Sysctrl Password[optional] ? ( )password Sysctrl Status <enabled|disabled> ? enabled|disabled Sysctrl Owner ? node_sending_reset_command Sysctrl Device ? port Sysctrl Owner Type <tty> ? (tty) tty_device Number of Network Interfaces ? (1) number NIC 1 - IP Address ? IP_address_or_hostname (if DNS) |
For example, in normal mode:
# /usr/cluster/bin/cmgr Welcome to SGI Cluster Manager Command-Line Interface cmgr> define node foo Enter commands, you may enter "done" or "cancel" at any time to exit ? set is_failsafe to false ? set is_cxfs to true ? set operating_system to irix ? set node_function to server_admin ? set hierarchy to fencereset,reset ? add nic 111.11.11.111 Enter network interface commands, when finished enter "done" or "cancel" NIC 1 - set heartbeat to true NIC 1 - set ctrl_msgs to true NIC 1 - set priority to 1 NIC 1 - done ? done |
For example, in prompting mode:
# /usr/cluster/bin/cmgr -p Welcome to SGI Cluster Manager Command-Line Interface cmgr> define node foo Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? Is this a FailSafe node <true|false> ? false Is this a CXFS node <true|false> ? true Operating System <IRIX|Linux32|Linux64|AIX|HPUX|Solaris|MacOSX|Windows> ? irix Node Function <server_admin|client_admin|client_only> server_admin Node ID[optional]? Partition ID ? [optional] (0) Do you wish to define failure hierarchy[y|n]:y Hierarchy option 0 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? fencereset Hierarchy option 1 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? reset Hierarchy option 2 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to define system controller info[y/n]:n Number of Network Interfaces ? (1) NIC 1 - IP Address ? 111.11.11.111 NIC 1 - Heartbeat HB (use network for heartbeats) <true|false> ? true NIC 1 - (use network for control messages) <true|false> ? true NIC 1 - Priority <1,2,...> 1 |
Following is an example of defining a Solaris node in prompting mode (because it is a Solaris node, no default ID is provided, and you are not asked to specify the node function because it must be client_only).
cmgr> define node solaris1 Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? Is this a FailSafe node <true|false> ? false Is this a CXFS node <true|false> ? true Operating System <IRIX|Linux32|Linux64|AIX|HPUX|Solaris|MacOSX|Windows> ? solaris Node ID ? 7 Do you wish to define failure hierarchy[y/n]:y Hierarchy option 0 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? fence Hierarchy option 1 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? Number of Network Interfaces ? (1) NIC 1 - IP Address ? 163.154.18.172 |
To modify an existing node, use the following commands:
modify node logical_hostname set hostname to hostname set partition_id to partitionID set reset_type to powerCycle|reset|nmi set sysctrl_type to msc|mmsc|l2|l1 (based on node hardware) set sysctrl_password to password set sysctrl_status to enabled|disabled set sysctrl_owner to node_sending_reset_command set sysctrl_device to port set sysctrl_owner_type to tty_device set is_failsafe to true|false set is_cxfs to true|false set weight to 0|1 add nic IP_address_or_hostname (if DNS) set heartbeat to true|false set ctrl_msgs to true|false set priority to integer remove nic IP_address_or_hostname (if DNS) set hierarchy to [system] [fence][reset][fencereset][shutdown] |
Note: You cannot modify the operating_system
setting for a node; trying to do so will cause an error. If you have mistakenly
specified the incorrect operating system, you must delete the node and
define it again.
You cannot modify the node function. To change the node function, you must delete the node and redefine it (and reinstall software products, as needed); the node function for a Solaris or Windows node is always client_only. |
The commands are the same as those used to define a node. You can change any of the information you specified when defining a node except the node ID. For details about the commands, see “Define a Node with cmgr”.
Caution: Do not change the node ID number after the node has been defined. |
The following shows an example of partitioning an Origin 3000 system:
# cmgr Welcome to SGI Cluster Manager Command-Line Interface cmgr> modify node n_preston Enter commands, when finished enter either "done" or "cancel" n_preston ? set partition_id to 1 n_preston ? done Successfully modified node n_preston |
To perform this function with prompting, enter the following:
# cmgr -p Welcome to SGI Cluster Manager Command-Line Interface cmgr> modify node n_preston Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? (preston.engr.sgi.com) Is this a FailSafe node <true|false> ? (true) Is this a CXFS node <true|false> ? (true) Node ID[optional] ? (606) Partition ID[optional] ? (0) 1 Do you wish to modify failure hierarchy[y/n]:n Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to modify system controller info[y/n]:n Number of Network Interfaces? (1) NIC 1 - IP Address ? (preston) NIC 1 - Heartbeat HB (use network for heartbeats) ? (true) NIC 1 - (use network for control messages) ? (true) NIC 1 - Priority <1,2,...> ? (1) Successfully modified node n_preston cmgr> show node n_preston Logical Machine Name: n_preston Hostname: preston.engr.sgi.com Operating System: IRIX Node Is FailSafe: true Node Is CXFS: true Node Function: client_admin Nodeid: 606 Partition id: 1 Reset type: powerCycle ControlNet Ipaddr: preston ControlNet HB: true ControlNet Control: true ControlNet Priority: 1 |
To unset the partition ID, use a value of 0 or none.
The following shows an example of changing the failure hierarchy for the node perceval from the system defaults to fencereset,reset,shutdown and back to the system defaults.
cmgr> modify node perceval Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? (perceval.engr.sgi.com) Is this a FailSafe node <true|false> ? (false) Is this a CXFS node <true|false> ? (true) Node ID[optional] ? (803) Partition ID[optional] ? (0) Do you wish to modify failure hierarchy[y/n]:y Hierarchy option 0 <System|FenceReset|Fence|Reset|Shutdown>[optional] ?fencereset Hierarchy option 1 <System|FenceReset|Fence|Reset|Shutdown>[optional] ?reset Hierarchy option 2 <System|FenceReset|Fence|Reset|Shutdown>[optional] ?shutdown Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to modify system controller info[y/n]:n Number of Network Interfaces ? (1) NIC 1 - IP Address ? (163.154.18.173) Successfully modified node perceval cmgr> show node perceval Logical Machine Name: perceval Hostname: perceval.engr.sgi.com Operating System: IRIX Node Is FailSafe: false Node Is CXFS: true Node Function: client_admin Nodeid: 803 Node Failure Hierarchy is: FenceReset Reset Shutdown Reset type: powerCycle ControlNet Ipaddr: 163.154.18.173 ControlNet HB: true ControlNet Control: true ControlNet Priority: 1 |
To return to system defaults:
cmgr> modify node perceval Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? (perceval.engr.sgi.com) Is this a FailSafe node <true|false> ? (false) Is this a CXFS node <true|false> ? (true) Node ID[optional] ? (803) Partition ID[optional] ? (0) Do you wish to modify failure hierarchy[y/n]:y Hierarchy option 0 <System|FenceReset|Fence|Reset|Shutdown>[optional] ? (FenceReset) system Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to modify system controller info[y/n]:n Number of Network Interfaces ? (1) NIC 1 - IP Address ? (163.154.18.173) NIC 1 - Heartbeat HB (use network for heartbeats) <true|false> ? (true) NIC 1 - (use network for control messages) <true|false> ? (true) NIC 1 - Priority <1,2,...> ? (1) cmgr> show node perceval Logical Machine Name: perceval Hostname: perceval.engr.sgi.com Operating System: IRIX Node Is FailSafe: false Node Is CXFS: true Node Function: client_admin Nodeid: 803 Reset type: powerCycle|reset|nmi ControlNet Ipaddr: 163.154.18.173 ControlNet HB: true ControlNet Control: true ControlNet Priority: 1 |
Note: When the system defaults are in place for failure hierarchy, no status is displayed with the show command. |
When CXFS is running, you can reset a node with a system controller by using the following command:
admin reset node hostname |
This command uses the CXFS daemons to reset the specified node.
Even when the CXFS daemons are not running, you can reset a node with a system controller by using the standalone option of the admin reset command:
admin reset standalone node hostname |
If you have defined the node but have not defined system controller information for it, you could use the following commands to connect to the system controller or reset the node:
admin ping dev_name tty of dev_type tty with sysctrl_type msc|mmsc|l2|l1 |
admin reset dev_name tty of dev_type tty with sysctrl_type msc|mmsc|l2|l1 |
For more information about the command elements, see “Define a Node with cmgr”.
The above command does not go through the crsd daemon.
When CXFS is running, you can perform a powercycle on a node with the following command:
admin powerCycle node nodename |
This command uses the CXFS daemons to shut off power to the node and then restart it.
You can perform a powercycle on a node in a cluster even when the CXFS daemons are not running by using the standalone option:
admin powerCycle standalone node nodename |
Th above command does not go through the crsd daemon.
If the node has not been defined in the cluster database, you can use the following command line:
admin powerCycle dev_name nodename of dev_type tty with sysctrl_type msc|mmsc|l2|l1 |
When CXFS daemons are running, you can perform a nonmaskable interrupt (NMI) on a node with the following command:
admin nmi node nodename |
This command uses the CXFS daemons to perform an NMI on the specified node.
You can perform an NMI on a node in a cluster even when the CXFS daemons are not running by using the standalone option:
admin nmi standalone node nodename |
This command does not go through the CXFS daemons.
If the node has not been defined in the cluster database, you can use the following command line:
admin nmi dev_name nodename of dev_type tty with sysctrl_type msc|mmsc|l2|l1 |
To convert an existing FailSafe node so that it also applies to CXFS, use the modify command to change the setting.
Note: You cannot turn off FailSafe or CXFS for a node if the respective high availability (HA) or CXFS services are active. You must first stop the services for the node. |
For example, in normal mode:
cmgr> modify node cxfs6 Enter commands, when finished enter either "done" or "cancel" cxfs6 ? set is_FailSafe to true cxfs6 ? done Successfully modified node cxfs6 |
For example, in prompting mode:
cmgr> modify node cxfs6 Enter commands, you may enter "done" or "cancel" at any time to exit Hostname[optional] ? (cxfs6.americas.sgi.com) Is this a FailSafe node <true|false> ? (false) true Is this a CXFS node <true|false> ? (true) Node ID[optional] ? (13203) Partition ID[optional] ? (0) Do you wish to modify failure hierarchy[y/n]:n Reset type <powerCycle|reset|nmi> ? (powerCycle) Do you wish to modify system controller info[y/n]:n Number of Network Interfaces ? (1) NIC 1 - IP Address ? (163.154.18.172) NIC 1 - Heartbeat HB (use network for heartbeats) <true|false> ? (true) NIC 1 - (use network for control messages) <true|false> ? (true) NIC 1 - Priority <1,2,...> ? (1) Successfully modified node cxfs6 |
To delete a node, use the following command:
delete node hostname |
You can delete a node only if the node is not currently part of a cluster. If a cluster currently contains the node, you must first modify that cluster to remove the node from it.
For example, suppose you had a cluster named cxfs6-8 with the following configuration:
cmgr> show cluster cxfs6-8 Cluster Name: cxfs6-8 Cluster Is FailSafe: true Cluster Is CXFS: true Cluster ID: 20 Cluster HA mode: normal Cluster CX mode: normal Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
To delete node cxfs8, you would do the following in prompting mode (assuming that CXFS services have been stopped on the node):
cmgr> modify cluster cxfs6-8 Enter commands, when finished enter either "done" or "cancel" Is this a FailSafe cluster <true|false> ? (false) Is this a CXFS cluster <true|false> ? (true) Cluster Notify Cmd [optional] ? Cluster Notify Address [optional] ? Cluster CXFS mode <normal|experimental>[optional] ? (normal) Cluster ID ? (20) Current nodes in cluster cxfs6-8: Node - 1: cxfs6 Node - 2: cxfs7 Node - 3: cxfs8 Add nodes to or remove nodes/networks from cluster topiary Enter "done" when completed or "cancel" to abort cxfs6-8 ? remove node cxfs8 cxfs6-8 ? done Successfully modified cluster cxfs6-8 cmgr> show cluster cxfs6-8 Cluster Name: cxfs6-8 Cluster Is FailSafe: false Cluster Is CXFS: true Cluster ID: 20 Cluster CX mode: normal Cluster cxfs6-8 has following 2 machine(s) cxfs6 cxfs7 |
Note: The networks feature is deferred. |
To delete cxfs8 from the pool, enter the following:
cmgr> delete node cxfs8 Deleted machine (cxfs6). |
After you have defined a node, you can display the node's parameters with the following command:
show node hostname |
For example:
cmgr> show node cxfs6 Logical Machine Name: cxfs6 Hostname: cxfs6.americas.sgi.com Operating System: IRIX Node Is FailSafe: false Node Is CXFS: true Node Function: server_admin Nodeid: 13203 Reset type: powerCycle ControlNet Ipaddr: 163.154.18.172 ControlNet HB: true ControlNet Control: true ControlNet Priority: 1 |
You can see a list of all of the nodes that have been defined with the following command:
show nodes in pool |
For example:
cmgr> show nodes in pool 3 Machine(s) defined cxfs8 cxfs6 cxfs7 |
You can see a list of all of the nodes that have been defined for a specified cluster with the following command:
show nodes [in cluster clustername] |
For example:
cmgr> show nodes in cluster cxfs6-8 Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
If you have specified a default cluster, you do not need to specify a cluster when you use this command. For example:
cmgr> set cluster cxfs6-8 cmgr> show nodes Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
You can use cmgr to test the network connectivity in a cluster. This test checks if the specified nodes can communicate with each other through each configured interface in the nodes. This test will not run if CXFS is running. This test requires that the /etc/.rhosts file be configured properly; see “IRIX Modifications Required for CXFS Connectivity Diagnostics” in Chapter 6, “Linux 64-bit Modifications Required for CXFS Connectivity Diagnostics ” in Chapter 7.
Use the following command to test the network connectivity for the nodes in a cluster:
test connectivity in cluster clustername [on node nodename1 node nodename2 ...] |
For example:
cmgr> test connectivity in cluster cxfs6-8 on node cxfs7 Status: Testing connectivity... Status: Checking that the control IP_addresses are on the same networks Status: Pinging address cxfs7 interface ef0 from node cxfs7 [cxfs7] Notice: overall exit status:success, tests failed:0, total tests executed:1 |
This test yields an error message when it encounters its first error, indicating the node that did not respond. If you receive an error message after executing this test, verify that the network interface has been configured up, using the ifconfig command. For example (line breaks added here for readability):
# /usr/etc/ifconfig ef0 ef0: flags=405c43 <UP,BROADCAST,RUNNING,FILTMULTI,MULTICAST,CKSUM,DRVRLOCK,IPALIAS> inet 128.162.89.39 netmask 0xffff0000 broadcast 128.162.255.255 |
The UP in the first line of output indicates that the interface is configured up.
If the network interface is configured up, verify that the network cables are connected properly and run the test again.
This section tells you how to define, modify, delete, and display a cluster using cmgr. It also tells you how to start and stop CXFS services.
When you define a cluster with cmgr, you define a cluster and add nodes to the cluster with the same command. For general information, see “Define a Cluster with the GUI” in Chapter 10.
Use the following commands to define a cluster:
define cluster clustername set is_failsafe to true|false set is_cxfs to true|false set clusterid to clusterID set notify_cmd to notify_command set notify_addr to email_address set ha_mode to normal|experimental set cx_mode to normal|experimental add node node1name add node node2name |
Usage notes:
cluster is the logical name of the cluster. Logical names cannot begin with an underscore (_) or include any whitespace characters, and can be at most 255 characters. Clusters that share a network must have unique names.
If you are running just CXFS, set is_cxfs to true and is_failsafe to false. If you are running a coexecution cluster, set both values to true.
clusterid is a unique number within your network in the range 1 through 128. The cluster ID is used by the operating system kernel to make sure that it does not accept cluster information from any other cluster that may be on the network. The kernel does not use the database for communication, so it requires the cluster ID in order to verify cluster communications. This information in the kernel cannot be changed after it has been initialized; therefore, you must not change a cluster ID after the cluster has been defined. Clusters that share a network must have unique IDs.
notify_cmd is the command to be run whenever the status changes for a node or cluster.
notify_addr is the address to be notified of cluster and node status changes. To specify multiple addresses, separate them with commas. CXFS will send e-mail to the addresses whenever the status changes for a node or cluster. If you do not specify an address, notification will not be sent. If you use the notify_addr command, you must specify the e-mail program (such as /usr/sbin/Mail ) as the notify_command.
The set ha_mode and set cx_mode commands should usually be set to normal. The set cx_mode command applies only to CXFS, and the set ha_mode command applies only to IRIS FailSafe.
The following shows the commands with prompting:
cmgr> define cluster clustername Enter commands, you may enter "done" or "cancel" at any time to exit Is this a FailSafe cluster <true|false> ? true|false Is this a CXFS cluster <true|false> ? true|false Cluster Notify Cmd [optional] ? Cluster Notify Address [optional] ? Cluster CXFS mode <normal|experimental>[optional] use_default_of_normal Cluster ID ? cluster_ID No nodes in cluster clustername No networks in cluster topiary Add nodes to or remove nodes/networks from cluster clustername Enter "done" when completed or "cancel" to abort clustername ? add node node1name clustername ? add node node2name ... clustername ? done Successfully defined cluster clustername Added node <node1name> to cluster <clustername> Added node <node2name> to cluster <clustername> ... |
Note: The networks feature is deferred. |
You should set the cluster to the default normal mode. Setting the mode to experimental turns off heartbeating in the CXFS kernel membership code so that you can debug the cluster without causing node failures. For example, this can be useful if you just want to disconnect the network for a short time (provided that there is no other cluster networking activity, which will also detect a failure even if there is no heartbeating). However, you should never use experimental mode on a production cluster and should only use it if directed to by SGI customer support. SGI does not support the use of experimental by customers.
For example:
cmgr> define cluster cxfs6-8 Enter commands, you may enter "done" or "cancel" at any time to exit Is this a FailSafe cluster <true|false> ? false Is this a CXFS cluster <true|false> ? true Cluster Notify Cmd [optional] ? Cluster Notify Address [optional] ? Cluster CXFS mode <normal|experimental>[optional] Cluster ID ? 20 No nodes in cluster cxfs6-8 No networks in cluster topiary Add nodes to or remove nodes/networks from cluster topiary Enter "done" when completed or "cancel" to abort cxfs6-8 ? add node cxfs6 cxfs6-8 ? add node cxfs7 cxfs6-8 ? add node cxfs8 cxfs6-8 ? done Successfully defined cluster cxfs6-8 Added node <cxfs6> to cluster <cxfs6-8> Added node <cxfs7> to cluster <cxfs6-8> Added node <cxfs8> to cluster <cxfs6-8> |
To do this without prompting, enter the following:
cmgr> define cluster cxfs6-8 Enter commands, you may enter "done" or "cancel" at any time to exit cluster cxfs6-8? set is_cxfs to true cluster cxfs6-8? set clusterid to 20 cluster cxfs6-8? add node cxfs6 cluster cxfs6-8? add node cxfs7 cluster cxfs6-8? add node cxfs8 cluster cxfs6-8? done Successfully defined cluster cxfs6-8 |
modify cluster clustername set is_failsafe to true set is_cxfs to true set clusterid to clusterID set notify_cmd to command set notify_addr to email_address set ha_mode to normal|experimental set cx_mode to normal|experimental add node node1name add node node2name remove node node1name remove node node2name |
These commands are the same as the define cluster commands. For more information, see “Define a Cluster with cmgr”, and “Define a Cluster with the GUI” in Chapter 10.
Note: If you want to rename a cluster, you must delete it and then
define a new cluster. If you have started CXFS services on the node, you
must either reboot it or reuse the cluster ID number when renaming the
cluster.
However, be aware that if you already have CXFS filesystems defined and then rename the cluster, CXFS will not be able to mount the filesystems. For more information, see “Cannot Mount Filesystems” in Chapter 18. |
To convert a cluster, use the following commands:
modify cluster clustername set is_failsafe to true|false set is_cxfs to true|false set clusterid to clusterID |
cluster is the logical name of the cluster. Logical names cannot begin with an underscore (_) or include any whitespace characters, and can be at most 255 characters.
If you are running just CXFS, set is_cxfs to true and is_failsafe to false. If you are running a coexecution cluster, set both values to true.
clusterid is a unique number within your network in the range 1 through 128. The cluster ID is used by the operating system kernel to make sure that it does not accept cluster information from any other cluster that may be on the network. The kernel does not use the database for communication, so it requires the cluster ID in order to verify cluster communications. This information in the kernel cannot be changed after it has been initialized; therefore, you must not change a cluster ID after the cluster has been defined.
For example, to convert CXFS cluster cxfs6-8 so that it also applies to FailSafe, enter the following:
cmgr> modify cluster cxfs6-8 Enter commands, when finished enter either "done" or "cancel" cxfs6-8 ? set is_failsafe to true |
The cluster must support all of the functionalities (FailSafe and/or CXFS) that are turned on for its nodes; that is, if your cluster is of type CXFS, then you cannot modify a node that is part of the cluster so that it is of type FailSafe or of type CXFS and FailSafe. However, the nodes do not have to support all the functionalities of the cluster; that is, you can have a node of type CXFS in a cluster of type CXFS and FailSafe.
To delete a cluster, use the following command:
delete cluster clustername |
However, you cannot delete a cluster that contains nodes; you must first stop CXFS services on the nodes and then redefine the cluster so that it no longer contains the nodes.
For example, in normal mode:
cmgr> modify cluster cxfs6-8 Enter commands, when finished enter either "done" or "cancel" cxfs6-8 ? remove node cxfs6 cxfs6-8 ? remove node cxfs7 cxfs6-8 ? remove node cxfs8 cxfs6-8 ? done Successfully modified cluster cxfs6-8 cmgr> delete cluster cxfs6-8 cmgr> show clusters cmgr> |
For example, in prompting mode:
cmgr> modify cluster cxfs6-8 Enter commands, you may enter "done" or "cancel" at any time to exit Cluster Notify Cmd [optional] ? Cluster Notify Address [optional] ? Cluster mode <normal|experimental>[optional] ? (normal) Cluster ID ? (55) Current nodes in cluster cxfs6-8: Node - 1: cxfs6 Node - 2: cxfs7 Node - 3: cxfs8 Add nodes to or remove nodes from cluster cxfs6-8 Enter "done" when completed or "cancel" to abort cxfs6-8 ? remove node cxfs6 cxfs6-8 ? remove node cxfs7 cxfs6-8 ? remove node cxfs8 cxfs6-8 ? done Successfully modified cluster cxfs6-8 cmgr> delete cluster cxfs6-8 cmgr> show clusters cmgr> |
To display the clusters and their contents, use the following commands:
show clusters show cluster clustername |
For example:
cmgr> show cluster cxfs6-8 Cluster Name: cxfs6-8 Cluster Is FailSafe: false Cluster Is CXFS: true Cluster ID: 22 Cluster CX mode: normal Cluster cxfs6-8 has following 3 machine(s) cxfs6 cxfs7 cxfs8 |
The following tasks tell you how to start and stop CXFS services and set log levels.
To start CXFS services, and set the configuration to automatically restart CXFS services whenever the system is rebooted, use one of the following commands:
start cx_services [on node hostname ] for cluster clustername |
For example, to start CXFS services on all nodes in the cluster:
cmgr> start cx_services for cluster cxfs6-8 |
When CXFS services are stopped on a node, filesystems are automatically unmounted from that node.
To stop CXFS services temporarily (that is, allowing them to restart with a reboot), use the following command line in a shell window outside of cmgr:
# /etc/init.d/CXFS stop |
To stop CXFS services on a specified node or cluster, and prevent CXFS services from being restarted by a reboot, use the following command:
stop cx_services [on node hostname]for cluster clustername [force] |
For example:
cmgr> stop cx_services on node cxfs6 for cluster cxfs6-8 CXFS services have been deactivated on node cxfs6 (cluster cxfs6-8) cmgr> stop cx_services for cluster cxfs6-8 |
After you have stopped CXFS services in a node, the node is no longer an active member of the cluster.
Caution: If you stop CXFS services, the node will be marked as INACTIVE and it will therefore not rejoin the cluster after a reboot. To allow a node to rejoin the cluster, you must restart CXFS services using cmgr or the GUI. |
A CXFS tiebreaker node determines whether a CXFS kernel membership quorum is maintained when exactly half of the server-capable nodes can communicate with each other. There is no default CXFS tiebreaker.
Caution: If the CXFS tiebreaker node in a cluster with two server-capable
nodes fails or if the administrator stops CXFS services, the other node
will do a forced shutdown, which unmounts all CXFS filesystems.
The reset capability or I/O fencing with switches is mandatory to ensure data integrity for all nodes. Clusters should have an odd number of server-capable nodes. (See “CXFS Recovery Issues in a Cluster with Only Two Server-Capable Nodes ” in Appendix B.) |
To set the CXFS tiebreaker node, use the modify command as follows:
modify cx_parameters [on node nodename] in cluster clustername set tie_breaker to hostname |
To unset the CXFS tiebreaker node, use the following command:
set tie_breaker to none |
For example, in normal mode:
cmgr> modify cx_parameters in cluster cxfs6-8 Enter commands, when finished enter either "done" or "cancel" cxfs6-8 ? set tie_breaker to cxfs8 cxfs6-8 ? done Successfully modified cx_parameters |
For example, in prompting mode:
cmgr> modify cx_parameters in cluster cxfs6-8 (Enter "cancel" at any time to abort) Tie Breaker Node ? (cxfs7) cxfs8 Successfully modified cx_parameters cmgr> show cx_parameters in cluster cxfs6-8 _CX_TIE_BREAKER=cxfs8 |
For general information about CXFS logs, see “Set Log Configuration with the GUI” in Chapter 10.
Use the following command to view the log group definitions:
show log_groups |
This command shows all of the log groups currently defined, with the log group name, the logging levels, and the log files.
Use the following command to see messages logged by a specific daemon on a specific node:
show log_group LogGroupName [on node Nodename] |
To exit from the message display, enter Cntrl-C.
You can configure a log group with the following command:
define log_group log_group on node adminhostname [in cluster clustername] set log_level to log_level add log_file log_file remove log_file log_file |
Usage notes:
log_group can be one of the following:
clconfd cli crsd diags |
log_level can have one of the following values:
0 gives no logging
1 logs notifications of critical errors and normal operation (these messages are also logged to the SYSLOG file)
2 logs Minimal notifications plus warnings
5 through 7 log increasingly more detailed notifications
10 through 19 log increasingly more debug information, including data structures
log_file
Caution: Do not change the names of the log files. If you change the names, errors can occur. |
For example, to define log group cli on node cxfs6 with a log level of 5:
cmgr> define log_group cli on node cxfs6 in cluster cxfs6-8 (Enter "cancel" at any time to abort) Log Level ? (11) 5 CREATE LOG FILE OPTIONS 1) Add Log File. 2) Remove Log File. 3) Show Current Log Files. 4) Cancel. (Aborts command) 5) Done. (Exits and runs command) Enter option:5 Successfully defined log group cli |
Use the following command to modify a log group:
modify log_group log_group_name on node hostname [in cluster clustername] |
You modify a log group using the same commands you use to define a log group.
For example, to change the log level of cli to be 10, enter the following:
cmgr> modify log_group cli on node cxfs6 in cluster cxfs6-8 (Enter "cancel" at any time to abort) Log Level ? (2) 10 MODIFY LOG FILE OPTIONS 1) Add Log File. 2) Remove Log File. 3) Show Current Log Files. 4) Cancel. (Aborts command) 5) Done. (Exits and runs command) Enter option:5 Successfully modified log group cli |
To revoke CXFS kernel membership for the local node, such as before the forced CXFS shutdown, enter the following on the local node:
admin cxfs_stop |
This command will be considered as a node failure by the rest of the cluster. The rest of the cluster may then fail due to a loss of CXFS kernel membership quorum, or it may decide to reset the failed node. To avoid the reset, you can modify the node definition to disable the system controller status.
Allowing CXFS kernel membership for the local node permits the node to reapply for CXFS kernel membership. You must actively allow CXFS kernel membership for the local node in the following situations:
After a manual revocation as in “Revoke Membership of the Local Node with cmgr”.
When instructed to by an error message on the console or in /var/adm/SYSLOG.
After a kernel-triggered revocation. This situation is indicated by the following message in /var/adm/SYSLOG:
Membership lost - withdrawing from cluster |
To allow CXFS kernel membership for the local node, use the following command:
cmgr> admin cxfs_start |
This section tells you how to define a filesystem, specify the nodes on which it may or may not be mounted (the enabled or disabled nodes), and perform mounts and unmounts.
A given filesystem can be mounted on a given node when the following things are true:
One of the following is true for the node:
The default local status is enabled and the node is not in the filesystem's list of explicitly disabled nodes
The default local status is disabled and the node is in the filesystem's list of explicitly enabled nodes
The global status of the filesystem is enabled. See “Mount a CXFS Filesystem with cmgr”.
Use the following commands to define a filesystem and the nodes on which it may be mounted:
define cxfs_filesystem logical_filesystem_name [in cluster clustername] set device_name to devicename set mount_point to mountpoint set mount_options to mount_options set force to true|false set dflt_local_status to enabled|disabled add cxfs_server admin_nodename set rank to 0|1|2|... add enabled_node nodename add disabled_node nodename remove cxfs_server admin_nodename remove enabled_node nodename remove disabled_node nodename |
Usage notes:
Relocation is disabled by default. Recovery and relocation are supported only when using standby nodes. Therefore, you should only define multiple metadata servers for a given filesystem if you are using the standby node model. See “Relocation” in Chapter 1.
The list of potential metadata servers for any given filesystem must all run the same operating system type.
cxfs_filesystem can be any logical name. Logical names cannot begin with an underscore (_) or include any whitespace characters, and can be at most 255 characters.
Note: Within the GUI, the default is to use the last portion of the device name; for example, for a device name of /dev/cxvm/d76lun0s0 , the GUI will automatically supply a logical filesystem name of d76lun0s0. The GUI will accept other logical names defined with cmgr but the GUI will not allow you to modify a logical name; you must use cmgr to modify the logical name. |
device_name is the device name of an XVM volume that will be shared among all nodes in the CXFS cluster. The name must begin with /dev/cxvm/.
mount_point is a directory to which the specified XVM volume will be attached. This directory name must begin with a slash (/). For more information, see the mount man page.
mount_options are options that are passed to the mount command and are used to control access to the specified XVM volume. For a list of the available options, see the fstab man page.
force controls what action CXFS takes if there are processes that have open files or current directories in the filesystem(s) that are to be unmounted. If set to true , then the processes will be killed and the unmount will occur. If set to false, the processes will not be killed and the filesystem will not be unmounted. The force option off (set to true) by default.
dflt_local_status defines whether the filesystem can be mounted on all unspecified nodes or cannot be mounted on any unspecified nodes. You can then use the add enabled_node or add disabled_node commands as necessary to explicitly specify the nodes that differ from the default. There are multiple combinations that can have the same result.
For example, suppose you had a cluster with 10 nodes ( node1 through node10). You could use the following methods:
If you want the filesystem to be mounted on all nodes, and want it to be mounted on any nodes that are later added to the cluster, you would specify:
set dflt_local_status to enabled |
If you want the filesystem to be mounted on all nodes except node5, and want it to be mounted on any nodes that are later added to the cluster, you would specify:
set dflt_local_status to enabled add disabled_node cxfs5 |
If you want the filesystem to be mounted on all nodes except node5, and you also do not want it to be mounted on any nodes that are later added to the cluster, you would specify:
set dflt_local_status to disabled add enabled_node cxfs1 add enabled_node cxfs2 add enabled_node cxfs3 add enabled_node cxfs4 add enabled_node cxfs6 add enabled_node cxfs7 add enabled_node cxfs8 add enabled_node cxfs9 add enabled_node cxfs10 |
If you want the filesystem to be mounted on node5 through node10 and on any future nodes, you could specify:
set dflt_local_status to enabled add disabled_node cxfs1 add disabled_node cxfs2 add disabled_node cxfs3 add disabled_node cxfs4 |
To actually mount the filesystem on the enabled nodes, see “Mount a CXFS Filesystem with cmgr”.
cxfs_server adds or removes the specified CXFS administration node name to the list of potential metadata servers.
Note: After a filesystem has been defined in CXFS, running mkfs on it will cause errors to appear in the system log file. To avoid these errors, run mkfs before defining the filesystem in CXFS, or delete the CXFS filesystem before running mkfs. See “Delete a CXFS Filesystem with cmgr”. |
The following examples shows two potential metadata servers for the fs1 filesystem; if cxfs6 (the preferred server, with rank 0) is not up when the cluster starts or later fails or is removed from the cluster, then cxfs7 (rank1) will be used. The filesystem is mounted on all nodes.
Note: Although the list of metadata servers for a given filesystem is ordered, it is impossible to predict which server will become the server during the boot-up cycle because of network latencies and other unpredictable delays. |
For example, in normal mode:
cmgr> define cxfs_filesystem fs1 in cluster cxfs6-8 cxfs_filesystem fs1 ? set device_name to /dev/cxvm/d76lun0s0 cxfs_filesystem fs1 ? set mount_point to /mnts/fs1 cxfs_filesystem fs1 ? set force to false cxfs_filesystem fs1 ? add cxfs_server cxfs6 Enter CXFS server parameters, when finished enter "done" or "cancel" CXFS server - cxfs6 ? set rank to 0 CXFS server - cxfs6 ? done cxfs_filesystem fs1 ? add cxfs_server cxfs7 Enter CXFS server parameters, when finished enter "done" or "cancel" CXFS server - cxfs7 ? set rank to 1 CXFS server - cxfs7 ? done cxfs_filesystem fs1 ? set dflt_local_status to enabled cxfs_filesystem fs1 ? done Successfully defined cxfs_filesystem fs1 cmgr> define cxfs_filesystem fs2 in cluster cxfs6-8 cxfs_filesystem fs2 ? set device_name to /dev/cxvm/d76lun0s1 cxfs_filesystem fs2 ? set mount_point to /mnts/fs2 cxfs_filesystem fs2 ? set force to false cxfs_filesystem fs2 ? add cxfs_server cxfs8 Enter CXFS server parameters, when finished enter "done" or "cancel" CXFS server - cxfs8 ? set rank to 0 CXFS server - cxfs8 ? done cxfs_filesystem fs2 ? set dflt_local_status to enabled cxfs_filesystem fs2 ? done Successfully defined cxfs_filesystem fs2 |
For example, in prompting mode:
cmgr> define cxfs_filesystem fs1 in cluster cxfs6-8 (Enter "cancel" at any time to abort) Device ? /dev/cxvm/d76lun0s0 Mount Point ? /mnts/fs1 Mount Options[optional] ? Use Forced Unmount ? <true|false> ? false Default Local Status <enabled|disabled> ? (enabled) DEFINE CXFS FILESYSTEM OPTIONS 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:1 No current servers Server Node ? cxfs6 Server Rank ? 0 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:1 Server Node ? cxfs7 Server Rank ? 1 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:9 Successfully defined cxfs_filesystem fs1 cmgr> define cxfs_filesystem fs2 in cluster cxfs6-8 (Enter "cancel" at any time to abort) Device ? /dev/cxvm/d77lun0s1 Mount Point ? /mnts/fs2 Mount Options[optional] ? Use Forced Unmount ? <true|false> ? false Default Local Status <enabled|disabled> ? (enabled) DEFINE CXFS FILESYSTEM OPTIONS 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:1 Server Node ? cxfs8 Server Rank ? 0 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:9 Successfully defined cxfs_filesystem fs2 |
To mount a filesystem on the enabled nodes, enter the following:
admin cxfs_mount cxfs_filesystem logical_filesystem_name [on node nodename] [in cluster clustername] |
This command enables the global status for a filesystem; if you specify the nodename, it enables the local status. (The global status is only affected if a node name is not specified.) For a filesystem to mount on a given node, both global and local status must be enabled; see “CXFS Filesystem Tasks with cmgr”.
Nodes must first be enabled by using the define cxfs_filesystem and modify cxfs_filesystem commands; see “Define a CXFS Filesystem with cmgr”, and “Modify a CXFS Filesystem with cmgr”.
For example, to activate the f1 filesystem by setting the global status to enabled, enter the following:
cmgr> admin cxfs_mount cxfs_filesystem fs1 in cluster cxfs6-8 |
The filesystem will then be mounted on all the nodes that have a local status of enabled for this filesystem.
To change the local status to enabled, enter the following:
cmgr> admin cxfs_mount cxfs_filesystem fs1 on node cxfs7 in cluster cxfs6-8 |
If the filesystem's global status is disabled, nothing changes. If the filesystem's global status is enabled , the node will mount the filesystem as the result of the change of its local status.
Note: If CXFS services are not active, mounting a filesystem will not completely succeed. The filesystem will be marked as ready to be mounted but the filesystem will not actually be mounted until you have started CXFS services. For more information, see “Start CXFS Services with cmgr”. |
To unmount a filesystem, enter the following:
admin cxfs_unmount cxfs_filesystem filesystemname [on node nodename] [in cluster clustername] |
Unlike the modify cxfs_filesystem command, this command can be run on an active filesystem.
For example, to deactivate the f1 filesystem by setting the global status to disabled, enter the following:
cmgr> admin cxfs_unmount cxfs_filesystem fs1 in cluster cxfs6-8 |
The filesystem will then be unmounted on all the nodes that have a local status of enabled for this filesystem.
To change the local status to disabled, enter the following:
cmgr> admin cxfs_unmount cxfs_filesystem fs1 on node cxfs7 in cluster cxfs6-8 |
If the filesystem's global status is disabled, nothing changes. If the filesystem's global status is enabled , the node will unmount the filesystem as the result of the change of its local status.
Note: You cannot modify a mounted filesystem. |
Use the following commands to modify a filesystem:
modify cxfs_filesystem logical_filesystem_name [in cluster clustername] set device_name to devicename set mount_point to mountpoint set mount_options to options set force to true|false set dflt_local_status to enabled|disabled add cxfs_server servername set rank to 0|1|2|... modify cxfs_server servername set rank to 0|1|2|... add enabled_node nodename add disabled_node nodename remove cxfs_server nodename remove enabled_node nodename remove disabled_node nodename |
These are the same commands used to define a filesystem; for more information, see “Define a CXFS Filesystem with cmgr”.
For example, in normal mode:
cmgr> show cxfs_filesystem fs1 in cluster cxfs6-8 Name: fs1 Device: /dev/cxvm/d76lun0s0 Mount Point: /mnts/fs1 Forced Unmount: false Global Status: disabled Default Local Status: enabled Server Name: cxfs6 Rank: 0 Server Name: cxfs7 Rank: 1 Disabled Client: cxfs8 cmgr> modify cxfs_filesystem fs1 in cluster cxfs6-8 Enter commands, when finished enter either "done" or "cancel" cxfs_filesystem fs3 ? modify cxfs_server cxfs6 Enter CXFS server parameters, when finished enter "done" or "cancel" Current CXFS server cxfs6 parameters: rank : 0 CXFS server - cxfs6 ? set rank to 2 CXFS server - cxfs6 ? done cxfs_filesystem fs1 ? done Successfully modified cxfs_filesystem fs1 cmgr> show cxfs_filesystem fs1 in cluster cxfs6-8 Name: fs1 Device: /dev/cxvm/d76lun0s0 Mount Point: /mnts/fs1 Forced Unmount: false Global Status: disabled Default Local Status: enabled Server Name: cxfs6 Rank: 2 Server Name: cxfs7 Rank: 1 Disabled Client: cxfs8 |
In prompting mode:
cmgr> show cxfs_filesystem fs1 in cluster cxfs6-8 Name: fs1 Device: /dev/cxvm/d76lun0s0 Mount Point: /mnts/fs1 Forced Unmount: false Global Status: disabled Default Local Status: enabled Server Name: cxfs6 Rank: 0 Server Name: cxfs7 Rank: 1 Disabled Client: cxfs8 cmgr> modify cxfs_filesystem fs1 in cluster cxfs6-8 (Enter "cancel" at any time to abort) Device ? (/dev/cxvm/d76lun0s0) Mount Point ? (/mnts/fs1) Mount Options[optional] ? Use Forced Unmount ? <true|false> ? (false) Default Local Status <enabled|disabled> ? (enabled) MODIFY CXFS FILESYSTEM OPTIONS 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:0 Current servers: CXFS Server 1 - Rank: 0 Node: cxfs6 CXFS Server 2 - Rank: 1 Node: cxfs7 Server Node ? cxfs6 Server Rank ? (0) 2 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:7 Current settings for filesystem (fs1) CXFS servers: Rank 2 Node cxfs6 Rank 1 Node cxfs7 Default local status: enabled No explicitly enabled clients Explicitly disabled clients: Disabled Node: cxfs8 0) Modify Server. 1) Add Server. 2) Remove Server. 3) Add Enabled Node. 4) Remove Enabled Node. 5) Add Disabled Node. 6) Remove Disabled Node. 7) Show Current Information. 8) Cancel. (Aborts command) 9) Done. (Exits and runs command) Enter option:9 Successfully modified cxfs_filesystem fs3 |
If relocation is explicitly enabled in the kernel with the cxfs_relocation_ok systune (see “Relocation” in Chapter 1), you can relocate a metadata server to another node using the following command if the filesystem must be mounted on the system that is running cmgr:
admin cxfs_relocate cxfs_filesystem filesystem_name to node nodename [in cluster clustername] |
Note: This function is only available on a live system. |
To relocate the metadata server from cxfs6 to cxfs7 for fs1 in cluster cxfs6-8 , enter the following:
cmgr> admin cxfs_relocate cxfs_filesystem fs1 to node cxfs7 in cluster cxfs6-8 |
CXFS kernel membership is not affected by relocation. However, users may experience a degradation in filesystem performance while the metadata server is relocating.
For more details, see “Modify a CXFS Filesystem with cmgr”.
The following tasks let you configure switches and I/O fencing. For general information, see “I/O Fencing” in Chapter 1.
Note: Solaris and Windows nodes require I/O fencing to protect data integrity. A Brocade switch is mandatory to support I/O fencing; therefore, multiOS CXFS clusters require a Brocade switch. |
To define a new switch to support I/O fencing in a cluster, use the following command:
define switch switch_hostname username username password password [mask mask] |
Usage notes:
switch is the hostname of the Fibre Channel switch; this is used to determine the IP address of the switch.
username is the user name to use when sending a telnet message to the switch.
password is the password for the specified username.
mask is a hexadecimal string (representing a 64-bit port bitmap) that indicates the list of ports in the switch that will not be fenced. Ports are numbered from zero. If bit i is nonzero, then the port that corresponds to i will always be excluded from any fencing operations. For example, a mask of A4 indicates that the 3rd, 6th, and 8th ports (port numbers 2, 5, and 7) will not be affected by fencing.
CXFS administration nodes automatically discover the available HBAs and, when fencing is triggered, fence off all of the Fibre Channel HBAs when the Fence or FenceReset fail action is selected. However, masked HBAs will not be fenced. Masking allows you to prevent the fencing of devices that are attached to the SAN but are not shared with the cluster, to ensure that they remain available regardless of CXFS status. You would want to mask HBAs used for access to tape storage, or HBAs that are only ever used to access local (nonclustered) devices.
For example:
cmgr> define switch ptg-brocade username admin password password mask A4 |
To modify the user name, password, or mask for a given Fibre Channel switch, use the following command:
modify switch switch_hostname username username password password [mask mask] |
The arguments are the same as for “Define a Switch with cmgr”.
For example, to change the mask for switch ptg-brocade from A4 to 0 (which means that none of the ports on the switch will be excluded from fencing), enter the following:
cmgr> modify switch ptg-brocade username admin password password mask 0 |
Raising an I/O fence isolates the node from the SAN; CXFS sends a messages via the telnet protocol to the Brocade switch and disables the port. After the node is isolated, it cannot corrupt data in the shared CXFS filesystem. Use the following command:
admin fence raise [node nodename] |
nodename is the name of the node to be isolated.
For example, to isolate the default node, enter the following:
cmgr> admin fence raise |
To isolate node Node3, enter the following:
cmgr> admin fence raise node Node3 |
To lower the I/O fence for a given node in order to reenable the port, allowing the node to connect to the SAN and access the shared CXFS filesystem, use the following command:
admin fence lower [node nodename] |
nodename is the name of the node to be reconnected.
For example, to provide access for the default node, enter the following:
cmgr> admin fence lower |
To provide access for node Node3, enter the following:
cmgr> admin fence lower node Node3 |
To update the mappings in the cluster database between the host bus adapters (HBAs) and switch ports, use the following command:
admin fence update |
You should run this command if you reconfigure any switch or add ports.
To delete a switch, use the following command:
delete switch switch_hostname |
switch_hostname is the hostname of the Fibre Channel switch; this is used to determine the IP address of the switch.
For example:
cmgr> delete switch ptg-brocade Successfully updated switch config. |
To display the switches in the system, use the following command:
show switches |
To show the switches for a given node, use the following command:
show switch hostname |
For example:
cmgr> show switch ptg-brocade Switch[0] *Hostname ptg-brocade Username admin Password password Mask 0 Vendor BROCADE Number of ports 8 0 0000000000000000 Reset 1 210000e08b0102c6 Reset 2 210000e08b01fec5 Reset 3 210000e08b019dc5 Reset 4 210000e08b0113ce Reset 5 210000e08b027795 Reset thump 6 210000e08b019ef0 Reset 7 210000e08b022242 Reset |
To query the status of each port on the switch, use the following command:
admin fence query |
For example:
cmgr> admin fence query Switch[0] "brocade04" has 16 ports Port 4 type=FABRIC status=enabled hba=210000e08b0042d8 on host o200c Port 5 type=FABRIC status=enabled hba=210000e08b00908e on host cxfs30 Port 9 type=FABRIC status=enabled hba=2000000173002d3e on host cxfssun3 |
For more verbose display, (which shows all ports on the switch, rather than only those attached to nodes in the default cluster), use the following command:
admin fence query verbose |
For example:
cmgr> admin fence query verbose Switch[0] "brocade04" has 16 ports Port 0 type=FABRIC status=enabled hba=2000000173003b5f on host UNKNOWN Port 1 type=FABRIC status=enabled hba=2000000173003adf on host UNKNOWN Port 2 type=FABRIC status=enabled hba=210000e08b023649 on host UNKNOWN Port 3 type=FABRIC status=enabled hba=210000e08b021249 on host UNKNOWN Port 4 type=FABRIC status=enabled hba=210000e08b0042d8 on host o200c Port 5 type=FABRIC status=enabled hba=210000e08b00908e on host cxfs30 Port 6 type=FABRIC status=enabled hba=2000000173002d2a on host UNKNOWN Port 7 type=FABRIC status=enabled hba=2000000173003376 on host UNKNOWN Port 8 type=FABRIC status=enabled hba=2000000173002c0b on host UNKNOWN Port 9 type=FABRIC status=enabled hba=2000000173002d3e on host cxfssun3 Port 10 type=FABRIC status=enabled hba=2000000173003430 on host UNKNOWN Port 11 type=FABRIC status=enabled hba=200900a0b80c13c9 on host UNKNOWN Port 12 type=FABRIC status=disabled hba=0000000000000000 on host UNKNOWN Port 13 type=FABRIC status=enabled hba=200d00a0b80c2476 on host UNKNOWN Port 14 type=FABRIC status=enabled hba=1000006069201e5b on host UNKNOWN Port 15 type=FABRIC status=enabled hba=1000006069201e5b on host UNKNOWN |
The following script defines a three-node cluster of type CXFS. The nodes are of type CXFS.
Note: This example only defines one network interface. The hostname is used here for simplicity; however, you may wish to use the IP address instead to avoid confusion. This example does not address the system controller definitions. |
#!/usr/cluster/bin/cmgr -if # #Script to define a three-node cluster define node cxfs6 set hostname to cxfs6 set is_cxfs to true set operating_system to irix set node_function to server_admin add nic cxfs6 set heartbeat to true set ctrl_msgs to true set priority to 1 done done define node cxfs7 set hostname to cxfs7 set is_cxfs to true set operating_system to irix set node_function to server_admin add nic cxfs7 set heartbeat to true set ctrl_msgs to true set priority to 1 done done define node cxfs8 set hostname to cxfs8 set is_cxfs to true set operating_system to irix set node_function to server_admin add nic cxfs8 set heartbeat to true set ctrl_msgs to true set priority to 1 done done define cluster cxfs6-8 set is_cxfs to true set is_failsafe to true set clusterid to 20 add node cxfs6 add node cxfs7 add node cxfs8 done quit |
After running this script, you would see the following output:
Successfully defined node cxfs6 Successfully defined node cxfs7 Successfully defined node cxfs8 Successfully defined cluster cxfs6-8 |
The following script defines two filesystems; fs1 is mounted on all but node cxfs8, and fs2 is mounted on all nodes:
#!/usr/cluster/bin/cmgr -if # Script to define two filesystems # Define fs1, do not mount on cxfs8 define cxfs_filesystem fs1 in cluster cxfs6-8 set device_name to /dev/cxvm/d76lun0s0 set mount_point to /mnts/fs1 set force to false add cxfs_server cxfs6 set rank to 0 done add cxfs_server cxfs7 set rank to 1 done set dflt_local_status to enabled add disabled_node cxfs8 done # # Define fs2, mount everywhere define cxfs_filesystem fs2 in cluster cxfs6-8 set device_name to /dev/cxvm/d76lun0s1 set mount_point to /mnts/fs2 set force to false add cxfs_server cxfs8 set rank to 0 done set dflt_local_status to enabled done |
After you have configured the cluster database, you can use the build_cmgr_script command to automatically create a cmgr script based on the contents of the cluster database. The generated script will contain the following:
Node definitions
Cluster definition
Switch definitions
CXFS filesystem definitions
Parameter settings
Any changes made using either the cmgr command or the GUI
FailSafe information (in a coexecution cluster only)
As needed, you can then use the generated script to recreate the cluster database after performing a cdbreinit.
Note: You must execute the generated script on the first node that is listed in the script. If you want to execute the generated script on a different node, you must modify the script so that the node is the first one listed. |
By default, the generated script is named:
/tmp/cmgr_create_cluster_clustername_processID |
You can specify an alternative pathname by using the -o option:
build_cmgr_script [-o script_pathname] |
For more details, see the build_cmgr_script man page.
For example:
# /var/cluster/cmgr-scripts/build_cmgr_script -o /tmp/newcdb Building cmgr script for cluster clusterA ... build_cmgr_script: Generated cmgr script is /tmp/newcdb |
The example script file contents are as follows; note that because nodeE is the first node defined, you must execute the script on nodeE:
#!/usr/cluster/bin/cmgr -f # Node nodeE definition define node nodeE set hostname to nodeE.americas.sgi.com set operating_system to IRIX set is_failsafe to false set is_cxfs to true set node_function to server_admin set nodeid to 5208 set reset_type to powerCycle add nic nodeE set heartbeat to true set ctrl_msgs to true set priority to 1 done done # Node nodeD definition define node nodeD set hostname to nodeD.americas.sgi.com set operating_system to IRIX set is_failsafe to false set is_cxfs to true set node_function to server_admin set nodeid to 5181 set reset_type to powerCycle add nic nodeD set heartbeat to true set ctrl_msgs to true set priority to 1 done done # Node nodeF definition define node nodeF set hostname to nodeF.americas.sgi.com set operating_system to IRIX set is_failsafe to false set is_cxfs to true set node_function to server_admin set nodeid to 5401 set reset_type to powerCycle add nic nodeF set heartbeat to true set ctrl_msgs to true set priority to 1 done done # Define cluster and add nodes to the cluster define cluster clusterA set is_failsafe to false set is_cxfs to true set cx_mode to normal set clusterid to 35 done modify cluster clusterA add node nodeD add node nodeF add node nodeE done set cluster clusterA define cxfs_filesystem fs1 set device_name to /dev/cxvm/fs1 set mount_point to /fs1 set force to false set dflt_local_status to enabled add cxfs_server nodeE set rank to 1 done add cxfs_server nodeD set rank to 2 done add cxfs_server nodeF set rank to 0 done done define cxfs_filesystem fs2 set device_name to /dev/cxvm/fs2 set mount_point to /fs2 set force to false set dflt_local_status to enabled add cxfs_server nodeE set rank to 1 done add cxfs_server nodeD set rank to 2 done add cxfs_server nodeF set rank to 0 done done define cxfs_filesystem fs2 set device_name to /dev/cxvm/fs2 set mount_point to /fs2 set force to false set dflt_local_status to enabled add cxfs_server nodeE set rank to 1 done add cxfs_server nodeD set rank to 2 done add cxfs_server nodeF set rank to 0 done done # Setting CXFS parameters modify cx_parameters set tie_breaker to none done quit |