au548inst

March 20, 2018 | Author: senthilnathane | Category: Network Topology, Disk Storage, Trademark, Computer Data Storage, Computer Networking


Comments



Description

V4.0 cover Front cover HACMP System Administration I: Planning and Implementation (Course code AU54) Instructor Guide ERC 8.0 IBM certified course material Instructor Guide Trademarks IBM® is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX® BladeCenter® DS4000™ Enterprise Storage Server® HACMP™ POWER™ Redbooks® System i5™ System Storage™ WebSphere® AIX 5L™ Cross-Site® DS6000™ General Parallel File System™ NetView® POWER5™ Requisite® System p™ Tivoli® Approach® DB2® DS8000™ GPFS™ Notes® pSeries® SP™ System p5™ TotalStorage® Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. UNIX® is a registered trademark of The Open Group in the United States and other countries. Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. June 2008 edition The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. © Copyright International Business Machines Corporation 1998, 2008. All rights reserved. This document may not be reproduced in whole or in part without the prior written permission of IBM. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. V4.0 Instructor Guide TOC Contents Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Instructor course overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Unit 0. Course introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1 Course objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-2 Course agenda (1 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-4 Course agenda (2 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-6 Course agenda (3 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-8 Course agenda (4 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-10 Course agenda (5 of 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-12 Lab exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-14 Student Guide font conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-16 Course overview summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-18 Unit 1. Introduction to HACMP for AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.1 High Availability concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 High availability and HACMP concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 So, what is High Availability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 Eliminating single points of failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 High availability clusters (HACMP base) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 So, what about site failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 IBM's HA solution for AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 Fundamental HACMP concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 A highly available cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22 HACMP’s topology components (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25 HACMP’s topology components (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28 HACMP's resource components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-32 What is HACMP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35 Additional features of HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-38 Some Assembly Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41 Let’s review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-44 1.2 What does HACMP do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47 Topic 2 objectives: What does HACMP do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-48 Just What Does HACMP Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50 What happens when something fails? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-52 What happens when a problem is fixed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-54 Standby (active/passive) with fallback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-56 Standby (active/passive without fallback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-59 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Contents iii Instructor Guide Mutual takeover: Active/Active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-61 Concurrent: multiple active nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-64 Points to ponder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-67 Other considerations for HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-69 Things HACMP Does Not Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-72 When is HACMP not the correct solution? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-74 What do we plan to achieve this week? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-77 Overview of the implementation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-79 Hints to get started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-82 Sources of HACMP information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-84 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-86 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-88 Unit 2. Networking considerations for high availability . . . . . . . . . . . . . . . . . . 2-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 2.1 How HACMP uses networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 How HACMP uses networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6 How does HACMP use networks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8 Providing HA client access to the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 What HACMP detects and diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-14 Heartbeat packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-17 Failure detection versus failure diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20 Failure diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-22 What if all heartbeat packets stop? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-24 CRITICAL: All clusters require a non-IP network . . . . . . . . . . . . . . . . . . . . . . . . . .2-27 The two subnet rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-30 Failure recovery and reintegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-32 Let’s review topic 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-34 2.2 HACMP concepts and configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37 HACMP concepts and configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-38 HACMP networking support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-40 Network types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-43 HACMP topology components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-46 Naming nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-49 HACMP network component terms (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-52 HACMP network component terms (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-54 IP network configuration rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-57 Non-service IP address examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-61 Non-ip network configuration rules: Point-to-point . . . . . . . . . . . . . . . . . . . . . . . . .2-63 Non-IP network configuration rules: Multi-node . . . . . . . . . . . . . . . . . . . . . . . . . . .2-67 Persistent node IP labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-70 Let’s review topic 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-73 2.3 Implementing IP address takeover (IPAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-75 Implementing IP Address Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-76 IP Address Takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-78 Two ways to implement IPAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-81 IPAT via IP aliasing configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-84 iv HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide TOC IPAT via IP aliasing at startup of resource group . . . . . . . . . . . . . . . . . . . . . . . . . 2-88 IPAT via IP aliasing after an interface fails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-90 IPAT via IP aliasing after a node fails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-93 IPAT via IP aliasing: Distribution preference for service IP label aliases . . . . . . . 2-95 IPAT via IP aliasing summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-98 IPAT via IP replacement overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-101 Service IP address examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-104 Adopt labeling/naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-106 Hostname resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-108 Other configurations - Etherchannel (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-111 Other configurations - Etherchannel (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-113 Other configurations: Base virtual Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-115 HACMP view of virtual Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-117 Other configurations: Single IP adapter nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 2-119 Talk to your network administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-121 Changes to AIX start sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-123 Changes to /etc/inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-125 Common TCP/IP configuration problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-127 Let’s review topic 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-129 2.4 The impact of IPAT on clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-131 The impact of IPAT on clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-132 How are users affected? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-134 What about the users's computers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-137 Local or remote client? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-139 Gratuitous ARP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-141 Gratuitous ARP support issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-143 What if gratuitous ARP is not supported? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-145 Option 1: clinfo on the client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-147 Option 2: clinfo from within the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-149 clinfo.rc script (extract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-151 Option 3: Hardware address takeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-154 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-157 Unit summary (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-159 Unit summary (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-161 Unit 3. Shared storage considerations for high availability . . . . . . . . . . . . . . . . 3-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.1 Fundamental shared storage concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 Fundamental shared storage concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 What is shared storage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 What is private storage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 Access to shared data must be controlled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 Who owns the storage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Reserve/release-based protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 Reserve/release disk takeover: Manual move . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21 Reserve/release disk takeover - failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23 Reserve/release ghost disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Contents v Instructor Guide RSCT-based shared storage protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-29 Enhanced concurrent volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 ECMVG varyon - active versus passive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-35 ECMVG state: Active versus passive . . . . . . . . . . . . . . . . 3-38 How ECMVGs work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-40 Determining ECMVG or Group Services status . . . . . . . . . . . . . . . . . . . . . . . . . . .3-43 RSCT-based fast disk takeover: Manual move . . . . . . . . . . . . . . . . . . . . . . . . . . .3-45 RSCT-based fast disk takeover: Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-47 Fast disk takeover details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-49 Let’s review topic 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51 3.2 Shared disk technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53 Shared disk technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-54 Shared disk and HACMP strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-56 Virtual storage (VIO) and HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-59 IBM SAN storage and HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-66 Non-IBM SAN storage and HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-68 SCSI technology and HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-71 Physical volume IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-75 Support for OEM disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-78 Let’s review topic 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-83 3.3 Shared storage from the AIX perspective. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-85 Topic 3 objectives: Shared storage from the AIX perspective . . . . . . . . . . . . . . . .3-86 Logical Volume Manager review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-88 LVM relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-91 ODM-LVM relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-93 Creating a shared volume group: Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-95 LVM mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-99 Steps to create a mirrored file system - manually . . . . . . . . . . . . . . . . . . . . . . . . .3-102 MIrroring? Let’s talk quorum checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-105 Elimination of quorum issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-109 Allow HACMP to handle it: Forced varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-112 Recommendations for forced varyon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-115 LVM and HACMP considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-117 Support for OEM volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-120 Support for OEM file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-123 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-126 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-128 Unit 4. Planning for applications and resource groups. . . . . . . . . . . . . . . . . . . 4-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 How to define an application to HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 Writing start and stop scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-11 Where should data go? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14 Resource group policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17 Startup policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-20 Online on all available nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-23 vi HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide TOC Fallover policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fallback policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valid combinations of policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependent applications/resource groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26 4-29 4-32 4-34 4-37 4-39 Unit 5. HACMP installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.1 Installing the HACMP 5.4.1 software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Installing the HACMP software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Steps for successful implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 Where are we in the implementation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 First steps in planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 What is on the CD? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 Install the HACMP filesets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17 Don’t forget the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 Some final things to check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25 Install HACMP client machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 Let’s review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30 5.2 What was installed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33 What was installed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34 The layered look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36 HACMP components and features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39 Cluster manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41 Cluster secure communication subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43 Cluster communication daemon (clcomd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-46 clcomd standard connection authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49 RSCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52 HACMP from an RSCT perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-54 Heartbeat rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-57 HACMP’s SNMP support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-60 Cluster information daemon (clinfo) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-62 Highly available NFS server support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-65 Shared external disk access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-70 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-72 Unit 6. Initial cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 What we are going to achieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 Where are we in the implementation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 The topology configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Configuration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 Planning and base configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 The top-level HACMP smit menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 The standard configuration method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Contents vii Instructor Guide Add nodes to an HACMP cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-22 What did we get? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 Now define highly available resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 Start with service addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29 Adding service IP labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31 Add xweb service label (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33 Add xweb service label (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35 Continue with application servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38 Add xwebserver application server (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40 Add xweb application server (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42 Configure volume groups (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45 Discover the volume groups for pick-lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-47 Adding the xwebgroup resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-49 Setting name, nodes, and policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-51 Adding resources to the xwebgroup RG (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-53 Adding resources to the xwebgroup RG (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . .6-55 Synchronize and test the changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-58 What do we have at this point? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-62 Extending the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-65 Extended topology configuration menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-67 Communication interfaces and devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-69 Defining a non-IP network (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-71 Defining a non-IP network (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-73 Defining a non-IP network (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-75 Defining persistent node IP labels (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-78 Defining persistent node IP labels (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-80 Defining persistent node IP labels (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-82 Synchronize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-84 Save configuration: snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-87 Save configuration: xml file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-89 Two-node cluster configuration assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-91 What does the two-node assistant give you? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-94 Where are we in the implementation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-97 Starting Cluster Services (1 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-99 Starting Cluster Services (2 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-101 Starting Cluster Services (3 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-103 Starting Cluster Services (4 of 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-105 Removing a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-107 We're there! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-109 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-111 Break time! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-113 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-115 Unit 7. Basic HACMP administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2 7.1 Topology and resource group management . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 Topology and resource group management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 viii HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide TOC Yet another resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 Adding the third resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 Adding a third service IP label (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Adding a third service IP label (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 Adding a third application server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 Adding resources to the third RG (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 Adding resources to the third RG (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 Synchronize your changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 Expanding the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 Adding a new cluster node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27 Add node: Standard path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 Add node: Standard path (in progress) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 Add node: Extended path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34 Define the non-IP rs232 networks (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36 Define the non-IP rs232 networks (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39 Synchronize your changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41 Start Cluster Services on the new node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43 Add the node to a resource groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45 Shrinking the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-47 Removing a cluster node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-49 Removing an application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-51 Removing a resource group (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-53 Removing a resource group (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-56 Removing a resource group (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-58 Let’s review: Topic 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-60 7.2 Cluster single point of control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-63 Cluster single point of control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-64 Administering a high availability cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-66 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-68 Cluster single point of control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-71 The top-level C-SPOC menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-74 Starting cluster services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-76 Verifying that cluster services has started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-79 Checking on what actually happened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-82 Stopping cluster services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-84 Verifying that cluster services has stopped (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . 7-87 Verifying that cluster services has stopped (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . 7-90 Managing shared LVM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-93 Creating a shared volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-96 Discover, add VG to resource group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-98 Creating a shared file system (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-100 Creating a shared file system (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-103 LVM change management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-105 LVM changes: Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-108 LVM changes: Lazy update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-110 LVM changes: C-SPOC synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-113 Enhanced concurrent mode volume groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-116 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Contents ix Instructor Guide The best method: C-SPOC LVM changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-118 LVM changes: Select your file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-120 Update the size of a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-122 HACMP resource group operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-124 Priority override location (POL): Old . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-126 Priority override location (POL): New . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-129 Moving a resource group (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-131 Moving a resource group (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-134 Bring a resource group offline (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-136 Bring a resource group offline (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-138 Bring a resource group offline (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-140 Bring a resource group back online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-142 Log files generated by HACMP - before HACMP 5.4.1 . . . . . . . . . . . . . . . . . . . .7-144 Log files generated by HAMCP - HACMP 5.4.1 and later . . . . . . . . . . . . . . . . . .7-146 Let’s review topic 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-149 7.3 Dynamic automatic reconfiguration event facility . . . . . . . . . . . . . . . . . . . . . 7-151 Dynamic Automatic Reconfiguration Event facility . . . . . . . . . . . . . . . . . . . . . . . .7-152 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-154 What can DARE do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-156 What limitations does DARE have? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-158 So how does DARE work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-160 Verifying and synchronizing (standard) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-163 Verifying and synchronizing (extended) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-165 Discarding unwanted changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-168 Rolling back from a DARE operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-171 What if DARE fails? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-174 Dynamic reconfiguration lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-177 Let’s review: Topic 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-179 7.4 WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-181 Implementing WebSMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-182 Web-enabled SMIT (WebSMIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-184 WebSMIT main page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-187 WebSMIT context menu controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-190 WebSMIT associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-192 WebSMIT online documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-195 WebSMIT configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-197 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-203 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-205 Unit 8. Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2 8.1 HACMP events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5 Topic 1 objectives: HACMP events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6 What is an HACMP event? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8 HACMP basic event flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 Recovery programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12 Recovery program example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-14 x HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide TOC Event scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16 process_resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18 First node starts cluster services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20 Another node joins the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22 Node leaves the cluster (stopped) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-24 Let’s review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-26 8.2 Cluster customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-29 Topic 2 objectives: Event customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30 Event processing customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-32 Adding/changing cluster events (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35 Adding/changing cluster events (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-37 Adding/changing cluster events (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40 Recovery commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-42 Adding/changing recovery commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-44 Points to note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-46 RG_Move event and selective fallover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48 Customizing event flow for other devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-51 Error notification within smit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-53 Configuring automatic error notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-55 Listing automatic error notification (non-virtual HACMP nodes) . . . . . . . . . . . . . . 8-57 Listing automatic error notification (virtual HACMP nodes) . . . . . . . . . . . . . . . . . . 8-60 Adding error notification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-62 Emulating errors (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-65 Emulating errors (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-68 What will this cause? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-70 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-72 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-74 Unit 9. Integrating NFS into HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 So, what is NFS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 NFS background processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 Combining NFS with HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 NFS fallover with HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 Configuring NFS for high availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 Cross-mounting NFS filesystems (1 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 Cross-mounting NFS filesystems (2 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18 Cross-mounting NFS filesystems (3 of 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20 Choosing the network for cross-mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22 Configuring HACMP for cross-mounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24 Syntax for specifying cross-mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26 Ensuring the VG major number is unique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-28 NFS with HACMP considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-32 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Contents xi Instructor Guide Unit 10. Problem determination and recovery . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2 Why do good clusters turn bad? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4 Test your cluster before going live! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-7 Tools to help you diagnose a problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10 Tools available from smit menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12 Automatic cluster configuration monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-14 Automatic connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-16 HACMP cluster test tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-20 Checking cluster processes (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-23 Checking cluster processes (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-26 Testing your network connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-28 Dead man's switch timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-31 Avoiding dead man’s switch timeouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-33 Setting performance tuning parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-36 Enabling I/O pacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-38 Changing the frequency of syncd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-40 SRC halts a node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-42 Partitioned clusters and node isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-44 Avoiding partitioned clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-47 Automatic failure data capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-49 Check event status message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-51 Changing the timeouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-54 Recovering from an event script failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-56 Recovering from an event failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-59 A troubleshooting methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-61 Contacting IBM for support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-65 Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-67 Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-69 Appendix A. Checkpoint solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Appendix B. Release Notes for HACMP 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Appendix C. IPAT via IP replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 Appendix D. Configuring target mode SSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 xii HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide TMK Trademarks The reader should recognize that the following terms, which appear in the content of this training document, are official trademarks of IBM or other companies: IBM® is a registered trademark of International Business Machines Corporation. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: AIX® BladeCenter® DS4000™ Enterprise Storage Server® HACMP™ POWER™ Redbooks® System i5™ System Storage™ WebSphere® AIX 5L™ Cross-Site® DS6000™ General Parallel File System™ NetView® POWER5™ Requisite® System p™ Tivoli® Approach® DB2® DS8000™ GPFS™ Notes® pSeries® SP™ System p5™ TotalStorage® Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. UNIX® is a registered trademark of The Open Group in the United States and other countries. Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Trademarks xiii Instructor Guide xiv HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide pref Instructor course overview This course teaches the students to design, plan, and configure a highly available cluster of pSeries nodes running AIX 6.1 using the High Availability Cluster Multi-Processing HACMP 5.4.1 software. The course introduces the basic concepts, design, and planning considerations as well as covering the steps necessary to configure the HACMP 5.4.1 startup, fallover, and fallback behavior policies. © Copyright IBM Corp. 1998, 2008 Instructor course overview xv Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide xvi HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide pref Course description HACMP System Administration I: Planning and Implementation Duration: 5 days Purpose This course is designed to prepare students to install and configure a highly available cluster using HACMP for AIX. Audience The audience for this course is students who are experienced AIX system administrators with TCP/IP networking and AIX LVM experience who are responsible for the planning and installation of an HACMP 5.4.1 cluster on an IBM System p server running AIX 5L V5.3 or later (the lab exercises are conducted on AIX 6.1). Prerequisites Students should ideally be qualified as IBM Certified Specialists - p5 and pSeries Administration and Support AIX 5L and in addition have TCP/ IP, LVM storage and disk hardware implementation skills. These skills are addressed in the following courses (or can be obtained through equivalent education and experience): • AU16: AIX 5L System Administration II: Problem Determination • AU07: AIX V4 Configuring TCP/IP Objectives After completing this course, you should be able to: • • • • Explain what high availability is. Outline the capabilities of HACMP for AIX. Design and plan a highly available cluster. Install and configure HACMP for AIX in the following modes of operation: - Single resource group on a primary node with standby node - Two resource groups in a mutual takeover configuration Configure resource group startup, fallover, and fallback policies Perform basic system administration tasks for HACMP. Perform basic customization for HACMP. Perform basic problem determination and recovery. • • • • © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Course description xvii Instructor Guide Curriculum relationship • This course should be taken before AU61 • AU61: HACMP System Administration II: Administration and Problem Determination xviii HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide pref Agenda This agenda assumes a start time of 9:00 a.m., a 10 minute break at each hour, a one-hour lunch break at noon, and a stop time of 4:30 p.m. Day 1 (00:30) Welcome (02:00) Unit 1 - Introduction to HACMP for AIX 5L (03:00) Unit 2- Networking considerations for high availability (00:30) Exercise 1 (01:00) Exercise 2 Day 2 (02:00) Unit 3- Shared storage considerations for high availability (00:45) Unit 4 - Planning for applications and resource groups (01:30) Unit 5 - HACMP installation (01:00) Exercise 3 (00:30) Exercise 4 (00:30) Exercise 5 Day 3 (03:00) Unit 6 - Initial cluster configuration (03:00) Exercise 6 Day 4 (03:00) Unit 7 - Basic HACMP administration (01:30) Unit 8 - Events (03:00) Exercise 7 (00:30) Exercise 8 Day 5 (01:00) Unit 9 - Integrating NFS into HACMP (01:30) Unit 10 - Problem determination and recovery (01:00) Exercise 9 (00:30) Exercise 10 © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Agenda xix Instructor Guide Text highlighting The following text highlighting conventions are used throughout this book: Bold Italics Identifies file names, file paths, directories, user names, and principals. Identifies links to Web sites, publication titles, is used where the word or phrase is meant to stand out from the surrounding text, and identifies parameters whose actual names or values are to be supplied by the user. Identifies attributes, variables, file listings, SMIT menus, code examples of text similar to what you might see displayed, examples of portions of program code similar to what you might write as a programmer, and messages from the system. Identifies commands, daemons, menu paths, and what the user would enter in examples of commands and SMIT menus. Monospace Monospace bold xx HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Unit 0. Course introduction Estimated time 00:30 What this unit is about This unit describes the content of this course. What you should be able to do After completing this unit, you should understand the aim of this course. © Copyright IBM Corp. 1998, 2008 Unit 0. Course introduction 0-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Course objectives After completing this unit, you should be able to: Define high availability. Outline the capabilities of HACMP for AIX Design and plan a highly available cluster Install and configure HACMP in the following modes of operation: – Single resource group on a primary node with a standby node – Two resource groups in a mutual takeover configuration Perform basic system administration tasks for HACMP Perform basic problem determination and recovery © Copyright IBM Corporation 2008 Figure 0-1. Course objectives AU548.0 Notes: 0-2 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. 1998, 2008 Unit 0. Course introduction 0-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Introduction to HACMP for AIX 5L Unit 2 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Course agenda (1 of 5) AU548. 1998.0 Notes: 0-4 HACMP Implementation © Copyright IBM Corp. .Networking Considerations for High Availability Exercise 1 Exercise 2 © Copyright IBM Corporation 2008 Figure 0-2.Instructor Guide Course agenda (1 of 5) Day 1 – – – – – Welcome Unit 1 . 0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . Course introduction 0-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. 2008 Unit 0. Planning for Applications and Resource Groups Unit 5 .Shared Storage Considerations for High Availability Unit 4 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998.HACMP Installation Exercise 3 Exercise 4 Exercise 5 © Copyright IBM Corporation 2008 Figure 0-3. Course agenda (2 of 5) AU548.0 Notes: 0-6 HACMP Implementation © Copyright IBM Corp.Instructor Guide Course agenda (2 of 5) Day 2 – – – – – – Unit 3 . Course introduction 0-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 0.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . 1998.Initial Cluster Configuration – Exercise 6 © Copyright IBM Corporation 2008 Figure 0-4. .0 Notes: 0-8 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Course agenda (3 of 5) AU548.Instructor Guide Course agenda (3 of 5) Day 3 – Unit 6 . 0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . 1998. 2008 Unit 0. Course introduction 0-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Course agenda (4 of 5) AU548.Events Exercise 7 Exercise 8 © Copyright IBM Corporation 2008 Figure 0-5.Basic HACMP Administration Unit 8 . .Instructor Guide Course agenda (4 of 5) Day 4 – – – – Unit 7 . 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: 0-10 HACMP Implementation © Copyright IBM Corp. V4. Course introduction 0-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. 1998. 2008 Unit 0. .0 Notes: 0-12 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Course agenda (5 of 5) AU548.Integrating NFS into HACMP Unit 10 . 1998.Instructor Guide Course agenda (5 of 5) Day 5 – – – – Unit 9 .Problem Determination and Recovery Exercise 9 Exercise 10 © Copyright IBM Corporation 2008 Figure 0-6. 2008 Unit 0. . Course introduction 0-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: 0-14 HACMP Implementation © Copyright IBM Corp. – Each lab must be completed successfully before continuing on to the next lab. © Copyright IBM Corporation 2008 Figure 0-7. – If you have any questions. – Manuals are available online. – TCP/IP and LVM have not been configured.Instructor Guide Lab exercises Points to note: – Work as a team and split the workload. – HACMP software has been loaded and might have already been installed. . ask your instructor. 1998. as each lab is a prerequisite for the next one. Lab exercises AU548. 1998. Course introduction 0-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Unit 0.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . and menu selections. directories. Italics Monospace Monospace bold © Copyright IBM Corporation 2008 Figure 0-8. code examples. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and text that the user would type. 1998. . file listings. and command output that you would see displayed on a terminal. and icons that the user selects. user names.0 Notes: 0-16 HACMP Implementation © Copyright IBM Corp.Instructor Guide Student Guide font conventions The following text highlighting conventions are used throughout this book: Bold Identifies file names. and identifies parameters whose actual names or values are to be supplied by the user. Identifies links to Web sites and publication titles. file paths. variables. and messages from the system. subroutines. principals. Identifies attributes. daemons. SMIT menus. Identifies commands. such as buttons. labels. is used where the word or phrase is meant to stand out from the surrounding text. Also identifies graphical objects. menu paths. Student Guide font conventions AU548. Course introduction 0-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. 2008 Unit 0.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . . © Copyright IBM Corporation 2008 Figure 0-9. planning. and teamwork are essential. 1998.Instructor Guide Course overview summary Key points for the course: There is ample time for the lab exercises. Storage Management and TCP/IP experience is assumed and required. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Thorough design. Course overview summary AU548. Prior AIX. LVM.0 Notes: 0-18 HACMP Implementation © Copyright IBM Corp. 1998.V4. . Course introduction 0-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 0.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. . 1998.Instructor Guide 0-20 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .1: Concepts and Facilities Guide http://www-03. you should be able to: • Define High Availability and explain why it is needed • List the key considerations when designing and implementing a High Availability cluster • Outline the features and benefits of HACMP for AIX • Describe the components of an HACMP for AIX cluster • Explain how HACMP for AIX operates in typical cases How you will check your progress Accountability: • Checkpoint References SC23-4864-10 HACMP for AIX.com/systems/p/library/hacmp_docs.0 Instructor Guide Uempty Unit 1.html HACMP manuals © Copyright IBM Corp. What you should be able to do After completing this unit.V4. 2008 Unit 1.4. Version 5.ibm. Introduction to HACMP for AIX Estimated time 02:00 What this unit is about This unit introduces the concepts of High Availability and HACMP (High Availability Cluster Multi-Processing) for AIX (Advanced Interactive eXecutive). 1998. Introduction to HACMP for AIX 1-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we introduce the concept of High Availability.HACMP means any version and release of the HACMP product. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Unit objectives After completing this unit. you should be able to: Define High Availability and explain why it is needed List the key considerations when designing and implementing a high availability cluster Outline the features and benefits of HACMP for AIX Describe the components of an HACMP for AIX cluster Explain how HACMP for AIX operates in typical cases © Copyright IBM Corporation 2008 Figure 1-1. Unit objectives AU548. HACMP terminology This course uses the following terminology: .HACMP x means version x and any release of that version.0 Notes: Objectives In this unit.y means a specific version and release. and compare High Availability with some alternative availability technologies. . .HACMP x. 1-2 HACMP Implementation © Copyright IBM Corp. examine why you might want to implement a High Availability solution. 1998. . 2008 Unit 1. we will talk about High Availability in general in the first topic and then focus on HACMP concepts in the second topic of this unit. © Copyright IBM Corp. Additional information — This unit is an introduction. .0 Instructor Guide Uempty Instructor notes: Purpose — State the unit objectives. Introduction to HACMP for AIX 1-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Be careful not to teach the whole course.V4. Transition statement — Let’s start by examining what we mean by High Availability. Details — First. 1998. Instructor Guide 1-4 HACMP Implementation © Copyright IBM Corp. . 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1 High Availability concepts Instructor topic introduction What students will do — Study High Availability concepts How students will do it — Listen to lecture What students will learn — What High Availability is How this will help students on their job — Help plan for High Availability © Copyright IBM Corp.0 Instructor Guide Uempty 1. . 2008 Unit 1.V4. 1998. Introduction to HACMP for AIX 1-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: Define High Availability Recognize that eliminating single points of failure (SPOFs) is part of the HACMP implementation process Outline the features and benefits for HACMP for AIX Describe the HACMP concepts of topology and resources Give examples of topology components and resources Provide a brief description of the software and hardware components of a typical HACMP cluster © Copyright IBM Corporation 2008 Figure 1-2. 1998.Instructor Guide High Availability and HACMP concepts After completing this topic. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: 1-6 HACMP Implementation © Copyright IBM Corp. High availability and HACMP concepts AU548. . 1998.V4. 2008 Unit 1. what is High Availability? © Copyright IBM Corp. Details — Additional information — Transition statement — So.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the topic. Introduction to HACMP for AIX 1-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. what is High Availability? AU548. what is High Availability? High Availability characteristics: The masking or elimination of both planned and unplanned downtime The elimination of single points of failure (SPOFs) Fault resilience and system hardening No specialized hardware requirement Workload Fallover WAN Production Node/LPAR Standby Node/LPAR client © Copyright IBM Corporation 2008 Figure 1-3. we say fault resilient instead of tolerant. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. High availability solutions should eliminate single points of failure (SPOF) through appropriate design. and carefully controlled change management discipline. . does not cause the application and its data to be inaccessible to the user community.Instructor Guide So. selection of hardware. or system management. So. High Availability does not mean no interruption to the application. software. be it hardware. configuration of software. 1-8 HACMP Implementation © Copyright IBM Corp. planning. thus.0 Notes: High Availability characteristics A High Availability solution ensures that the failure of any component of the solution. 1998. This is achieved through the elimination or masking of both planned and unplanned downtime. .V4. Transition statement — What is meant by elimination of single points of failure? © Copyright IBM Corp. Details — This is an agenda visual. Introduce the term Single Point of Failure and the acronym SPOF. but they also recover very quickly. Just mention the concepts here.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the concept of High Availability. Additional information — Note that there is a distinct difference between High Availability and fault tolerance. 1998. 2008 Unit 1. Fault tolerant solutions should not fail. High availability solutions do fail. The next visuals cover these points in more detail. Introduction to HACMP for AIX 1-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. configuring application monitor Implementing dual VIO Servers Adding an additional site The fundamental goal of (successful) cluster design is the elimination of single points of failure (SPOFs). renders the HA cluster’s application unavailable. if it fails. 1-10 HACMP Implementation © Copyright IBM Corp. then you can review them from time to time to consider whether some of them now need to be dealt with (for example. site failures if cluster becomes very important). Document the SPOFs which you have decided to not deal with. spend your efforts dealing with SPOFs that can be reasonably handled. Site recovery would be a possible solution here using HACMP/XD. © Copyright IBM Corporation 2008 Figure 1-4. Focus on the art of the possible. most clusters are not designed to deal with the server room being flooded with water. 1998. Remember that generally some SPOFs are not eliminated. . or with the entire city being without electrical power for two weeks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Eliminating single points of failure Cluster Object Node Power source Network adapter Network TCP/IP subsystem Disk adapter Disk Application VIO Server Site Eliminated as a single point of failure by: Using multiple nodes Using multiple circuits or uninterruptible power supplies Using redundant network adapters Using multiple networks to connect nodes Using non-IP networks to connect adjoining nodes and clients Using redundant disk adapter or multipath hardware Using multiple disks with mirroring or raid Adding node for takeover. Eliminating single points of failure AU548.0 Notes: Eliminating single points of failure Each of the items in the left-hand column is a physical or logical component which. For example. In other words. 1998.V4.0 Instructor Guide Uempty Instructor notes: Purpose — List some of the classic single points of failure and how to deal with them. Details — It is probably worthwhile to give an example of an SPOF (for example. Additional information — Transition statement — What model can be used with AIX to eliminate single points of failure? © Copyright IBM Corp. 2008 Unit 1. but might become necessary to deal with as the importance of the cluster evolves. . long-term building power outage) that you might decide to not deal with. Introduction to HACMP for AIX 1-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This tends to keep folks focused on what really matters to them while reassuring them that the big SPOFs can still be discussed rationally. robust. High availability clusters (HACMP base) AU548. IBM’s HACMP product has been ranked (and continues to be ranked) the leading high-availability solution for UNIX servers by D. Those will be pointed out later. and feature-rich product that delivers significantly improved availability on the IBM System p platform. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Do feel free to examine the high-availability solutions offered by our competitors. 1-12 HACMP Implementation © Copyright IBM Corp.dhbrown. . a redundant back-up component is waiting to take over the workload.H. that is. Virtualization is supported as well. Brown Associates (www. you’ll also agree that HACMP 5 is a mature.com) for many years. providing all of the requirements are met. The systems that are clustered can be stand-alone systems or Logical Partitions (LPARs).Instructor Guide High availability clusters (HACMP base) System p and AIX RAS features include: Application and Partition Mobility First Failure Data Capture (FFDC) Dynamic CPU Deallocation Flexible Service Processor Redundant Power and Cooling Error Correction Checking Memory Hot Swap Adapters Dynamic Kernel Journaled Filesystem Redundant Data Paths Dual Disk Adapters (MPIO) Data Mirroring and/or Striping Hot Swap / Hot Spare Storage Redundant Power/Cooling for Storage Arrays With High Availability Clustering (HACMP) Protection against node and OS failure with Redundant nodes Protection against NIC failure with Redundant Network Adapters Protection against Network failure with Redundant Networks Self-healing clusters with Application Monitoring Protection against Site Failure (typically limited by SAN infrastructure) or no distance limitations with HACMP/XD © Copyright IBM Corporation 2008 Figure 1-5. If any component of the solution should fail. 1998. We are confident that by the end of this course.0 Notes: High availability clustering The High Availability solution addresses the fundamental weakness of both the stand-alone and stand-alone enhanced storage systems. it has two of everything. 1998. © Copyright IBM Corp. .0 Instructor Guide Uempty Drawback The base product HACMP 5 only partially solves the site SPOF in the case where data does not have to be replicated. Introduction to HACMP for AIX 1-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This can be done with LVM mirroring using SAN technology.V4. 2008 Unit 1. Additional information — Point out to the students that having one of anything is a single point of failure. Explain that High Availability solutions do fail. . If a student asks you how quickly HACMP takes to recover following fallover. answer “It depends. but they just recover very quickly. Transition statement — We haven’t eliminated sites as a single point of failure. Ask them if they are the only person in their company who is trained to support HACMP. but can we? 1-14 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Examine the availability benefits of the High Availability solution. and we will see what it depends upon later in the course. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Ensure that you cover the benefits of a High Availability solution when compared with a stand-alone and enhanced solution.” Point out that HACMP 5 has limited support for site failures in the base product and will be looked at in the next visual. . 1998. what about site failure AU548. AU620. For more information.0 Notes: What about Site Failure and data replication? Limited distance The base product HACMP 5.V4. These solutions enable an HACMP cluster to operate over extended distances at two sites. HACMP/XD for Metro Mirror/PPRC increases data availability for IBM TotalStorage ESS/DS/SVC volumes that use Peer-to-Peer Remote Copy (PPRC) to copy data to a remote site for disaster recovery purposes. Using SAN technology. So. a. HACMP/XD for Metro Mirror/PPRC © Copyright IBM Corp. you can get limited distance support for site failures.2 and later allows you to create sites as long as you can use LVM mirroring for redundancy. Introduction to HACMP for AIX 1-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 1. and network independent Automated site failover and reintegration A single cluster across two sites Get more details in HACMP System Administration III – AU620 Metro Mirror/PPRC GLVM GeoRM Toronto Data Replication © Copyright IBM Corporation 2008 Brussels Figure 1-6. HACMP/XD) – – – – – Distance unlimited Application. disk. Extended distance The HACMP/XD (Extended Distance) priced feature provides three distinct software solutions for disaster recovery. see the HACMP System Administration III: Virtualization and Disaster Recovery course.0 Instructor Guide Uempty What about site failure? Limited distance (LVM mirroring and SAN): HACMP for AIX Extended distance: Geographic Clustering Solution (that is. b. as HACMP/XD for GLVM is positioned as the replacement for HACMP/XD for HAGEO. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Data entered at one site is sent across a point-to-point IP network and mirrored at a second.Instructor Guide takes advantage of the PPRC fallover/fallback functions and HACMP cluster management to reduce downtime and recovery time during disaster recovery. . HACMP/XD for HAGEO Technology uses the TCP/IP network to enable unlimited distance for data mirroring between sites. This technology is based on the IBM High Availability Geographic Cluster for AIX (HAGEO) product. disk technology.ibm. HACMP/XD was called HAGEO. getting started with GLVM as a means of replicating data across sites without making the commitment to HACMP/XD is possible. data. HACMP/XD for HAGEO Technology extends an HACMP for AIX cluster to encompass two physically separate data centers. HACMP/XD is independent of the application. (Note that although the distance is unlimited. In the past.html c. Additionally. Try both names if you’re searching for information on HACMP/XD. the physical distance between sites is limited to the capabilities of the ESS/DS/SVC hardware. geographically distant location. 1-16 HACMP Implementation © Copyright IBM Corp. but concurrent access is not supported across sites. HACMP/XD can work across any network that supports TCP/IP and offers automated fallover of applications and data from one site to another (maximum two sites) in the event of a site disaster. refer to http://www. When PPRC is used for data mirroring between sites. 1998. practical restrictions exist on the bandwidth and throughput capabilities of the network). This is referred to as Standalone GLVM and is supported in AIX 5. This is a no cost function. and distances between the sites. Enhanced concurrent mode volume groups (in any type of HACMP resource group) are supported on each site’s nodes. HACMP/XD for GLVM takes advantage of the following components to reduce downtime and recovery time during disaster recovery: • AIX GLVM data mirroring and synchronization • TCP/IP-based unlimited distance network support • HACMP for AIX cluster management Multiple data mirroring networks are supported increasing the availability and performance.3 and later. HACMP/XD for Geographic Logical Volume Manager (GLVM) increases data availability for IBM volumes that use GLVM to copy data to a remote site for disaster recovery purposes. For a whitepaper on the configuration of GLVM.com/servers/aix/whitepapers/aix_glvm. . but be sure to indicate that a limited distance (less than 20 km) can be implemented with just HACMP and LVM mirroring. you will implement HACMP/XD with a data replication method. let’s look at the product.0 Instructor Guide Uempty Instructor notes: Purpose — Very briefly discuss the site as a single point of failure. 2008 Unit 1. © Copyright IBM Corp.V4. Introduction to HACMP for AIX 1-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Don’t go into many details. 1998. Additional information — Transition statement — Now that all the goals are understood. For greater distances. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide IBM's HA solution for AIX HACMP for AIX characteristics: – Stands for High Availability Cluster Multi-processing – Is based on cluster technology (RSCT) – Provides two environments (which can co-exist simultaneously): • Serial (High Availability): the process of ensuring that an application is available for use through the use of serially accessible shared data and duplicated resources • Parallel (Cluster Multiprocessing): concurrent access to shared data © Copyright IBM Corporation 2008 Figure 1-7. thus offering excellent horizontal scalability. diagnosis. recovery and reintegration. With an appropriate application. 1-18 HACMP Implementation © Copyright IBM Corp. HACMP can also work in a concurrent access or parallel processing environment. .0 Notes: HACMP characteristics IBM’s HACMP product is a mature and robust technology for building a high-availability solution. IBM's HA solution for AIX AU548. A high-availability solution based upon HACMP provides automated failure detection. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Outline the capabilities of HACMP. 1998. . Introduction to HACMP for AIX 1-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — What makes up the HACMP product? © Copyright IBM Corp. 2008 Unit 1. Details — Explain that HACMP is a clustering technology that provides both fallover protection through redundant components and horizontal scalability through concurrent access (also known as parallel processing). Fundamental HACMP concepts AU548. 1998. log files and SMIT screens. time until warning (config_too_long timeout) © Copyright IBM Corporation 2008 Figure 1-8. at most. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. pre.and post-event scripts. user-defined events. which HACMP controls as a single unit – A given resource can appear only in.0 Notes: Terminology A clear understanding of the above concepts and terms is important as they appear over and over again both in the remainder of the course and throughout the HACMP documentation. typically via implementing scripts – Minimum: application start and stop scripts – Optional: • Application monitoring scripts (highly recommended!) • Event customization – Notification. . one resource group Resource group policies: – startup policy: which node the resource group is activated on – fallover policy: determines target when there is a failure – fallback policy: determines fallback behavior Customization – The process of augmenting HACMP. 1-20 HACMP Implementation © Copyright IBM Corp. recovery scripts.Instructor Guide Fundamental HACMP concepts Topology: Physical “networking centric” components Resources: Entities that are being made highly available Resource group: A collection of resources. V4.0 Instructor Guide Uempty Instructor notes: Purpose — List some of the key HACMP terms and concepts. . 2008 Unit 1. Details — Be sure that the students are clear on these terms and concepts before moving on. 1998. we need to be able to use these terms and concepts without having to constantly define them. Additional information — Transition statement — How do these components work together to provide high availability? © Copyright IBM Corp. Introduction to HACMP for AIX 1-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and network adapters. Cluster A cluster is comprised basically of nodes. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. resource group. These objects are referred to as Topology objects.Instructor Guide A highly available cluster Fundamental Concepts clstrmgr clstrmgr ur c e Reso group Shared Storage Node Fallover Node Cluster is comprised of physical components (topology) and logical components (resource groups and resources). 1-22 HACMP Implementation © Copyright IBM Corp. and cluster manager (clstrmgr). 1998. These objects are referred to as Resource objects. A highly available cluster AU548. network address. and volume group using shared disks. © Copyright IBM Corporation 2008 Figure 1-9.0 Notes: Fundamental concepts HACMP is based on the fundamental concepts of cluster. Resource group A resource group is typically comprised of an application. networks. Introduction to HACMP for AIX 1-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Here is a simple diagram of a two-node cluster. The clstrmgr runs on all the nodes of the cluster. using shared disk. © Copyright IBM Corp.V4.0 Instructor Guide Uempty clstrmgr The cluster manager daemons together are the software components that communicate with each other to control on which node a resource group is activated or where the resource group is moved on a fallover based on parameters set up by the administrator. and providing fallover for a single application. 2008 Unit 1. 1998. . the client connection. and the fallover capability for the workload. 1998. Details — Briefly explain the concept of the cluster. 1-24 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. let’s have a look at the topology components. point out the nodes. Additional information — We cover the details of topology and resources on the next pages. Transition statement — So. .Instructor Guide Instructor notes: Purpose — Show a high-level example of a cluster running HACMP. Ethernet or token-ring network adapters). HACMP’s topology components (1 of 2) Notes: Topology components A cluster’s topology is the cluster. nodes and the technology that connects them together. nodes (pSeries servers). 1998. © Copyright IBM Corporation 2008 Figure 1-10. © Copyright IBM Corp. 2008 IP ork tw Ne -IP k on or N tw e N Communication Interface n io at ic un e m ic m Dev Co r ste Clu No de AU548. . Nodes In the context of HACMP. networks (connections between the nodes). Introduction to HACMP for AIX 1-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Unit 1.V4.0 Instructor Guide Uempty HACMP's topology components (1 of 2) The Topology components consist of a cluster. the term node means any IBM pSeries system that is a member of a high-availability cluster running HACMP. the communication interfaces (for example. and the communication devices (/dev/rhdisk for heartbeat on disk or /dev/tty for RS232 for example). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Networks can also be logical or physical.Instructor Guide Networks Networks consist of IP and non-IP networks. Non-IP networks are strongly recommended to be configured in an HACMP. 1-26 HACMP Implementation © Copyright IBM Corp. 1998. Logical networks have been used with the IBM SP environments when different frames were in different subnets but needed to be treated as if they were in the same network for HACMP purposes. The non-IP networks ensure that cluster monitoring can be done if there is a total loss of IP communication. network and communication interface in the context of HACMP. let’s take a look at the topology objects in more detail. 2008 Unit 1. 1998. © Copyright IBM Corp. Transition statement — Now. Details — Explain the concepts of node. Introduction to HACMP for AIX 1-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Outline the topology components of HACMP.V4. As of HACMP 5. a service IP label is no longer a topology component. Additional information — Ensure that you emphasize the fact that all these components will be covered in detail later in the course. . The internal Ethernet adapter fitted to most entry-level pSeries servers cannot be included in the calculations. Target-mode SCSI Fibre Channel RS/6000 Heartbeat on Disk RS232/422 Shared storage – – Physical • SCSI or Fibre Channel DS8000 RS/6000 DS4000 SAN IBM Virtual SCSI © Copyright IBM Corporation 2008 Fibre Figure 1-11. . everything. HACMP 5 works with pSeries servers in a “no-single-point-of-failure” server configuration. internal disk. and I/O slots. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. It should be noted that even with four adapter slots free. Any other adapters (for example. HACMP’s topology components (2 of 2) AU548. well. see the Sales manual at 1-28 HACMP Implementation © Copyright IBM Corp.Instructor Guide HACMP’s topology components (2 of 2) Node – – Any-to-any. RS-232. HACMP for AIX supports the System p models that are designed for server applications and that meet the minimum requirements for internal memory. graphics adapters) occupy additional slots.0 Notes: Supported nodes As you can see. 1998. the range of systems that supports HACMP is. For a current list of systems that are supported with the version of HACMP that you want to use. The only requirement is that the system should have at least four adapter slots spare (two for network adapters and two for disk adapters). there is still be a single point of failure as the cluster is able to accommodate only a single TCP/IP local area network between the nodes. including LPARs Minimum number of physical adapters for redundancy must be considered Ethernet / Etherchannel PC Server Server Server Networking – Ethernet • • Physical and virtual Etherchannel Non -IP Server – Non-IP • Heartbeat on disk. HACMP monitors and performs IP address switching for the following TCP/IP-based communications adapters on cluster nodes: Ethernet EtherChannel Token ring FDDI SP Switches ATM ATM LAN Emulation HACMP also supports non-IP networks.2 does not support micro channel systems.1 with HACMP 5. LPAR support There is also support for dynamically adding LPAR resources in AIX V5. The list of HACMP documents will display. Unsupported nodes and adapters With the introduction of AIX V5. the integrated serial ports are supported only for modem and async terminal connections. the integrated serial ports are not enabled when the HMC ports are connected to a Hardware Management Console.V4. the micro channel range of systems is excluded. On most IBM System p5 and IBM System i5 servers. and click Search. Either the HMC ports or the integrated serial ports can be used. Target Mode SSA (TMSSA).ibm.2 (and earlier) on micro channel systems.nsf/WebIndex/FLASH10390 for more details. and Heartbeat on Disk (using Enhanced Concurrent Mode Volume Groups).com/support/techdocs/atsmastr.2 or later LPAR environments to take advantage of Capacity Upgrade of Demand (CUoD). click Advanced Search. choose the HW & SW desc (Sales Manual. 1998. such as RS232/442. HACMP 5. Supported networks HACMP 5 supports client users on a LAN using TCP/IP. because AIX V5. but not both. See http://www-03. go to the bottom of the screen.0 Instructor Guide Uempty www. Consult the Sales Manual if you intend to use an integrated serial port.2 (and later) supports Virtual SCSI (VSCSI) and Virtual LAN (VLAN) on POWER5 (IBM System p5 and IBM System i5). require a separate serial port adapter to be installed in a PCI slot. you can consult the Sales Manual.com/common/ssi. It is highly recommended to have both IP and non-IP networks defined to HACMP. © Copyright IBM Corp. RPQ) option from the pull-down menu.2. Moreover. Any other applications using serial ports. Introduction to HACMP for AIX 1-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you can still run AIX V5.ibm. Type hacmp in the Title: field. For a list of specific adapters. However. . Target Mode SCSI (TMSCSI). including HACMP. 2008 Unit 1. . but when using these facilities on an HACMP cluster node. 1-30 HACMP Implementation © Copyright IBM Corp. . ensure that they are configured on the subnets that are completely different from the subnets used by HACMP. such as EMC and Hitachi. the term adapter is used. For non-IP adapters. . although some custom modifications may be required. HACMP might not be able to properly detect failures and manage recovery. 1998. This ml0 interface is not supported by HACMP. can be used. VIPA can be configured and used outside of HACMP. This must be done either through the redundancy features of a storage device or through AIX LVM mirroring. For a complete list of supported devices. Most IBM storage is supported with HACMP. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.ibm. and third-party storage. Supported technologies include Fibre Channel and SCSI.com/systems/p/ha. Shared storage environments HACMP is largely unconcerned about the disk storage that you select. This is an ml0 interface with its own IP address. For IP networks. The failure of the underlying devices that are used to service the pseudo device cannot be coordinated with HACMP recovery processing. see the HACMP 5.Aggregate IP Interface with the SP Switch2 With the SP Switch2 you have css0 and css1. It is also important to note that data availability is not ensured by HACMP.IP V6 Adapters versus devices HACMP distinguishes between communication adapters and communication devices for network support. If any VIPA addresses are configured on the same subnet that is used for an HACMP network.Instructor Guide Unsupported networks The following networks are not supported: Serial Optical Channel Converter (SOCC) SLIP Fibre Channel Switch (FCS) 802_ether Virtual IP Address (VIPA) facility of AIX The pseudo IP address provided by VIPA cannot be reliably monitored by RSCT or HACMP. PSSP allows you to configure an Aggregate IP switch. the term communication device is used.1 Announcement Letter at: http://www.4. This will be discussed further in the networking unit of this course. Introduction to HACMP for AIX 1-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The virtual resources are supported given that the prerequisites are followed as outlined in the student notes (and according to the most current published information). Details — Spend time on this slide. 1998. Additional information — Transition statement — Now move on to the Resource components.0 Instructor Guide Uempty Instructor notes: Purpose — To detail the topology components. © Copyright IBM Corp.V4. . Also point out that virtual SCSI is supported for storage access. covering the fact that etherchannel and virtual Ethernet are supported networking types. 2008 Unit 1. Resource groups will be covered in more detail in Unit 4. 1998. such as: 1-32 HACMP Implementation © Copyright IBM Corp. . HACMP's resource components ica pl Ap tio n er rv Se le Fi tem s Sy roup urce G Re s o s Node e Policies im Run-t ces ur Reso © Copyright IBM Corporation 2008 AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Because they are logical components.0 Notes: Resource group A resource group is a collection of resources treated as a unit along with the nodes that they can potentially be activated on and what policies the cluster manager should use to decide which node to choose during startup. and fallback. they can be moved without human intervention. The resources shown in the visual are a typical set of resources used in resource groups. Resources Resources are logical components that can be put into a resource group. fallover.Instructor Guide HACMP's resource components Vo Gr lum ou e p Se Ad rvic dr e I es P s Figure 1-12. A cluster can have more than one resource group (usually one for each application). thus allowing for very flexible configurations. V4. the users are given an IP address or hostname to connect to. .A resource group might be configured to provide NFS server services by NFS exporting some of its filesystems. attributes. this storage is contained within volume groups.” can be assigned.An application often requires that certain filesystems be mounted. Filesystem .The application itself must be part of the resource group (strictly speaking. Finally. such as “Force vary on of volume groups. This IP address/hostname becomes a resource in the resource group because it must be associated with the same node that is running the application.An application might require that an NFS filesystem be mounted by the node running the application NFS exports .0 Instructor Guide Uempty Service IP Address . These will be covered later in this course in Unit 6. © Copyright IBM Corp. The use of Application Server can be confusing because this term is used popularly by application vendors to describe a layer in their implementation. Application Server .Users need to be able to connect to the application. 1998. Introduction to HACMP for AIX 1-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Volume Group . The use of the term in HACMP describes the start/stop methods (scripts) for the application. you can also associate with a resource group the following: NFS mounts . 2008 Unit 1.If the application requires shared disk storage. The IP address/hostname resource is referred to as the Service IP Label in the resource group. Typically. the application server actually consists of scripts which start and stop the application as required by HACMP). It is an object that points to the start/stop methods. In addition to the resources listed in the figure. More than one Service IP Label may be configured for a resource group. what is the internal structure at a high-level? 1-34 HACMP Implementation © Copyright IBM Corp. 1998. that is. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Additional information — The difference between topology and resources is that the topology components are physical.Instructor Guide Instructor notes: Purpose — Introduce the concept of resources and resource groups. Explain that resources are grouped together for administrative purposes in to resource groups. nodes. networks. resources are logical entities. Details — Explain that resources are logical components that can be moved from one node to another without manual intervention. which can be moved from node to node without human intervention. Transition statement — After understanding the components. which would require manual intervention to move from one place to another. In contrast. and network adapters. Introduction to HACMP for AIX 1-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and RSCT to react to failures. RMC subsystems) © Copyright IBM Corporation 2008 snmpd clinfoES clstat Figure 1-13. an event manager with event scripts that works through the RMC facility.3 and later. or OpenView. 1998. 2008 Unit 1. LVM. . RSCT. which allows for SNMP-based monitoring to be done manually or by using an SNMP manager. Clinfo also provides remote monitoring capabilities and can run a script in response to a status change in the cluster. the cluster manager contains the SNMP SMUX Peer function (previously provided by the clsmuxpd) for the cluster manager MIBs. a resource manager to manage resource groups. BMC.0 Notes: HACMP core components HACMP comprises of a number of software components: .The cluster manager clstrmgr is the core process that monitors cluster membership. The cluster manager includes a topology manager to manage the topology components. SRC. such as Tivoli.0 Instructor Guide Uempty What is HACMP? An application which: – – – – Controls where resource groups run Monitors and reacts to events Provides tools for cluster-wide configuration and synchronization Relies on other AIX Subsystems (ODM. grpsvcs.V4. What is HACMP? AU548. Clinfo is an optional © Copyright IBM Corp. . .In HACMP 5. TCP/IP. and so on) Cluster Manager Subsystem (clstrmgrES) clcomdES Topology manager Resource manager Event manager SNMP manager RSCT (topsvcs.The clinfo process provides an API for communicating between cluster manager and your application. 1-36 HACMP Implementation © Copyright IBM Corp. .Instructor Guide process that can run on both servers and clients (the source code is provided).In HACMP 5.rhost files. or Web browser interfaces. Xwindow. The clstat command uses clinfo to display status via ASCII. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. clcomdES allows the cluster managers to communicate in a secure manner without using rsh and the /. . Introduction to HACMP for AIX 1-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Additional information — ClsmuxpdES was moved into the clstrmgr in HACMP 5.3 Transition statement — But. HACMP has many additional features. © Copyright IBM Corp. 2008 Unit 1.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Examine the software components of HACMP. . Details — Briefly outline each of the components and the role each component performs. Configuration changes can be made to the cluster while the cluster is running. and cluster administration. monitoring. It includes an RG_Move facility. © Copyright IBM Corporation 2008 Figure 1-14. testing.C-SPOC is a series of SMIT menus that allow AIX-related cluster tasks to be propagated across all nodes in the cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Multiple monitors can be defined for an application.Application monitoring should be used to monitor the cluster’s applications and restart them should they fail.0 Notes: Additional features HACMP also has additional software to provide facilities for administration.Instructor Guide Additional features of HACMP OLPW smit via web Configuration Assistant CSPOC DARE clstrmgrES SNMP Verification Auto tests Tivoli Integration Application Monitoring HACMP is shipped with utilities to simplify configuration. 1-38 HACMP Implementation © Copyright IBM Corp. 1998. . including monitoring the startup. . customization. This facility is known as Dynamic Automatic Reconfiguration Event (or DARE for short). Additional features of HACMP AU548. . remote monitoring. and verification: . which allows a resource group to be placed offline or on another node without stopping the cluster manager. as a manual process and a daily Automatic Cluster Configuration Monitoring function. © Copyright IBM Corp.0 Instructor Guide Uempty .Administration is made easier by the use of Online Planning Worksheets (OLPW) and a Web-based SMIT interface.A two-node configuration assist facility enables you to configure an HACMP cluster with very little input. which will be covered in more detail in the HACMP Administration II: Administration and Problem Determination course. 1998. . . as part of synchronization. Introduction to HACMP for AIX 1-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 1. . .Verification is provided at HACMP startup time.There is an automatic correction facility.V4. Transition statement — HACMP can be customized to achieve availability goals for your particular environment. Additional information — Note that Automatic Cluster Configuration Monitoring is usually called automatic cluster verification and that it requires that clcomdES is active and that the /usr/es/sbin/cluster/etc/rhosts file is present and has something in it. .Instructor Guide Instructor notes: Purpose — Examine some of the other components of the HACMP software. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Outline each component and the role that it performs. 1-40 HACMP Implementation © Copyright IBM Corp. 1998. V4. Customization required HACMP is shipped with event scripts (Korn Shell scripts) which handle the failure scenarios. 1998. Some Assembly Required AU548. © Copyright IBM Corp. all the script writing that is required to integrate an application into the cluster is done in the Application Server start/stop scripts.0 Instructor Guide Uempty Some assembly required HACMP can be used out of the box. – – Minimum: • Application Start/Stop/Monitor scripts Optional: • Customized pre/post event scripts • Reaction to events – Error notification Methods – User Defined Event’s (UDE’s) – Cluster State Change HACMP's flexibility allows for complex customization in order to meet availability goals © Copyright IBM Corporation 2008 Figure 1-15.0 Notes: Not just HACMP The final high-availability solution is more than just HACMP. appropriate selection of hardware. 2008 Unit 1. trained administrators. however. applications that are tested to work in a high-availability cluster. Application Server start/stop scripts are written to control the application(s) based on the status of the cluster nodes. storage devices. and thorough design and planning. Introduction to HACMP for AIX 1-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. some assembly is required. . Most often. A high-availability solution comprises a reliable operating system (AIX). In the rare circumstance where you have a requirement to customize some special fallover behavior.and post-event scripts. an API is provided that allows third-party application vendors to write Smart Assists. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. In HACMP 5.Instructor Guide Smart Assists are provided in HACMP (since HACMP 5. 1998. .4 and later. 1-42 HACMP Implementation © Copyright IBM Corp. this is done with pre.2) to help ease the customization for the applications that they address. Additional information — This capability is explained in detail later in the course.V4. 2008 Unit 1. Details — Point out that HACMP can be customized through the provision of pre.0 Instructor Guide Uempty Instructor notes: Purpose — Explain that HACMP can be customized. it’s time to review what we have been doing in this topic. Each HACMP event script (shell script) can have a customer defined pre-event (one or more) and a customer defined post-event (one or more). 1998. Introduction to HACMP for AIX 1-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Transition statement — OK.and post-event scripts. © Copyright IBM Corp. © Copyright IBM Corporation 2008 Figure 1-16. 2. Which of the following is a characteristic of high availability? a. 4. Network c. 3. High availability solutions never require customization.0 Notes: 1-44 HACMP Implementation © Copyright IBM Corp.) a. Service IP label d. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Which of the following items are examples of topology components in HACMP? (Select all that apply. High availability always requires specially designed hardware components. 1998. Let’s review AU548. d. c. Hard disk drive True or False? All nodes in an HACMP cluster must have roughly equivalent performance characteristics. Node b. High availability solutions use redundant standard equipment (no specialized hardware).Instructor Guide Let’s review 1. High availability solutions always require manual intervention to ensure recovery following fallover. True or False? A thorough design and detailed planning is required for all high availability solutions. b. Introduction to HACMP for AIX 1-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .V4. 2008 Unit 1. let’s take a closer look at how HACMP for AIX works.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — Now that we have looked at High Availability in general. 1998. © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide 1-46 HACMP Implementation © Copyright IBM Corp. 1998. . V4. Introduction to HACMP for AIX 1-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty 1.2 What does HACMP do? © Copyright IBM Corp. 2008 Unit 1. . 1998. we take a look at what HACMP does. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Topic 2 objectives: What does HACMP do? AU548. .Instructor Guide What does HACMP do? After completing this topic.0 Notes: In this topic. you should be able to: Describe the failures that HACMP detects directly Provide an overview of the standby and takeover cluster configuration options in HACMP Describe some of the considerations and limits of an HACMP cluster © Copyright IBM Corporation 2008 Figure 1-17. 1-48 HACMP Implementation © Copyright IBM Corp. 1998. 0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the topic. 1998.V4. . 2008 Unit 1. Details — Additional information — Transition statement — So what is HACMP’s fundamental function? © Copyright IBM Corp. Introduction to HACMP for AIX 1-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. A node failure (all communication adapters/devices on a given node) c. 1-50 HACMP Implementation © Copyright IBM Corp. A network failure (all communication adapters/devices on a given network) HACMP also interfaces to the AIX error log to respond to the loss of quorum for a volume group when the loss is detected by the LVM. monitors the state of the applications. . networks.0 Notes: HACMP basic functions HACMP detects three kinds of network related failures. and can be customized to react to every possible failure © Copyright IBM Corporation 2008 Figure 1-18. either by AIX or LVM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. a. and can be handled in HACMP via customization. A communications adapter or device failure b. Just What Does HACMP Do? AU548. 1998. network adapters and devices – Strives to keep resource groups highly available – Optionally. Most other failures are handled outside of HACMP.Instructor Guide Just what does HACMP do? HACMP functions: – Monitors the states of nodes. Disk failure is handled through LVM mirroring or RAID. 2008 Unit 1.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the components that HACMP monitors. are monitored by AIX. especially disk adapters and disk buses. Other components. 1998. and network adapters.V4. . Additional information — Transition statement — What happens when something fails? © Copyright IBM Corp. networks. Details — Point out that HACMP monitors only nodes. Introduction to HACMP for AIX 1-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. What happens when something fails? AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide What happens when something fails? How the cluster responds to a failure depends on what has failed. another node takes over from a failed node). . if a node fails. © Copyright IBM Corporation 2008 Figure 1-19. If only one resource group is affected. an action that consists of moving the resource groups that were previously on the failed node to a surviving node. 1-52 HACMP Implementation © Copyright IBM Corp. what the resource group's fallover policy is.0 Notes: How HACMP responds to a failure HACMP generally responds to a failure by using a still available component to take over the duties of the failed component. and if there are any resource group dependencies: – Typically. then HACMP initiates a fallover. another equivalent component takes over duties of failed component (for example. HACMP usually moves any IP addresses being used by clients to another available NIC. For example. If there are no remaining available NICs. then only the one resource group is moved to another node. HACMP initiates a fallover. If a Network Interface Card (NIC) fails. 1998. 0 Instructor Guide Uempty Instructor notes: Purpose — Explain. Details — Keep the discussion fairly general as the details will follow. Introduction to HACMP for AIX 1-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 1. Additional information — Transition statement — What happens when a failed component recovers? © Copyright IBM Corp. 1998. in general terms. . what happens when a failure occurs.V4. Some components are integrated automatically. for instance. and possibly moving the resource group. it must be reintegrated back into the cluster (reintegration is the process of HACMP recognizing that the component is available for use again). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. administrators need to indicate or confirm that the fixed component is approved for use. are automatically reintegrated when they recover. 1-54 HACMP Implementation © Copyright IBM Corp. starting cluster services. © Copyright IBM Corporation 2008 Figure 1-20. . What happens when a problem is fixed? AU548. or bringing it online). 1998. Some components. cannot be reintegrated until the cluster administrator explicitly requests the reintegration (by starting the HACMP daemons on the recovered node. what the resource group's fallback policy is. such as nodes. such as NICs.0 Notes: How HACMP responds to a recovery When a previously failed component recovers. and the resource group dependencies: – Typically. Other components.Instructor Guide What happens when a problem is fixed? ? How the cluster responds to the recovery of a failed component depends on what has recovered. when a communication interface recovers. 1998. Introduction to HACMP for AIX 1-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Explain. . Details — Additional information — Transition statement — Let’s take a look at how a typical two-node cluster with a single application would probably be configured using a standby configuration.V4. 2008 Unit 1. in general terms. what happens when a previously failed component recovers. Standby node with one node primary In a two-node cluster.Instructor Guide Standby (active/passive) with fallback Node USA fails A Node UK fails A One node is primary RG can be configured to come online on the primary or any node A (no change) USA returns UK returns A A © Copyright IBM Corporation 2008 Figure 1-21. there is a single application (that is.0 Notes: Standby Standby configurations are configurations where one (or more) nodes have no workload. there would be a start-up policy to indicate which node is primary (or home). and a fallback policy is set so that the resource group automatically falls back to the primary node when the primary node recovers. To accomplish this. and the node with no workload is the secondary. resource group). . standby. or back-up node. 1998. Standby (active/passive) with fallback AU548. 1-56 HACMP Implementation © Copyright IBM Corp. a fallover policy to allow fallover if the primary node fails. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. which must run as much as possible on a primary or home node. A second outage on the fallback is possible. continues with quaternary. 1998. The resource group would usually be configured to run on the highest priority (most preferred) available node. . octonary. and denary. senary. nonary. 2008 Unit 1. .One node is not used (this is ideal for availability but not from a utilization perspective). This could lead to performance problems if more than one application must be moved to the standby node. For example: primary -> secondary -> tertiary -> quaternary -> quinary -> senary -> septenary -> octonary -> nonary -> denary. septenary.. and tertiary. © Copyright IBM Corp. secondary. have applications.. and the one node is a standby node.V4. The resource group could be configured to have multiple layers of back-up nodes. which starts primary. A tidbit for the wordsmiths in the audience: The sequence.0 Instructor Guide Uempty Drawbacks . Extending this concept to more nodes This concept can be extended to multiple nodes in two ways: i. Introduction to HACMP for AIX 1-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. All nodes. Multiple layers of back-up nodes are possible--fallover policy determines which node. ii. There is no generally accepted word for eleventh order although duodenary means twelfth order. quinary. The word for twentieth order is vigenary. except one. At this point. Transition statement — What if we want to avoid the fallback outage? 1-58 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Describe a typical two-node cluster with a single application. the policy is just a summary of the behaviors being described. . Additional information — Point out a reason why this could be if one node was much more capable (better performance) than the standby node. Do not go into what all the policies are on these visuals. Details — You might want to set the stage for the next visual by pointing out that the fallback after the primary node recovers results in a second outage from the users’ perspective (who are unlikely to be impressed by the notion that this second outage of the day is actually good news indicating that things are back to normal). 1998. This avoids the second outage.V4.0 Instructor Guide Uempty Standby (active/passive) without fallback USA fails A UK returns A Eliminates another outage Reduces downtime A USA returns A UK fails © Copyright IBM Corporation 2008 Figure 1-22. © Copyright IBM Corp. Introduction to HACMP for AIX 1-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . The cluster administrator can request that HACMP move the resource group back to the higher priority node at an appropriate time or it can simply be left on its current node indefinitely (an approach that calls into question the terms primary and secondary. Extending to more nodes This can result in multiple applications ending up on the node that stays up the longest. 2008 Unit 1. Standby (active/passive without fallback AU548.0 Notes: Minimize downtime A resource group can be configured to not fall back to the primary node (or any other higher priority node) when it recovers. but which is actually quite a reasonable approach in many situations). which results when the fallback occurs. 1998. Details — Additional information — Transition statement — Now. 1-60 HACMP Implementation © Copyright IBM Corp. Most two-node clusters actually have two applications. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we look at takeover configurations.Instructor Guide Instructor notes: Purpose — Explain how the cluster administrator might avoid the fallback outage. 1998. 2008 Unit 1. 1998. Mutual takeover: Active/Active AU548.V4.0 Notes: Takeover Takeover configurations imply that there is workload on all nodes which might or might not be under the control of HACMP. Introduction to HACMP for AIX 1-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Mutual takeover configurations are very popular configurations for HACMP because they support two highly available applications at a cost. one failing from right to left and the other failing from left to right. This is referred to as mutual takeover. Mutual takeover An extension of the primary node with a secondary node configuration is to have two resource groups. . © Copyright IBM Corp. which is not that much more than would be required to run the two applications in separate stand-alone configurations. but that a node can take over the work of another node in the cluster.0 Instructor Guide Uempty Mutual takeover: Active/Active A USA fails B UK fails Very common B A No one node/LPAR is left idle A B USA returns (with Fallback) UK returns A B (with Fallback) © Copyright IBM Corporation 2008 Figure 1-23. should one of the nodes fail. .HACMP for AIX license fees.Each cluster node probably needs to be somewhat larger than the stand-alone nodes because they must each be capable of running both applications. possibly in a slightly degraded mode. . which is often forgotten in the early cluster planning stages). .This is not intended to be an all inclusive list of additional costs. . 1998. 1-62 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Additional costs Note that there are at least a few additional costs: .Additional software licenses might be required for the applications when they run on their respective back-up nodes (this is a potentially significant cost item. each with a different highest priority node. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Explain mutual takeover. 2008 Unit 1. © Copyright IBM Corp. Details — Point out that mutual takeover is simply two resource groups shared between two nodes. Introduction to HACMP for AIX 1-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Additional information — Transition statement — Some clusters have applications that are active simultaneously on multiple nodes.V4. and be prepared to switch to another service IP address should the one that they’re dealing with stop functioning (presumably. each using a separate IP Address A A A A A A If nodes fail. This has the potential for essentially zero downtime. Fixed nodes resume running their copy of the application. the application remains continuously available as long as there are surviving nodes to run on. all nodes run a copy of the application and share simultaneous access to the disk.Instructor Guide C Concurrent: Multiple active nodes USA. Concurrent: multiple active nodes AU548. The client systems must be configured to randomly (or otherwise) select which service IP address to communicate with. © Copyright IBM Corporation 2008 Figure 1-24. It is also possible to configure an IP multiplexer between the clients and the cluster which redistributes the client 1-64 HACMP Implementation © Copyright IBM Corp.0 Notes: Concurrent mode HACMP also supports resource groups in which the application is active on multiple nodes simultaneously. This style of cluster is often referred to as a concurrent access cluster or concurrent access environment. because the node with the service IP address has failed). 1998. each node has its own service IP label. Service labels Since the application is active on multiple nodes. Application must be designed to run simultaneously on multiple nodes. In such a resource group. . and UK are all running Application A. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Germany. © Copyright IBM Corp. 1998.0 Instructor Guide Uempty sessions to the cluster nodes. . How to choose Whether this mode of operation can be used for your application is a function of the application. not of HACMP.V4. although care must be taken to ensure that the IP multiplexer does not itself become a single point of failure. 2008 Unit 1. Introduction to HACMP for AIX 1-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. all nodes run the same application as the same point in time. accessing the same data on shared disk. Transition statement — As we consider the implementation details. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Point out that in a concurrent access cluster. Additional information — Be prepared to discuss some of the ways that client systems can select which server to use and switch to another server when their server fails (see student notes). 1998.Instructor Guide Instructor notes: Purpose — Explain concurrent access clusters. what should we be thinking about? 1-66 HACMP Implementation © Copyright IBM Corp. V4.0 Notes: Importance of planning Planning. 1998. configuring. and operating a successful HACMP cluster requires considerable attention to detail. Points to ponder AU548. 2008 Unit 1. © Copyright IBM Corporation 2008 Figure 1-25. In fact. Methodical approach A careful methodical approach considers the relevant points above. © Copyright IBM Corp. .0 Instructor Guide Uempty Points to ponder Resource groups: – Must be serviced by at least two nodes – Can have different policies – Can be migrated (manually or automatically) to rebalance loads Clusters: – – – – Must have at least one IP network and one non-IP network Need not have any shared storage Can have any combination of supported nodes * Can be split across two sites • Might or might not require replicating data (HACMP/XD). and many other issues that are discussed this week or that are discussed in the HACMP documentation. designing. a careful methodical approach to all the phases of the cluster’s life-cycle is probably the most important factor in determining the ultimate success of the cluster. Applications: – Can be restarted via monitoring – Must be manageable via scripts (start/restart and stop) * Application performance requirements and other operational issues almost certainly impose practical constraints on the size and complexity of a given cluster. testing. Introduction to HACMP for AIX 1-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — List some of the many issues that must be considered when planning an HACMP cluster. . 1998. what other very important things should be considered? 1-68 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Along those same lines. it’s too often the case that there isn’t enough time to do it right first time. but always time enough to do it over when things go wrong. Remember the reason why we worry about node failures and disk failures and such is not because we are particularly concerned with their actual failure.0 Instructor Guide Uempty Other considerations for HACMP Design. testing Focus on service and availability Apply appropriate risk analysis Disciplined system administration practices – Documented operational procedures Systems Management People High availability Data Networking Continuous availability Continuous operation Hardware Environment Software © Copyright IBM Corporation 2008 Figure 1-26. but rather we are concerned with the impact that their failure might have. Other considerations for HACMP AU548. © Copyright IBM Corp. planning. As you’ll learn this week. Time well spent in these areas of the project reduces the amount of unneeded administration time required to manage your cluster solution. and testing are all critical steps that cannot be skipped when implementing a high-availability solution. 1998.V4. planning. planning. 2008 Unit 1. Unfortunately. there should be no shortage of time spent designing. Introduction to HACMP for AIX 1-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. planning. testing Design.0 Notes: Design. and documenting your proposed cluster solution. . How much availability is required and what is the acceptable likelihood of a long outage? d. if there is a genuine risk that someone could die. They are interested in availability of services. f. What does it take to reduce the risk or consequence to an acceptable level? g. do not get trapped into a mode of thinking in which all conceivable risk of outages must be eliminated. create a budget. Identify and quantify risks. Identify relevant policies. • Estimating the likelihood of a vulnerability occurring can be extremely difficult. Evaluate counter measures. in general. h. c. Hypothesize vulnerabilities. Risk analysis involves the following steps: a. What existing risk tolerance policies are available? b. discipline and documentation are required. 1-70 HACMP Implementation © Copyright IBM Corp. So hardware and software should be used to make the services highly available. Finally. • Some vulnerabilities do not lend themselves to any sort of quantifiable analysis. then the cost of this sort of failure would be irrelevant in any meaningful sense. Users are not interested in highly available hardware or software. Such a goal is. An example would be that the server room is on a properly sized UPS but there is no disk mirroring today. Study current environment. Do not be fooled by the apparent determinism (that is. Make decisions. What could go wrong? e. The process can be applied to identify those that must be dealt with as well as those that can be tolerated. Cluster design decisions should be based on whether they contribute to availability (that is. 1998. the formula that always seems to come up with an answer) of risk analysis: • It simply is not possible to predict all the possible or even likely vulnerabilities. Perform requirements analysis. the risk analysis process can be used for deciding if a defensive measure is warranted. and plan the cluster. simply impossible to attain with any technology.Instructor Guide Focus on service and availability Focus on making the service highly available and view the hardware and software as the tools that you use in accomplishing this goal. For example. it is very easy to do commands that interfere with availability software or to not propagate changes or to have a person take over that does not understand the cluster environment. Estimate the cost of a failure versus the probability that it occurs. . eliminate a SPOF) or detract from availability (gratuitously complex) Apply appropriate risk analysis Because it is probably not possible to fix all SPOFs. So. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Disciplined system administration practice In a cluster environment. . This notion leads to some rather convoluted language and circular reasoning. administrators tend to spend a great deal of their time trying to keep particular servers running. planning.V4. cut once.0 Instructor Guide Uempty Instructor notes: Purpose — Explain that thorough design. Transition statement — Let’s focus for a minute on things that HACMP does not do. 1998. documentation. Remember the carpenter’s saying “mark twice. (I am adding a second server to make the first server highly available but now how do I make the second server highly available?) Users really are not interested in highly available hardware. Introduction to HACMP for AIX 1-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and testing are all critical to successful implementation. Details — Emphasize to the students that these considerations are absolutely critical to successful implementation and cannot be skipped. © Copyright IBM Corp. 2008 Unit 1. Additional information — Presenting this perspective in an early-stage cluster planning session tends to get everyone oriented in the same general direction.” Many students enter this course with the idea that they are trying to make their server highly available. Focus on making the service highly available. and can forget that the reason they do that is to keep the service running on the server available. Unfortunately. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. neither does it keep time in sync between the cluster nodes. Tivoli Storage Manager for back-up and a time protocol.Instructor Guide Things HACMP does not do Back-up and restoration Time synchronization Application specific configuration System administration tasks unique to each node © Copyright IBM Corporation 2008 Figure 1-27. 1-72 HACMP Implementation © Copyright IBM Corp.0 Notes: Things HACMP does not do HACMP does not automate your back-ups. . Things HACMP Does Not Do AU548. 1998. such as xntp for time synchronization. These tasks do require further configuration and software. for example. . including back-ups. time synchronization. Details — Briefly list a few administrative tasks that HACMP does not help with. Additional information — Transition statement — There are certain circumstances under which HACMP is not the right solution.V4. 1998. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Outline some of the administrative tasks that HACMP does not help with. Introduction to HACMP for AIX 1-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 1. and configuration of application-specific requirements. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. implementation. – Availability 7x24x365. Security issues One security issue that is now addressed is the need to eliminate . – HACMP tends to reduce the availability of poorly managed systems. Also HACMP is not designed to handle many failures at once. Unstable environments The prime cause of problems with HACMP is poor design. 1998. Unstable environments – HACMP cannot make an unstable and poorly managed environment stable. with poorly trained 1-74 HACMP Implementation © Copyright IBM Corp.0 Notes: Zero downtime An example of zero down time is the intensive care room. Security issues – Too little security • Many people can change the environment. and administration.Instructor Guide When is HACMP not the correct solution? Zero downtime required – Maybe a fault tolerant system is the correct choice.rhost files. – Life-critical environments. planning. HACMP occasionally needs to be shut down for maintenance. © Copyright IBM Corporation 2008 Figure 1-28. If you have an unstable environment. – Too much security • C2 and B1 environments might not allow HACMP to function as designed. When is HACMP not the correct solution? AU548. . but this might not be enough for some security environments. Also there is better encryption possible with inter node communications. and a lack of change control. easy access to the root password.0 Instructor Guide Uempty administrators.V4. With HACMP. . HACMP is not the solution for you. customize. 1998. configure. and administer the cluster is employing an amateur. install. 2008 Unit 1. Introduction to HACMP for AIX 1-75 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Other characteristics of poorly managed systems are: Lack of change control Failure to treat cluster as single entity Too many cooks Lack of documented operational procedures © Copyright IBM Corp. design. the only thing more expensive than employing a professional to plan. . 1-76 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain the circumstances when HACMP is not the right solution for you. Additional information — Transition statement — Let’s start to look now at what we will do this week. Details — Emphasize that training is an exceptionally important requirement for the staff who will be supporting the HACMP environment. 0 Instructor Guide Uempty What do we plan to achieve this week? Your mission this week is to build a two-node mutual takeover highly available cluster using two previously separate AIX systems. configure. customize. Introduction to HACMP for AIX 1-77 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 on an AIX system.0 Notes: Goals During this week you will design. Although this is not a recommended configuration for production. 1998. . and administer a two-node high-availability cluster running HACMP 5. it provides the necessary components for a fruitful HACMP configuration experience. You will learn how to build a standby environment for one application as well as a mutual takeover environment for two applications. In the mutual takeover environment.V4.4. What do we plan to achieve this week? AU548. A B A B © Copyright IBM Corporation 2008 Figure 1-29. each system will eventually be running its own highly available application. and providing fallover back-up for the other system. 2008 Unit 1. plan. each of which has an application which needs to be made highly available. Some classroom environments will involve creating the cluster on a single pSeries system between two LPARs. © Copyright IBM Corp. Instructor Guide Instructor notes: Purpose — Introduce the case study scenario for this week. 1-78 HACMP Implementation © Copyright IBM Corp. This page is just designed to set the scene. Details — No need to dwell on this page. Additional information — Transition statement — OK. just emphasize the fact that each team builds a two-node cluster. with each node providing fallover back-up for the other. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The scenario is covered in more detail later on. let’s start to look at an overview of the planning and implementation process. . • Configure the topology and resource groups (and resources). name resolution via /etc/hosts and service address for the application. then start HACMP – Note: If using two nodes and one application “Configure the HACMP environment” can be done in one step.0 Instructor Guide Uempty Overview of the implementation process Plan and configure AIX – – – – Elimination of single points of failure Storage (adapters. © Copyright IBM Corporation 2008 Figure 1-30. LVM volume group. Overview of the implementation process AU548.For networks. policies • Resources: Application Server. nodes.0 Notes: Implementation process The process should include at least the following: • Work as a team. devices. service label. filesystem) Networks (IP interfaces. © Copyright IBM Corp. It cannot be stressed enough that it will be necessary to work with others when you build your HACMP cluster in your own environment. /etc/hosts. . non-IP networks. . • Look at the AIX environment. . VG.For storage. plan for adapters and LVM components required for application. Introduction to HACMP for AIX 1-79 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. node names. and devices) Application start and stop scripts Install the HACMP filesets (Note: 5. 1998. plan and for communication interfaces. . • Install the HACMP for AIX software and reboot. HACMP IP and non-IP networks – Resources and Resource groups: • Identify name. 2008 Unit 1. Practice here will be useful. filesystem – Synchronize.3 and earlier reboot!) Configure the HACMP environment – Topology • Cluster.For application build start and stop script and test outside of the control of HACMP. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide • Synchronize. 1998. start. and test. . 1-80 HACMP Implementation © Copyright IBM Corp. Transition statement — So. .V4. Introduction to HACMP for AIX 1-81 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. how do we get started? © Copyright IBM Corp. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Go over the implementation process Details — Additional information — Point out that each item in the list will be covered as a unit in this course in the order listed. 2008 Unit 1. • Be methodical. hdisk6. • Document test plan. Ensure that you have multipath access to shared storage devices. • Document a test plan.9. /dbdata © Copyright IBM Corporation 2008 AU548.1 = yes = b_tmssa = /dev/tmssa2 = a_tty = /dev/tty1 Draw a diagram.0 192. • Try to reduce SPOFs.255.255.3 255. .255.168.254.0 Public Network Node Name Resource group Applications Resources A-B Priority CWOF Label Device = nodea = dbrg = database = cascading = 1.9.255.0 192. Use (online) planning sheets. • Always include a non-IP network. Focus on eliminating SPOFs.168.Instructor Guide Hints to get started HACMP Cluster for the ABC company user community hints Node A Service Boot Standby IP Label database nodeaboot nodeastand IP Address Netmask 192. Figure 1-31.255. dblv2 FS Mount Point = /db.1GB VG = dbvg Raid5 100GB Resource Group httprg contains Volume Group = httpvg hdisk2.9.254. 1998.6 255.3 255.0 Node A Service Boot Standby IP Label webserv nodebboot nodebstand IP Address Netmask 192.255. • Execute the test plan prior to placing the cluster into production! 1-82 HACMP Implementation © Copyright IBM Corp.168.9.255. hdisk5.168.5 255. • Access storage over multiple paths or mirror across power and buses. a thousand is not enough!). • Be methodical.4 255.3 255.2 = yes tmssa network = a_tmssa = /dev/tmssa1 = a_tty = /dev/tty1 serial network Node Name Resource group Applications Resources B-A Priority CWOF Label Device Label Device =nodeb = httprg = http = cascading = 2.255. • Test the cluster carefully. HACMP also provides test scripts called auto test. hdisk4.1GB VG =httpvg Raid1 9GB rootvg raid1 9.hdisk8 Major # = 50 JFS Log = httplvlog Logical Volume = httplv FS Mount Point = /http Resource Group databaserg contains Volume Group = dbvg hdisk3. They can be used without installing HACMP and can be used to generate AND save HACMP configurations. • Use the Online Planning Worksheets.0 Notes: Hint • Create a cluster diagram--a picture is worth 10 thousand words (because of inflation.255.255.0 192.255.0 192. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.255.168. hdisk7 Major # = 51 JFS Log = dblvlog Logical Volume = dblv1. Always factor in a non-IP network.168. Hints to get started • • • • • Label Device rootvg raid1 9. 2008 Unit 1. Transition statement — What are other sources of information? © Copyright IBM Corp. .0 Instructor Guide Uempty Instructor notes: Purpose — Go over planning activities. Details — Additional information — Online planning worksheets can now be used to configure HACMP and to save HACMP configuration beyond initial planning in HACMP 5.3 and later. 1998.V4. Introduction to HACMP for AIX 1-83 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4.4.1: Planning Guide SC23-4862-10 HACMP for AIX.4.html ftp://ftp.html cluster.uk http://portal.1: Installation Guide SC23-4861-10 HACMP for AIX. .com/systems/p/ha/ Non-IBM sources (not endorsed by IBM but probably worth a look): – – – – http://lpar. I: Planning and Implementation (AU540/AU54) HACMP Admin II: Admin.ibm.de/ http://www.pdf 1-84 HACMP Implementation © Copyright IBM Corp. and Problem Determination (AU610/AU61) HACMP Administration III: Virtualization and Disaster Recovery (AU620/AU62) HACMP V5 Internals (AU60) IBM Web site: – http://www-03.com/servers/eserver/pseries/library/hacmp_docs.com http://www-1. 1998.co.ibm.com/group/hacmp/ © Copyright IBM Corporation 2008 Figure 1-32. Version 5.storage.ibm.ibm.yahoo.1: Troubleshooting Guide Additional Web sites for storage http://www.1: Concepts and Facilities Guide SC23-5209-01 HACMP for AIX.en_US.software.1: Master Glossary SC23-4864-10 HACMP for AIX. Version 5.com/hacmp/ http://groups.1 manuals are: SC23-4867-09 HACMP for AIX.1: Administration Guide SC23-5177-04 HACMP for AIX. Version 5.4. Version 5.doc.es.es.matilda.com/common/ssi IBM courses: – – – – HACMP Admin.html /usr/es/sbin/cluster/release_notes HACMP documentation also available online Release Notes contain important information about the version release Sales manual: http://www.ibm.doc. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.explico.en_US.ibm.4.com/servers/storage/support/software/sdd.0 Notes: Manuals on CD The HACMP 5. Sources of HACMP information AU548.4. Version 5.Instructor Guide Sources of HACMP information HACMP manuals come with the product – – – – cluster. Version 5.pdf http://www.com/storage/fastt/fastt500/HACMP_config_info.4. V4. © Copyright IBM Corp. Details — Additional information — Transition statement — Well now it’s time to see how we did. 1998. Introduction to HACMP for AIX 1-85 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .0 Instructor Guide Uempty Instructor notes: Purpose — Go over sources of information. 2008 Unit 1. System Administration tasks unique to each node. 2. © Copyright IBM Corporation 2008 Figure 1-33. . 3. Checkpoint AU548. Time synchronization b. back-up and restoration d. Which of the following capabilities does HACMP not provide? (Select all that apply. Automatic recovery from node and network adapter failure c.) a. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Checkpoint 1. True or False? All nodes in a resource group must have equivalent performance characteristics. Fallover of just a single resource group 4.0 Notes: 1-86 HACMP Implementation © Copyright IBM Corp. True or False? HACMP/XD is a complete solution for building geographically distributed clusters. True or False? Resource Groups can be moved from node to node. back-up and restoration d. Introduction to HACMP for AIX 1-87 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corporation 2008 Additional information — Transition statement — Time for the unit summary. True or False? HACMP/XD is a complete solution for building geographically distributed clusters.V4. © Copyright IBM Corp. 2008 Unit 1.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Checkpoint solutions 1. . Fallover of just a single resource group 4. 1998. Automatic recovery from node and network adapter failure c. 3. True or False? Resource Groups can be moved from node to node. True or False? All nodes in a resource group must have equivalent performance characteristics. Which of the following capabilities does HACMP not provide? (Select all that apply.): a. System Administration tasks unique to each node. Time synchronization b. 2. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: 1-88 HACMP Implementation © Copyright IBM Corp. 1998. you should be able to: Define high availability and explain why it is needed Outline the various options for implementing high availability List the key considerations when designing and implementing a high availability cluster Outline the features and benefits of HACMP for AIX Describe the components of an HACMP for AIX cluster Explain how HACMP for AIX operates in typical cases © Copyright IBM Corporation 2008 Figure 1-34.Instructor Guide Unit summary Having completed this unit. Unit summary AU548. . V4. 2008 Unit 1. 1998. .0 Instructor Guide Uempty Instructor notes: Purpose — Summarize the unit Details — Additional information — Transition statement — © Copyright IBM Corp. Introduction to HACMP for AIX 1-89 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide 1-90 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 1: Administration Guide SC23-5177-04 HACMP for AIX. you should be able to: • • • • • • • Discuss how HACMP uses networks Describe the HACMP networking terminology Explain and configure IP Address Takeover (IPAT) Configure an IP network for HACMP Configure a non-IP network Explain how client systems are likely to be affected by HACMP Minimize the impact of failure recovery on client systems How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX.1: Master Glossary http://www-03.4.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.V4. . Version 5. You learn which networks are supported in an HACMP cluster and what you have to take into consideration for planning it.com/systems/p/library/hacmp_docs.4. Networking considerations for high availability Estimated time 03:00 What this unit is about This unit describes the HACMP functions related to networks.html HACMP manuals © Copyright IBM Corp. Version 5. Version 5.4. 1998. Version 5.0 Instructor Guide Uempty Unit 2.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX.1: Planning Guide SC23-4862-10 HACMP for AIX.4. What you should be able to do After completing this unit. Networking considerations for high availability 2-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Installation Guide SC23-4864-10 HACMP for AIX.4.4. Version 5. Version 5. 2008 Unit 2.ibm. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2-2 HACMP Implementation © Copyright IBM Corp. you should be able to: Discuss how HACMP uses networks Describe the HACMP networking terminology Explain and set up IP Address Takeover (IPAT) Configure an IP network for HACMP Configure a non-IP network Explain how client systems are likely to be affected by failure recovery Minimize the impact of failure recovery on client systems © Copyright IBM Corporation 2008 Figure 2-1. Unit objectives AU548.Instructor Guide Unit objectives After completing this unit.0 Notes: Unit objectives This unit discusses networking in the context of HACMP. . V4.0 Instructor Guide Uempty Instructor notes: Purpose — To tell the students what we will talk about in this unit. . Details — Additional information — Transition statement — Let’s have a first look at how HACMP uses networks. 2008 Unit 2. © Copyright IBM Corp. Networking considerations for high availability 2-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide 2-4 HACMP Implementation © Copyright IBM Corp. . Networking considerations for high availability 2-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Let’s Review and Checkpoint questions. They will get an introduction to IP Address Takeover.0 Instructor Guide Uempty 2. What students will learn — Students will learn: • How HACMP uses networks to detect and diagnose failures as well as providing clients with a highly available access to applications and HACMP internode communications • Why a non-IP network is essential • HACMP network terminology • The basics of IP Address Takeover How this will help students on their job — This will help them when planning and implementing an HACMP cluster. How students will do it — The objectives are covered through lecture.V4. and pencil and paper and hands-on lab exercises. © Copyright IBM Corp. 1998.1 How HACMP uses networks Instructor topic introduction What students will do — The students will learn how HACMP uses networks. which will be expanded in later topics in this unit. 2008 Unit 2. you should be able to: Explain how HACMP uses networks to: – – – – Provide client access to the cluster Detect failures Diagnose failures Communicate with other nodes in the cluster Explain why a non-IP network is an essential part of any HACMP cluster © Copyright IBM Corporation 2008 Figure 2-2. where application addresses are relocated when failures occur will be looked at in more detail in a later section. 2-6 HACMP Implementation © Copyright IBM Corp.Instructor Guide How HACMP uses networks After completing this topic. The HACMP concept of IP Address Takeover (IPAT). . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. How HACMP uses networks AU548.0 Notes: Topic 1 objectives This topic explores how HACMP uses networks. 1998. V4. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — To explain what we discuss in this section. Details — Additional information — Transition statement — Let’s take a look at the three ways that HACMP uses networks. . 1998. 2008 Unit 2. Networking considerations for high availability 2-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Notes: Network design for availability To design a network that supports high availability using HACMP. Client access to applications From the users’ perspective. network. we must understand how HACMP uses networks. 1998. and NIC failures Communicate with other HACMP daemons on other nodes in the cluster 1 en0 en1 en0 en1 2 RSCT RSCT 3 clcomd clcomd © Copyright IBM Corporation 2008 Figure 2-3. 2. satisfying this requirement for client access to the cluster involves a bit more than just plugging in a network cable. Provide clients with highly available access to the cluster's applications Detect and diagnose node. 3. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide How does HACMP use networks? HACMP uses networks to: 1. . Detection and diagnosis of failures In contrast. the only reason that the cluster is on a network is that the network provides them with access to the cluster’s highly available applications. How does HACMP use networks? AU548. As we see. the fact that HACMP uses the networks to detect and diagnose various failures is likely to be of considerably more interest to the cluster designers and 2-8 HACMP Implementation © Copyright IBM Corp. network. 1998. The clcomd daemon manages the connection authentication between nodes and any message authentication or encryption configured. and NIC failures imposes several requirements on how the networks are designed. this last use does not impose any additional requirements on the network design. and event notification. Reliable Scalable Cluster Technology (RSCT) provides facilities for monitoring node membership. and coordination via reliable messaging. Just being able to detect node. © Copyright IBM Corp. clcomd. synchronization. . Networking considerations for high availability 2-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. network interface and communication interface health. that runs on each node. All communication between nodes is sent through the Cluster Communications daemon. imposes yet more requirements on the network design. Assuming that the requirements imposed by the first two uses are properly satisfied. HACMP internode communications The final way in which HACMP uses networks to communicate with HACMP daemons running on other nodes in the cluster is rather mundane.0 Instructor Guide Uempty administrators.V4. Being able to distinguish between certain failures (for example the failure of a network and the failure of a node). 2008 Unit 2. 1998. Try to avoid using terms such as IP networks because HACMP also requires non-IP networks. 2-10 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — To list the three ways that HACMP uses networks. a requirement that is best left unmentioned at this point in the course. Details — This visual is deliberately vague regarding network types or technologies. Additional information — Transition statement — Let’s see what it takes to provide the users with highly available access to the cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Networking considerations for high availability 2-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Providing HA client access to the cluster Providing clients with highly available access to the cluster's applications requires: – Multiple physical NICs per network per node • Virtual Ethernet is supported with a single interface – (Possibly) multiple networks per node – Careful network design and implementation all the way out to the client's systems © Copyright IBM Corporation 2008 Figure 2-4. © Copyright IBM Corp. For etherchannel or virtual ethernet configurations.cf.V4. the resource group) is moved to another node. Providing HA client access to the cluster AU548. That file is called the netmon. The use of a special file that provides additional addresses for diagnosis processing is necessary. 1998.0 Notes: Network interface card and single point of failure When using physical networking and not Etherchannel (more on these topics in a few visuals). 2008 Unit 2. . each cluster node requires at least two NICs per network. To achieve that. the norm is to have only a single interface in the network. The alternative is that the loss of a single NIC would cause a significant outage while the application (that is. We will see more on that in a few visuals. the goal is to avoid the NIC being a single point of failure. then the cluster requires at least two networks. Eliminating the network as a SPOF If the network as a SPOF must be eliminated. a single point of failure because the failure of the network will disrupt the users’ ability to communicate with the cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . of course. an approach that is often considered sufficient. Truly eliminating the network as a SPOF can become a massive undertaking. Importance of careful network design In the end. and cabling-each of which typically represents yet another SPOF. Unfortunately. It is not unusual for the users to be located some number of hops away from the cluster. Failure to perform this design and implementation activity properly could easily become a crippling issue when the cluster is put into production. Each of these hops involves routers. 1998. Most organizations that are concerned about the network as a SPOF usually compromise by designing the network to ensure that no single failure deprives all key users of their access to the cluster.Instructor Guide Network as SPOF The network itself is. this only eliminates the network directly connected to the cluster as a SPOF. 2-12 HACMP Implementation © Copyright IBM Corp. The probability of this SPOF being an issue can be reduced by careful network design. switches. there is simply no replacement for careful network design and implementation all the way out to the users. Remember to point out that each client system requires two NICs. 2008 Unit 2. There’s a visual on this issue of highly available networks towards the end of the unit.V4. 1998. Networking considerations for high availability 2-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. But also be prepared to have a discussion about virtual ethernet and Etherchannel. Additional information — Transition statement — Let’s recall exactly which failures HACMP detects and diagnoses. You can put most of it off until later in the unit when those topics are covered. so you can also choose to defer any detailed discussion of the issue until then. Details — You might find it useful to sketch out just how many redundant components are required to provide a SPOF-free network all the way out to distant users.0 Instructor Guide Uempty Instructor notes: Purpose — To examine what requirements result from the need to provide the users with highly available access to the cluster. . © Copyright IBM Corp. . HACMP makes a determination of what type of failure this is and takes appropriate action.Network failures 2-14 HACMP Implementation © Copyright IBM Corp. By gathering heartbeat information from multiple NICs and non-IP devices on multiple nodes. Using the information from RSCT.NIC failures . Actually. What HACMP detects and diagnoses AU548. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the only thing that RSCT can detect is the loss of heartbeat packets. RSCT sends heartbeats over IP and non-IP networks.0 Notes: Failures that HACMP handles directly HACMP uses RSCT to detect failures. HACMP only handles the following failures directly: – NIC failure – Node failure – Network failure IP network en0 en1 en0 en1 non-IP network uk usa © Copyright IBM Corporation 2008 Figure 2-5. HACMP handles only three different types of failures: .Node failures .Instructor Guide What HACMP detects and diagnoses Remember. 1998.V4. but HACMP is not directly involved in detecting these other types of failures. © Copyright IBM Corp.0 Instructor Guide Uempty Other failures HACMP uses AIX features to respond to other failures (for example. Networking considerations for high availability 2-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the loss of a volume group can trigger a fallover). . 2008 Unit 2. node. .Instructor Guide Instructor notes: Purpose — To reinforce the three failure types that HACMP handles directly. or network. nodes and networks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2-16 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — To detect the failure of a NIC. HACMP must monitor the cluster’s NICs. the heartbeat packets are sent in the pair-wise fashion shown above. 2008 Unit 2. Heartbeat packets AU548. Networking considerations for high availability 2-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Heartbeat packets are not acknowledged. Heartbeating pattern In a typical two-node cluster with two NICs on the network. Heartbeat packets are sent and received by every NIC. and network failures. . The cluster sends heartbeat packets from every NIC and to every NIC and to and from non-IP devices. node. en0 en1 en0 en1 usa Application Data © Copyright IBM Corporation 2008 uk Figure 2-6. This is sufficient to detect all NIC. The pattern gets more complicated when the cluster gets larger as HACMP uses a pattern that is intended to satisfy three requirements: . 1998.V4.0 Instructor Guide Uempty Heartbeat packets HACMP sends heartbeat packets across networks.0 Notes: Heartbeat packets HACMP’s primary monitoring mechanism is to send heartbeat packets.That each NIC be used to send heartbeat packets (to verify that the NIC is capable of sending packets) © Copyright IBM Corp. That no more heartbeat packets are sent than are necessary to achieve the first two requirements (to minimize the load on the network) The details of how HACMP satisfies the third requirement are discussed in a later unit. Noticing that the expected heartbeat packets have stopped arriving is sufficient to detect failures.That heartbeat packets be sent to each NIC (to verify that the NIC is capable of receiving heartbeat packets) . 2-18 HACMP Implementation © Copyright IBM Corp. . 1998. each node knows what the heartbeat pattern is and simply expects to receive appropriate heartbeat packets on appropriate network interfaces. Instead. Detecting failures Heartbeat packets are not acknowledged. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide . V4. Details — Additional information — Transition statement — When a failure has been detected. 1998. HACMP must diagnose what has failed. © Copyright IBM Corp. . Networking considerations for high availability 2-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — To explain how HACMP uses heartbeat packets to monitor the cluster. 2008 Unit 2. and uk stops receiving heartbeat packets via its en1 interface. 1998. . 2-20 HACMP Implementation © Copyright IBM Corp. They are not sufficient to diagnose a failure in the sense of figuring out exactly what is broken. figuring out that usa's en1 NIC has failed HACMP uses RSCT to do both detection and diagnosis. For example. realizing that packets have stopped flowing between usa's en1 and uk's en1 Failure diagnosis is figuring out what is wrong. if the en1 interface on the usa node fails as in the visual above.Instructor Guide Failure detection versus failure diagnosis Failure detection is realizing that something is wrong. Usa and uk both realize that something has failed. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. – For example. Failure detection versus failure diagnosis AU548. usa stops receiving heartbeat packets via its en1 interface. en0 en1 en0 en1 usa Application Data uk © Copyright IBM Corporation 2008 Figure 2-7. – For example.0 Notes: Diagnosis The heartbeat patterns just discussed are sufficient to detect a failure in the sense of realizing that something is wrong. but neither of them has enough information to determine what has failed. and to make it clear that a heartbeat pattern that is sufficient to perform detection is not necessarily sufficient to perform diagnosis.V4. 2008 Unit 2. . Networking considerations for high availability 2-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s have a closer look at failure diagnosis. © Copyright IBM Corp. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — To explain the difference between failure detection and failure diagnosis. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The nodes soon realize that all packets involving usa's en1 are vanishing but packets involving uk's en1 are being received. 3.Instructor Guide Failure diagnosis When a failure is detected. 2. These diagnostic packet patterns can be considerably more network-intensive than the normal heartbeat traffic. they share information and plan a diagnostic packet pattern or series of patterns. RSCT on usa notices that heartbeat packets are no longer arriving via en1 and notifies uk (which has also noticed that heartbeat packets are no longer arriving via its en1).0 Notes: Diagnostic heartbeat patterns When one or more cluster nodes detect a failure. Example: 1. diagnose) the actual failure by ruling out other alternatives. RSCT on both nodes send diagnostic packets between various combinations of NICs (including out via one NIC and back in via another NIC on the same node). © Copyright IBM Corporation 2008 Figure 2-8. although. 1998. 2-22 HACMP Implementation © Copyright IBM Corp. 4. Failure diagnosis AU548. HACMP (RSCT topology services) uses specially crafted packet transmission patterns to determine (that is. Diagnosis: usa's en1 has failed. they usually only take a few seconds to complete the diagnosis of the problem. which will diagnose the failure. 0 Instructor Guide Uempty Instructor notes: Purpose — To provide a simple example of failure diagnosis. Additional information — Transition statement — Let’s see what happens if all communication is lost with a node. 1998. © Copyright IBM Corp. . 2008 Unit 2. Networking considerations for high availability 2-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Details — Avoid discussing other scenarios in much details for another visual or two. © Copyright IBM Corporation 2008 Figure 2-9. When it determines that it is totally unable to communicate with the other node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. it is impossible for either node to distinguish between failure of the network and failure of the other node. but currently resident on the other node. What if all heartbeat packets stop? AU548. Each node concludes that the other node is down! en0 en1 en0 en1 usa Application Data uk Result is a partitioned cluster and likely data divergence.0 Notes: Total loss of heartbeat traffic If a node in a two-node cluster realizes that it is no longer receiving any heartbeat packets from the other node. Each node then proceeds to take over any resource groups configured to be able to run on both nodes. .Instructor Guide What if all heartbeat packets stop? A node might notice that heartbeat packets are no longer arriving on any NIC. 1998. In the following configuration. 2-24 HACMP Implementation © Copyright IBM Corp. Both nodes try to take control In the above configuration. then it starts to suspect that the other node has gone down. if the network fails. then each node soon concludes that the other node has failed. it concludes that the other node has failed. The result is almost certainly very unpleasant. If the shared disks are also online to both nodes. a five-node cluster might become split into a group of two nodes and a group of three nodes. Networking considerations for high availability 2-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. then the result could be a massive data corruption problem. For example. Note that essentially equivalent situations can occur in larger clusters.0 Instructor Guide Uempty Partitioned cluster Because each node is. in fact.V4. clearly. © Copyright IBM Corp. the result is that the applications are now running simultaneously on both nodes. 1998. . Each group concludes that the other group has failed entirely and takes what it believes to be appropriate action. It is. a situation that must be avoided. 2008 Unit 2. This situation is called a partitioned cluster. still very alive. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain what happens when all communication is lost between nodes or groups of nodes. 1998. . Details — Additional information — Transition statement — Let’s see how we avoid partitioned clusters. 2-26 HACMP Implementation © Copyright IBM Corp. ALL CLUSTERS SHOULD HAVE A NON-IP NETWORK! en0 en1 en0 en1 non-IP network Application Data usa © Copyright IBM Corporation 2008 uk AU548.V4. but it is definitely practically accurate that one is required. Networking considerations for high availability 2-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. then every cluster must be configured with at least two ways for nodes to communicate with each other. you do not have to configure a non-IP network. if a partitioned cluster is to be avoided. 1998. Why we need more than one network Distinguishing between the failure of the network and the failure of the other node requires that there be a path between the two nodes that does not involve the network in question.0 Instructor Guide Uempty CRITICAL: All clusters require a non-IP network There must be more than one network to distinguish between: – Failure of the other node – Failure of a network There must be a non-IP network to distinguish between: – Failure of the other node's IP subsystem – Total failure of the other node Therefore.0 Figure 2-10. © Copyright IBM Corp. But for the reasons outlined as follows. That is why the title indicates required. 2008 Unit 2. . while the content of the visual indicates should. you will want to implement at least one non-IP network and possibly more. So it is not technically accurate that a non-IP network is required. Consequently. CRITICAL: All clusters require a non-IP network Notes: Required? To be completely accurate. the term serial network must only be used to refer to the RS-232 type. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Both IP and non-IP networks are used Many untrained people seem to assume that the non-IP network is for heartbeating with the implication possibly being that the IP networks are not used for heartbeating or that the non-IP networks are not used for heartbeating. Therefore. and use the specific network type (RS-232 or heartbeat on disk). that does not require IP to be operational. This is because the predominant non-IP network was the RS-232 type. every cluster must be configured with a way to communicate between nodes. . when referring to the type of network to be implemented. IP and non-IP. 2-28 HACMP Implementation © Copyright IBM Corp. Every cluster must be configured with enough non-IP networks to ensure that any node can communicate with every other node (possibly by asking an intermediate node to pass along messages) without requiring any nodes’ IP subsystem to be operational. 1998. use the term non-IP network to refer to the concept of a network between nodes that does not involve IP (today that generally means heartbeat on disk or RS-232). Neither implication is true. With the advent and ease of configuration of heartbeat on disk. To distinguish between the failure of the IP subsystem on a node and the failure of the node itself. it is also possible for the entire IP subsystem to fail on a node without the node crashing. Both IP and non-IP networks are needed These pathways that do not require IP are called non-IP networks.Instructor Guide Why we need a non-IP network Although rather unlikely. Terminology: serial networks versus non-IP networks Older HACMP documentation generally refers to these non-IP networks as serial networks. HACMP sends heartbeat packets across all configured networks. HACMP uses any available network to communicate with other cluster nodes. Details — Be very careful to not leave any doubt in the students’ minds about the necessity to configure non-IP networks. .V4. 2008 Unit 2.0 Instructor Guide Uempty Instructor notes: Purpose — Explain why each cluster requires non-IP networks. The students must understand that they are NOT optional! Additional information — Transition statement — Getting heartbeating to work properly imposes one more requirement on the network configuration. 1998. Networking considerations for high availability 2-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 1. Because this is incompatible with HACMP’s requirement that HACMP be able to dictate which NIC is to be used to send heartbeat packets. We will give some examples of valid and invalid configurations later in this unit.168.168.2. The two subnet rule AU548.2 en1 192.Instructor Guide The two subnet rule HACMP must ensure that heartbeats are sent out via all NICs and know which NIC is used. then AIX can rotate which NIC is used to send packets to the network. 2-30 HACMP Implementation © Copyright IBM Corp.168. We will discuss that shortly. then the AIX kernel is allowed to use either NIC on the sending node to send the packet. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note: There is an exception to the requirement that each NIC be on a different logical IP subnet.1 en1 192.1 en0 192.2. Therefore. en0 192. like etherchannel or virtual ethernet. .2 non-IP network Note: Doesn’t apply for single adapter networks. uk usa © Copyright IBM Corporation 2008 Figure 2-11. HACMP requires that each NIC on each node be on a different logical IP subnet. If a node has multiple NICs on the same logical subnet. after we have covered the other subnetting rules. each NIC on each physical IP network on any given node must have an IP address on a different logical subnet.0 Notes: Requirements for HACMP to monitor every NIC If a node has two NICs on the same logical IP subnet and a network packet is sent to an IP address on the same logical subnet.1. 1998.168. Transition statement — Now let’s take a look at what happens when a failed component recovers. but this behavior is not usually configured.V4. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the first layer of HACMP’s subnetting rules. . Details — Additional information — You can get around this requirement by using heartbeat over alias. Networking considerations for high availability 2-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 2. en0 en1 en0 en1 en0 en1 en0 en1 usa uk usa uk © Copyright IBM Corporation 2008 Figure 2-12. Reintegration might trigger significant actions. the resource group might fall back. while the repair action is occurring. depending on how the resource group is configured. a node is not considered to have recovered until the Cluster Services has been started on the node. Failure recovery and reintegration AU548. if a node is reintegrated. Node recovery In contrast. This allows the node to be rebooted and otherwise exercised as part of the repair process without HACMP declaring failures or performing reintegration or both. – For example. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Failure recovery and reintegration HACMP continues to monitor failed components to detect their recovery. 2-32 HACMP Implementation © Copyright IBM Corp. The reintegration of a component might trigger quite significant actions. recovery of primary node will optionally trigger fallback of resource group to primary node. Recovered components are reintegrated back into the cluster. 1998. For example. which has a high priority within a resource group. then.0 Notes: NIC and network recovery NICs and networks are automatically reintegrated into the cluster when they recover. Details — Additional information — Transition statement — Let’s see how we did. © Copyright IBM Corp. 1998. 2008 Unit 2. Networking considerations for high availability 2-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the notion of reintegration.V4. . ______________. Let’s review topic 1 AU548.Communicate between cluster nodes e. and ______________.Provide client systems with highly available access to the cluster's applications b. HACMP directly handles only three types of failures: ______________. True or False? Heartbeat packets must be acknowledged or a failure is assumed to have occurred. 5. . 4. True or False? Clusters should include a non-IP network. How does HACMP use networks? (Select all that apply. © Copyright IBM Corporation 2008 Figure 2-13. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. True or False? Each NIC on each physical IP network on each node is required to have an IP address on a different logical subnet. 3.Monitor network performance 2.Detect failures c.) a. 1998. Using information from RSCT.Diagnose failures d.0 Notes: 2-34 HACMP Implementation © Copyright IBM Corp.Instructor Guide Let’s review: Topic 1 1. Diagnose failures d.) a. © Copyright IBM Corp. True or False? Each NIC on each physical IP network on each node is required to have an IP address on a different logical subnet. 2008 Unit 2. Detect failures c. True or False? Clusters should include a non-IP network. 3. Provide client systems with highly available access to the cluster's applications b. 4.V4. HACMP directly handles only three types of failures: Network interface card (NIC) failures. Details — Let’s review: Topic 1 solutions 1. © Copyright IBM Corporation 2008 Additional information — Transition statement — It is time that we got a little deeper in the HACMP concepts. Node failures. and Network failures. True or False? Heartbeat packets must be acknowledged or a failure is assumed to have occurred. 1998. terminology. and configuration rules. 5. . Communicate between cluster nodes e. Using information from RSCT.Monitor network performance 2. How does HACMP use networks? (Select all that apply.0 Instructor Guide Uempty Instructor notes: Purpose — Check that at least one student has managed to stay awake so far. Networking considerations for high availability 2-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide 2-36 HACMP Implementation © Copyright IBM Corp. What Students will learn— Students will learn: What network technologies are supported by HACMP The purpose of public and private HACMP networks HACMP networking terminology HACMP networking configuration rules How this will help students on their job — This will help them when planning and implementing an HACMP cluster. How students will do it — The objectives are covered through lecture. Let’s Review and Checkpoint questions. 1998.V4. © Copyright IBM Corp. 2008 Unit 2. . Networking considerations for high availability 2-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty 2.2 HACMP concepts and configuration rules Instructor topic introduction What students will do — The students will learn more about HACMP networking concepts. and pencil and paper and hands-on lab exercises. HACMP concepts and configuration rules AU548. terms and configuration rules in more detail. 1998. 2-38 HACMP Implementation © Copyright IBM Corp.0 Notes: Topic 2 objectives This section will explore HACMP networking concepts. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: List networks that HACMP supports Describe the different HACMP network types Describe the purpose of public and private HACMP networks Describe the topology components and their naming rules Define key networking-related HACMP terms Describe the basic HACMP network configuration rules Describe what a persistent node IP label is and its typical uses © Copyright IBM Corporation 2008 Figure 2-14. .Instructor Guide HACMP concepts and configuration rules After completing this topic. . Networking considerations for high availability 2-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. Details — Additional information — Transition statement — Let’s start with a look at which networking technologies HACMP supports. © Copyright IBM Corp. 2008 Unit 2.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the next section. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. Heartbeat on disk and rs-232 are the most prevalent.3 Ethernet frame type is not supported. – – – – FDDI Token-Ring ATM and ATM LAN Emulation SP Switch 1 and SP Switch 2 Supported non-IP network technologies: – Heartbeat over Disks (diskhb) • Requires Enhanced Concurrent Volume Group – Multinode Disk Heartbeat (mndhb) • Online on All Nodes only – RS232/RS422 (rs232) – Target Mode SSA (tmssa) – Target Mode SCSI (tmscsi) © Copyright IBM Corporation 2008 Figure 2-15. It provides a method by which multiple nodes access multiple shared logical volumes to ensure that the loss of access 2-40 HACMP Implementation © Copyright IBM Corp.3 frame type which uses et0. 1998. Multinode disk heartbeat is new with HACMP 5.1. The advantage of heartbeat on disk is that there is no need for additional hardware. et1 .Instructor Guide HACMP networking support Supported IP networking technologies: – Ethernet • All speeds • Includes etherchannel • Not the IEEE 802. Supported non-IP networks HACMP supports multiple non-IP networking technologies.. HACMP networking support AU548. .4.0 Notes: Supported IP networks HACMP supports all of the popular IP networking technologies (and a few that are possibly not quite as popular). assuming that you have shared storage and are willing to create an enhanced concurrent mode volume group. Note that the IEEE 802. No data area is used for the heartbeat on disk processing. © Copyright IBM Corp. This in turn will cause the node or nodes to stop accessing the data. 2008 Unit 2.0 Instructor Guide Uempty from a node or nodes to the rest of the cluster nodes via all routes. . IP and non-IP will be treated as a loss of quorum.V4. This is to prevent (or minimize) data corruption in the event of a domain merge (split brain). also known as concurrent resource groups). Networking considerations for high availability 2-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This is implemented only with Resource Groups that have a startup policy of Online on All Nodes (OOAN. 1998. Details — Additional information — Transition statement — Let’s look at the difference of local versus remotely attached nodes. .Instructor Guide Instructor notes: Purpose — List the IP and non-IP networking technologies supported by HACMP 5. 2-42 HACMP Implementation © Copyright IBM Corp.4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1. 1998. 0 Notes: IP networks As mentioned before. tmscsi. This attribute is not used by HACMP itself. tmssa. . 2008 Unit 2. diskhb.fddi. Network types AU548.token.Communications between HACMP daemons on different nodes . Networking considerations for high availability 2-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. hps (SP Switch or High Performance Switch) public or private Non-IP networks: – Network type: rs232.Client network traffic IP network attribute The default for this attribute is public. IP networks are used by HACMP for: .atm. 1998. © Copyright IBM Corp. See the HACMP for AIX: Planning Guide for more information.V4. Oracle uses the private network attribute setting to select networks for Oracle inter-node communications.0 Instructor Guide Uempty Network types HACMP categorizes all networks: IP networks: – – Network type: Network attribute: ether.HACMP heartbeat (failure detection and diagnosis) . mndhb © Copyright IBM Corporation 2008 Figure 2-16. Because of the nature of Virtual Ethernet. the netmon. . The VIOS support is analogous to EtherChannel in this regard. there are some considerations.If the VIO Server has multiple physical interfaces on the same network.Alternative non-IP path for HACMP heartbeat and messaging .Eliminates IP as a single point of failure 2-44 HACMP Implementation © Copyright IBM Corp. . for complete details on using virtual I/O with HACMP. netmon. This does not limit the availability of the entire cluster because VIOS itself routes traffic around the failure. HACMP’s “PCI Hot Plug” facility is not meaningful because the I/O adapters are virtual rather than physical. . 1998.cf file must be used to monitor and detect failure of the network interfaces. Note: We will discuss IPAT and HWAT in detail in the next topic in this unit. IPAT via Aliasing is recommended for all HACMP networks that can support it. other mechanisms to detect the failure of network interfaces are not effective. Non-ip networks HACMP uses non-IP networks for: . In particular. We summarize some of them as follows. .ibm.cf should include a list of clients to ping. Note that when an HACMP node is using Virtual I/O.HACMP’s “PCI Hot Plug” facility cannot be used. then a failure of that physical interface will be detected by HACMP.com/support/techdocs/atsmastr. IPAT via Replacement and Hardware Address Takeover (HWAT) are not supported.IP Address Takeover (IPAT) via Aliasing must be used.All Virtual Ethernet interfaces defined to HACMP should be treated as “single-adapter networks” as described in the Planning Guide. If the VIO Server has only a single physical interface on a network. HACMP will not be informed of (and hence will not react to) single physical interface failures.Differentiates between node/network failure . that failure will isolate the node from the network. Other methods (not based the VIO Server) must be used for providing notification of individual adapter failures. however.Instructor Guide HACMP and virtual Ethernet HACMP 5. see: http://www.nsf/WebIndex/FLASH10390 . However. In general. PCI Hot Plug operations are available through the VIO Server. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.3 and later supports virtual Ethernet in POWER5-based systems. or if two or more HACMP nodes are using VIO Servers in the same frame. Details — Go through the matrix in sufficient detail to ensure that the students understand the distinctions being made. 2008 Unit 2.AU620 and the HACMP/XD manuals for more information. If students ask.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the different network types. Transition statement — Let’s take a closer look at a typical cluster’s topology components. They are used with HACMP/XD: XD_data and XD_ip for IP networks and XD_rs232 for non-IP networks. .V4. Additional information — A separate private IP network is almost mandatory if the Oracle cluster lock manager is being used as it can generate an overwhelming amount of network traffic. Networking considerations for high availability 2-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. © Copyright IBM Corp. Refer students to HACMP System Administration III . there are three other network types that we do not mention here. IP Ne two rk IP lab el s IP addres vancouver-service 192.node An IBM system p server operating within an HACMP cluster .diskhb non IP . 1998. HACMP topology components Network Interface Card Serial Port usa node name Network Interface Card ne non tw -IP or k Communication Device non IP . .0 Notes: Terminology HACMP has quite a few special terms that are used repeatedly throughout the documentation and the HACMP smit screens.rs232 non IP .node name The name of a node from HACMP’s perspective .168. Over the next few visuals we will discuss some of the network related terminology in detail. network) components under its control.IP non .IP Serial Port net wor k uk net wo rk © Copyright IBM Corporation 2008 AU548.Internalnet Comm u nicatio n Inte rface Network Interface Card Network Interface Card Figure 2-17.5.2 TCP/IP network . . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the name specified in the /etc/hosts file or by the Domain Name Service for a specific IP address 2-46 HACMP Implementation © Copyright IBM Corp.mndhb non .IP label For TCP/IP networks.Instructor Guide HACMP topology components HACMP uses some unique terminology to describe the type and function of topology (as in. 2008 Unit 2. We will discuss this distinction in the next few visuals. and IP labels in the next visual. node name. .communication interface A network connection onto an IP network (slightly better definition coming shortly) . and thus multiple IP labels. but only one hostname. We will look at the relationship between hostname.V4. .non-IP network or serial network A point-to-point network. Networking considerations for high availability 2-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty In many configurations. IP labels are either service IP labels or non-service IP labels. 1998. which does not rely on the TCP/IP family of protocols . HACMP nodes will have multiple NICs.IP network A network that uses the TCP/IP family of protocols .communication device A port or device connecting a node to a non-IP network (slightly better definition coming shortly) © Copyright IBM Corp. In HACMP. Details — Many of these concepts are covered in more detail shortly. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show the topology components and their names. . 2-48 HACMP Implementation © Copyright IBM Corp. Additional information — Transition statement — Let’s consider the question of naming a node. 168. © Copyright IBM Corp. the HACMP node name. Remember that node names are not required to be the same as hostnames. These concepts should not be confused.4d.f4 vancouverboot2 0.49. For example.ac.4.48. Networking considerations for high availability 2-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4e db-app-svc Ipkts 5338 5338 5338 76884 76884 476 476 5667 5667 Ierrs 0 0 0 0 0 0 0 0 0 Opkts 5345 5345 5345 61951 61951 451 451 4500 4500 Oerrs 0 0 0 0 0 13 13 0 0 Coll 0 0 0 0 0 0 0 0 0 AU548. and one of the IP labels.20 Figure 2-18.0 © Copyright IBM Corporation 2008 Notes: Hostname Each node within an HACMP cluster has a hostname associated with it that was assigned when the machine was first installed onto the network.ac. our hypothetical machine with a hostname of gastown might have a node name of vancouver. Naming nodes Address localhost 0. The node name for a machine is almost always the same as the hostname. AIX hostname # hostname gastown # uname -n gastown HACMP node name # usr/es/sbin/cluster/utlities/get_local_nodename vancouver IP labels # netstat -i Name Mtu Network lo0 16896 link#1 lo0 16896 127 lo0 16896 ::1 tr0 1500 link#2 tr0 1500 192.ac.0 Instructor Guide Uempty Naming nodes A node can have several names. a hypothetical machine might have been given the name gastown.37.22.16.2 tr2 1492 link#4 tr2 1492 195.1 tr1 1492 link#3 tr1 1492 192. 1998.4.58 vancouverboot1 0. HACMP node name Each node within an HACMP cluster also has a node name.4. . 2008 Unit 2.168. because the alternative would result in unnecessary confusion. For example.35. Note: The Canadian city of Vancouver was once called Gastown.V4. including the AIX hostname. it is not unusual for the system’s only IP label to be the same as the system’s hostname. there is one service IP label: db-app-svc. in some sense. For IP addresses that are not associated with an application (non-service). . It is usually useful to include which node the IP address is associated with. In example in the visual. Experience shows that including a node name or a hostname as any part of an IPAT service IP label is almost always the source of significant confusion (significant in the sense that it leads to a cluster outage or other painful experience). which are used in IPAT. In the example in the visual. This is rarely a good naming convention within an HACMP cluster because there are just so many IP labels to deal with. 1998. and having to pick which one gets a name that is the same as a node’s hostname is a pointless exercise.Instructor Guide IP labels Each IP address used by an HACMP cluster almost certainly has an IP label associated with it. In non-HACMP systems. can move from node to node. there are two NICs that have a vancouver prefix on their IP labels because these particular IP labels will never be associated with any other node. Service IP labels should not contain the name of any node since they are not always associated with any particular node. IP label naming conventions: non-service IP labels Preferably. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the purpose of the IP address. 2-50 HACMP Implementation © Copyright IBM Corp. assign IP labels to IP addresses that describe. IP label naming conventions: service IP labels Service IP labels/addresses. A clear and consistent naming convention exists solely to avoid confusing the poor humans. Additional information — If the hostname is not one of the IP labels and needs to be resolvable within a node (such as CDE) then one solution would be to make it an alias to the loopback address. Networking considerations for high availability 2-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — You should make it clear that HACMP does not care what names are used.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss node and IP label naming issues. 1998. 2008 Unit 2. Transition statement — Let’s take a closer look at some of the key HACMP network component terms. .V4. © Copyright IBM Corp. . From HACMP’s point of view.Instructor Guide HACMP network component terms (1 of 2) Communication Interface: A communication interface refers to IP-based networks and NICs. 1998. Communication Adapter: A communication adapter is an X.25 adapters 2-52 HACMP Implementation © Copyright IBM Corp. 195. HACMP network component terms (1 of 2) AU548. such as /dev/tty1.Communication devices: Devices for non-IP networks . devices and adapters: . /dev/hdisk1 or /dev/tmssa1. a communication interface is an object defined to HACMP.25 adapter used to support a Highly Available Communication Link.Communication adapters: X.16.10 Communication Device: A communication device refers to one end of a point-to-point non-IP network connection. it is important to understand the difference between communication interfaces. An HACMP communication interface is a combination of: – A network interface for example: en0 – An IP label / address for example: db-app-svc.20. © Copyright IBM Corporation 2008 Figure 2-19.Communication interfaces: Interfaces for IP-based networks Note: The term communication interface in HACMP refers to more than just the physical NIC. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. such as en0 • The IP label / address .0 Notes: HACMP network terminology When using HACMP SMIT. which includes: • The logical interface (the name for the physical NIC). Refer to the student notes for a description of what a communication interface is. communication device and communication adapter. Additional information — Students might be taking the class with previous (pre-HACMP V5) experience. If so. The interfaces contain non-service IP addresses/labels as base or boot addresses and may contain a service IP address/labels if necessary. but rather is managed as a resource in a resource group only. Generally. The service IP address/label is not associated with an HACMP adapter definition. 2008 Unit 2. they should understand that there is no longer a service adapter/standby adapter/boot adapter. See the descriptions in the student notes on the next visual. but rather interfaces. Encourage them to learn this fresh. Transition statement — Let’s look at a few more HACMP network component terms. 1998. . Networking considerations for high availability 2-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. they might ask about the correlation between the old terminology and this terminology.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce and define the terms communication interface.V4. Details — Spend the time that it takes to familiarize yourself with these three terms before you try to present this visual. – Not stored in AIX ODM – Configured by Cluster Manager on an interface either by replacement or by alias. In other words. non-service. Non-service IP label / address An IP address that is configured onto a NIC using AIX’s TCP/IP smit screens and stored in the AIX ODM. Used with IP Address Takeover (IPAT). it is the IP address that a NIC has immediately after AIX finishes booting. Non-service interface: A communications interface not configured with a service IP label / address. 2-54 HACMP Implementation © Copyright IBM Corp. which stays on a single node and is kept available on that node by HACMP. . © Copyright IBM Corporation 2008 Figure 2-20. defined as an alias to an interface.0 Notes: More HACMP terminology Another set of important terms are service. It is kept highly available by HACMP. Persistent IP label / address: An IP label / address. Non-service IP label / address: An IP label / address defined to HACMP for communication interfaces and is not used by HACMP for client traffic. 1998. and persistent: Service IP label / address An IP label or address intended to be used by client systems to access services running within the cluster. Service interface: A communications interface configured with a service IP label / address (either by alias or replacement). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Two types: – Referred to as a boot or base address (stored in AIX ODM) – Persistent (see the following text). Used as a potential location for a service IP label / address.Instructor Guide HACMP network component terms (2 of 2) Service IP label / address: Address configured by HACMP to support client traffic. HACMP network component terms (2 of 2) AU548. HACMP might replace a non-service IP address with a service IP address depending on factors that are explained shortly. It is implemented as an alias and HACMP will attempt to keep this IP label / address highly available on the same node. . because a client system that gets into the habit of connecting to its application using a non-service IP label / address cannot find its application after a fallover to a different node. 2008 Unit 2.0 Instructor Guide Uempty Note: In earlier versions of HACMP. The older terms still appear in a few places in the HACMP 5. Persistent IP labels / addresses are discussed later in this unit. Non-service interface A communications interface not configured with a service IP label / address. Used as a backup for a service IP label / address. Applications should use the service IP label / address Non-service IP labels and non-service IP addresses should not be used by client systems to contact the cluster’s applications. Persistent IP label / address An IP address monitored by HACMP but it stays on the node on which it is configured. Service interface A communications interface configured with a service IP label / address (either by alias or by replacement). © Copyright IBM Corp.V4. 1998. This is particularly important if IPAT is configured. the terms boot IP label and boot IP address were used to refer to what is now being called non-service IP label / address.x documentation. Networking considerations for high availability 2-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Define a few more HACMP network component terms. . 2-56 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s take a look at some of the rules for configuring IP networks in HACMP. In particular.0 Notes: Network configuration rules for heartbeating The visual shows some of the rules for configuring HACMP IP-based networks. © Copyright IBM Corporation 2008 Figure 2-21. • Heartbeating addresses are applied to NICs as aliases. and there are a few other issues still to be discussed. 1998.Heartbeating over IP interfaces (the default) . – Heartbeating over IP aliases • No subnet restrictions on all service and non-service IP addresses. General rules The primary purpose of these rules is to ensure that cluster heartbeating can reliably monitor NICs.V4. – Do not place network equipment that filters packets between nodes. allowing all NICs to be monitored. 2008 Unit 2. . IP network configuration rules AU548. • HACMP configures a set of IP addresses and subnets for heartbeating. The two basic approaches are: .Heartbeating over IP aliases © Copyright IBM Corp. • There must be at least one subnet in common with all nodes. – With multiple NICs on the same subnet. we will discuss the rules for the service IP addresses later in the unit. • You specify a base address for the heartbeat paths. These are not quite the complete set of rules as we have not had a close enough look at IPAT yet.0 Instructor Guide Uempty IP network configuration rules General – Each node must have at least one direct connection with every other node. Non-service IP Address Rules – Heartbeating over IP interfaces (the default) • Each IP address on a node must be on a different logical subnet. HACMP cannot reliably monitor each NIC. – This is called an “offset” in the pubs. Networking considerations for high availability 2-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. when we discuss IPAT. networks and nodes. • Each logical subnet must use the same subnet mask. Heartbeating over IP aliases With this heartbeating method. so it cannot reliably monitor all interfaces. HACMP cannot select which interface will be used for heartbeating. Heartbeating over IP interfaces In this case. see the HACMP Planning Guide. hubs.Instructor Guide In either case: . In this case. .HACMP requires that each node in the cluster have at least one direct. Bridges. and other passive devices that do not modify the packet flow may be safely placed between cluster nodes. routers. which are totally separate from those used as service and non-service addresses. non-routed network connection with every other node. AIX can use any one of them for outgoing messages.There must be at least one subnet in common with all nodes.Heartbeating over IP aliases . the service and non-service addresses are not used for heartbeating. do not place intelligent switches. there are two choices: .Each logical subnet should use the same subnet mask. . Because of this. you specify an IP address offset to be used for heartbeating. there are requirements on how the addresses are configured to ensure that heartbeating can occur reliably: . The heartbeating addresses are added to the NICs using IP aliases. They are used solely for cluster node-to-cluster node communications. . 2-58 HACMP Implementation © Copyright IBM Corp. 1998. Instead. you must reserve a unique address and subnet range that is used specifically for heartbeating. These addresses are not to be routed in your network.Using netmon. HACMP then configures a set of IP addresses and subnets for heartbeating.Each interface on a node must be on a different logical subnet. the configured service addresses and non-service addresses are used for heartbeating.Between cluster nodes. all other addresses are free of any constraints. IP configuration rules too restrictive? If it is difficult to conform to the IP address configuration rules for heartbeating over IP interfaces. Of course. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Because HACMP automatically generates the proper addresses required for heartbeating. For more details. . If there are multiple interfaces on the same logical subnet. or other network equipment that do not transparently pass through UDP broadcasts and other packets to all cluster nodes. by default. it needs to be well thought out because of the way AIX handles multiple routes to the same destination.0 Instructor Guide Uempty Subnet considerations for heartbeating over IP alias Heartbeating over IP Aliases provides the greatest flexibility for configuring non-service and service IP addresses. netmon is outside the scope of this class. Unmonitorable NICs One final point: If no other mechanism has been configured into the cluster. this does involve some additional network traffic. Consider the following scenario: The non-service addresses on en1 and en2 on node1 are in the same subnet as an application’s service address. This could create a problem for your application. The service address starts on en1.use heartbeating via IP aliases or netmon to get the job done right. HACMP installations typically require many subnets. . it will continue to round robin packets between en1 and en2 (because they have the same subnet destination). netmon netmon. see the man page for the no command and the AIX Version 5.3 System Management Guide: Communications and Networks. If you only have a limited number of subnets available. © Copyright IBM Corp. AIX supports multiple routes to the same destination and. While this is perfectly acceptable in terms of HACMP heartbeating. AIX’s active Dead Gateway Detection provides a way for AIX to detect routes that are down and adjust the routing table.V4. AIX does not know that en1 has failed. however. This approach is not sufficiently robust to be relied upon-. 2008 Unit 2. See the HACMP for AIX: Planning Guide for information on using netmon. therefore. HACMP attempts to monitor an otherwise unmonitorable NIC by checking to see if packets are arriving and being sent via the interface. enables you to create a configuration file that specifies additional network addresses to which ICMP ECHO requests can be sent as an additional way to monitor interfaces. the network monitor portion of RSCT Topology Services. 1998. you may consider using heartbeating over IP alias and putting multiple service IP addresses on the same subnet or putting a service address on the same subnet as non-service addresses. en1 fails and HACMP moves the service address to en2. For more information about AIX’s support for multipath routing and active Dead Gateway Detection. will round robin between the available routes. Networking considerations for high availability 2-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Packets sent to en1 will be lost because of the failure. Instructor Guide Instructor notes: Purpose — Discuss the IP network configuration rules as they’ve been revealed so far. . Details — Additional information — Use of heartbeat over alias is an exception to the subnet rule. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-60 HACMP Implementation © Copyright IBM Corp. This will be covered later. Transition statement — Let’s take a look some IP network configuration examples. 168.1 192.1 192.6.1 192.5.168.1 192. Maybe.6.1 192. Same comment as above.168. they cannot be monitored.168.2 192. Non-service IP address examples AU548.255.255.2 192.168.7.1 192.cf file or Heartbeat over IP Alias.3 192. but both node2 interfaces are on same subnet.2 192.168. Networking considerations for high availability 2-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.168. © Copyright IBM Corp.5.1 192.5.1 Figure 2-22. they cannot be monitored.V4.168.5. No.2 192.168. We’ll see the service IP address examples later.168.5.5.2 Maybe.0 Notes: Examples The visual shows some non-service IP address examples.5.168. but third and fourth interfaces on node1 do not have a common subnet with another node.6.168. when we discuss IPAT. Yes.5. 1998.1 192.0 Instructor Guide Uempty Non-service IP address examples IP Address node1 IP Address node2 Yes Valid boot addresses? Assume a subnet mask of 255. .168. Requires netmon.8.6.2 192.5.0 192. 2008 Unit 2.168.1 192.6.168. © Copyright IBM Corporation 2008 192.5.6. but NICs are a single point of failure. Same comment as above.168. a direct non-routed network connection does not exist between the two nodes.168.2 192.168. 2-62 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Go through each example and explain why it is either valid or not. Additional information — Transition statement — Let’s take a look at the rules for non-IP networks.Instructor Guide Instructor notes: Purpose — Provide examples of how the IP networking rules work in practice. 1998. V4. each connection between two nodes is considered a network and a separate non-IP network label because it is created in HACMP. . 1998. the center node becomes a single point of failure for all the associated networks.0 Notes: Non-ip networks Non-IP networks are point-to-point. 2008 Unit 2. – Star: One node is connected to all other nodes. This is the least robust.Multi-node Disk Heartbeat (for use with resource groups with Startup Policy of “Online on All Available Nodes”) © Copyright IBM Corp.0 Instructor Guide Uempty Non-IP network configuration rules: Point-to-point Non-IP networks are strongly recommended to provide an alternate communication path between cluster nodes in the event of an IP network failure or IP subsystem failure. With more than two nodes. you can configure the non-IP network topology using one of the following layouts: – Mesh: Each node is connected to all other nodes. connecting four nodes to provide full cluster non-IP connectivity. This is the most robust. the visual shows four RS232 networks. Non-ip network configuration rules: Point-to-point AU548.Disk heartbeat (over an enhanced concurrent mode disk) . For example. Types of non-IP networks You can configure heartbeat paths over the following types of networks: . but requires the most hardware. net_rs232_04 node1 net_rs232_01 node2 net_rs232_02 node3 net_rs232_03 node4 © Copyright IBM Corporation 2008 Figure 2-23. that is. Each node has two non-IP connections for heartbeating.Serial (RS232) . in a ring configuration. Networking considerations for high availability 2-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. – Ring (or Loop): Each node is connected to its two adjacent neighbors. every node must have a non-IP network connection to at least one other node. . The connection uses the shared disk hardware as the communication path. In other words.) Use the AIX filemon command to determine the seek activity. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.The disk should have fewer than 60 seeks per second at peak load. the volume groups associated with the disks used for disk heartbeating do not have to be defined as resources within an HACMP resource group.Two nodes A node can be a member of any number of one disk heartbeat networks. (Disk heartbeats rely on being written and read within certain intervals. most disk drives that do not have write caches can perform about 100 seeks per second. For information about converting a volume group. In other words. 1998. there are some considerations based on the type of non-IP network you are using. possibly via intermediate nodes. Typically. Each disk can support one connection between two nodes. 2-64 HACMP Implementation © Copyright IBM Corp. such as the ring or mesh topologies discussed in the visual. A disk heartbeat network in a cluster contains: .Target Mode SCSI Rules The rules for non-IP networks are considerably simpler than the rules for IP networks although they are just as important.A disk used for disk heartbeating must be a member of an enhanced concurrent mode volume group. A cluster can include up to 256 communications devices. You can convert an existing volume group to enhanced concurrent mode. between every pair of nodes in the cluster. The basic rule is that you must configure enough non-IP networks to provide a non-IP communication path. In addition. an enhanced concurrent volume group associated with the disk that enables heartbeating does not have to belong to any resource group in HACMP.An enhanced concurrent mode disk that participates in only one heartbeat network Keep in mind the following points when selecting a disk to use for disk heartbeating: . .Target Mode SSA . as well as the I/O load for a physical disk. . Planning disk heartbeat networks Any shared disk in an enhanced concurrent mode volume group can support a point-to-point heartbeat connection.Instructor Guide . provide more robustness. Disk heartbeating uses 2–4 seeks. see Chapter 11: Managing Shared LVM Components in a Concurrent Access Environment in the Administration Guide. Additional communication paths. However. If there are no native serial ports available. RSCT supports baud rates of 38400. 1998. Check the hardware documentation for the system being used as well as the HACMP Release Notes for specifics on which systems allow use of the native serial ports. at least one disk in each mirror should be used for disk heartbeating. . ensure that the disk heartbeating communication device is defined to use the /dev/vpath device (rather than the associated /dev/hdisk device). remember the following points: .The serial port is free for HACMP exclusive use. 19200. Planning target mode networks Target mode SCSI and target mode SSA are also supported for point-to-point heartbeat communications. © Copyright IBM Corp. .V4. the configuration requires an RS232 serial adapter. This is particularly important if you plan to set the forced varyon option for a resource group. Each of these types of networks includes two nodes.If a shared volume group is mirrored.The hardware supports use of that serial port for modem attachment. might have lower limits. However. Check with the disk or disk subsystem manufacturer to determine the number of seeks per second that a disk or disk subsystem can support. or subsets of RAID arrays. and your planned HACMP configuration for that node uses an RS232 network. . . if you choose to use a disk that has significant I/O load. increase the value for the timeout parameter for the disk heartbeat network. .When SDD is installed and the enhanced concurrent volume group is associated with an active vpath device. Certain System p systems do not support the use of native serial ports. Networking considerations for high availability 2-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. a shared disk. Any serial port that meets the following requirements can be used for heartbeats: .All RS232 networks defined to HACMP are brought up by RSCT with a default of 38400 bps. and SCSI or SSA communications (as appropriate to the disk type). 2008 Unit 2. 9600. Planning serial point-to-point networks When planning a serial (RS232) network.0 Instructor Guide Uempty Disks that are RAID arrays. The tty ports should be defined to AIX as running at 38400 bps. 1 called multi-node disk heartbeat.4.Instructor Guide Instructor notes: Purpose — List the rules for non-IP networks. unless your students are interested. . Let’s see the requirements. 1998. Additional information — Transition statement — There’s a new disk heartbeat network introduced in HA 5. 2-66 HACMP Implementation © Copyright IBM Corp. You do not need to cover all these details here. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — The details of planning the various non-IP network types are provided for reference and come straight out of the Install Guide. 1998. MWC or "write verify“ (do not want LVM to interfere with DHB) – Reside on a single physical disk that is accessible from all cluster nodes n1 lv1 n2 ecmvg (e. The “losing side” is determined by a simple quorum calculation – a node must have access to at least one more than half of the disks.g. Quorum checking and disk fencing The quorum check is performed only on ECM volume groups used in a concurrent (OOAN) resource group.V4. 2008 Unit 2. . – For Resource Groups with Startup Policy of Online on All Nodes only A single logical volume per volume group must be allocated for MNDHB. Non-IP network configuration rules: Multi-node AU548. Oracle RAC Voting disks) lv2 lv3 MNDHB 1 MNDHB 2 MNDHB 3 n3 © Copyright IBM Corporation 2008 Figure 2-24. Networking considerations for high availability 2-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. “Fencing” uses the same function as LVM uses when quorum is lost on a mirrored volume group with quorum on – access to the disks is blocked and any further I/O attempts fail.. Note that the VG does not have to be defined with quorum and mirroring. – Traditional disk heartbeat uses non-data area of the disk as the communications medium to create a point to point connection between two nodes. so actual space is required.0 Notes: Fencing When a cluster partition occurs HACMP will determine the “losing side” and fence those nodes away from the shared storage. © Copyright IBM Corp. – Consider three per volume group.0 Instructor Guide Uempty Non-IP network configuration rules: Multi-node Multi-node disk heartbeat interconnects multiple nodes. The logical volume for MNDHB must: – Be at least 32M – Not use LVM mirroring. The quorum check happens automatically if there are ECM volume groups in an OOAN resource group with one or more disk heartbeat disks. When a node gets an indication that another node has failed (with which it shared one or more OOAN resource groups) The quorum check is on “disks used for disk heart beating. When a node comes on line 2. .” not on the total number of disks in the volume group. 1998. The quorum check is performed at two times: 1. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide It does not apply to ECM used for fast disk takeover. 2-68 HACMP Implementation © Copyright IBM Corp. with the “voting files” contained in a three-disk ECM volume group. There is no way to disable it. The anticipated use is with Oracle RAC. Additional information — Transition statement — Next let’s take a look at the concept of persistent IP labels.0 Instructor Guide Uempty Instructor notes: Purpose — Briefly cover the multi-node disk heartbeat network introduced in HACMP 5. in the past. . There is also some discussion of the configuration of MNDHBs in the unit on configuring the cluster. © Copyright IBM Corp. Networking considerations for high availability 2-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1. WIth this network. 1998. the node or group of nodes that lose access that contain less than 50%+1 of the VGDAs will be removed from the cluster. The “removal” method is user specified. Details — This concept is only to be briefly covered in this course.V4. The reason for this network is to give clusters with more than two nodes (primarily Oracle RAC) a non-IP network that contains multiple nodes. If a node or group of nodes are isolated from the others completely. 2008 Unit 2. data divergence could occur and it is difficult to determine that it is happening since all nodes continue to access the data as they had prior to the split. IP and non-IP.4. never moved to another node. which was never used for IPAT. via IP aliasing. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. require that a static IP address be assigned to each node it manages. 2-70 HACMP Implementation © Copyright IBM Corp. one that does not move from one node to another. Applications such as Tivoli Management Region (TMR). Persistent node IP labels AU548. 1998. there are applications and functions that require a reliable address that is used to reach a specific node. the only way to guarantee that a known IP address would always be available on each node for administrative purposes was to configure a separate network. GLVM is one AIX function that requires an address to be bound to a node but kept highly available amongst adapters on that network. after node synchronization. .Instructor Guide Persistent node IP labels An IP label associated with a particular node Useful for administrative purposes: – Provides highly available IP address associated with a particular node – Allows external monitoring tools (for example. to a communications interface on the node HACMP will strive to keep the persistent node IP label available on that node -. Maximum of one persistent node IP label per network per node Persistent node IP labels must adhere to subnet rules: – Persistent node IP labels must not be in any non-service interface subnet © Copyright IBM Corporation 2008 Figure 2-25. This is accomplished through a persistent address. Additionally. Tivoli) and administrative scripts to reach a particular node Assigned.0 Notes: Rationale In earlier releases of HACMP. Such a configuration limits the usefulness of the administrative network because the loss of that network adapter would result in an inability to reach the node for administrative purposes. but will move to another interface on the same node in the event an adapter failure occurs. Persistent IP labels can be used with IPAT. This is done via rc. but also plays a role in HATivoli and HACMP/XD for GLVM clusters. The persistent node IP label coexists on an interface with the non-service or service label that is already there.V4. and always stays on the same node.init in the /etc/inittab. and they are not included in any resource groups (the clients of a concurrent access resource group might be configured to use the persistent node IP label). Persistent node IP labels do not require installation of additional physical adapters. although. If Cluster Services is not up If Cluster Services is not up on the node. the failure of the underlying communication interface will.0 Instructor Guide Uempty Persistent IP labels As an optional network component. More on persistent IP labels A persistent node IP label is an IP alias that has been assigned to a specific node in the cluster. Persistent IP labels do not move as part of IPAT from node to node. Persistent node IP labels are supported on the following types of IP-based networks only: . These are IP aliases that are configured on a node and kept available as long as at least one communication interface remains active on the associated network.Ethernet . Networking considerations for high availability 2-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. of course. 2008 Unit 2. 1998. users can configure persistent node IP labels. then the persistent node IP label is still aliased to a communication interface.FDDI .ATM LANE © Copyright IBM Corp. . A persistent node IP label is intended primarily to provide administrative access to the node. cause the persistent node IP label to become unavailable.Token Ring . Make sure that the students understand why they are useful (they provide a highly available IP address. . Additional information — Transition statement — Let’s see how we did. 1998. Details — These are a very useful administrative feature.Instructor Guide Instructor notes: Purpose — Explain persistent node IP labels. 2-72 HACMP Implementation © Copyright IBM Corp. that is always associated with a particular node). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. True or False? There are no exceptions to the rule that.0 Instructor Guide Uempty Let’s review: Topic 2 1. .0 Notes: © Copyright IBM Corp. 2.V4.Might have more than one IP address associated with it c. © Copyright IBM Corporation 2008 Figure 2-26. each NIC on the same LAN must have an IP address in a different subnet.Has an IP address assigned to it using the AIX TCP/IP SMIT screens b. True or False? Persistent node IP labels are not supported for IPAT via IP replacement. 2008 Unit 2. Let’s review topic 2 AU548. on each node.Always used to communicate with clients 3. Which of the following options are true statements about communication interfaces? (Select all that apply. 4.Sometimes but not always used to communicate with clients d. Networking considerations for high availability 2-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. True or False? Clusters must always be configured with a private IP network for HACMP communication.) a. 2. True or False? Clusters must always be configured with a private IP network for HACMP communication.Has an IP address assigned to it using the AIX TCP/IP SMIT screens b.Instructor Guide Instructor notes: Purpose — Review. 4.1 heartbeat over IP aliases feature is the exception to this rule.) a. 2-74 HACMP Implementation © Copyright IBM Corp. (The HACMP 5. . each NIC on the same LAN must have an IP address in a different subnet.Sometimes but not always used to communicate with clients d. Details — Let’s review: Topic 2 solutions 1.Might have more than one IP address associated with it c. True or False? There are no exceptions to the rule that. Which of the following options are true statements about communication interfaces? (Select all that apply. on each node.Always used to communicate with clients 3.) © Copyright IBM Corporation 2008 Additional information — Transition statement — Now let’s take a look at how to configure IP address takeover. 1998. True or False? Persistent node IP labels are not supported for IPAT via IP replacement. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.3 Implementing IP address takeover (IPAT) Instructor topic introduction What students will do — The students will learn how to configure IPAT via IP aliasing and IPAT via IP replacement. . What students will learn — Students will learn: • • • • • How to configure the two variants of IPAT What happens in various failure situations How to select which style of IPAT to use in a given cluster How boot sequences change when IPAT is involved The importance of consistent addressing and naming conventions How this will help students on their job — This will help them when planning and implementing an HACMP cluster. Let’s Review and Checkpoint questions. © Copyright IBM Corp.0 Instructor Guide Uempty 2. and pencil and paper and hands-on lab exercises. Networking considerations for high availability 2-75 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 2. How students will do it — The objectives are covered through lecture. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2-76 HACMP Implementation © Copyright IBM Corp.Instructor Guide Implementing IP Address Takeover After completing this topic. Implementing IP Address Takeover AU548.0 Notes: Topic 3 objectives This section explains how to configure both variants of IP Address Takeover. you should be able to: Describe IPAT via IP aliasing and IPAT via IP replacement: – How to configure a network to support them – What happens when: • • • • • There are no failed components A communication interface fails A communication interface recovers A node fails A node recovers Select the style of IPAT that is appropriate in a given context Describe how the AIX boot sequences changes when IPAT is configured in a cluster Describe the importance of consistent IP addressing and labeling conventions © Copyright IBM Corporation 2008 Figure 2-27. 2008 Unit 2. . © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Show what we’ll talk about in this section. Networking considerations for high availability 2-77 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s take a look at a very useful feature called IP Address takeover.V4. 1998. It is placed in the application’s resource group.Instructor Guide IP Address Takeover Each highly available application is likely to require its own IP address (called a service IP address). IPAT is an optional behavior that must be configured into the cluster. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. An example of an application that might not require IPAT is a database server 2-78 HACMP Implementation © Copyright IBM Corp. from the user’s perspective. Applications that do not need IPAT Although very common. The process of moving an IP address to another NIC or to a NIC on another node is called IP address takeover (IPAT). 1998. This IP address is called a service IP address because it is used to deliver a service to the use. An IP address is selected that is associated with the application. IP Address Takeover AU548. – HACMP is responsible for ensuring that the service IP address is available on the node currently responsible for the resource group. This service IP address is placed in the application's resource group.0 Notes: Service IP address Most highly available applications work best. This capability is provided by HACMP using a feature called IP Address Takeover. HACMP then ensures that the service IP address is kept available on whichever node the resource group is currently on. NF NF S mo po un r ts Vo G lu m ro e up S ex Se I P r v ic e la b el e F il e m st Sy ts Ap ti p lic a er on S ver Reso ur Grou ce p © Copyright IBM Corporation 2008 Figure 2-28. if the application’s IP address never changes. 0 Instructor Guide Uempty for which the client software can be configured to check multiple IP addresses when it is looking for the server. IPAT is not supported for resource groups configured with a Startup Policy of Online on All Node (concurrent access) because the application in such a resource group is active on all the nodes that are currently up. © Copyright IBM Corp. clients of a concurrent access resource group must be capable of finding their server by checking multiple IP addresses. 2008 Unit 2.V4. Networking considerations for high availability 2-79 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Also. . Consequently. Details — Additional information — Transition statement — Let’s look at the two ways to implement IPAT. 2-80 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Introduce IP Address Takeover. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. It has the limitation of supporting only one service IP label per adapter.2 © Copyright IBM Corporation 2008 Figure 2-29. IPAT via IP replacement IPAT via IP replacement involves replacing the IP address currently on a NIC with a service IP address. Two ways to implement IPAT AU548. called IP aliasing.168. the number of service IP labels in a resource group. © Copyright IBM Corp.1. that we will discuss shortly. which restricts the number of resource groups that can use IPAT and. This approach supports a facility called hardware address takeover. 1998. allows HACMP to move service IP addresses between NICs (or between nodes) without having to either change existing IP addresses on NICs or worry about whether or not there is already a service IP label on the NIC. 2008 Unit 2.0 Instructor Guide Uempty Two ways to implement IPAT IPAT via IP aliasing: – HACMP adds the service IP address to an (AIX) interface IP address using AIX's IP aliasing feature: ifconfig en0 alias 192. This ability. Networking considerations for high availability 2-81 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .1.0 Notes: IPAT via IP aliasing IPAT via IP aliasing takes advantage of AIX’s ability to have multiple IP addresses associated with a single NIC.2 IPAT via IP replacement: – HACMP replaces an (AIX) interface IP addresses with the service IP addresses: ifconfig en0 192.V4.168. in practical terms. 2-82 HACMP Implementation © Copyright IBM Corp. 1998. .Instructor Guide Which is better? We will examine the advantages and disadvantages of each method in the next few pages. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Remember that the question is not which is better but rather which is better suited to a particular context. Additional information — Transition statement — Let’s look at IPAT via IP aliasing first. Networking considerations for high availability 2-83 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the student) is working in and not on the biases of the instructor. 1998. Details — Try to avoid expressing a preference between the two variants as the choice between which variant to use should be based on the context that the cluster configurator (that is. © Copyright IBM Corp. .0 Instructor Guide Uempty Instructor notes: Purpose — Explain the two types of IPAT. 2008 Unit 2.V4. 168. – They must not be in the same logical IP subnet as any of the interface IP addresses.11. ensure that: . 1998. IPAT via IP aliasing configuration AU548. Define service addresses in /etc/hosts and in HACMP resources.2 (ODM) * Refer to earlier discussion of heartbeating and failure diagnosis for explanation of why © Copyright IBM Corporation 2008 Figure 2-30.10.* – Define these addresses in the /etc/host file and configure them in HACMP topology as communication interfaces. – HACMP will configure them to AIX when needed.Instructor Guide IPAT via IP aliasing configuration Define IP address for each network interface in the AIX ODM.No service IP labels on the network require hardware address takeover (HWAT) .0 Notes: Requirements Before configuring an HACMP network to use IPAT via IP aliasing.The service IP addresses are on separate IP subnets from all non-service IP addresses 2-84 HACMP Implementation © Copyright IBM Corp.The non-service IP addresses on each node are all on separate IP subnets . .11.168.The network is a type that supports IPAT via IP aliasing: • • • • Ethernet Token-ring FDDI SP switch .168.2 (ODM) 192.1 (ODM) 192.1 (ODM) 192. – Each interface IP address must be in a different logical IP subnet. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.168.10. Before starting the application resource group 192. 0 Instructor Guide Uempty IPAT via aliasing subnet rules example The interfaces must all be on different subnets. For example. users should plan their networks carefully to balance the RG load across the cluster.88/24 IP labels n1boot1. Because the underlying interface’s IP address is not changed. .11.47.1 192.10. Additional background information HACMP systems try to keep the number of service IP labels on each NIC roughly equal.2 9. n2boot2 appA-svc appB-svc Hardware address takeover HWAT is not supported on networks that use IPAT via IP aliasing (HWAT is discussed in detail in Appendix C along with IPAT via Replacement). n2boot1 n1boot2.22 9. in a cluster with one network using IPAT via aliasing. Use HWAT when you have a networking component that does not use Gratuitous ARP.168.2 192. its hardware address is also expected to remain the same. The reason is that the service IP label is configured as an alias on top of the existing interface.1 192. where each node has two communication interfaces and there are two service IP labels. Networking considerations for high availability 2-85 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. which is the mechanism that is relied upon for IPAT via aliasing. although.22 subnet 192.11. 1998.88.168.87. 2008 Unit 2.47. Still.168.168.47.V4.168.47. it has no way to predict which service IP labels will be most popular. and the service IP labels cannot be in any of the non-service subnets.10. the network can require up to four subnets: one for each set of non-service IP labels and one for each service label (three would be required if the service addresses were on the same subnet): Node name node1 node1 node2 node2 Service address Service address NIC en0 en1 en0 en1 IP Label n1boot1 n1boot2 n2boot1 n2boot2 appA-svc appB-svc IP Address 192. regardless of the number of actual boot interfaces on the node. Planning considerations A node on a network that uses IPAT via aliasing can be the primary node for multiple resource groups on the same network. © Copyright IBM Corp.168.11/24 9.87/24 9.10/24 192. 2-86 HACMP Implementation © Copyright IBM Corp. 1998. any load balancing is the responsibility of the cluster administrator (and will require customization. which is beyond the scope of this course). .Instructor Guide Consequently. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Instructor Guide Uempty Instructor notes: Purpose — Explain how to configure a network to support IPAT via IP aliasing.V4. © Copyright IBM Corp. . 2008 Unit 2. 1998. Networking considerations for high availability 2-87 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s see what happens with IPAT via IP aliasing when it is in operation. 1 (ODM) 192. Applications should use the service address Strongly discourage users from using anything other than approved service IP addresses when contacting the cluster because the NICs associated with these non-service IP addresses might fail or the application might move to a different node while the non-service IP labels remain behind on the original node.1 (ODM) 192.47.10. IPAT via IP aliasing at startup of resource group AU548.168. . the non-service IP address (stored in the ODM) is still present. HACMP aliases the service IP label onto one of the node's available (that is. Note that one advantage of sorts of IPAT via IP aliasing is that the non-service IP addresses do not need to be routable from the client/user systems.10.2 (ODM) 192.168. After starting the application resource group 9.168. 2-88 HACMP Implementation © Copyright IBM Corp.2 (ODM) © Copyright IBM Corporation 2008 Figure 2-31.0 Notes: Operation HACMP uses AIX’s IP aliasing capability to alias service IP labels included in resource groups onto interfaces (NICs) on the node that runs the resource group. currently functional) interfaces (ODM).11.87.168.22 (alias) 192.11. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. With aliasing.Instructor Guide IPAT via IP aliasing at startup of resource group When the resource group comes up on a node. Details — Reiterate the importance of ensuring that users are using service IP labels to contact their application.V4. © Copyright IBM Corp. Networking considerations for high availability 2-89 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — Let’s see what happens when a NIC fails. . 2008 Unit 2.0 Instructor Guide Uempty Instructor notes: Purpose — To examine how IPAT via IP aliasing works in a normal (no failures) situation. 1998. 0 Notes: Interface failure If a communication interface fails. .168. 192.11. on the same network.168.Instructor Guide IPAT via IP aliasing after an interface fails If the communication interface being used for the service IP label fails.2 (ODM) © Copyright IBM Corporation 2008 Figure 2-32.10. then HACMP initiates a fallover for that resource group.22 (alias) 192. which is still available.2 (ODM) 192. HACMP moves the service IP addresses to another communication interface.168.1 (ODM) 9.1 (ODM) 192. 1998.47. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.87. HACMP aliases the service IP label onto one of the node's remaining available (currently functional) non-service (ODM) interfaces. Interface failure with IPAT The failure of an interface is generally handled locally on the node that experienced the failure by moving the IP address to a still available interface. The eventual recovery of the failed boot adapter makes it available again for future use. If no remaining available NICs are on the node for the network. The outage in this case is considerably shorter than the one that occurs when a node fails. 2-90 HACMP Implementation © Copyright IBM Corp.10.168.11. IPAT via IP aliasing after an interface fails AU548. Networking considerations for high availability 2-91 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. then HACMP triggers a fallover for any resource groups with service IP addresses on the failed interfaces’s network.0 Instructor Guide Uempty Users’ perspective Because existing TCP/IP sessions generally recover cleanly from this sort of failure/move-IP-address operation. users might not even notice the outage if they are not interacting with the application at the time of the failure. © Copyright IBM Corp. . 2008 Unit 2.V4. Failure of all interfaces on a node If the last remaining interface on a node fails. 1998. . 1998.Instructor Guide Instructor notes: Purpose — Show what happens when an interface fails. Details — Additional information — Transition statement — Let’s see what happens when a node fails. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-92 HACMP Implementation © Copyright IBM Corp. it moves with the rest of the resources to the new node. 2008 Unit 2.11. HACMP moves the resource group to a new node and aliases the service IP label onto one of the new node's available (currently functional) non-service (ODM) communication interfaces. The service IP address is aliased onto an available (currently functional) communication interface on the takeover node. IPAT via IP aliasing after a node fails AU548.168. Node failure from the users’ perspective The users experience a short outage and then.0 Notes: Node failure with IPAT When a node that is running an IPAT-enabled resource group fails.168.2 (ODM) © Copyright IBM Corporation 2008 Figure 2-33. 9. You probably shouldn’t correct the user when they mention that the server was down for a few minutes earlier when you happen to know that it is still down and undergoing repair! Strictly speaking. the same server is back up and running.V4.0 Instructor Guide Uempty IPAT via IP aliasing after a node fails If the resource group's node fails. 1998. Because the service IP address is in the resource group. HACMP moves the resource group to an alternative node. from their perspective.2 (ODM) 192. .22 (alias) 192.10. Networking considerations for high availability 2-93 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.47. © Copyright IBM Corp. the service was down for a few minutes and is now up again.87. therefore. 2-94 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — Transition statement — Because several service IP labels can share the same interface.Instructor Guide Instructor notes: Purpose — Show what happens with IPAT when a node fails. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP provides a network attribute to control distribution of service IP labels. Load balancing: Enables you to customize the load balancing for service IP labels in the cluster. HACMP allocates service IP label aliases and persistent IP labels to an existing active network interface card Four choices: – – – – Anti-Collocation Collocation Collocation with Persistent Label Anti-Collocation with Persistent Label © Copyright IBM Corporation 2008 Figure 2-34.0 Instructor Guide Uempty IPAT via IP aliasing: Distribution preference for service IP label aliases Network level attribute that controls the placement of service IP labels onto communication interfaces – Useful for • Load balancing • Isolating traffic for VPN requirements – If there are insufficient interfaces available to satisfy the preference. HACMP lets you specify the distribution preference for the service IP label aliases. A distribution preference for service IP label aliases is a network-wide attribute used to control the placement of the service IP label aliases on the communication interfaces on the nodes in the cluster. Starting with HACMP 5. .V4. 1998. Networking considerations for high availability 2-95 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Distribution preference for service IP label aliases You can configure a distribution preference for the placement of service IP labels that are configured in HACMP. taking into account the persistent IP labels previously assigned on the nodes. IPAT via IP aliasing: Distribution preference for service IP label aliases AU548. Configuring a distribution preference for service IP label aliases provides: . © Copyright IBM Corp.1. 2008 Unit 2. even if the preference cannot be satisfied. . HACMP will try to meet preferences. .Collocation: HACMP allocates all service IP label aliases on the same communication interface (adapter). . HACMP distributes all service IP label aliases across all non-service IP labels using a “least loaded” selection process. Four possible values for this attribute You can specify in SMIT the following distribution preferences for the placement of service IP label aliases: . HACMP will place the service IP label alias on the interface that is hosting the persistent label only if no other network interface is available. but will always keep service labels active: The distribution preference is exercised as long as there are acceptable network interfaces available. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP always keeps service IP labels active. . How to configure You use the extended path to configure distribution preferences.Instructor Guide . Follow this path: smitty hacmp -> Extended Configuration -> Extended Resource Configuration -> HACMP Extended Resources Configuration -> Configure Resource Distribution Preferences -> Configure Service IP Labels/Address Distribution Preference -> pick your network -> toggle through the Distribution Preference menu options.Anti-Collocation with Persistent Label: HACMP distributes all service IP label aliases across all active communication interfaces that are not hosting the persistent IP label.Anti-Collocation: This is the default. This option may be useful in VPN firewall configurations where only one interface is granted external connectivity and all IP labels (persistent and service) must be allocated on the same interface card. 2-96 HACMP Implementation © Copyright IBM Corp.Collocation with Persistent Label: All service IP label aliases are allocated on the same communication interface that is hosting the persistent IP label. However. 1998.VPN requirements: Enables you to configure the type of the distribution preference suitable for the VPN firewall external connectivity requirements. © Copyright IBM Corp.V4. 2008 Unit 2. Details — Additional information — Transition statement — Let’s summarize IPAT via aliasing.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss how you can control placement of service IP labels in an IPAT via aliasing environment. 1998. . Networking considerations for high availability 2-97 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Advantages Probably the most significant advantage to IPAT via IP aliasing is that it supports multiple service IP labels per network per resource group on the same communication interface and allows a node to easily support quite a few resource groups. IPAT via IP aliasing summary AU548. Some additional considerations are discussed as follows. – There is a total limit of 256 IP addresses known to HACMP and 64 resource groups. IPAT via IP aliasing requires that hardware address takeover is not configured. • There is no limit on the number of resource groups with service IP labels. IPAT via IP aliasing requires gratuitous ARP support. Within those overall limits: • There is no limit on the number of service IP addresses in a resource group.Instructor Guide IPAT via IP aliasing summary Configure each node's communication interfaces with nonservice IP addresses (each on a different subnet). Assign service IP labels to resource groups as appropriate. – Must be on separate subnet from non-service IP addresses. © Copyright IBM Corporation 2008 Figure 2-35. In other words. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Thus. it can require fewer adapters and interfaces than IPAT via replacement. HACMP assigns service IP labels to communication interfaces using IP aliases based on resource group rules and available hardware. IPAT enables you to share several service labels on one interface.0 Notes: Summary The visual summarizes IPAT via IP aliasing. 2-98 HACMP Implementation © Copyright IBM Corp. 0 Instructor Guide Uempty Disadvantages Probably the most significant disadvantage is that IPAT via IP aliasing does not support hardware address takeover. IPAT via IP aliasing can require a lot of subnets. because you must have a subnet for each interface and a subnet for each service IP label.V4. 1998. 2008 Unit 2. . In addition. Networking considerations for high availability 2-99 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. You will rely on Gratuitous ARP as the means of resetting the ARP entries on IPAT. © Copyright IBM Corp. 2-100 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show a summary of the IPAT via IP aliasing configuration rules and behavior. Details — Additional information — Transition statement — Now let’s take a look at IPAT via IP replacement. 1998. . V4. in order to maintain the service IP label. consider converting IPAT via Replacement configurations to Aliasing only if there is another reason compelling you to do so. Because IPAT via IP aliasing is more flexible and usually requires less network interface cards. When Cluster Services are started. . Advantages – Supports hardware address takeover – Requires fewer subnets Disadvantages – Requires more interfaces to support multiple service IP labels – Is less flexible © Copyright IBM Corporation 2008 Figure 2-36. IPAT via IP replacement overview AU548. leave the IPAT via Replacement configuration as it is. Many existing cluster implementations still have IPAT via Replacement. 2008 Unit 2. Any new implementations should strongly consider using IPAT via Aliasing. This visual gives a brief overview of IPAT via IP replacement. IPAT via IP aliasing became available when AIX could support multiple IP addresses associated with a single NIC via IP aliasing. © Copyright IBM Corp. Networking considerations for high availability 2-101 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. – All service IP labels must be in the same subnet. When upgrading to versions of HACMP that support IPAT via Aliasing. HACMP will attempt to replace the non-service IP label of another interface with the service IP label. the non-service IP labels are replaced with service IP labels for the resource groups that are being brought online. Configuration rules: – Each service IP label must be in the same subnet as a non-service label subnet. A detailed discussion can be found in Appendix C. – There must be at least as many interfaces on each node as there are service IP labels. IPAT via IP replacement is no longer the recommended method. 1998.0 Instructor Guide Uempty IPAT via IP replacement overview AIX boots with a non-service (ODM) IP address on each interface. Only one service IP label can be on an interface at a time.0 Notes: History In the beginning. IPAT via IP replacement was the only form of IPAT available. If the interface hosting a service IP label fails. Otherwise. another alternative may be to use heartbeating via aliasing. If you are limited in the number of subnets available for your cluster. Notice that they are almost the opposite to the rules for IPAT via IP aliasing.Instructor Guide Configuration rules The visual summarizes the configuration rules. Disadvantages Probably the most significant disadvantages are that IPAT via IP replacement limits the number of service IP labels per subnet per resource group on one communications interface to one and makes it rather expensive (and complex) to support lots of resource groups in a small cluster. 1998. you need more network adapters to support more applications. Another advantage is that it requires fewer subnets. HWAT may be needed if your local clients or routers do not support gratuitous ARP. this may be important. Advantages Probably the most significant advantage of IPAT via IP replacement is that it supports hardware address takeover (HWAT). . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note: If reducing the number of subnets needed is important. This will be discussed in a few pages. In other words. 2-102 HACMP Implementation © Copyright IBM Corp. see “Heartbeating over IP aliases” on page 2-58. Details — Additional information — Transition statement — Now let’s take a look at some service IP address examples. .0 Instructor Guide Uempty Instructor notes: Purpose — Show a summary of the IPAT via IP replacement configuration rules and behavior. © Copyright IBM Corp.V4. 1998. 2008 Unit 2. Networking considerations for high availability 2-103 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 6.168.97 192.5.168.168.57 198.5.168.1 Valid service IP addresses for IPAT via IP replacement 192.2 192.168.97 OR 192.168.6.3 and 192.6.57 198.1 192.5. .7.168.171 192.1 192.1 192.168.1 192.168.5.5.97 © Copyright IBM Corporation 2008 Figure 2-37.5.168.5. For IPAT via Aliasing.1 192.57 198.7. The table above provides some examples.6.6.6.6.57 198.8.161.168.168.168.1 192.2 192.97 192.7.5.5.1 Valid service IP addresses for IPAT via IP aliasing 192.1 192.168.3 and 192.1 192.168.2 192.168. 2-104 HACMP Implementation © Copyright IBM Corp.14 192.1 102.168.168.4.97 OR 192.6.5.161.7.168.22.7.183.161.168.168.168.2 192.22.8.Instructor Guide Service IP address examples IP addresses on first node 192.22. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 192.97 OR 192.5.161.22.5.168. Service IP address examples AU548.168.183.97 192. It comes down to what subnet the service IP address can be in.6.97 OR 192.3 192.168.0 Notes: Service IP address rules and examples The rules for service IP addresses are straight-forward.3 and 192.5.168.3 and 192.6.1 192.168.168.3 and 192.1 192.3 and 192. the service IP addresses must be in a subnet that is the same as one of the non-service IP address subnets.5.168.183.2 192. Notice that for a given set of IP addresses on the interfaces (AIX ODM).168.7.168.168.7.168.1 192.1 192.168. Also notice that the IPAT via Replacement column only contains subnets that are the same as the subnets in the first two columns.168.1 192.168.10.5.168. For IPAT via Replacement.6. while the IPAT via Aliasing column contains only subnets that are different than the subnets in the first two columns.1 192. service IP labels which are acceptable for IPAT via IP aliasing are not acceptable for IPAT via replacement and vice-versa.168.3 and 192. the service IP addresses must be in a subnet that is different than the non-service IP address subnets.168. 1998.1 IP addresses on second node 192.168.2 192.168.168.6.9.183.168.1 192.168.5.6.98 192.168. © Copyright IBM Corp. 1998. Networking considerations for high availability 2-105 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .V4. 2008 Unit 2. Details — Work through the table and explain why each set of service IP addresses is valid for the variant of IPAT shown at the top of their column and for the IP addresses shown to their left.0 Instructor Guide Uempty Instructor notes: Purpose — Show some example service IP addresses. let’s look at hostname naming conventions. Additional information — Transition statement — Now that we’re clear on the IP address rules. ukbase2. Adopt labeling/naming conventions AU548. .Instructor Guide Adopt labeling/naming conventions HACMP cluster also tend to have quite a few IP labels and other names associated with them Adopt appropriate labeling and naming conventions: – For example: • Node-resident labels should include the node's name: usaboot1. prevent outages. infodb-svc. in turn. node1adm. app1.0 Notes: Using IP labeling and naming conventions Again. the purpose of HACMP is to create a highly available environment for your applications. 1998. … • Persistent IP labels should include the node name (because they will not be moved to another node) and should identify that they are persistent: usa-per. uk-per. … Why? – Conventions prevent mistakes – Preventing mistakes improves availability! © Copyright IBM Corporation 2008 Figure 2-38. It can prevent mistakes which can. A naming convention can make it easier for humans to understand the configuration. usaboot2. prod1. ukbase1. node1boot1. 2-106 HACMP Implementation © Copyright IBM Corp. leading to better availability. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. node1boot2… • Service IP labels that move between nodes should describe the application rather than the node: web1-svc. usaadmin. This can reduce mistakes. Never underestimate the value of a consistent labeling or naming convention. 1998.V4. 2008 Unit 2. let’s ensure that they are resolvable by the Cluster Manager. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Indicate the value of a consistent naming/labeling convention. . Networking considerations for high availability 2-107 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now that we have an idea what hostnames we’ll use with our IP addresses. and eventually times out.5.168.31 uk-per # Service IP labels 192.168.0.31 ukboot2 # persistent IP labels 192. IP label lookup defaults to a nameserver system for name and address resolution.16.168.168.31 ukboot2 # persistent IP labels 192.31 ukboot1 192.29 usaboot2 # uk boot addresses 192.168. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.16.0 # usa boot addresses 192. However.255.11 test 127.15.0 # usa boot addresses 192.16.29 usa-per 192.168.5.168.0.0 Notes: /etc/hosts Make sure that the /etc/hosts file on each cluster node contain all of the IP labels used by the cluster (you do not want HACMP to be in a position where it must rely on an external DNS server to do IP label to address mappings).5.29 usaboot1 192.255.11 test © Copyright IBM Corporation 2008 Figure 2-39.255.15.16.0.0.168.5.5.92 xweb-svc 192.168.29 usaboot1 192.168.31 uk-per # Service IP labels 192. .168. HACMP disables NIS or DNS hostname resolution by setting the following AIX environment variable during service IP label swapping: 2-108 HACMP Implementation © Copyright IBM Corp. To ensure that the cluster event completes successfully and quickly.168.5.1 loopback localhost # cluster explorers # netmask 255.Instructor Guide Hostname resolution All of the cluster's IP labels must be defined in every cluster node's /etc/hosts file: 127. But I’m using DNS / NIS If NIS or DNS is in operation.255.15.168.168.168.70 yweb-svc # test client node 192. 1998.15.168.15. This might significantly slow down HACMP event processing.5. if the nameserver was accessed through an interface that has failed.70 yweb-svc # test client node 192.168.1 loopback localhost # cluster explorers # netmask 255.29 usaboot2 # uk boot addresses 192.92 xweb-svc 192.29 usa-per 192. the request does not complete.15.168.5. Hostname resolution AU548.31 ukboot1 192. Maintaining /etc/hosts The easiest way to ensure that all of the /etc/hosts file contain all of the required addresses is to get one /etc/hosts file set up correctly and then copy it to all of the other nodes or use the filecollections facility of HACMP 5. Networking considerations for high availability 2-109 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 1998. © Copyright IBM Corp. . the /etc/hosts file of each cluster node must contain all HACMP-defined IP labels for all cluster nodes.x. 2008 Unit 2.0 Instructor Guide Uempty NSORDER = local As a result. Details — Additional information — Transition statement — Now let’s talk about other network configuration options. . starting with Etherchannel. 2-110 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Show the two last sets of conventions in /etc/hosts file examples. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. When a link fails. EtherChannel was invented by Kalpana in the early 1990s and bought by CISCO in 1994.10 appB 192.0 Notes: Etherchannel details Etherchannel is a “trunking” technology that allows grouping several Ethernet links.1.3. Traffic is distributed across the links.1 n1boot2 192. providing higher performance and redundant parallel paths.168.168.2.2 en4 en5 en0 en1 en2 en3 Shared Storage Heartbeat on disk appA 192.Etherchannel (1 of 2) AU548. Other configurations . traffic is redirected to the remaining links within the channel without user intervention and with minimal packet loss. © Copyright IBM Corp. Networking considerations for high availability 2-111 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Other configurations: Etherchannel (1 of 2) n1boot1 192. Other popular trunking technologies exist: Adaptec's Duralink trunking / Nortel MLT MultiLink Trunking. 1998.1 en4 en5 sw1 sw2 en0 en1 en2 en3 n2boot1 192. Interoperability between technologies is a problem.20 © Copyright IBM Corporation 2008 Figure 2-40.2.3ad was finalized in 2000.168.2 n2boot2 192.168.V4. A standard IEEE 802.1.3. .168.168. 2008 Unit 2. Instructor Guide Instructor notes: Purpose — Explain what etherchannel is and the configuration option. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Additional information — Transition statement — Let’s look at limitations and rules for etherchannel with HACMP. Details — Indicate that this is meant to be a high-level view of etherchannel and its implementation with HACMP. 2-112 HACMP Implementation © Copyright IBM Corp. . . © Copyright IBM Corporation 2008 Figure 2-41.nsf/WebIndex/FLASH10284 © Copyright IBM Corp.ibm.V4. 2008 Unit 2. Hardware address takeover is not supported when implemented with IPAT via Replacement. – Minimum downtime is experienced.0 Notes: Very useful information can be found in the following documents (although dated.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785 The Flash announcing support for Etherchannel with HACMP: http://www-03.” Gives you the performance improvement of link aggregation while allowing the hardware to deal with NIC failures. Other configurations . A Techdoc regarding experiences and configuration: http://www-03. are handled by the Etherchannel technology. that is. NIC failures go unnoticed by HACMP.ibm. the information is still very relevant).Etherchannel (2 of 2) AU548. 1998. Networking considerations for high availability 2-113 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Other configurations: Etherchannel (2 of 2) If configured correctly.com/support/techdocs/atsmastr. NIC failure should not result in Clients need to “reconnect. .Instructor Guide Instructor notes: Purpose — Explain rules and limitations of etherchannel with HACMP. 2-114 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — The next configuration option to consider is virtual ethernet networking. 1998. .html?Open Other education: AU620.com/abstracts/sg247940. HACMP System Administration III: Virtualization and Disaster Recovery © Copyright IBM Corp.0 Notes: Where to get more information To get more information on this configuration.0 Instructor Guide Uempty Other configurations: Base virtual Ethernet Virtual I/O Server (VIOS1) ent3 (LA) ent4 (SEA) ent2 (virt) AIX Client LPAR 1 en0 Control Channel Control Channel Virtual I/O Server (VIOS2) ent4 (SEA) ent3 (LA) Frame1 ent1 ent0 (phy) (phy) ent5 (virt) ent0 (virt) ent5 (virt) ent2 (virt) ent1 ent0 (phy) (phy) Hypervisor Ethernet Switch Ethernet Switch Hypervisor Frame2 ent1 ent0 (phy) (phy) ent2 (virt) ent5 (virt) Control Channel ent0 (virt) ent5 (virt) Control Channel ent2 (virt) ent1 ent0 (phy) (phy) ent3 (LA) ent4 (SEA) en0 AIX Client LPAR 2 ent4 (SEA) ent3 (LA) Virtual I/O Server (VIOS1) Virtual I/O Server (VIOS2) © Copyright IBM Corporation 2008 Figure 2-42.ibm. consult the following resources: Redbooks: Implementing HACMP Cookbook http://publib-b.V4.com/abstracts/sg246769. 2008 Unit 2.redbooks.html?Open Advanced POWER Virtualization on IBM System p5: Introduction and Configuration http://www. 1998. Networking considerations for high availability 2-115 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.boulder. Other configurations: Base virtual Ethernet AU548.ibm. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Cover this at a high level. Details — This slide might generate a lot of discussion. Indicate that the virtual networking concepts are beyond the scope of the class but explain as much as you can within a reasonable amount of time. .Instructor Guide Instructor notes: Purpose — Cover one configuration option where Virtual Ethernet networking is used on an LPAR system. 1998. Additional information — Transition statement — How does HACMP see this? 2-116 HACMP Implementation © Copyright IBM Corp. 21 (persistent IP) 9.V4. HACMP view of virtual Ethernet AU548.168.51. .19.100.51. Note that there does not have to be link aggregation at the VIO Server level.51.51.cf file. You could configure a single NIC and rely on the other VIO Server for redundancy.168. 1998.19.11 192.2 ( base address) en0 HACMP Node 1 FRAME 1 serial_net1 en0 HACMP Node 2 FRAME 2 Hypervisor ent1 ent0 (phy) (phy) FRAME X ent3 (LA) ent2 (virt) ent5 (virt) Control Channel ent0 (virt) ent5 (virt) Control Channel ent2 (virt) ent1 ent0 (phy) (phy) ent4 (SEA) en0 AIX Client LPAR ent4 (SEA) ent3 (LA) Virtual I/O Server (VIOS1) Virtual I/O Server (VIOS2) © Copyright IBM Corporation 2008 Figure 2-43. Networking considerations for high availability 2-117 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.19.1 ( base address) Topsvcs heartbeating (service IP) 9.0 Instructor Guide Uempty HACMP view of virtual Ethernet net_ether_0 9.100.19.0 Notes: Additional information Single adapter Ethernet networks in HACMP require the use of a netmon. © Copyright IBM Corp.20 (service IP) 9. 2008 Unit 2.10 (persistent IP) 192. 1998.cf file should be configured on the them to provide topsvcs with assistance in monitoring the network. Details — HACMP sees the single adapter in the HACMP LPARs as a single network adapter (next visual) and that the netmon.Instructor Guide Instructor notes: Purpose — Examine the HACMP view of the virtual ethernet configuration. . Let’s look at that. Additional information — Transition statement — So we see that virtual ethernet networking results in the need to configure HACMP with a single network adapter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-118 HACMP Implementation © Copyright IBM Corp. they are a fiasco waiting to happen as the lack of a second NIC on one or more of the nodes could lead to extended cluster outages and just generally strange behavior (including HACMP failing to detect failures which would have been detected had all nodes had at least two NICs per IP network). though supported. 3. © Copyright IBM Corporation 2008 Figure 2-44.0 Instructor Guide Uempty Other configurations: Single IP adapter nodes Single IP Adapter nodes might seem attractive because they appear to reduce the cost of the cluster. © Copyright IBM Corp. Other configurations: Single IP adapter nodes AU548. Networking considerations for high availability 2-119 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unnecessary outages cost (potentially quite serious) money. quite simple: with the exception of virtual ethernet implementations and certain Cluster 1600 clusters that use the SP Switch facility. The situation is actually. 2. Clusters with fewer than two NICs per IP network. * Virtual Ethernet and certain Cluster 1600 SP Switch-based clusters are supported with only one adapter per network. The cost reduction is an illusion: 1. HACMP requires at least two NICs per IP network for failure diagnosis. the solitary NIC. 2008 Unit 2. Clusters with unnecessary single points of failure tend to suffer more outages.0 Notes: Single IP adapter nodes It is not unusual for a customer to try to implement an HACMP cluster in which one or more of the cluster nodes have only a single network adapter (the motivation is usually the cost of the adapter but the additional cost of a backup system with enough PCI slots for the second adapter can also be the issue). a false economy. 1998.V4.the single adapter. and is not supported. Nodes with only a single NIC on an IP network are. . At worst. are not recommended*. One of the fundamental cluster design goals is to reduce unnecessary outages by avoiding single points of failure. A node with only a single adapter on a network is a node with a single point of failure-. any cluster with only one NIC on a node for a given network has a single point of failure. at best. Details — Additional information — Transition statement — So how do we get all these IP addresses that we’ve been talking about? 2-120 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Explain why single adapter networks are needed and might not be a good idea. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . – These do not need to be routable. . ask for what you want well in advance of the date that you need it because it might take some time for the network administrator to find addresses and subnets for you that meet your needs. Do not accept IP addresses that do not meet the HACMP configuration rules. © Copyright IBM Corporation 2008 Figure 2-45. Even if you can get them to appear to work. – Persistent node IP label for each node on at least one network (very useful but optional) Ask early (getting subnets assigned might take some time). – IPAT via IP Replacement: • Service IP labels/addresses • Interface IP label for each network adapter (one must be in the same subnet as the service label) • A different subnet for each interface – One per adapter on the node with the most adapters. so be prepared to explain both what you want and why you want it.0 Instructor Guide Uempty Talk to your network administrator Explain how HACMP uses networks. you need to get the network administrator to provide you with IP addresses for your cluster. you can feel free to spend time talking to yourself).V4. The requirements imposed by HACMP on IP addresses are rather unusual and might surprise your network administrator. 2008 Unit 2. Also. Ask for what you need: – IPAT via IP Aliasing: • Service IP labels/addresses in the production network for client connections to the cluster applications • Additional subnets for non-service interface (ODM) labels – One per network interface on the node with the most network adapters. they almost certainly will not work at a point in time when you can least afford a problem. © Copyright IBM Corp. Talk to your network administrator AU548. 1998. Networking considerations for high availability 2-121 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Getting IP addresses and subnets Unless you happen to be the network administrator (in which case. – Only the subnet containing the service label need be routable. Instructor Guide Instructor notes: Purpose — Explain what they need to get from the networking folks. 1998. 2-122 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Let’s take a look at some changes to the AIX boot sequence that occur when IPAT is configured in a cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . /etc/inittab /sbin/rc. .0 Instructor Guide Uempty Changes to AIX start sequence The startup sequence of AIX networking is changed when IPAT is enabled.nfs daemons start exportfs /etc/inittab /sbin/rc.tcpip daemons start /etc/rc.boot cfgmgr /etc/rc. Networking considerations for high availability 2-123 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Changes to AIX start sequence AU548. © Copyright IBM Corp.V4.net -boot cfgif < Cluster Services startup > clstrmgr event node_up node_up_local get_disk_vg_fs acquire_service_addr telinit -a /etc/rc.net (modified for ipat) exit 0 /etc/rc mount all /usr/sbin/cluster/etc/harc.boot cfgmgr /etc/rc. 1998.0 Notes: /etc/inittab changes A node with a network configured for IPAT must not start inetd until HACMP has had a chance to assign the appropriate IP addresses to the node’s interfaces.nfs daemons start exportfs IPAT changes the init sequence © Copyright IBM Corporation 2008 Figure 2-46. Consequently. 2008 Unit 2.tcpip daemons start /etc/rc.net cfgif /etc/rc mount all /etc/rc.net /etc/rc. the AIX start sequence is modified slightly if a node has a resource group that uses either form of IPAT. Details — Additional information — Transition statement — Let’s take a closer look at the inittab. 2-124 HACMP Implementation © Copyright IBM Corp. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Examine the changes to the AIX boot sequence that result when IPAT is configured. 1 added the harc entry to the /etc/inittab file. Changes to /etc/inittab AU548. These daemons are started by the ha_star and hacmp entries in the inittab file.nfs > /dev/console 2>&1 # Start NFS Daemons . These are invoked by HACMP when it is ready for the TCP/IP daemons to run. HACMP waits for this marker file to exist so that it knows when the run-level a items have been completed.3 and later.boot 3 >/dev/console 2>&1 # Phase 3 of system boot .net # HACMP for AIX network startup rctcpip:a:wait:/etc/rc.x changes to /etc/inittab The visual shows excerpts from /etc/inittab from a system running AIX 6. some of the other inittab entries have been changed to run in run-level a. © Copyright IBM Corp. ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1 ha_star:h2:once:/etc/rc. The final two lines use the touch command to create a marker file when all of the run-level a items have been run.ha_star >/dev/console 2>&1 dt:2:wait:/etc/rc. qdaemon:a:wait:/usr/bin/startsrc -sqdaemon writesrv:a:wait:/usr/bin/startsrc -swritesrv . .dt cons:0123456789:respawn:/usr/sbin/getty /dev/console xfs:0123456789:once:/usr/lpp/X11/bin/xfs hacmp:2:once:/usr/es/sbin/cluster/etc/rc.1. 2008 Unit 2. HACMP 5. . Also.4.tcpip > /dev/console 2>&1 # Start TCP/IP daemons rcnfs:a:wait:/etc/rc. In HACMP 5. the HACMP daemons are running all the time.V4. . even before you start the cluster.1 and HACMP 5. srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller harc:2:wait:/usr/es/sbin/cluster/etc/harc. .net to configure the network interfaces. . . HACMP 5. starting in HACMP 5. that runs harc.3 made some additional changes to the inittab file.init >/dev/console 2>&1 clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.0 Instructor Guide Uempty Changes to /etc/inittab init:2:initdefault: brc::sysinit:/sbin/rc.0 Notes: HACMP 5.telinit > /dev/console # HACMP for AIX These must be the last entries of run level a in inittab! © Copyright IBM Corporation 2008 Figure 2-47.1. Networking considerations for high availability 2-125 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .telinit # HACMP for AIX These must be the last entries of run level a in inittab! pst_clinit:a:wait:/bin/echo Created /usr/es/sbin/cluster/. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show the changes that occur in the /etc/inittab file when IPAT is implemented. 2-126 HACMP Implementation © Copyright IBM Corp. . 1998. Details — Additional information — Transition statement — Let’s take a look at some common HACMP network configuration problems with HACMP. © Copyright IBM Corp. © Copyright IBM Corporation 2008 Figure 2-48. Service and interface IP labels are placed in the same subnet in IPAT via IP aliasing networks. Service and interface IP labels are placed in different subnets in IPAT via IP replacement networks. 2008 Unit 2. Interface IP labels on one node are placed on the same subnet. Ethernet frame type is set to 802. The contents of /etc/hosts is different on the cluster nodes. . Ethernet speed is not set uniformly or is set to autodetect.0 Notes: Configuration problems The visual shows some common IP configuration errors to watch out for. 1998.0 Instructor Guide Uempty Common TCP/IP configuration problems Subnet masks are not consistent for all HA network adapters. A different version of perl than is used by the HACMP verification tools (resulting in what appears to be a network communications problem). Common TCP/IP configuration problems AU548. This includes etherchannel.V4. Networking considerations for high availability 2-127 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.3. Instructor Guide Instructor notes: Purpose — List some common networking configuration errors. . Details — Additional information — Transition statement — Let’s see how we’ve done. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2-128 HACMP Implementation © Copyright IBM Corp. 168.168.21.4) b.4) or (192.22.3 and 192.168. 192.21.168.20. Networking considerations for high availability 2-129 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.21. 1998.20.3 and 192. 2.21.192.) a.168.3.1 and the right hand node has NICs with the IP addresses 192.4.V4.3 and 192.23.4) b.168.4) or (192.168. 2008 Unit 2.168.(192.21. If the left node has NICs with the IP addresses 192.21. True or False? All networking technologies supported by HACMP support IPAT via IP replacement.168.168.(192.) a.3 and 192.168.3 © Copyright IBM Corporation 2008 Figure 2-49.168.168.168.168.3 and 192.2 and 192.22.168.3 and 192.3 and 192.4 d. 192. True or False? All networking technologies supported by HACMP support IPAT via IP aliasing.168.20.22. then which of the following options are valid service IP addresses if IPAT via IP replacement is being used? (Select all that apply.3 and 192.3 and 192.20.4 c.168.20.168.20. If the left node has NICs with the IP addresses 192. .168.1 and 192.192.168.192.4 d.4 c.168.0 Notes: © Copyright IBM Corp.0 Instructor Guide Uempty Let’s review: Topic 3 1.21.21. 4.2.168.192.168.4 and 192. 3.168.3 and 192.20. Let’s review topic 3 AU548.2.20.21.168.168.21.168.1 and the right hand node has NICs with the IP addresses 192.168.168. 192.3 and 192.20.21.1 and 192. 192.20. True or False? A single cluster can use both IPAT via IP aliasing and IPAT via IP replacement.168.168.21.24. then which of the following options are valid service IP addresses if IPAT via IP aliasing is being used? (Select all that apply.20.23.2 and 192.20.22.3 5.24. 2. If the left node has NICs with the IP addresses 192.20.21.24.168.4 c. 192.21. 192.20. 2-130 HACMP Implementation © Copyright IBM Corp.192.1 and 192.3.1 and the right hand node has NICs with the IP addresses 192.168. True or False? A single cluster can use both IPAT via IP aliasing and IPAT via IP replacement. 1998. 4.2 and 192.3 and 192.3 and 192.4) or (192.21.21. 2.168.20. Transition statement — Next let’s look at what happens after IPAT from the client’s point of view.4 d.(192.) a.21.168.1 and 192.4 d.168.3 and 192.168.20.(192. 192.168.24.1 and the right hand node has NICs with the IP addresses 192.22.3 and 192.168.23.21. 192. True or False? All networking technologies supported by HACMP support IPAT via IP aliasing.168.3 and 192.Instructor Guide Instructor notes: Purpose — Review.3 and 192.20.168.192.3 and 192.2.3 and 192.192.2 and 192.21.168.20.21.3 © Copyright IBM Corporation 2008 Additional information — 5b is not correct because you would need two “standby adapters” and you only have two adapters.3 and 192.4 and 192.20.168.168.4.168.4) b.22.168.168.21.168.23.3 and 192.168.20.) a. then which of the following options are valid service IP addresses if IPAT via IP aliasing is being used? (Select all that apply.20.22.20.168.168.20.168. then which of the following options are valid service IP addresses if IPAT via IP replacement is being used? (Select all that apply.168. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Let’s review: Topic 3 solutions 1.20. If the left node has NICs with the IP addresses 192.4) b.21.168.21.168.168.4 c. 3.168.168.4) or (192. True or False? All networking technologies supported by HACMP support IPAT via IP replacement.22.168.168.168.21. .3 and 192.3 5.168.192.168. . How this will help students on their job — This will help them when planning and implementing an HACMP cluster.V4. 2008 Unit 2. specifically what needs to be done to update the ARP cache on clients.0 Instructor Guide Uempty 2.4 The impact of IPAT on clients Instructor topic introduction What students will do — The students will learn about the impact of IPAT on client systems. Networking considerations for high availability 2-131 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Let’s review and Checkpoint questions. 1998. What students will learn — How AIX’s gratuitous ARP usually takes care of ARP issues and three alternatives if it does not. How students will do it — The objectives are covered through lecture. and pencil and paper and hands-on lab exercises. © Copyright IBM Corp. 1998.Instructor Guide The impact of IPAT on clients After completing this topic. . The impact of IPAT on clients AU548. you should be able to: Explain how user systems are affected by IPAT related operations Describe the ARP cache issue Explain how gratuitous ARP usually deals with the ARP cache issue Explain three ways to deal with the ARP cache issue if gratuitous ARP does not provide a satisfactory resolution to the ARP cache issue: – Configure clinfo on the client systems – Configure clinfo within the cluster – Configure Hardware Address Takeover within the cluster © Copyright IBM Corporation 2008 Figure 2-50. 2-132 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Topic 4 objectives This section looks at the impact of IPAT on client systems. 0 Instructor Guide Uempty Instructor notes: Purpose — Introduce this section. 1998.V4. Details — Additional information — Transition statement — What will users experience? © Copyright IBM Corp. . 2008 Unit 2. Networking considerations for high availability 2-133 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The third component can take another dozen 2-134 HACMP Implementation © Copyright IBM Corp.0 Notes: What users see Users who are actively using the cluster’s services at the time of a failure will notice an outage while HACMP detects. Resource group fallovers to a new node result in a longer outage and sever connection-oriented services (long-term connections must be reestablished. . In either case: – Short-lived TCP-based services such as http and SQL queries. © Copyright IBM Corporation 2008 Figure 2-51. How long it takes HACMP to diagnose the failure (determine what failed) iii. short term connections retried). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. – Long-term connection oriented sessions typically recover seamlessly (TCP layer deals with packet retransmission). 1998. How long it takes HACMP to recover from the failure The first two of these generally takes between about five and about thirty seconds depending on the exact failure involved. experience short server down outage.Instructor Guide How are users affected? IP address moves and swaps within a node result in a short outage. How long does failure recovery take? Three components contribute to the duration of the outage: i. diagnoses and recovers from the failure. How are users affected? AU548. – UDP-based services must deal with lost packets. How long it takes HACMP to decide that something has failed ii. Each of these issues tends to be visible to the humans using the application in some fashion or other. then existing TCP/IP sessions eventually fail (usually as soon as the service IP address comes up on the takeover node. © Copyright IBM Corp.0 Instructor Guide Uempty or so seconds when moving an IP address within a node or it can take a few minutes or more in the case of a fallover. They might see a short period of total silence followed by a clean recovery. and AIX on that node resets sessions that it gets packets for that it does not know about). they might not even notice the outage. . 2008 Unit 2. Recovery with fallover If the problem requires a fallover. or they might have to reconnect to the application. Unless they are actively using the cluster’s applications at the time. checking file systems and recovering applications). What they actually experience generally depends far more on how the client side of the application is designed and implemented than on anything within the control of the cluster’s administrator. Networking considerations for high availability 2-135 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Recovery without fallover If the problem can be resolved without a fallover then the users generally notice a short outage and then are able to continue with what they were doing.V4. Their TCP/IP-based sessions come back to life and everything appears to be fine again. 1998. Users are also more likely to notice the outage because it typically takes a couple of minutes to complete a fallover (much of this time is spent dealing with taking over volume groups. Details — Additional information — Transition statement — Let’s take a look at one more issue that might affect some client systems. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-136 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Explain what the user is likely to see when something fails. 1 (ODM) 00:04:ac:48:22:f4 192.1) 00:04:ac:62:72:49 xweb (192.0 Instructor Guide Uempty What about the user's computers? An IPAT operation renders ARP cache entries on client systems obsolete. 1998. xweb (192.168.1 (ODM) 00:04:ac:48:22:f4 © Copyright IBM Corporation 2008 Figure 2-52. When an IP address moves to a different physical network card.1 (alias) xweb 192. Networking considerations for high availability 2-137 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.5.168.168.11.168.1 (alias) 192.10.1) 00:04:ac:48:22:f4 xweb 192.168.5.168. The ARP cache is a table of IP addresses and the network hardware addresses (MAC addresses) of the physical network cards that the IP addresses are assigned to. It could take the client system a few minutes to realize that its ARP cache is out-of-date and ask for an updated MAC address for the server’s IP address. the client’s ARP cache might still have the old MAC address.1 (ODM) 00:04:ac:62:72:49 192.5.1) 00:04:ac:62:72:49 xweb (192. © Copyright IBM Corp.5.5. What about the users's computers? AU548.168.11. Client systems must (somehow) update their ARP caches.168.168.1 (ODM) 00:04:ac:62:72:49 192.V4.10.0 Notes: ARP cache issues Client systems that are located on the same physical network as the cluster might find that their ARP cache entries are obsolete after an IP address moves to another NIC (on the same node or on a different node). . 2008 Unit 2. 04. 1998.62.49. then everything will be just fine. Details — Point out that if the client system’s ARP entry for 192.5.ac.1 after the IP address move is 00.72. 2-138 HACMP Implementation © Copyright IBM Corp. then it won’t be able to communicate with the server. it is important to understand that not all client systems are likely to be in a position to be affected by the issue.168. Additional information — Transition statement — Before we get too excited about this ARP cache issue.ac. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . On the other hand.22.Instructor Guide Instructor notes: Purpose — Show how an ARP cache entry might become out-of-date.04.48. if it is 00.f4. 1 (ODM) 192.168.168. whatever ARP cache issues might exist in a particular configuration.5.3 00:04:ac:27:18:09 ARP: xweb (192.11.V4.1 (ODM) 00:04:ac:62:72:49 00:04:ac:48:22:f4 192. It’s the ARP cache entries of the routers that must be considered.10.168. it is the router's ARP cache that must be corrected.1) ??? client (192.168.8.5.11.1) 00:04:ac:42:9c:e2 ARP: router (192.168.8.168.1 (alias) xweb 192.1 (alias) 192. © Copyright IBM Corp. . Rather.5.5.8. then the client system’s ARP cache has entry for the IP address and MAC address for the router’s network interface located on the client’s side of the router.8.0 Notes: ARP cache entries are always local ARP cache entries are only maintained by a system for the physical network cards that it communicates with directly. Most clusters have either a small handful or no client systems on the same physical network as the cluster. ARP: router (192.1) 00:04:ac:42:9c:e2 192.168.3) 00:04:ac:27:18:09 192.1 00:04:ac:42:9c:e2 192.1 00:04:ac:42:9c:e2 192.168.168.1 (ODM) 192.168.1 (ODM) 00:04:ac:62:72:49 00:04:ac:48:22:f4 © Copyright IBM Corporation 2008 Figure 2-53.8. No amount of IP address moves or node fallovers have any (positive or negative) impact on what needs to be in the client’s ARP cache. Local or remote client? AU548.10.8.168.8.168.3) 00:04:ac:27:18:09 192. that is on the cluster’s network which must be up-to-date.168.99 00:04:ac:29:31:37 xweb 192. 2008 Unit 2.5.168.8.168. If there is a router between the client system and the cluster.99 00:04:ac:29:31:37 192. Networking considerations for high availability 2-139 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.0 Instructor Guide Uempty Local or remote client? If the client is remotely connected through a router. Consequently.168.1) 00:04:ac:62:72:49 client (192.168.8.3 00:04:ac:27:18:09 ARP: xweb (192.168. they do not usually affect very many systems. it is the ARP cache entries for the router. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 2-140 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Explain that only local systems’ (and routers’) ARP cache entries are relevant. Details — Additional information — Transition statement — Now let’s look at an AIX feature that generally causes the ARP cache issue to become a non-issue. If a local system either does not receive or ignores the gratuitous ARP cache packet then its ARP cache remains out-of-date.0 Notes: Gratuitous ARP AIX supports a feature called gratuitous ARP. Gratuitous ARP AU548. Networking considerations for high availability 2-141 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Remember: Only systems on the cluster's local physical network must respect the gratuitous ARP packet. AIX broadcasts out a gratuitous (in other words.V4. Other systems on the local physical network are expected to update their ARP caches when they receive the gratuitous ARP packet. © Copyright IBM Corporation 2008 Figure 2-54. This gratuitous ARP packet is generally received and used by all systems on the cluster’s local physical network to update their ARP cache entries. Gratuitous ARP is required if using IPAT via aliasing. unrequested) ARP update whenever an IP address is set or changed on a NIC. unsolicited) ARP update. The result is that all relevant ARP caches are updated almost immediately after the IP address is assigned to the NIC. . Whenever an IP address associated with a NIC changes. 2008 Unit 2.0 Instructor Guide Uempty Gratuitous ARP AIX supports a feature called gratuitous ARP. 1998. So ARP update problems have been minimized. local systems generally either always or never act upon the gratuitous ARP update packet. © Copyright IBM Corp. The problem is that not all systems respond or even necessarily receive these gratuitous ARP cache update packets. – AIX sends out a gratuitous (that is. Note that unless the network is very overloaded. 2-142 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Explain how gratuitous ARP can make the ARP cache issue go away. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s take a look at gratuitous ARP support issues. . 1998. operating systems that implement TCP/IP are not required to respect gratuitous ARP packets (although practically all modern operating systems do).V4. In addition. . (A network that is sufficiently overloaded to be losing gratuitous ARP packets or that is suffering intermittent failures that result in gratuitous ARP packets being lost. Finally. – Practically every operating system does support gratuitous ARP. Operating systems are not required to support gratuitous ARP packets. is likely to be causing the cluster and the cluster administrator far more serious problems than the ARP cache issue involves. support issues aside. certain routers) can be configured to respect or ignore gratuitous ARP packets. Gratuitous ARP support issues AU548.0 Notes: Gratuitous ARP issues Not all network technologies provide the appropriate capabilities to implement gratuitous ARP. an extremely overloaded network or a network that is suffering intermittent failures might result in gratuitous ARP packets being lost.0 Instructor Guide Uempty Gratuitous ARP support issues Gratuitous ARP is supported by AIX on the following network technologies: – – – – Ethernet (all types and speeds) Token-Ring FDDI SP Switch Gratuitous ARP is not supported on ATM. – Some systems (for example. 2008 Unit 2. © Copyright IBM Corporation 2008 Figure 2-55. 1998. Networking considerations for high availability 2-143 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.) © Copyright IBM Corp. 1998. Details — Additional information — Transition statement — What if gratuitous ARP is not supported? 2-144 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Discuss the key gratuitous ARP support issues. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Suggestion: Do not get involved with using either clinfo or HWAT to deal with ARP cache issues until you have verified that there actually are ARP issues that need to be dealt with. © Copyright IBM Corporation 2008 Figure 2-56.V4. forcing an update to their ARP caches. © Copyright IBM Corp. then they should proceed as though their context does not support gratuitous ARP. . We will discuss these in the next few pages. What if gratuitous ARP is not supported? AU548. possibly unnecessary complexity into the cluster.0 Notes: If gratuitous ARP is not supported HACMP supports three alternatives to gratuitous ARP.0 Instructor Guide Uempty What if gratuitous ARP is not supported? If the local network technology doesn't support gratuitous ARP or there is a client system or router on the local physical network that must communicate with the cluster and that does not support gratuitous ARP packets: – clinfo can used on the client to receive updates of changes. – clinfo can be used on the servers to ping a list of clients. 1998. If the cluster administrator or configurator decides that the probability of a gratuitous ARP update packet being lost is high enough to be relevant. Networking considerations for high availability 2-145 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 2. – HACMP can be configured to perform Hardware Address Takeover (HWAT). Do not add unnecessary complexity Cluster configurators should probably not simply assume that gratuitous ARP won’t provide a satisfactory solution because each of the alternatives introduce additional. Instructor Guide Instructor notes: Purpose — Present the list of alternative ways of dealing with the ARP cache issue. 1998. This should be presented as reference material but not stressed as something that will be required configuration activity in most clusters. Additional information — Transition statement — Let’s look at the first option. 2-146 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Details — Present these as alternatives to HWAT and gratuitous ARP configurations in the event a student (or a few) find themselves in a situation where they can’t do one or the other. 192. 1998. – In this option. 2008 Unit 2. © Copyright IBM Corp.V4. Networking considerations for high availability 2-147 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. be ported to non-AIX client operating systems. • clinfo source code is provided with HACMP to facilitate porting clinfo to other platforms. – clinfo uses SNMP for communications with HACMP nodes. clinfo can detect failure either by polling or receiving SNMP traps from within the cluster. at least in theory.10. – clinfo.1 (alias) xweb 192.0 Notes: clinfo on the client The cluster information service may be run on any client system.1 (boot) 00:04:ac:62:72:49 192. .168. • clinfo executables are supplied for AIX.0 Instructor Guide Uempty Option 1: clinfo on the client The cluster information daemon (clinfo ) provides a facility to automatically flush the ARP cache on a client system. – /usr/es/sbin/cluster/etc/clhosts on the client system must contain a list of persistent node IP labels (one for each cluster node).5. Option 1: clinfo on the client AU548.168.11. clinfo can execute a script that flushes the local ARP cache and pings the servers following failure.rc is invoked to flush the local arp cache.rc © Copyright IBM Corporation 2008 Figure 2-57.1 (boot) 00:04:ac:48:22:f4 snmpd clinfo clstrmgr clinfo. clinfo must execute on the client platform.168. The clinfo source code is provided with HACMP so that it can. Details — Go through the SNMP hops shown on the diagram to illustrate how HACMP integrates with SNMP and how clinfo “gets the news of the cluster event.Instructor Guide Instructor notes: Purpose — Present the clinfo on the client option for dealing with the ARP cache issue. 2-148 HACMP Implementation © Copyright IBM Corp. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.” Additional information — Transition statement — Let’s look at the second option. rc © Copyright IBM Corporation 2008 Figure 2-58. Once again clinfo can execute a script on the servers that flushes the local ARP cache and pings the local clients. Option 2: clinfo from within the cluster AU548. 2008 Unit 2.1 (boot) 00:04:ac:48:22:f4 ping! snmpd clinfo clstrmgr clinfo. so it’s possible (although rather unusual) that a client operating system might fail to update its ARP cache when the ping packet arrives. .V4. If clinfo is only run on one cluster node then that node become a single point of failure! clinfo flushes local ARP cache (on the cluster node) and then pings a defined list of clients listed in /usr/es/sbin/cluster/etc/clinfo. In this option. Networking considerations for high availability 2-149 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.168.168. These in-bound ping packets contain the new IP address-to-MAC address relationship.11.0 Notes: clinfo on the cluster nodes clinfo is already compiled and ready to run on the cluster’s servers.0 Instructor Guide Uempty Option 2: clinfo from within the cluster clinfo can also be used on the cluster's nodes to force an ARP cache update.5.rc.168. and are used by the client operating system to update its ARP cache. © Copyright IBM Corp. 192. Unfortunately.1 (alias) xweb 192. clinfo runs on every cluster node. this is not a mandatory feature of TCP/IP.10.1 (boot) 00:04:ac:62:72:49 192. Clients pick up the new IP address to hardware address relationship as a result of the ping request. 1998. 2-150 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.rc script. Details — Additional information — Transition statement — Let’s have a look at the key part of the clinfo. 1998.Instructor Guide Instructor notes: Purpose — Present the clinfo in the cluster option for dealing with the ARP cache issue. © Copyright IBM Corp.rc script (extract) AU548.1. © Copyright IBM Corporation 2008 Figure 2-59. .V4. This is crucial.rc. # . DNS. Then start the clinfo daemon (clinfo can be started as part of starting Cluster Services on the cluster nodes). /etc/cluster/ping_client_list TOTAL_CLIENT_LIST="${TOTAL_CLIENT_LIST} ${PING_CLIENT_LIST}" fi # # WARNING!!! For this shell script to work properly. . 1998. This allows the client list to be # kept in a file that is not altered when maintenance is # applied to clinfo.0 Instructor Guide Uempty clinfo. A separate file /etc/cluster/ping_client_list can also contain a list of client machines to ping.rc file on each server node. clinfo.rc script must be edited manually on the cluster nodes that run clinfo.rc script (extract) This script is located under /usr/es/sbin/cluster/etc and is present on an AIX system if the cluster. ALL entries in # the TOTAL_CLIENT_LIST must resolve properly to IP addresses or hostnames # (must be found in /etc/hosts. or NIS). Remember: All the cluster nodes should be running clinfo if clinfo is being used within the cluster.0 Notes: clinfo.rc The clinfo. Edit the /usr/es/sbin/cluster/etc/clinfo.client fileset has been installed.3" # PING_CLIENT_LIST="" TOTAL_CLIENT_LIST="${PING_CLIENT_LIST}" if [[ -s /etc/cluster/ping_client_list ]] . . # Example: # # PING_CLIENT_LIST="host_a host_b 1. these changes are only required on the cluster nodes that are running clinfo. although. to deal with ARP cache issues (because you never know which cluster nodes will survive whatever has gone wrong). then # # The file "/etc/ping_client_list" should contain only a line # setting the variable "PING_CLIENT_LIST" in the form given # in the example above.rc. 2008 Unit 2. There is no reason why clinfo cannot also be run on the client systems. Add the IP label or IP address of each system that accesses service IP addresses managed by HACMP to the PING_CLIENT_LIST list. Networking considerations for high availability 2-151 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .1. it receives a new map (description of the cluster’s state). clinfo calls clinfo. .If clinfo receives a fail_network_complete event.If clinfo receives a node_down_complete event. This is probably the best method as it ensures that the list of clients to ping is not overlaid by future changes to clinfo. . it calls clinfo. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.swap} interface_name The next set of details likely do not make sense until we are further into the course. 1998. 2-152 HACMP Implementation © Copyright IBM Corp.rc fail interface_name.rc {join.rc with the fail parameter for all associated interfaces. . When clinfo is notified that the cluster is stable after undergoing a failure recovery of some sort or when clinfo first connects to clsmuxpd (the SNMP part of HACMP). .Instructor Guide /etc/cluster/ping_client_list You can also provide the list of clients to be pinged in the file /etc/cluster/ping_client_list.rc swap interface_name. More details This script is invoked by HACMP as follows: clinfo. . .fail.If a new state is UP clinfo calls clinfo.If a new state is DOWN. it calls clinfo.If clinfo receives a swap_complete event.rc join interface_name.rc with the fail parameter for each interface currently UP. it calls clinfo. It checks for changed states of interfaces: .rc. Networking considerations for high availability 2-153 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. of course.0 Instructor Guide Uempty Instructor notes: Purpose — Show the clinfo. 2008 Unit 2. a third option. .rc script and explain how to customize it and how it is used. Details — Additional information — Transition statement — There is. 1998. . 1998. HACMP then ensures that whichever NIC the service IP address is on also has the designated hardware address. The essence of HWAT is that the cluster configurator designates a hardware address. which HACMP assigns to the NIC that has the service IP label.0 Notes: Hardware address takeover Hardware Address Takeover is the most robust method of dealing with the ARP cache issue as it ensures that the hardware address associated with the service IP address does not change (which avoids the whole issue of whether the client system’s ARP cache is out-of-date). © Copyright IBM Corporation 2008 Figure 2-60. that is to be associated with a particular service IP address. and a NIC can support only one hardware address at any given time. HWAT is discussed in detail in Appendix C. Option 3: Hardware address takeover AU548. HWAT is incompatible with IPAT via IP aliasing because each service IP address must have its own hardware address. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-154 HACMP Implementation © Copyright IBM Corp. Cluster implementer designates a Locally Administered Address (LAA).Instructor Guide Option 3: Hardware Address Takeover HACMP can be configured to swap a service IP label's hardware address between network adapters. .V4. Networking considerations for high availability 2-155 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.HWAT increases the takeover time (usually by just a few seconds).HWAT is an optional capability that must be configured into the HACMP cluster. . 1998.Cluster nodes using HWAT on token ring networks must be configured to reboot after a system crash as the token ring card will continue to intercept packets for its hardware address until the node starts to reboot.) . (We will see how to do that in detail in a later unit. . . .HWAT is not supported by IPAT via IP aliasing because each NIC can have more than one IP address but each NIC can only have one hardware address.HWAT is only supported for Ethernet.The hardware address that is associated with the service IP address must be unique within the physical network that the service IP address is configured for. ATM networks do not support HWAT. and FDDI networks (MCA FDDI network cards do not support HWAT). © Copyright IBM Corp.0 Instructor Guide Uempty HWAT considerations Remember the following points when contemplating HWAT: . 2008 Unit 2. token ring. Details — Details are in Appendix C. Additional information — Transition statement — Let’s review. .Instructor Guide Instructor notes: Purpose — Introduce HWAT. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-156 HACMP Implementation © Copyright IBM Corp. 1998. True or False? Clients are required to exit and restart their application after a fallover. 3. Networking considerations for high availability 2-157 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corporation 2008 Figure 2-61. 2008 Unit 2. you must add the list of clients to ping to either the __________________________ or the __________________________ file. . True or False? All client systems are potentially directly affected by the ARP cache issue. True or False? clinfo must not be run both on the cluster nodes and on the client systems. If clinfo is run by cluster nodes to address ARP cache issues. 2. 4.V4.0 Instructor Guide Uempty Checkpoint 1. 1998.0 Notes: © Copyright IBM Corp. Checkpoint AU548. rc file. True or False? All client systems are potentially directly affected by the ARP cache issue. 2-158 HACMP Implementation © Copyright IBM Corp. 4. 1998. Details — Checkpoint solutions 1. If clinfo is run by cluster nodes to address ARP cache issues. 2. © Copyright IBM Corporation 2008 Additional information — Transition statement — Let’s summarize. True or False? clinfo must not be run both on the cluster nodes and on the client systems. True or False? Clients are required to exit and restart their application after a fallover. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 3.Instructor Guide Instructor notes: Purpose — Review. you must add the list of clients to ping to either the /etc/cluster/ping_client_list or the /usr/es/sbin/cluster/etc/clinfo. . . Networking considerations for high availability 2-159 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and network failures using RSCT heartbeats – Communicate with HACMP daemons on other nodes All HACMP clusters require a non-IP network – Differentiate between node. stored in AIX ODM Persistent IP label/address: Node bound HA address for admin access to a node Communication interface: Association between a NIC and an IP label/address Communication device: Device used in non-IP network Communication adapter: X. 1998. 2008 Unit 2. IP subsystem and network failures – Prevent cluster partitioning HACMP networking terminology – – – – – – – Service IP label/address: HA address used by client to access application Non-service IP label/address: Applied to NIC at boot time. Unit summary (1 of 2) AU548.V4.0 Notes: © Copyright IBM Corp. node.25 adapter used in a HA communication link IP Address Takeover (IPAT): Moves service IP address to working NIC after a failure • IPAT via aliasing: Adds the service address to a NIC using IP aliasing • IPAT via replacement: Replaces the non-service address with the service address © Copyright IBM Corporation 2008 Figure 2-62.0 Instructor Guide Uempty Unit summary (1 of 2) Key points from this unit: HACMP uses networks to: – Provide highly available client access to applications in the cluster – Detect and diagnose NIC. 2-160 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — More summary. .Instructor Guide Instructor notes: Purpose — Summarize the most important topics that have been discussed in this unit. 0 Instructor Guide Uempty Unit summary (2 of 2) Key points from this unit (continued): HACMP has very specific requirements for subnets. • Multiple service addresses must be in the same subnet. • Each service address must be in same subnet as one of the non-service addresses on the highest priority node. Networking considerations for high availability 2-161 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. • One subnet required for heartbeating. does not need to be routed.V4. or be on separate subnets. 1998. • There must be at least one subnet in common with all nodes. which must use the same subnet mask. 2008 Unit 2. . • Service addresses must be on different subnet than any non-service address. • A service address can be on same subnet with another service address. – Heartbeating over IP alias (any form of IPAT) • Service and non-service addresses can coexist on the same subnet. HACMP can update local clients’ ARP cache after IPAT. – IPAT via replacement • NICs on a node must be on different subnets.0 Notes: © Copyright IBM Corp. which must use the same subnet mask. – IPAT via aliasing • NICs on a node must be on different subnets. – – – – Gratuitous ARP (default) clinfo on clients clinfo on server nodes Hardware address takeover (HWAT) © Copyright IBM Corporation 2008 Figure 2-63. Unit summary (2 of 2) AU548. Instructor Guide Instructor notes: Purpose — More summary Details — Additional information — Transition statement — We are finished with this unit. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2-162 HACMP Implementation © Copyright IBM Corp. . Version 5.4.ibm. .4. Shared storage considerations for high availability 3-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX. 2008 Unit 3.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX.redbooks.4.ibm. Version 5.com/servers/storage http://www.4.4. Version 5.ibm. What you should be able to do After completing this unit.4. Version 5. Shared storage considerations for high availability Estimated time 02:00 What this unit is about This unit discusses the issue of shared storage in a high-availability environment with a particular emphasis.com © Copyright IBM Corp. you should be able to: • Discuss the shared storage concepts that apply within an HACMP cluster • Describe the capabilities of various disk technologies as they relate to HACMP clusters • Describe the shared storage related facilities of AIX and how to use them in an HACMP cluster How you will check your progress • Checkpoint questions • Pencil and paper planning exercises • Machine exercises References SC23-5209-01 HACMP for AIX.com/systems/p/library/hacmp_docs.html HACMP manuals http://www-03. of course.0 Instructor Guide Uempty Unit 3.1: Planning Guide SC23-4862-10 HACMP for AIX.1: Master Glossary http://www-03.V4. Version 5. Version 5. 1998. on shared storage in an HACMP context.1: Administration Guide SC23-5177-04 HACMP for AIX.1: Installation Guide SC23-4864-10 HACMP for AIX. Instructor Guide Unit objectives After completing this unit. 1998. .0 Notes: 3-2 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit objectives AU548. you should be able to: Discuss the shared storage concepts that apply within an HACMP cluster Describe the capabilities of various disk technologies as they related to HACMP clusters Describe the shared storage related facilities of AIX and how to use them in an HACMP cluster © Copyright IBM Corporation 2008 Figure 3-1. 0 Instructor Guide Uempty Instructor notes: Purpose — To tell the students what we will talk about in this unit. © Copyright IBM Corp. Shared storage considerations for high availability 3-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 3. . Transition statement — Let’s take a look at the fundamental concepts behind shared storage. but it is the intent of this unit to teach the students what the HACMP considerations are. Details — Additional information — The intent of this unit is not to teach how the various storage devices are installed and configured.V4. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide 3-4 HACMP Implementation © Copyright IBM Corp. . How this will help students on their job — This will help them when planning and implementing an HACMP cluster. © Copyright IBM Corp. and pencil and paper and hands-on lab exercises. How students will do it — The objectives are covered through lecture.1 Fundamental shared storage concepts Instructor topic introduction What students will do — The students will be introduced to basic storage concepts as related to HACMP. 1998.V4. . Let’s Review and Checkpoint questions. What students will learn — The fundamental shared storage concepts as they apply within an HACMP cluster are described. Shared storage considerations for high availability 3-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 3.0 Instructor Guide Uempty 3. you should be able to: Explain the distinction between shared storage and private storage Describe how shared storage is used within an HACMP cluster Discuss the importance of controlled access to an HACMP cluster's shared storage Describe how access to shared storage is controlled in an HACMP cluster © Copyright IBM Corporation 2008 Figure 3-2. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide Fundamental shared storage concepts After completing this topic. Fundamental shared storage concepts AU548.0 Notes: 3-6 HACMP Implementation © Copyright IBM Corp. . 1998. Shared storage considerations for high availability 3-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 3.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the topic of shared storage fundamentals.V4. Details — Additional information — Transition statement — Let’s take a look at just what shared storage is. . © Copyright IBM Corp. 1998. even the most minimal application requires disk space to store the application’s binaries. When such an application is placed into a high-availability cluster. What is shared storage? AU548. This storage that is accessible to multiple nodes is called shared storage. .Instructor Guide What is shared storage? SCSI disks Node 1 SAN storage rootvg rootvg Virtual SCSI disks via VIO Server Node 2 rootvg rootvg © Copyright IBM Corporation 2008 Figure 3-3. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Data must be striped or mirrored across multiple physical drives (generally presented to AIX as a LUN) and access to those LUNs from each node should be over multiple paths (generally referred to as multi-pathing). For example. Most applications also require storage space for configuration files and whatever application data the application is responsible for. Also keep in mind that HACMP does not provide data redundancy. any of the application’s data that changes must be stored in a location that is accessible to whichever node the application is currently running on.0 Notes: Application storage requirements A computer application always requires at least a certain amount of disk storage space. This most likely will result in the use of a shared storage device that provides the striping or mirroring and multi-pathing software. These components must 3-8 HACMP Implementation © Copyright IBM Corp. Shared resource example Note: The graphic in the lower right-hand corner is a shared telephone. the storage technology must support and the actual configuration must physically connect the storage to the relevant nodes. Non-concurrent access In a non-concurrent access environment. the shared disks are activated on more than one node simultaneously. because only one node has access to the shared storage at a time. Non-concurrent access mode is sometimes called serial access mode. Shared storage considerations for high availability 3-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . a highly available application potentially runs on only one node for extended periods of time. the cluster node with the next highest priority in the resource group node list acquires ownership of the shared disks as part of fallover processing. This capability is supported by a variety of storage technologies. both AIX and HACMP levels. Shared storage physical connection To associate the storage with whichever node is running the application. it is storage that can be associated automatically (without human intervention) with the node where the application is currently running. If the owning node fails. In a non-concurrent access environment. Rather. Only one disk connection is active at a time and the shared storage is not shared in any real time sense. Concurrent access In concurrent access environments. including SCSI and Fibre Channel as we’ll see shortly. © Copyright IBM Corp. 2008 Unit 3. the disks are owned by only one node at a time. Therefore. 1998.V4.0 Instructor Guide Uempty be checked for compatibility with HACMP at the level you intend to implement. access to the shared storage must be controlled by some locking mechanism in the application. This ensures that the data stored on the disks remains accessible to client applications. In this case. We will focus on non-concurrent shared storage in this unit. disk takeover is not required. when a node fails. Instructor Guide Instructor notes: Purpose — Explain what shared storage is. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 3-10 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — And then there’s private storage. . Private resource example Note: The graphic in the lower right-hand corner is a private telephone. 1998. accessible to only a single cluster node. Shared storage considerations for high availability 3-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. of course. What is private storage? AU548. .0 Instructor Guide Uempty What is private storage? SCSI disks Node 1 SAN storage rootvg rootvg Virtual SCSI disks via VIO Server Node 2 rootvg rootvg © Copyright IBM Corporation 2008 Figure 3-4. © Copyright IBM Corp.V4. It might be physically located within each system’s box or externally in a rack or even in an external storage subsystem. 2008 Unit 3.0 Notes: Private storage Private storage is. The key point is that private storage is not physically accessible from more than one cluster node. Details — Additional information — We will discuss how to decide which data should be in shared storage and which should be in private storage in the Planning for applications and resource groups unit. . 3-12 HACMP Implementation © Copyright IBM Corp. Transition statement — We must carefully control how the nodes access the shared storage so that corruption does not occur. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain what private storage is. 1998. Shared storage considerations for high availability 3-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Therefore. Issues for concurrent access Some clusters have instances of the application active on more than one node at a time (for example.0 Instructor Guide Uempty Access to shared data must be controlled Consider: Data is placed in shared storage to facilitate access to the data from whichever node the application is running on. the application actually runs on only one node at a time and modification or even access to the data from any other node during this time could be catastrophic (the data could be corrupted in ways which take days or even weeks to notice). not the node that the application is running on) could result in data corruption.V4. The application is typically running on only one node at a time. Updating the shared data from another node (that is. 1998. only the node actually running the application should be able to access the data. parallel databases). Such clusters require simultaneous access to the shared disks and must be designed to carefully control or coordinate their access to the shared data. . Access to shared data must be controlled AU548. In a non-concurrent access environment. This mechanism must be provided by the application. 2008 Unit 3. Viewing the shared data from another node could yield an inconsistent view of the data. © Copyright IBM Corp.0 Notes: Why? The shared storage is physically connected to each node that the application might run on. © Copyright IBM Corporation 2008 Figure 3-5. Details — Point out that concurrent access applications also require controlled or at least coordinated access to the shared data but that this is the application’s responsibility. Additional information — Transition statement — Let’s look at how access to the shared storage is controlled in the typical situation where only one node at a time needs access to the shared storage for extended periods of time. 3-14 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain why access to shared storage must be controlled. 1998. 0 Notes: Introduction There are two mechanisms to control ownership of shared storage. varyonvg/varyoffvg uses either: • Reserve/release-based shared storage protection Used with standard volume groups • RSCT-based shared storage protection Used with enhanced concurrent volume groups © Copyright IBM Corporation 2008 Figure 3-6. © Copyright IBM Corp. Who owns the storage? AU548. we refer to them as the: . which was appropriate for shared storage which was assigned to a single node for extended periods of time. Shared storage considerations for high availability 3-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the AIX logical volume manager invoked disk-based reserve/release as the shared storage protection mechanism.1. Although these two mechanisms do not seem to have formal names.Reserve/release-based shared storage protection mechanism and the .RSCT-based shared storage protection mechanism We use the term protection rather than access control both because it is a bit shorter and because it reminds us that the purpose of the mechanism is to protect the shared storage. Reserve/release-based shared storage protection Prior to HACMP 5. 1998.V4.0 Instructor Guide Uempty Who owns the storage? Node 1 ODM ODM A B Node 2 ODM ODM C D The varyonvg/varyoffvg commands are used to control ownership. in this unit. 2008 Unit 3. . only enhanced concurrent volume groups are supported. The original concurrent mode volume groups were only supported on Serial DASD and SSA disks in conjunction with the 32-bit kernel. . but you can still use them in AIX V5. This implies the use of enhanced concurrent mode volume groups. Standard. When concurrent volume groups are created on AIX v.1 introduced enhanced concurrent volume groups. 1998. Enhanced concurrent volume groups can also be used in non-concurrent environment access environments to provide RSCT-based shared storage protection. Beginning with AIX Version 5. which requires the use of enhanced concurrent volume groups.x uses this mechanism when enhanced concurrent volume groups are in use (more on enhanced concurrent volume groups later in this unit). concurrent. • Non-concurrent access environment There are two volume groups that can be used with non-concurrent access environments.1.2.AIX V5. 3-16 HACMP Implementation © Copyright IBM Corp. the enhanced concurrent mode volume group was introduced to extend the concurrent mode support to all other disk types and to the 64-bit kernel. you must use concurrent volume groups. This mechanism uses an AIX component called Reliable Scalable Cluster Technology (RSCT). and enhanced concurrent volume groups • History Concurrent mode volume groups were created to allow multiple nodes to access the same logical volumes concurrently. but still allowed you to create and use the classical concurrent volume groups. However. there are a number of advantages to using the RSCT-based shared storage protection.2 does not allow you to create classical concurrent volume groups. .1 introduced a new mechanism to be used with enhanced concurrent volume groups.1 and up.AIX V5. HACMP 5. they are created as enhanced concurrent mode volume groups by default. • Concurrent access environment If you need concurrent access to the data in shared storage.3 removes the support for classical concurrent volume groups entirely.5. We will be discussing RSCT in greater detail later in the week. • Support for the classical concurrent volume groups is being removed . . as we shall see.AIX V5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This volume group type uses reserve/release-based shared storage protection.Instructor Guide RSCT-based shared storage protection AIX V5. The first is a standard volume group. . Details — Additional information — Transition statement — Let’s look at how reserve/release-based shared storage protection works. 1998. © Copyright IBM Corp.V4. Shared storage considerations for high availability 3-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Point out that two different shared storage access control mechanisms exist. 2008 Unit 3. the volume group is imported. Disks which support this mechanism can be. The varyonvg command fails for any disks that are currently reserved by other nodes. Reserve/release-based protection AU548. There 3-18 HACMP Implementation © Copyright IBM Corp. 1998. If it fails for enough disks. in effect.0 Notes: Disk reservation Reserve/release-based shared storage protection relies on the disk technology supporting a mechanism called disk reservation. told to refuse to accept almost all commands from any node other than the one which issued the reservation. the others presumably are also.Instructor Guide Reserve/release-based protection Node 1 ODM ODM A B varyonvg Node 2 ODM varyonvg C D Reserve/release-based shared storage protection relies on hardware support for disk reservation (SCSI commands) – Disks are physically reserved to a node when varied on – Disks are released when varied off – LVM is unable to vary on a volume group whose disks are reserved to another node Not all shared storage systems support disk reservation © Copyright IBM Corporation 2008 Figure 3-7. that it almost certainly does since if one disk is reserved by another node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . LVM change management: Keeping the ODM and VGDA in sync When multiple nodes are sharing a volume group using reserve/release-based storage protection. but not varied on for the inactive nodes. AIX’s LVM automatically issues a reservation request for each disk in a volume group when the volume group is varied online by the varyonvg command. then the varyon of the volume group fails. Lazy update Lazy update works by using the volume group timestamp in the ODM. When HACMP needs to varyon a volume group. In this way. Shared storage considerations for high availability 3-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. which can automate this task. The ODM can be updated manually or you can use Cluster Single Point of Control (C-SPOC). no extra steps are required. extra time at fallover is avoided. For obvious reasons (like the fact that it can’t overcome some VGDA/ODM mismatches) relying on lazy update should be avoided. When an inactive node is made active and if the volume group were varied on without updating the ODM. lazy update does an exportvg/importvg to recreate the ODM on the node. © Copyright IBM Corp. Lazy update and the various options for updating ODM information on inactive nodes are discussed in detail in a later unit in this course. the information in the ODM on the node and the VGDA on the disks would disagree. . 1998. possible to update the ODM on inactive nodes when the change to the VGDA meta-data is made. If the timestamps disagree. of course. the other nodes’ ODMs will still list the logical volume at the original size. For example. It is. If the timestamps agree.0 Instructor Guide Uempty must be some mechanism to ensure that any meta-data VGDA changes made to the volume group on the active node will be updated in the ODM on the inactive nodes in the cluster.V4. HACMP provides a last-chance mechanism called lazy update to update the ODM on the takeover node at the time of fallover. This will cause problems. This is meant to be a final attempt at synchronizing the VGDA content with a takeover node’s ODM at fallover time. if you change the size of a logical volume on the active node. When using reserve/release-based shared storage protection. it compares the ODM timestamp to the timestamp in the VGDA. 2008 Unit 3. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Introduce reserve/release-based shared storage protection. Details — You might be asked “What if enough of the disks are not reserved to be able to get the volume group online?” Try to delay this issue for a few visuals as it gets dealt with shortly. 1998. 3-20 HACMP Implementation © Copyright IBM Corp. Additional information — Transition statement — Let’s see how a volume group is handed off as part of moving a resource group from a node that is still operational. V4.0 Notes: Manual takeover With reserve/release-based shared storage protection. Shared storage considerations for high availability 3-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP passes volume groups between nodes by issuing a varyoffvg command on one node and a varyonvg command on the other node. 1998. 2008 Unit 3. © Copyright IBM Corp. . The coordination of these commands (ensuring that the varyoffvg is performed before the varyonvg) is the responsibility of HACMP. Reserve/release disk takeover: Manual move AU548.0 Instructor Guide Uempty Reserve/release disk takeover: Manual move Node 1 ODM ODM A B httpvg varyonvg Node 2 ODM dbvg C varyonvg D Node 1 ODM ODM A B Node 2 ODM ODM Node2: varyoffvg httpvg dbvg C varyonvg D Node 1 ODM ODM httpvg A varyonvg B Node 2 ODM ODM Node1: varyonvg httpvg dbvg C varyonvg D © Copyright IBM Corporation 2008 Figure 3-8. Details — Additional information — Transition statement — Now let’s see what happens if the other node has failed.Instructor Guide Instructor notes: Purpose — Illustrate how a volume group is moved manually. . 1998. 3-22 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. It then varies on the volume group which causes the disks to be reserved to the takeover node.0 Instructor Guide Uempty Reserve/release disk takeover: Failure Node 1 ODM ODM varyonvg A B varyonvg Node 2 ODM ODM C D Node 1 ODM varyonvg A B Node 2 ODM ODM varyonvg C D © Copyright IBM Corporation 2008 Figure 3-9. Reserve/release disk takeover . 1998. . Shared storage considerations for high availability 3-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Disk takeover due to a failure The right node has failed with the shared disks still reserved to the right node. © Copyright IBM Corp.failure AU548.V4. Implications Note that if the right node had not really failed then it would lose its reserves on the shared disks (rather abruptly) when the left node varied them on. When HACMP encounters a reserved disk in this context. because this indicates you are in a situation where both nodes can access and update the data on the disks (each believing that it is the only node accessing and updating the data). An failure takeover isn’t possible unless all paths used by HACMP to communicate between the two nodes have been severed. 2008 Unit 3. This will be seen in the left node’s error log and should be acted on immediately. it uses a special utility program to break the disk reservation. ensure that there is sufficient redundancy in these communication paths to ensure that loss of all communication with another node implies that the other node has truly failed. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. In other words.Instructor Guide How do we know the other node has failed? Disk takeover due to failure will only occur when a node believes that the active node has failed. 1998. HACMP uses communication between the nodes to determine if each node is still active. . 3-24 HACMP Implementation © Copyright IBM Corp. © Copyright IBM Corp. . Shared storage considerations for high availability 3-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 3. Details — Additional information — Transition statement — Let’s take a look at something called ghost disks.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Illustrate how a volume group is moved in the event of a failure. leave that to Cluster Manager © Copyright IBM Corporation 2008 Figure 3-10. In order to be safe. This temporary hdisk name is called a ghost disk.0 Notes: What is a ghost disk? During the AIX boot sequence. 1998. If it isn’t. it tries to determine if the physical volume is the same actual physical volume that was last seen at the particular hardware address. Reserve/release ghost disks AU548. the question of whether each physical volume is the expected physical volume is resolved.Instructor Guide Reserve/release ghost disks Node 1 ODM varyonvg A B Node 2 ODM varyonvg C D hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk9 •Not seen with IBM disks •Add time to volume group activation •No need to manually deal with these. When the volume group is eventually brought online by Cluster Services. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If it is. then the ghost disk is deleted. . This operation fails if the disk is currently reserved to another node. Each time it accesses a physical volume at a particular hardware address. It does this by attempting to read the physical volume’s ID (PVID) from the disk. Consequently. then the ghost disk remains. the configuration manager (cfgmgr) accesses all the shared disks (and all other disks and devices). it assumes that it is a different physical volume and assigns it a temporary hdisk name. Whether or not the online of the volume group ultimately succeeds depends on whether or not the LVM is 3-26 HACMP Implementation © Copyright IBM Corp. the configuration manager is not sure if the physical volume is the one it expects or is a different physical volume. Shared storage considerations for high availability 3-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. • Don’t delete ghost disks If ghost disks occur. In volume groups that contain a large number of physical volumes (LUNs). © Copyright IBM Corp. 2008 Unit 3. Ghost disk issues • Time Dealing with ghost disks takes time with the result that a volume group with ghost disks takes longer to varyon than one without. Most LUNs presented from IBM disk technology can be uniquely identified regardless of whether the disk is reserved to another node. For example. they added about twenty seconds per ghost disk to the time required to varyon the volume group. .V4. this can result in a significant delay during fallovers. in one customer cluster where ghost disks were found.0 Instructor Guide Uempty find enough of the volume group’s physical volumes (and other factors such as whether quorum checking is enabled on the volume group). Disk technology differences Note that not all disk technologies result in ghost disks. they must be left in the AIX device configuration because their presence is necessary for the correct operation of the LVM when the volume group is ultimately brought online by Cluster Services. 1998. RSCT based storage protection. 1998.Instructor Guide Instructor notes: Purpose — Explain ghost disks. 3-28 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — Transition statement — Let’s now look at the other protection mechanism. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the node that is running the application has the volume group varied on in © Copyright IBM Corp. Shared storage considerations for high availability 3-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. A volume group being managed by RSCT-based shared storage protection is varied online in the passive state on all cluster nodes that might need access to the volume group’s data. 1998.0 Instructor Guide Uempty RSCT-based shared storage protection Node 1 passive varyon ODM ODM A B active varyon Node 2 ODM active varyon C D passive varyon • Requires Enhanced Concurrent Volume Group Is only used by HACMP Uses gsclvmd • Independent of disk type © Copyright IBM Corporation 2008 Figure 3-11. How HACMP controls RSCT-based Volume Groups HACMP takes advantage of new parameters on the varyonvg and varyoffvg commands related to a pair of new concepts called active and passive volume group varyon states.0 Notes: Introduction HACMP 5. which relies on AIX’s RSCT component to coordinate the ownership of shared storage when using enhanced concurrent volume groups in non-concurrent mode. 2008 Unit 3. RSCT-based shared storage protection AU548.x supports the new style of shared storage protection.V4. The volume group is varied online in the active state by the particular cluster node which needs access to the volume group’s data now (in other words. . 3-30 HACMP Implementation © Copyright IBM Corp. At fallover time. The LVM on each node prohibits updates to the volume group’s data unless the node has the volume group varied on in the active state. Since this mechanism does not rely on any disk reservation mechanism. HACMP skips the extra processing needed to break the disk reserves.2. This ensures that all nodes have an accurate view of the state of the volume group. because the timestamp in the ODM on the takeover node agrees with the timestamp in the VGDA. During fast disk takeover. Updates to the LVM components for an enhanced concurrent mode volume group should only be done through C-SPOC.Instructor Guide the active state). It is the responsibility of the RSCT component to ensure that each volume group is varied online in the active state on not more than one node. this style of disk takeover is called fast disk takeover. LVM change management: keeping the ODM and VGDA in sync Beginning in HACMP 5. . Disk reservation not used Even disks that support a disk reservation mechanism are not reserved when RSCT-based shared storage protection is in effect. when using enhanced concurrent volume groups (RSCT-based shared storage protection). This further improves the speed of fast disk takeover. the disk takeover mechanism used for enhanced concurrent volume groups is faster than disk takeover used for standard volume groups. Fast disk takeover Taking over a volume group using RSCT-based shared storage protection is considerably faster than using reserve/release-based shared storage protection. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note: All nodes in the cluster must be available before making any LVM changes. As a result. which will be discussed later in this unit. or update and synchronize the LVM information by running lazy update. the ODMs on the passive nodes are updated immediately with any VG changes and the new timestamp. Consequently. 1998. This is an issue if you are using the forced varyon feature. which further ensures that the VGDA and ODM are synchronized across all nodes participating in the volume group. lazy update does not run. it is compatible with all disk technologies supported by HACMP. • In HACMP 5. 2008 Unit 3. which allows HACMP to make use of RSCT-based storage protection © Copyright IBM Corp. lazy update is not needed. 1998. • Stress the need to use C-SPOC when updating an enhanced concurrent mode volume group. Details — Point out that using RSCT-based shared storage protection is much faster than reserve/release-based protection: • Breaking the disk reserve is not needed. Shared storage considerations for high availability 3-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.2 and later. Additional information — Transition statement — Let’s now look at the enhanced concurrent volume group feature. .0 Instructor Guide Uempty Instructor notes: Purpose — Introduce RSCT-based shared storage protection.V4. 3-32 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. using enhanced concurrent volume groups can result in significantly shorter fallover and fallback times (depending on the number of physical volumes and volume groups involved). all the nodes will varyon the volume group.0 Notes: Introduction Defining an enhanced concurrent volume group allows the LVM to use RSCT to manage varyonvg and varyoffvg processing.1 Supported for all HACMP-supported disk technologies – Allows for Fast Disk Takeover Supported JFS and JFS2 filesystems – File systems may only be mounted by one node at a time Enhanced concurrent VGs are required to use: – Heartbeat over disk for a non-IP network (Covered in the network unit) – Fast disk takeover – Some virtualized configurations (through a VIO server) Replaced old style classic concurrent volume groups – C-SPOC can be used to convert standard VG to enhanced concurrent VG – C-SPOC can be used to convert classic concurrent VGs to enhanced concurrent VGs • C-SPOC (Cluster Single Point of Control) to be discussed in a later unit © Copyright IBM Corporation 2008 Figure 3-12.Instructor Guide Enhanced concurrent volume groups Introduced in AIX 5L V5. In this case. Concurrent access In a concurrent access environment. 1998. Enhanced concurrent volume groups AU548. Fast disk takeover (enhanced concurrent VGs in a non-concurrent resource group environment) As was described earlier. . while all the other nodes will varyon the VG in passive mode. one node will varyon the volume group in active mode. 2008 Unit 3. .V4. © Copyright IBM Corp. 1998.0 Instructor Guide Uempty Heartbeat over disk Using enhanced concurrent volume groups also provides the capability to do heartbeats over disk to create a non-IP heartbeat network for HACMP (discussed in the next unit). Shared storage considerations for high availability 3-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998. what are active and passive modes all about? 3-34 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — So.Instructor Guide Instructor notes: Purpose — Provide some details on the enhanced concurrent volume groups that came up earlier in the fast disk takeover discussion. 0 Instructor Guide Uempty ECMVG varyon: Active versus passive Active Varyon (lsvg -o) – Behaves like normal varyon (listed with lsvg -o) – Allows all of the usual operations like: – RSCT responsible for ensuring that only one node has VG actively varied on Passive Varyon (lsvg <vg_name>) – Volume group is available in a very limited read-only mode – Only certain operations are allowed – Most operations are prohibited HACMP uses the appropriate varyonvg commands with enhanced concurrent volume groups Protecting VG integrity when using fast disk takeover – Use multiple IP networks and disk heartbeating (discussed in next unit) – Do not make structural changes to VG unless all nodes are online © Copyright IBM Corporation 2008 Figure 3-13.Any operations on filesystems and logical volumes (for example. 1998. Passive varyon Other nodes will varyon the VG in passive mode. 2008 Unit 3. mounts. create. lslv) Most operations are prohibited. modify. lsvg) . . only very limited operations are allowed on the volume group. and so forth) © Copyright IBM Corp. Shared storage considerations for high availability 3-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. open. delete. allowing full access. only one node will varyon the VG in active mode. They are” .V4.0 Notes: Active varyon If using enhanced concurrent volume groups in a non-concurrent access environment. In passive mode. They are: .Reading volume group configuration information (for example.active versus passive AU548. ECMVG varyon .Reading logical volume configuration information (for example. synchronizing the volume group's configuration . To avoid this situation: . Protecting volume group integrity using fast disk takeover When fast disk takeover is used.Instructor Guide . This is what makes fast disk takeover faster than traditional disk-reservation based volume group takeover.Make sure that there are multiple heartbeat paths to prevent a loss of network communication from triggering a fallover when the active node is still running. . 3-36 HACMP Implementation © Copyright IBM Corp. This protects against a partitioned cluster. this situation can result in different copies of the same volume group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This ensures that all nodes will have a common view of the volume group structure. If the cluster becomes partitioned. .Avoid making structural changes to the VG (such as adding or removing a logical volume. and so forth) unless all nodes are online. 1998. changing the size of a logical volume.Any operation that changes the contents or hardware state of the disks Fast disk takeover Switching a volume group from active to passive state (or the reverse) is a very fast operation as it only updates the LVM’s internal state of the volume group in an AIX kernel data structure and does not require any actual disk access operations.Modifying. nodes in each partition could accidentally varyon the volume group in active state. the SCSI disk reservation function is not used. Because active state varyon of the volume group allows mounting of filesystems and changing physical volumes. V4. Additional information — Transition statement — So.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the difference between the active and passive volume group varyon states. 1998. . Reassure them that they can avoid using RSCT-based shared storage protection and that we will discuss how shortly. Details — Students might be uncomfortable with RSCT-based shared storage protection because it does not rely on a tried and true technology such as hardware-based disk reservation. how can we tell if an enhanced concurrent mode volume group is varied on in active or passive mode? © Copyright IBM Corp. 2008 Unit 3. Shared storage considerations for high availability 3-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. ... ECMVG state: Active versus passive AU548.. . .Instructor Guide ECMVG state: Active versus passive On active node: halifax # lsvg VOLUME GROUP: e2eaa2d6d VG STATE: VG PERMISSION: Concurrent: ecmvg ecmvg VG IDENTIFIER: 0009314700004c00000000f active PP SIZE: 8 MB read/write TOTAL PPs: 537 (4296 MB) .. 3-38 HACMP Implementation © Copyright IBM Corp. Concurrent: ecmvg ecmvg VG IDENTIFIER: 0009314700004c00000000f active PP SIZE: 8 MB passive-only TOTAL PPs: 537 (4296 MB) .....0 Notes: Introduction The VG PERMISSION field in the output of lsvg shows if a volume group is varied on in active or passive mode. Enhanced-Capable Auto-Concurrent: Disabled © Copyright IBM Corporation 2008 Figure 3-14. 1998... . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. Enhanced-Capable Auto-Concurrent: Disabled On passive node: toronto # lsvg VOLUME GROUP: e2eaa2d6d VG STATE: VG PERMISSION: . . 1998.V4. 2008 Unit 3. .0 Instructor Guide Uempty Instructor notes: Purpose — Discuss how to tell if a VG is active or passive. Shared storage considerations for high availability 3-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — How about some more details on how the enhanced concurrent mode volume group works? © Copyright IBM Corp. too. it proposes that change to the group via a protocol (communications with the other Group Services daemons). gsclvmd 3-40 HACMP Implementation © Copyright IBM Corp. thus the name we’re using. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Group Services Group Services is a component that allows nodes to participate in groups to control resources of common interest where each node has a vote in how the resource is controlled. RSCT-based.HACMP Internals). 1998. When a node would like to effect a change on a resource. AU600 . – Protocols (node-to-node communications) are run to update VGDA/VGSA. it is very useful to understand the basics. © Copyright IBM Corporation 2008 Figure 3-15. The Group Services component of RSCT is used to control the ownership of the volume group. • Each node belongs to two Group Services groups for each enhanced concurrent mode volume group. .0 Notes: Although the details of the processing of enhanced concurrent mode volume groups are largely beyond the scope of this class. HACMP belongs to two Group Services groups for the control of the cluster related resources amongst all the nodes in the cluster (but that’s another story for another class. The Group Services daemon for HACMP is grpsvcs. How ECMVGs work AU548. – One for VGDA updates (starts with a “d”) – One for VGSA updates (start with an “s”) – All active nodes vote/agree on changes and then update ODM.Instructor Guide How ECMVGs work Enhanced concurrent mode volume groups rely on Group Services. – WARNING: Filesystem changes are not propagated. • gsclvmd is Group Services component involved. For this reason. In the case of ECMVGs. to include all that the code written for that change involves. . If all approve. it should be rather obvious that a missing member will be lost to the changes that have occurred during its absence.0 Instructor Guide Uempty The daemon that controls this group membership is gsclvmd. that is a change to the VGDA / VGSA that results in changes made to each participating node’s ODM.V4. So. or any changes made with missing members are propagated to the missing members very soon after their reactivation. This is where C-SPOC is necessary. great care must be taken to ensure that either. the change is made. Shared storage considerations for high availability 3-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 1998. Warning Filesystem changes are not handled by this process. all changes are made with all members present. It is important to understand that this daemon depends on the Group Services being active and that Group Services is activated when Cluster Services is started. 2008 Unit 3. That should reinforce the point that ECMVGs are to be used with HACMP only! Voting on LVM changes and changing the ODM All members (loosely) have a vote. Instructor Guide Instructor notes: Purpose — Introduce the basic workings of Enhanced concurrent mode volume groups. 1998. . Details — Additional information — Transition statement — But how can I see what’s going on with the processing of ECMVGs? 3-42 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The Group Services daemon provides some information as well As shown in the visual. Look at the gsclvmd daemon Using lssrc -ls gsclvmd you can also see the VGID and associated gslcmvd.0 Instructor Guide Uempty Determining ECMVG and Group Services status How do you know Group Services is controlling your VG? – From the output of ps rt1s1vlp2 # ps -ef | grep $(lsvg appAvg | grep IDENTIFIER | cut -d":" -f3) root 294954 405668 0 14:03:15 . but certainly not limited to the ECMVG groups. one for VGSA changes and one for VGDA changes. look at processes Looking for the VGID of each volume group that is “suspected” to be an ECMVG in the process table is a good start. This would be much more interesting in the case where many ECMVGs were defined.V4.0:00 /usr/sbin/gsclvmd -r 30 -i 300 -t 50 -c 00c0288e00004c0000000116b0b5cf7a -v 0 – From the “long” status of gsclvmd rt1s1vlp2 # lssrc -ls gsclvmd Subsystem Group gsclvmd gsclvmd PID 405668 Status active pid 294954 Match VGID Active VGs # 1 vgid 00c0288e00004c0000000116b0b5cf7a – And always check “long” status of grpsvcs rt1s1vlp2 # lssrc -ls grpsvcs Subsystem Group PID Status grpsvcs grpsvcs 491746 active 3 locally-connected clients. 2008 Unit 3. Determining ECMVG or Group Services status AU548. Shared storage considerations for high availability 3-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. If not. you made a mistake somewhere. you need to know how to see what’s going on. Note the two groups. Start simple. 1998. including. Verify that there is a running gsclvmd daemon for each VGID. Their PIDs: 540702(haemd) 639086(clstrmgr) 294954(gsclvmd) HA Group Services domain information: Domain established by node 3 Number of groups known locally: 5 Number of Number of local Group name providers providers/subscribers s00O0K8S0009G0000012QOBBJRQ 2 1 ha_em_peers 2 1 0 CLRESMGRD_1196797869 2 1 0 CLSTRMGR_1196797869 2 1 0 d00O0K8S0009G0000012QOBBJRQ 2 1 © Copyright IBM Corporation 2008 0 VGSA Group 0 VGDA Group Figure 3-16.0 Notes: Now that you have an idea how EMCVGs work. the lssrc -ls grpsvcs command gives details on this node’s groups. . 1998. Details — Additional information — Transition statement — Let’s take a closer look how manual changes in ownership are accomplished when using the RSCT-based storage protection mechanism. 3-44 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the methods that can be used to get status on the underlying processes that support ECMVGs. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Left node obtains active varyon of httpvg (varyonvg). . 2.x and AIX RSCT. It then sets the active varyon state on the node that is taking over the volume group. © Copyright IBM Corp. RSCT-based fast disk takeover: Manual move AU548. Right node releases active varyon of httpvg (varyoffvg). 3. 2008 Unit 3. 1998.0 Notes: Manual movement of RSCT-based volume groups The fast disk takeover mechanism handles a manual VG takeover by first releasing the active varyon state of the volume group on the node that is giving up the volume group. A decision is made to move httpvg from the right node to the left. Shared storage considerations for high availability 3-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Node 1 ODM passive varyon A httpvg B passive varyon Node 2 ODM active varyon C dbvg D passive varyon Node 1 ODM active varyon httpvg A B passive varyon Node 2 ODM active varyon C dbvg D passive varyon © Copyright IBM Corporation 2008 Figure 3-17.V4.0 Instructor Guide Uempty RSCT-based fast disk takeover: Manual move Node 1 ODM passive varyon A httpvg B active varyon Node 2 ODM active varyon C dbvg D passive varyon 1. The coordination of these operations is managed by HACMP 5. . 3-46 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide Instructor notes: Purpose — Illustrate how a voluntary takeover of a volume group is done using Fast Disk Takeover. Details — Additional information — Transition statement — Now let’s see what happens when the takeover is involuntary. Left node obtains active mode varyon of httpvg. 2008 Unit 3. the takeover node sets the volume group’s varyon state to be active. no communication between the nodes). RSCT-based fast disk takeover: Failure AU548. .x. When the remaining node (or nodes) realize that the node has failed. Node 1 ODM active varyon httpvg A B passive varyon Node 2 ODM active varyon C dbvg D passive varyon 3. There is no need to break disk reservations as no disk reservations are in place. Shared storage considerations for high availability 3-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Left node realizes that right node has failed. © Copyright IBM Corp. Node 1 ODM passive varyon httpvg A B Node 2 ODM active varyon C dbvg D passive varyon Active varyon state and passive varyon state are concepts that do not apply to failed nodes. to use enhanced volume groups only on systems running HACMP 5.V4. Right node fails. If Topology Services fail (that is. © Copyright IBM Corporation 2008 Figure 3-18. This makes it very safe to use. then group services fail and it is not possible to activate the volume group. 2.0 Instructor Guide Uempty RSCT-based fast disk takeover: Failure Node 1 ODM passive varyon httpvg A B active varyon Node 2 ODM active varyon C dbvg D passive varyon 1. It is recommended. 1998. however. The only action required is that the takeover node ask its local LVM to mark the volume group’s varyon state as active.0 Notes: Fast disk takeover in a failure scenario A node has failed. Additional information — Transition statement — So how do you use fast disk takeover? 3-48 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Try to not get too bogged down on the question of how to ensure that a node has actually failed if other nodes believe that it has failed as this topic is revisited in the networking unit and in the problem determination unit.Instructor Guide Instructor notes: Purpose — Illustrate how disk takeover as a result of failure occurs when Fast Disk Takeover is in effect. . Fast disk takeover details AU548. Although this is possible.0 Notes: Considerations As with any technology. . while it is varied on in active mode on another node. it is possible (although it takes some work) to manually varyon an enhanced concurrent volume group to active mode. Ghost disks do not occur when fast disk takeover is enabled.V4. – The gsclvmd subsystem that uses group services provides the protection. 1998. disk must only be supported by HACMP. all disks supported by HACMP can be used in an enhanced concurrent mode volume group. – Based on RSCT. Because RSCT is independent of disk technology. Shared storage considerations for high availability 3-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note: If RSCT is not running. Fast disk takeover is faster than reserve/release-based disk takeover. This small risk can easily be avoided by never varying on your shared volume groups manually. Fast disk takeover is independent of the disk type. – HACMP 5. it is an unlikely occurrence.x is installed on all nodes. © Copyright IBM Corp. Requirements Fast disk takeover is used only if all of the requirements listed previously have been met. • This is RSCT-based storage protection. – The volume group is an enhanced concurrent mode volume group. the implications of using fast disk takeover must be properly understood if the full benefits are to be experienced. • An existing volume group can be converted to enhanced concurrent via C-SPOC.0 Instructor Guide Uempty Fast disk takeover details Fast disk takeover is enabled automatically for a Volume Group if all of the following conditions are true: – The cluster is running AIX 5L on all nodes. it is not recorded anywhere on the shared disks). VG must be taken offline to take effect. – The distinction between active varyon and passive varyon is private to each node (that is. © Copyright IBM Corporation 2008 Figure 3-19. 2008 Unit 3. Details — This is the point where you should deal with concerns about whether fast disk takeover is appropriate in a cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. fast disk takeover). 3-50 HACMP Implementation © Copyright IBM Corp. Additional information — Transition statement — It’s review time. Also point out that Fast Disk Takeover is volume group-specific.Instructor Guide Instructor notes: Purpose — List the requirements for fast disk takeover. Fast Disk Takeover will apply only to those that are enhanced concurrent mode. 1998. . Remind students that some disk technologies do not support disk reservation. meaning that if a Resource Group contains volume groups that are both enhanced concurrent mode and non-enhanced concurrent mode. so the only way to protect shared storage on these disk technologies is to use RSCT-based shared storage protection (that is. Application binaries should only be placed on shared storage. b. Let’s review topic 1 AU548. Dynamic application data should always reside on shared storage. True or False? • © Copyright IBM Corporation 2008 Figure 3-20.V4. True or False? • Using RSCT-based shared disk protection results in slower fallovers. 3. .0 Notes: © Copyright IBM Corp. 1998.0 Instructor Guide Uempty Let’s review: Topic 1 1. Shared storage must always be simultaneously accessible in read-write mode to all cluster nodes. 2008 Unit 3. Which of the following statements is true (select all that apply)? a. Ghost disks must be checked for and eliminated immediately after every cluster fallover or fallback. d. Static application data should always reside on private storage. Shared storage considerations for high availability 3-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2. c. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Which of the following statements is true (select all that apply)? a. . Application binaries should only be placed on shared storage. Details — Let’s review: Topic 1 solutions 1. 3-52 HACMP Implementation © Copyright IBM Corp. 2.Instructor Guide Instructor notes: Purpose — Review topic 1. True or False? • Using RSCT-based shared disk protection results in slower fallovers. Static application data should always reside on private storage. we’ll discuss some of the disk technologies available for shared storage in an HACMP environment. c. Shared storage must always be simultaneously accessible in read-write mode to all cluster nodes. Dynamic application data should always reside on shared storage. Ghost disks must be checked for and eliminated immediately after every cluster fallover or fallback. 3. b. d. True or False? • © Copyright IBM Corporation 2008 Additional information — Transition statement — In the next topic. . 2008 Unit 3. review questions. Students also learn about the importance of consistent PVID to disk name mapping across the cluster.0 Instructor Guide Uempty 3. © Copyright IBM Corp. 1998.V4. How students will do it — The objectives are covered through lecture. What students will learn — The characteristics of the various shared storage technologies in an HACMP cluster are described. How this will help students on their job — This will help them when planning and implementing an HACMP cluster. and pencil and paper and hands-on lab exercises.2 Shared disk technology Instructor topic introduction What students will do — The students will learn the considerations of the various shared disk technologies in an HACMP environment. Shared storage considerations for high availability 3-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Shared disk technology AU548. 1998. you should be able to: Discuss the capabilities of various disk technologies in an HACMP environment Discuss the installation considerations of a selected disk technology when combined with HACMP Explain the issue of PVID consistency within an HACMP cluster © Copyright IBM Corporation 2008 Figure 3-21.Instructor Guide Shared disk technology After completing this topic. .0 Notes: 3-54 HACMP Implementation © Copyright IBM Corp. 1998. © Copyright IBM Corp. 2008 Unit 3. Shared storage considerations for high availability 3-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show what we talk about next.V4. . Details — Additional information — Transition statement — Let’s look at the strategy in considering the relationship between HACMP and storage. nsf/Web/Flashes • Hints.Instructor Guide Shared disk and HACMP strategies Two-pronged approach: – Compatibility of chosen disk subsystem with HACMP • Device drivers • Multi-pathing software • Adapter and disk subsystem microcode • OS patches • Reference the HACMP Installation Guide. and Technotes can be found at http://www-03.ibm. Shared disk and HACMP strategies AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.nsf/Web/Technotes • HACMP Release Notes Shipped with the product Redundancy Your goal is to eliminate single points of failure.ibm. and the hardware vendor (IBM or non-IBM) for the specifics – Elimination of storage single point of failure • Redundancy of data on the disks – RAID 1 or 10 > In AIX > In the disk subsystem – RAID 5 > In the disk subsystem © Copyright IBM Corporation 2008 Figure 3-22. This is referred to as 3-56 HACMP Implementation © Copyright IBM Corp. When considering this for storage.0 Notes: Compatibility • Flashes can be found at http://www-03. IBM Flashes. .com/support/techdocs/atsmastr. Tips. 1998. it involves defining more than one disk drive for every piece of data on the storage subsystem and multiple paths to get to the data from the server.com/support/techdocs/atsmastr. 2008 Unit 3.Support for multi-pathing .Data access performance requirements . in which case you will have to provide the redundancy in AIX. Multiple paths to get to the data from the server is accomplished through multi-pathing software.Price © Copyright IBM Corp. You might choose a JBOD (Just a Bunch Of Disks) storage device. Although not in the scope of this class. . In all likelihood. . The selected storage subsystem will then determine what you will look for in terms of compatibility with the chosen HACMP version and features. you will be choosing a storage subsystem to provide the data redundancy. HACMP does not provide data redundancy. Shared storage considerations for high availability 3-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Capacity . 1998. the selected storage subsystem will be affected by the factors listed as follows (among others).0 Instructor Guide Uempty data redundancy. HACMP is oblivious to the storage device and redundancy method chosen. That software must be checked for compatibility with HACMP.V4. All roads lead to doing your homework through Flashes. Details — Ensure that the students understand that HACMP doesn’t provide data redundancy. IBM and vendor support. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. The selected storage subsystem will probably do that. . Additional information — Transition statement — Now let’s look at virtualized storage on the Power5 and Power6. 3-58 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Set the groundwork for the discussions that will follow regarding storage. HACMP Release notes. The selected storage subsystem will dictate the compatibility requirements and who to contact and where to look to determine the compatibility requirements. Storage Subsystem providing access to disks. Legend Stg Dev . A full discussion of the implementation of this configuration is beyond the scope of the class. requirements.0 Instructor Guide Uempty Virtual storage (VIO) and HACMP FRAME 1 VIOS 1 HBA MPIO HBA hdisk0 no_reserve vhost0 Hypervisor vscsi0 HACMP Node1 VIOS 2 vhost0 MPIO vscsi1 hdisk0 HBA MPIO HBA hdisk0 } sharedvg hdisk0 Stg Dev HBA MPIO HBA hdisk0 no_reserve FRAME 1 VIOS 1 vhost0 Hypervisor vscsi0 HACMP Node2 VIOS 2 vhost0 MPIO vscsi1 hdisk0 HBA MPIO HBA hdisk0 } sharedvg Enhanced concurrent mode volume groups required on HACMP nodes MPIO or other (supported) multi-pathing software on VIO server MPIO on HACMP nodes © Copyright IBM Corporation 2008 Figure 3-23. Consult the IBM Sales Manual and IBM Support (and anyone else you can find who will talk to you about this from an experienced standpoint) for the latest requirements and considerations. 1998. some of the terms to learn. and so on. Shared storage considerations for high availability 3-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Virtual storage (VIO) and HACMP AU548.V4.0 Notes: Overview This type of configuration is becoming prevalent with the adoption of the Virtualization capabilities of the POWER5 and later architecture. EMC. SSA. 2008 Unit 3. HDS. DS4000. © Copyright IBM Corp. like a DS8300. . and a configuration overview. The intent is to indicate that this is a supported configuration. and 5. this is the connection to the SAN.Virtual I/O Server. the special LPAR on a Power5/6 systems that provides virtualized storage (and networking) devices for use by client LPARs HBA . 1998. . giving the VIOS access to storage in the SAN (LUNs).Instructor Guide VIOS .Multipath I/O. built into AIX since V5.2. vscsi0 . This includes HACMP nodes running in LPARs on supported IBM System i5* processors. 2007 IBM* High Availability Cluster Multiprocessing (HACMP*) for AIX 5L*. also known as Fibre Channel Adapter.3 HACMP #IY94307 3-60 HACMP Implementation © Copyright IBM Corp. the minimum requirements for HACMP with Virtual SCSI (VSCSI) and Virtual LAN (VLAN) on POWER5/6 models were: HACMP supports the IBM VIO Server V1.1.3. Hypervisor . Minimum requirements As of the writing of this version of the course.Virtual SCSI (client) adapter on the client LPAR that provides the client access to the VIOS’s Virtual SCSI (server) adapter and therefore access to the virtual SCSI disks.The Power5/6 component that manages access between the vhost and vscsi adapters. Refer to the following table for support details. vhost0 . MPIO . Note: TL = Technology Level ___________________________________________________________________ IBM VIO Server Version 1.4 August 10.4 virtual SCSI and virtual Ethernet devices on all HACMP supported IBM POWER5* and POWER6* servers along with IBM BladeCenter JS21. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.3 ____________________________________________________________ HACMP for AIX 5L V5.4** on AIX V5. creates path devices for each instance of a disk/LUN that is recognized by AIX.Virtual SCSI (server) adapter on the Virtual IO Server that provides the client LPARs with access to virtual SCSI disks.2HACMP #IY97326 AIX TL3 ___________________________________________________________________ HACMP for AIX 5L V5. 5.4 extends support to include IBM Virtual I/O Server (VIO Server) Version 1. Versions 5.Host Bus Adapter. presenting only a single hdisk device from these multiple paths. 4 AIX TL3 AIX TL3 ___________________________________________________________________ **HACMP and Virtual SCSI (vSCSI) © Copyright IBM Corp.2.3** on AIX V5. 1998. HACMP #IY94307 HACMP #IY94307 V5.3 ____________________________________________________________ HACMP for AIX 5L. Note: TL = Technology Level ___________________________________________________________________ IBM VIO Server Version 1.3. Version 5. V5. HACMP #IY87247 HACMP #IY87247 V5.3 AIX TL3 AIX TL3 ___________________________________________________________________ HACMP for AIX 5L. 2008 Unit 3. Shared storage considerations for high availability 3-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4HACMP #IY87247 AIX TL3 ___________________________________________________________________ HACMP supports the IBM VIO Server Versions 1.3 March 8.2 and 1.2 and 1.3 on AIX V5. and V5. HACMP #IY86296 HACMP #IY86296 V5. Refer to the following table for support details. This includes HACMP nodes running in LPARs on supported IBM System* i5 processors. 2007 IBM* High Availability Cluster Multiprocessing (HACMP*) for AIX 5L*.2** Version 1.3 on all HACMP supported IBM POWER5* servers.4 extends support to include IBM Virtual I/O Server (VIO Server) Version 1.0 Instructor Guide Uempty AIX TL3 ___________________________________________________________________ HACMP for AIX 5L V5. .2 AIX TL3 AIX TL3 ___________________________________________________________________ HACMP for AIX 5L.V4. The VIOS support is analogous to EtherChannel in this regard. not from the VIO server. not logical volumes or volume groups. the netmon. Because of the nature of Virtual Ethernet. In particular. Enhanced Concurrent Mode is the recommended mode for sharing volume groups in HACMP clusters because volumes are accessible by multiple HACMP nodes. not from the VIO Server. All volume group construction and maintenance on these shared disks is done from the HACMP nodes. . IPAT via Aliasing is recommended for all HACMP networks that can support it. This means that disks cannot be shared between an LPAR using vSCSI and a node directly accessing those disks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . physical disks (hdisks) are shared. • If any cluster node accesses shared volumes through vSCSI. other mechanisms to detect the failure of network interfaces are not effective. If file systems are used on the standby nodes. • All Virtual Ethernet interfaces defined to HACMP should be treated as “single-adapter networks” as described in the HACMP Planning and Installation Guide. • HACMP's “PCI Hot Plug” facility cannot be used.Instructor Guide • The volume group must be defined as “Enhanced Concurrent Mode.Note that when an HACMP node is using Virtual I/O. PCI Hot Plug operations are available through the VIO Server. **Further configuration-dependent attributes of HACMP with Virtual Ethernet • If the VIO Server has multiple physical interfaces on the same network or if there are two or more HACMP nodes using VIO Servers in the same frame.cf to include a list of clients to ping must be used to monitor and detect failure of the network interfaces. In general. This does not limit the availability of the entire cluster because VIOS itself routes traffic around the failure. If shared volumes are accessed directly (without file systems) in Enhanced Concurrent Mode. HACMP will not be informed of (and hence will not react to) single physical interface failures. they are not mounted until the point of failover so accidental use of data on standby nodes is impossible. 3-62 HACMP Implementation © Copyright IBM Corp. these volumes are accessible from multiple nodes so access must be controlled at a higher layer such as databases. **HACMP and Virtual Ethernet • IPAT via Aliasing must be used. HACMP's “PCI Hot Plug” facility is not meaningful because the I/O adapters are virtual rather than physical. resulting in faster failover in the event of a node failure. 1998.” In general. All volume group construction and maintenance on these shared disks is done from the HACMP nodes. all nodes must do so. IPAT via Replacement and MAC Address Takeover are not supported. • From the point of view of the VIO Server. Other methods (not based the VIO Server) must be used for providing notification of individual adapter failures. 1998. Although some of these might be viewed as configuration restrictions. that failure will isolate the node from the network. 2008 Unit 3. Service can be obtained from the IBM Electronic Fix Distribution site at: http://www-03. this is just another volume group to be managed in a resource group. in addition to the correct software levels as outlined previously is that enhanced concurrent volume groups are used in this configuration.com/support/techdocs/atsmastr. The use of MPIO at the AIX level is also essential to ensuring data availability if access to a VIOS is lost.V4. Otherwise. On Storage device Map LUNs to the two corresponding VIO servers On Hardware Management Console Define Mappings – (vhost & vscsi) On VIO Server 1 Set “no_reserve” attribute $ chdev -dev <hdisk#> -attr reserve_policy=no_reserve Export the LUNs out to each client $ mkvdev –vdev hdisk# -vadapter vhost0 On VIO Server 2 Set “no_reserve” attribute $ chdev -dev <hdisk#> -attr reserve_policy=no_reserve Export the LUNs out to each client $ mkvdev –vdev hdisk# -vadapter vhost0 On Clients © Copyright IBM Corp.ibm. The HACMP consideration. to the cluster manager. Shared storage considerations for high availability 3-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.com/servers/eserver/support/unixservers/aixfixes. many are direct consequences of I/O Virtualization. Ensure that you reactivate any path in MPIO that was lost after it is recovered to avoid total loss of access to data on a subsequent path failure.html All the details on requirements and specifications are in this Flash: http://www-03.ibm.0 Instructor Guide Uempty • If the VIO Server has only a single physical interface on a network.nsf/WebIndex/FLASH10390 Configuration overview Configuration is mostly performed on the VIOS and Hardware Management Console. then a failure of that physical interface will be detected by HACMP. . However. Varyoffvg on Client 1.Instructor Guide . . . System p LPAR and Virtualization II: Implementing Advanced Configurations Redbooks (www.AU620.Configure the MPIO Default PCM to conduct health checks down all paths and recover when a path is restored. .3.SG24-7940-02: Advanced POWER Virtualization on IBM System p5 Servers: Introduction and Configuration . # chdev -l <hdisk#> -a hcheck_interval=20 -a hcheck_mode=nonactive -P .REDP-4027-00: HACMP 5.Define to HACMP as a shared resource in a resource group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. System p LPAR and Virtualization I: Planning and Configuration .clvm.AU780.Chapter 4 .REDP-4194: IBM System p Advanced POWER Virtualization Best Practices 3-64 HACMP Implementation © Copyright IBM Corp. .ibm. Dynamic LPAR and Virtualization • Provides details later in the document on HACMP and Virtualization along with failure scenarios in the VIO infrastructure and performance considerations .com): .enh required). References Courses that address this configuration: . 1998.redbooks.AU730. This requires a reboot to take affect.Create the shared volume group as Enhanced Concurrent VG on first Client (bos.Import VG onto Client 2. HACMP System Administration III: Virtualization and Disaster Recovery . Refer them to the AU620. Additional information — Transition statement — SAN-based storage can be lumped together in terms of the strategy. of virtualized storage technology and how it relates to HACMP.0 Instructor Guide Uempty Instructor notes: Purpose — Provide an overview. © Copyright IBM Corp. . and AU780 classes for more details on this. Shared storage considerations for high availability 3-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Unit 3. Let’s take a look at IBM SAN-based storage. unless you have experience with this configuration. 1998. AU730. Details — Avoid any detailed discussions of the HMC/VIOS configuration steps beyond what is given in the student notes. with some requirement details. wss?rs=540&context=ST52G7&uid=ssg1S 4000065&loc=en_US&cs=utf-8&lang=en 3-66 HACMP Implementation © Copyright IBM Corp. SDD 1. MPIO PCM.3. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. To use C-SPOC with VPATH disks.3. check: http://www-1. © Copyright IBM Corporation 2008 Figure 3-24. – Consult http://www-03.ibm.com/support/techdocs/atsmastr. is required.nsf/Web/Flashes Determine the HACMP compatibility levels for the following items: – – – – HBA device driver AIX patch levels Multi-pathing software (SDD. For levels and maintenance. It is supported with HACMP (with appropriate PTFs).ibm.1. or later. . the multi-pathing software will be the Subsystem Device Driver (SDD).0 Notes: Overview Use the pointers already provided to access IBM Flashes to determine if the IBM hardware that you’ve chosen is supported with HACMP and the HACMP requirements. SDD With most IBM SAN Storage devices. IBM SAN storage and HACMP AU548. Also read the Release Notes provided with the HACMP product for the latest information on requirements. RDAC.Instructor Guide IBM SAN storage and HACMP IBM Storage Subsystems currently supported include: – DS8000 / DS6000 families – DS4000 family – SAN Volume Controller (SVC) IBM Storage Subsystem support with HACMP is announced via Flash. and so on) Device microcode/firmware Contact IBM support.com/support/docview. . of virtualized storage technology and how it relates to HACMP. unless you have experience with this configuration. Shared storage considerations for high availability 3-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. 2008 Unit 3. with some requirement details.0 Instructor Guide Uempty Instructor notes: Purpose — Provide an overview. What are the considerations there? © Copyright IBM Corp. Refer them to the AU730 and AU780 classes for more details on this. Details — Avoid any detailed discussions of the HMC/VIOS configuration steps beyond what is given in the student notes. Additional information — Transition statement — It is just as likely that you’ll be using non-IBM SAN-based storage. Be sure to look into the multi-pathing software version and maintenance: – PowerPath – HDLM – MPIO PCM For EMC planning. IBM does not provide the requirements for HACMP compatibility with non-IBM storage. – Contact the support organization or online reference materials for the vendor of the nonIBM storage.0 Notes: IBM’s statement on non-IBM storage requirements with HACMP This FAQ states the HACMP position with respect to non-IBM storage devices.Instructor Guide Non-IBM SAN storage and HACMP As explained in the Student Notes below. There is a group.pdf © Copyright IBM Corporation 2008 Figure 3-25.com/interoperability/matrices/EMCSupportMatrix. documented in the HACMP Planning and Installation Guide. HACMP also provides a supported interface. Non-IBM SAN storage and HACMP AU548. New additions are announced via Flash. which allows any storage subsystem to be described in terms of a standard set of operations: This allows for the invocation of user-provided methods to 3-68 HACMP Implementation © Copyright IBM Corp. 1998. Also. Current information can be retrieved from the online Sales Manual. cooperative service agreements are in place with certain non-IBM storage vendors. associated with development. that tests non-IBM storage subsystems for attachment to AIX systems and HACMP. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Question: Does HACMP support EMC or Hitachi storage subsystems when connected to pSeries servers? Answer: The storage subsystems supported by HACMP are those documented in the Sales Manual.emc. HACMP supports only those IBM devices that have passed IBM qualification efforts. – Contact IBM support. and for which IBM development and service are prepared to provide support. see their support matrix: – http://www. Then look for the HACMP version that you are installing. patch. © Copyright IBM Corp. indicate your intent to configure the non-IBM storage device with HACMP and request driver. . Determining compatibility When contacting both IBM and non-IBM sources for information. multi-pathing software and microcode requirements.com/interoperability/matrices/EMCSupportMatrix. look for the device driver. here is the path to take to find the HACMP compatibility information.Search for HACMP. it will be addressed the same as any other problem. . and they report a problem. Also read the Release Notes provided with the HACMP product for the latest information on requirements. Shared storage considerations for high availability 3-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Finally.0 Instructor Guide Uempty accommodate device specific behaviors and operations that might not be automatically supported by HACMP. EMC When using the EMC URL listed above to gather EMC information. and AIX patch information for your configuration.V4.You will get many hits. the client will be asked to refer the problem to the hardware manufacturer. If a client has an HACMP cluster containing storage hardware other than that supported by HACMP. If the problem is related to hardware for which no cooperative service agreement is in place. 1998. and experiences with this combination. If the problem is related to that hardware. . IBM Service will address the problem as follows: If the problem is unrelated to that hardware. PowerPath. look in the sections that apply to your storage devices.emc.Navigate to http://www. 2008 Unit 3.pdf . . the problem will be forwarded to the storage vendor. . and the hardware is covered by a cooperative service agreement with the storage vendor. 1998. let’s take a look at SCSI considerations with HACMP. 3-70 HACMP Implementation © Copyright IBM Corp. Additional information — Transition statement — For those who aren’t using SAN storage. Also indicate that the multi-pathing software version might be different when in an HACMP environment than normally used. Details — Indicate that this is a customer responsibility but that contacting IBM support is useful. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Outline strategy to use to determine what is necessary to integrate non-IBM SAN storage devices with HACMP. . there will be a SCSI ID conflict. 1998. SCSI terminators must be external so that the bus is still terminated after a failed system unit has been removed.0 Instructor Guide Uempty SCSI technology and HACMP HACMP-related issues with SCSI disk architecture: SCSI buses require termination at each end. If you are troubleshooting. SCSI cables are not hot pluggable (power must be turned off on all devices attached to the SCSI bus before a SCSI cable connection is made or severed).V4. In HACMP environments the terminators have to be external to ensure that the bus is still terminated properly after a failed system unit has been removed.0 Notes: SCSI termination In HACMP environments. SCSI buses are ID-based. Clusters using shared SCSI disks often experience ghost disks. Four node limit. to a shared SCSI bus. it is a very good practice to avoid using SCSI ID 7 because a node booted in service or diagnostic mode. Shared storage considerations for high availability 3-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. by default. 2008 Unit 3. SCSI technology and HACMP AU548. Avoid using SCSI ID 7 In an HACMP environment. This could result in data corruption. boot the failed node into service or diagnostic mode and the surviving node is using SCSI ID 7. such as CD-ROMs. Maximum 25m Host System SCSI Controller T 5 SCSI 4 Module Disk Drive SCSI 3 Module Disk Drive SCSI 2 Module Disk Drive T SCSI 1 Module Disk Drive Host System SCSI 6 Controller © Copyright IBM Corporation 2008 Figure 3-26. has SCSI controllers set to the default ID of 7. All devices must have a unique ID number. . Devices on a shared bus Do not connect other SCSI devices. © Copyright IBM Corp. Different SCSI bus types have different maximum cable lengths for the buses (maximum is 25 meters for differential SCSI). The possible consequences of disconnecting or connecting a SCSI cable when any device on the bus is powered on can include: . Disconnecting or connecting SCSI cables Many people have disconnected or connected SCSI cables without causing any problems. then assume that it is not hot swappable. proof that the person has been lucky.If the documentation that comes with the SCSI device does not explicitly state that a device is hot swappable. 3-72 HACMP Implementation © Copyright IBM Corp. 1998. Cable length and number of drives You can connect up to 16 devices to a SCSI bus.In general. until the operating system is rebooted) .SCSI cable connection points are never hot swappable. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Data transfer errors that are not seen by the operation but that result in data corruption on the disk drive . and each disk. . The maximum bus length for most SCSI devices provides enough length for most cluster configurations to accommodate the full 16 device connections allowed by the SCSI standard. otherwise.I/O errors that are seen by the operating system and potentially reflected back to the application . plan on using multiple disk subsystem drawers or desk-side units to avoid dependence on a single power supply. is considered a separate device with its own SCSI ID. This is.I/O errors that result in the operating system crashing or refusing to continue to use the SCSI bus in question (typically. As a result. the only hot swappable SCSI devices are certain SCSI disk drive modules.Total failure of devices and controllers on the bus (this failure is usually temporary and can be fixed by replacing a fuse but permanent damage is a real possibility). loss of a single power supply can prevent access to all copies. . .Instructor Guide Power supply redundancy If you mirror your logical volumes across two or more physical disks. at best. The rules are actually quite simple: . Disconnecting or connecting a SCSI cable while any device on the bus is powered on is a dangerous activity. Each SCSI adapter. the disks should not be connected to the same power supply. Hot swapping SCSI devices The hot swappability of SCSI devices is generally poorly understood. 0 Instructor Guide Uempty There are devices that can be inserted into the middle of SCSI buses. which claim to allow the bus to be severed at the point of insertion. Shared storage considerations for high availability 3-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . IBM SCSI storage devices It is most likely you will be using an IBM 2104 Expandable Storage Plus device if you are attaching via SCSI. then you should not use it. 2008 Unit 3. © Copyright IBM Corp. 1998. Unless you can get IBM to specifically state that they support such a device.V4. then ask them whether they are willing to risk their cluster’s data on that basis. The graphic in the visual shows a common connection method for a SCSI device. Details — Mention that the most likely IBM storage device that will be connected using SCSI is the 2104 Expandable Storage Plus. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If students argue that SCSI cables are hot swappable. Additional information — Transition statement — Now let’s look at Physical Volume Identifiers (PVIDs). 3-74 HACMP Implementation © Copyright IBM Corp. .Instructor Guide Instructor notes: Purpose — Explain the key SCSI technology issues related to HACMP. Shared storage considerations for high availability 3-75 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: PVIDs and their use in AIX For AIX to use a disk (LUN). and linked to a logical construct in AIX called an hdisk. hdisks are numbered sequentially as discovered by the configuration manager (cfgmgr). If a disk (LUN) has no PVID. Again.0 Instructor Guide Uempty Physical volume IDs # lspv hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 Node 1 000206238a9e74d7 00020624ef3fafcc 00206983880a1580 00206983880a1ed7 00206983880a31a7 rootvg None None None None A ODM B C D © Copyright IBM Corporation 2008 Figure 3-27. for systems to share access to a volume group. . If the zoning. If a disk (LUN) has a PVID assigned.V4. Each AIX system that is sharing a volume group will need to have access to the same disks (LUNs). all the © Copyright IBM Corp. it requires that the disk (LUN) be assigned a unique physical volume ID (PVID). each system will see the same disks (LUNs). Physical volume IDs AU548. This is stored in the ODM and on the disk (LUN). it will be recognized by AIX when a cfgmgr runs (manually or at system boot) and stored in the ODM. masking. This is either done through zoning and masking in the SAN or via twin-tail cabling for non-SAN implementations. 2008 Unit 3. and cabling is done correctly. 1998. it is assigned when the disk (LUN) is defined to a volume group or manually by a user via the chdev command. neither HACMP nor AIX cares about the hdisk naming. There is no substitute for good documentation and double checking at the time you are working with the disks (LUNs). Using the previous command on each system to determine which systems see which PVIDs and the volume group affinity is the first step to ensuring that all systems that will share a volume group have the necessary disks (LUNs) defined. The example shows that the system sees four disks (LUNs) that have PVIDs assigned. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide disks (LUN) that are in the volume group must be defined to each system with common PVIDs. These two techniques are combined with judicious use of rmdev -d -l and cfgmgr to get the hdisk numbers to be consistent.Ensuring that the shared disk subsystem cabling is organized so that the configuration manager on each side discovers new shared disks in the same order. Do you really want to rely on the names only. . 3-76 HACMP Implementation © Copyright IBM Corp. All PVIDs that are found in common would be the PVIDs (and therefore hdisks) that could be used to create shared volume groups. as stated previously. The next logical step would be to check the other systems for common PVIDs. if the names are different. Disk name inconsistency There is no requirement in AIX or HACMP that the hdisk name for a shared disk be the same on all nodes. no PVIDs are listed. this be a source of confusion for humans and a possible source of errors. 1998. . without consulting PVIDs. Knowing the PVID-to-hdisk relationship on all the cluster nodes is therefore very important when creating a shared volume group. because this could lead to confusion and could be a possible source of errors. but none of them are in a volume group yet. Is it really necessary? No. If C-SPOC finds no common PVIDs across the selected systems for a shared volume group. Creating hdisk name consistency Think about what you are trying to accomplish before you decide to make disk names consistent across all sharing systems. which could lead to down time. when dealing with the disks (LUNs)? No. you will want to consider two techniques: . However. If you decide to create hdisk naming consistency. This is true whether using C-SPOC or not.Defining fake hdisks to occupy hdisk numbers on nodes with fewer disks than other nodes. C-SPOC uses this method to list the PVIDs that can be used to create a cluster-wide shared volume group. . Details — Additional information — Transition statement — HACMP allows you to use OEM disks. 1998.V4. Shared storage considerations for high availability 3-77 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 2008 Unit 3.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the concept of PVIDs and show how hdisk numbers are assigned by the configuration manager. Treat an unknown disk type the same way as a known type. or .lst – /etc/cluster/conraid. while configuring a cluster. to direct HACMP to treat an unknown disk exactly the same way as another disk it supports.Instructor Guide Support for OEM disks HACMP enables you to use either IBM disks or OEM disks. custom methods enable you (or an OEM disk vendor) to either: . .lst – /etc/cluster/lunreset.dat Use custom disk processing methods: – Identifying ghost disks – Determining whether a disk reserve is being held by another node in the cluster – Breaking a disk reserve – Making a disk available for use by another node Enhanced concurrent VGs Additional considerations © Copyright IBM Corporation 2008 Figure 3-28.Tell HACMP that an unknown disk should be treated the same way as a known and supported disk type. – /etc/cluster/disktype. Depending on the type of OEM disk. 1998.Specify the custom methods that provide the low-level disk processing functions supported by HACMP for that particular disk type Treat an unknown disk the same way as a known type HACMP provides mechanisms that will allow you. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The 3-78 HACMP Implementation © Copyright IBM Corp. Support for OEM disks AU548.0 Notes: Introduction HACMP enables you to use either physical storage disks manufactured by IBM or by an Original Equipment Manufacturer (OEM) as part of a highly available infrastructure. 2008 Unit 3.lst This file is referenced by HACMP during disk takeover. and its Vendor ID was “HAL” and its Product ID was “9000”. only SCSI-3 devices support LUN reset.lst This file is referenced by HACMP during disk takeover. This file is intended to be customer modifiable. Normally. You can use this file to tell HACMP that it can process a particular type of disk the same way it processes a disk type that it supports. The file contains a series of lines of the following form: <PdDvLn field of the hdisk><tab><supported disk type> To determine the value of the PdDvLn field for a particular hdisk.) • /etc/cluster/disktype. which contains comments. but supported LUN reset. is provided. Shared storage considerations for high availability 3-79 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. enter the following command: # lsdev -Cc disk -l <hdisk name> -F PdDvLn The known and supported disk types are: Disk Name in HACMP SCSIDISK SSA FCPARRAY ARRAY FSCSI Disk Type SCSI -2 Disk IBM Serial Storage Architecture Fibre Attached Disk Array SCSI Disk Array Fibre Attached SCSI Disk For example. HACMP will use either a target ID reset or a LUN reset for parallel SCSI devices based on whether a SCSI inquiry of the device returns a 2 or a 3. . However. • /etc/cluster/lunreset. a line would be added that read: disk/fcal/HAL9000 FSCSI A sample disktype. to have a disk whose PdDvLn field was “disk/fcal/HAL9000” be treated the same as IBM fibre SCSI disks.0 Instructor Guide Uempty following three files can be edited to perform this configuration. For example.lst file. (There is no SMIT menu to edit these files. 1998. So. some SCSI-2 devices will support an LUN reset. If the device is listed in this file. if the “HAL 9000" disk subsystem returned an ANSI level of '2' to inquiry.V4. then this file should be modified to add a line which was either: HAL or HAL 9000 © Copyright IBM Corp. HACMP will check the Vendor Identification returned by a SCSI inquiry against the lines of this file. then a LUN reset is used. one disk type per line. Additional considerations The previously described files in /etc/cluster are not modified by HACMP after they have been configured and are not removed if the product is uninstalled. HACMP provides finer control. HACMP does not include a sample conraid.lst file.Specify a custom method 3-80 HACMP Implementation © Copyright IBM Corp.Instructor Guide depending on whether vendor or vendor plus product match was desired.3 or greater. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. there should be only one file entry for that disk type. not by the number of disks of the same type. For these cases. • /etc/cluster/conraid. A sample /etc/cluster/lunreset. This ensures that customized modifications are unaffected by the changes in HACMP. HACMP does not automatically propagate these files across nodes in a cluster. The value of the Disk Type field for a particular hdisk is returned by the following command: # lsdev -Cc disk -l <hdisk name> -F type Note: This file only applies to classical concurrent volume groups. is provided. unlike other configuration information. 1998. which contains comments.dat file. By default.Select one of the specific methods to be used for the steps in disk processing . you can either . It is your responsibility to ensure that these files contain the appropriate content on all cluster nodes. If several disks of the same type are attached to a cluster. You can use the HACMP File Collections facility to propagate this information to all cluster nodes. You can use this file to tell HACMP that a particular disk is a RAID disk that can be used in classical concurrent mode. While doing cluster configuration. Note the use of padding of Vendor ID to 8 characters. Finally. . Remember that the entries in these files are classified by disk type. The file contains a list of disk types. which does not support classical concurrent VGs. Use custom disk processing methods Some disks might behave sufficiently differently from those supported by HACMP so that it is not possible to achieve proper results by telling HACMP to process these disks exactly the same way as supported disk types. which does include some comments. the files initially contain comments explaining their format and usage. Thus this file has no effect in AIX V5. The file is referenced by the /usr/sbin/cluster/events/utils/cl_raid_vg script.dat This file is referenced by HACMP during varyon of a concurrent volume group. which you define. Using SMIT. . 2008 Unit 3. More information For detailed information about configuring OEM disks for use with HACMP. for fast disk takeover. you can perform the following functions for OEM disks: .V4.Add Custom Disk Methods . OEM disks and enhanced concurrent volume groups OEM disks can be used in enhanced concurrent volume groups. In this case. 1998.0 Instructor Guide Uempty HACMP supports the following disk processing steps: Identifying ghost disks Determining whether a disk reserve is being held by another node in the cluster Breaking a disk reserve Disk Name in HACMP Making a disk available for use by another node HACMP enables you to specify any of its own methods for each step in disk processing. you must copy this custom disk processing method to each node manually or use the HACMP File Collections facility.Change/Show Custom Disk Methods . This information is not propagated to other nodes. see: SC23-5209-01 HACMP for AIX. Volume Group. you would need to edit the /etc/cluster/disktype. Shared storage considerations for high availability 3-81 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. in non-concurrent access mode. or to use a customized method.Remove Custom Disk Methods Additional considerations for custom methods The custom disk processing method that you add.lst file and associate the OEM disk with a supported disk type. either for concurrent access mode or. Version 5.4. and Filesystems Accommodation © Copyright IBM Corp.1: Installation Guide Appendix B: OEM Disk. change or delete for a particular OEM disk is added only to the local node. but you do not need to lecture through all of that unless students are interested. . Additional information — Can OEM disks be used in Enhanced Concurrent VGs? I never got a firm answer to this during course development. I assume that you need to edit the disktype. I believe that they can. The notes provide details. Details — Provide an overview. 3-82 HACMP Implementation © Copyright IBM Corp.” This seems to imply that OEM disks can be used for enhanced concurrent.. . Appendix D of the Planning and Installation Guide is a little confusing and never explicitly says that OEM disks can be used with enhanced concurrent VGs. it says: “With enhanced concurrent mode: Any disk supported by HACMP for attachment to multiple nodes can be included in an enhanced concurrent mode volume group.. In the Install Guide. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Or is that not even required? All of the disk processing steps listed in Appendix D seem only to apply for reserve/release shared storage protection or classical concurrent VGs.lst file. 1998.Instructor Guide Instructor notes: Purpose — Introduce OEM disk accommodation. Transition statement — Let’s review what we’ve covered in this topic. True or False? 4. and so forth). c. • • SCSI SSA FC All of the above SSA disk subsystems can support RAID5 (cache-enabled) with HACMP.V4.0 Instructor Guide Uempty Let’s review: Topic 2 1. hdisk numbers must map to the same PVIDs across an entire HACMP cluster. Which of the following disk technologies are supported by HACMP? a. Shared storage considerations for high availability 3-83 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. EMC HDS. 1998. 2008 Unit 3. Let’s review topic 2 AU548. True or False? 3. © Copyright IBM Corporation 2008 2. ESS. . True or False? • 5. Compatibility must be checked when using different SSA adapters in the same loop. b. True or False? • Figure 3-29. d. No special considerations are required when using SAN based storage units (DS8000.0 Notes: © Copyright IBM Corp. d. we’ll look at the shared storage facilities provided by AIX. c. Details — Let’s review: Topic 2 solutions 1. and so forth). Which of the following disk technologies are supported by HACMP? a. EMC HDS.Instructor Guide Instructor notes: Purpose — Review topic 2. 3-84 HACMP Implementation © Copyright IBM Corp. True or False? 3. True or False? 4. © Copyright IBM Corporation 2008 2. ESS. No special considerations are required when using SAN based storage units (DS8000. hdisk numbers must map to the same PVIDs across an entire HACMP cluster. 1998. Compatibility must be checked when using different SSA adapters in the same loop. b. . • • SCSI SSA FC All of the above SSA disk subsystems can support RAID5 (cache-enabled) with HACMP. True or False? • 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. True or False? • Additional information — Transition statement — In the next topic. © Copyright IBM Corp.V4. and pencil and paper and hands-on lab exercises. Let’s Review and Checkpoint questions. How students will do it — The objectives are covered through lecture. 2008 Unit 3. Shared storage considerations for high availability 3-85 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.3 Shared storage from the AIX perspective Instructor topic introduction What students will do — The students will learn how LVM is used for shared storage in an HACMP environment. How this will help students on their job — This will help them when planning and implementing an HACMP cluster. 1998.0 Instructor Guide Uempty 3. . What students will learn — How to configure LVM and AIX file systems for maximum availability in an HACMP cluster. 3-86 HACMP Implementation © Copyright IBM Corp.0 Notes: This topic discusses shared storage from the AIX perspective. .Instructor Guide Shared storage from the AIX perspective After completing this topic. Topic 3 objectives: Shared storage from the AIX perspective AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: Discuss how LVM aids cluster availability Describe the quorum issues associated with HACMP Set up LVM for maximum availability Configure a new shared volume group. and jfslog © Copyright IBM Corporation 2008 Figure 3-30. 1998. filesystem. 0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the next topic. .V4. 2008 Unit 3. Shared storage considerations for high availability 3-87 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — Let’s review the AIX Logical Volume Manager. © Copyright IBM Corp. is provided to applications and is independent of the underlying physical disk structure. LVM's capabilities are exploited by HACMP Physical disk volumes are: – Organized into VGs (volume groups) – Identified by a unique physical volume ID (PVID) – Divided into physical partitions which are mapped to logical partitions in logical volumes (LVs) Applications (such as file systems and databases) use logical volumes Physical Partitions PVID Logical Partitions Physical Volumes hdisk0 PVID Logical Volume hdisk1 Volume Group © Copyright IBM Corporation 2008 Figure 3-31. library subroutines. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Logical volumes This logical view of the disk storage.0 Notes: LVM review The set of operating system commands. . and other tools that allow the user to establish and control logical volume storage is called the logical volume manager. 1998. 3-88 HACMP Implementation © Copyright IBM Corp. The LV is made up of logical partitions. LVM controls disk resources by mapping data between a simple and flexible logical view of storage space and the actual physical disks.Instructor Guide Logical Volume Manager review LVM is one of the major enhancements that AIX brings to traditional UNIX disk management. The logical volume manager does this by using a layer of device driver code that runs above the traditional physical device drivers. which is called a logical volume (LV). Logical Volume Manager review AU548. It has a physical volume ID (PVID) associated with it and an AIX name. A logical partition is mapped to one or more physical partitions. . usually /dev/hdiskx (where x is a unique integer on the system). Every physical volume in use belongs to a volume group (VG) unless it is being used as a raw storage device or a readily available spare (often called a hot spare). 2008 Unit 3.V4. Shared storage considerations for high availability 3-89 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Operating system files are stored in the rootvg volume group. © Copyright IBM Corp. Application data are usually stored in one or more additional volume groups. Each physical volume is divided into physical partitions (PPs) of a fixed size for that physical volume.0 Instructor Guide Uempty Physical volumes Each individual disk drive is called a physical volume (PV). Volume groups Physical volumes and their associated logical volumes are grouped into volume group. 1998. Details — Cover the terms mentioned in the visual--namely. 3-90 HACMP Implementation © Copyright IBM Corp. Transition statement — Let’s see how this looks from the file system’s perspective. 1998. PVID. and volume group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If it starts to turn into a “teach new stuff” foil. physical volume. logical volume. Additional information — This is a review.Instructor Guide Instructor notes: Purpose — Illustrate what the Logical Volume Manager does. logical partitions. hdisk. Emphasize physical versus logical. . then the students are not ready for this course and are going to have a tough week. 0 Notes: LVM relationships An application writes to a file system.0 Instructor Guide Uempty LVM relationships LVM manages the components of the disk subsystem. A file system provides the directory structure and is used to map the application data to logical partitions of a logical volume. LVM relationships AU548. the application is isolated from the physical disks. which has its LVs mirrored in a volume group physically residing on separate hdisks. The LVM can be configured to map a logical partition to up to three physical partitions and have each physical partition (copy) reside on a different disk.V4. 1998. Because an LVM exists. This example shows an application writing to a filesystem. 2008 Unit 3. . Shared storage considerations for high availability 3-91 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. LVM Physical Partitions Logical Partitions Volume Group write to /filesystem Mirrored Logical Volume Application © Copyright IBM Corporation 2008 Figure 3-32. Applications talk to the disks through LVM. © Copyright IBM Corp. Additional information — Transition statement — 3-92 HACMP Implementation © Copyright IBM Corp. 1998. You can also discuss having the disks far apart from each other--different adapters and so forth.Instructor Guide Instructor notes: Purpose — Show how a file system write operation is eventually written to both copies of the mirrored logical volume which underlies the file system. Details — As in the student notes. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. it’s important to understand that the LVM data we’ve discussed is kept in both the VGDA of all the disks in the volume group AND in the ODM of the system making changes to the volume group (or creating it). Understand that this is only a consideration when changes are made to the LVM constructs themselves. © Copyright IBM Corp. adding/removing a disk. adding a filesystem/logical volume. for example. and so forth.V4. . How do you keep the ODM up-to-date in every system other than the system that is making a change to the volume group. 1998. creating a volume group involves updating both – ODM / VGDA in sync on system where VGDA created Problem: What about the ODM in other sharing systems? Solution: Create volume group and ensure that it is imported on sharing systems – Manually – Using C-SPOC (we’ll see this later) NOTE: This applies to changes made to the LVM constructs. ODM-LVM relationships AU548. not the data within © Copyright IBM Corporation 2008 Figure 3-33.0 Notes: Before going too far. This creates a rather obvious problem. increasing the size of a logical volume.0 Instructor Guide Uempty ODM-LVM relationships LVM information is kept in two places: – ODM (Object Data Manager) – VGDA (Volume Group Descriptor Area) The ODM is in the system The VGDA is on the disk (LUN) Thus. Shared storage considerations for high availability 3-93 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 3. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain the volume group currency challenge when creating shared volume groups. 3-94 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Because of this. volume group creation is slightly different when making a shared VG for HACMP. . Varyoff VG on Node1 4. 2008 Unit 3. © Copyright IBM Corp. Creating a shared volume group: Manually AU548. Shared storage considerations for high availability 3-95 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Start Cluster Services Note that the slide presents only a high-level view of the commands required to perform these steps. Import VG on Node2 and set VG characteristics correctly 5. Varyoff VG on Node2 6.V4. More details are provided as follows. Ensure consistent PVIDs on all nodes where VG to be defined 2. Create a new VG and its contents 3. .0 Instructor Guide Uempty Creating a shared volume group: Manually Node 1 Disk Node 2 VGDA ODM ODM #1 #2 mkvg unmount chvg varyoffvg mklv (log) logform mklv (data) crfs #5 Start Cluster Services © Copyright IBM Corporation 2008 #4 #3 varyoffvg cfgmgr importvg chvg Figure 3-34. 1998.0 Notes: Introduction The steps to add a shared volume groups are: 1. Varyoff VG from Node1 a. Create the logical volume use smit mklv or C-SPOC e. Create the file system using one of the following options: crfs or smit jfs or C-SPOC using SMIT. Ensure common PVIDs across all nodes that will share volume group As discussed earlier. lspv 1. verify that the necessary PVIDs are seen on both nodes. Create and Initialize the jfslog using: mklv or smit mklv logform <jfslogname> (C-SPOC handles this automatically) d. b. b. but that all the nodes have access to the same disks and have discovered the PVIDs. If not. d. Change the auto vary on flag using: chvg -an <vgname> (C-SPOC does this automatically. Assign a PVID to the disk(s) chdev -a pv=yes -l disk_name where disk_name is hdisk#. select Add a Journaled File System on a previously defined logical volume 2. HACMP has no requirement that hdisk names on all the nodes are consistent. Also. the new volume group created in step 1. varyoffvg <vgname>. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. umount <File_System> any file systems that are part of the VG which was just created. 1998. a. Using the PVIDs. Ensure disks are cabled/zoned/masked so that the disks will be seen by both nodes. correct them. this step is unnecessary if you are using an enhanced concurrent VG) c.Instructor Guide 0. 3-96 HACMP Implementation © Copyright IBM Corp. . Create a new VG on Node1 a. Add the disks to AIX on the secondary node (Node2) cfgmgr e. hdiskpower# or vpath#. Create the shared volume group Use smit mkvg or C-SPOC. Add the shared disk(s) to AIX on the primary node (Node1 in the example): cfgmgr c. remember to pick a unique Major number for the VG and set Create VG Concurrent Capable to yes for Fast Disk Takeover. b. you can use the commands listed here in the notes. 5. 1998. you can skip this step since C-SPOC will do this automatically for you and enhanced concurrent mode volume groups don’t varyon at creation.0 Instructor Guide Uempty 3. C-SPOC Fortunately. Start Cluster Services a. Varyoff the VG on Node2 a. On the second cluster node perform the following commands: importvg -V <major#> -y <vgname> <hdisk#> chvg -an <vgname> If using C-SPOC.V4. © Copyright IBM Corp. . These steps will be done automatically if the cluster is active and C-SPOC is used. Restart Cluster Services. 2008 Unit 3. 4. Otherwise. we are not looking at the easier way until we get to the C-SPOC unit. varyoffvg <vgname> If using C-SPOC or if you created an enhanced concurrent mode volume group. you can skip this step as it will do this automatically for you. Unfortunately. which varies on the VG and mounts the filesystems and you can then resume processing. there is an easier way. Shared storage considerations for high availability 3-97 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Import VG on Node2 and set VG characteristics correctly a. Instructor Guide Instructor notes: Purpose — Show the manual way to create a volume group and have it configured on two nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Details — Additional information — Transition statement — Let’s take a look at how mirroring is used in a high-availability cluster. 3-98 HACMP Implementation © Copyright IBM Corp. 1998. Shared storage considerations for high availability 3-99 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. including concurrent logical volumes. sysdumpdev. . LVM mirroring AU548. paging space.0 Instructor Guide Uempty LVM mirroring As mentioned in an earlier topic.Data can be mirrored on three disks rather than having just two copies of data. © Copyright IBM Corp.V4. .0 Notes: Introduction Reliable storage is essential for a highly available cluster. and so forth. This provides higher availability in the case of multiple failures. HACMP does not provide data redundancy AIX LVM mirroring is a method that can be used to provide data redundancy LVM mirroring has some key advantages over other types of mirroring: – Up to three-way mirroring of all logical volume types.The disks used in the physical volumes could be of mixed attachment types. 1998. but does require more disks for the three copies. Other options are a hardware RAID disk array configured in RAID-5 mode or some other solution which provides sufficient redundancy such as an external storage subsystem like the ESS (DS6000/DS8000). LVM mirroring is one option to achieve this. EMC. and raw logical volumes – Disk type and disk bus independence – Optional parameters for maximizing speed or reliability – Changes to most LVM parameters can be done while the affected components are in use – The splitlvcopy command can be used to perform online backups LVM Volume Group Physical Partitions Logical Partitions write to /filesystem Mirrored Logical Volume Application © Copyright IBM Corporation 2008 Figure 3-35. 2008 Unit 3. LVM mirroring Some of the features of LVM mirroring are: . individual logical volumes are mirrored.Instructor Guide . Scheduling Policy. . .Extra mirrored copies can be created and then split off for backup purposes. unlike when creating RAID 1 or RAID 0+1 arrays. . so allocating disks on different sites requires considerable care and attention. . . . such as Mirror Write Consistency.Instead of entire disks. and Enable Write Verify. so the one with the shortest queue of commands can be used. It also allows for an odd number of disks to be used and provides protection for disk failures when more than one disk is used. after a total power failure on one site. 1998.Mirrored pairs can be on different adapters.Data can be striped across several mirrored disks.The disks can be configured so that mirrored pairs are in separate sites or in different power domains.Read performance is good for short length operations as data can be read from either of two disks. operations can continue using the disks on the other site that still has power. In this case. No information is displayed on the physical location of each disk when mirrored logical volumes are being created. 3-100 HACMP Implementation © Copyright IBM Corp. Write performance requires a write to two disks. This provides somewhat more flexibility in how the mirrors are organized. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .There are parameters. an approach that avoids hot spots caused by excessive activity on a few disks by distributing the I/O operations across all the member disks. which can help maximize performance and reliability. 0 Instructor Guide Uempty Instructor notes: Purpose — Show how the LVM in general and LVM mirroring in particular contribute to the cluster’s availability. Additional information — Transition statement — There are several considerations when creating a mirrored filesystem for an HACMP environment. which might otherwise be required. 2008 Unit 3.V4. Details — LVM mirroring is useful on its own as a way of eliminating individual disks as a single point of failure. 1998. © Copyright IBM Corp. The ability to manage the cluster’s disk subsystem while the cluster is operational contributes to availability (if done correctly as mistakes are still possible) by eliminating outages. . Shared storage considerations for high availability 3-101 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. There is an easier-to-use method provided by an HACMP facility called C-SPOC. The procedure described in the visual permits the creation of shared file systems before performing any HACMP related configuration (an approach favored by some cluster configurators). assign desired mount point mount filesystem. 1 lp. © Copyright IBM Corporation 2008 Figure 3-36. scheduling = sequential. Pay very close attention to that when creating LVM components manually. separate physical volumes= yes. write verify = ?? pick the lv = sharedlv to create the file system on. copies=2 logform -V jfs2log /dev/sharedlvlog type= jfs2. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. separate physical volumes = yes. copies=2. If a JFS Log is not specified when creating a filesystem. 1998.Instructor Guide Steps to create a mirrored filesystem: Manually These are the steps to creating a properly mirrored filesystem for HA environments: Step 1 2 3 4 5 6 7 Description create shared volume group change the autovary on flag create a jfs2log lv "sharedlvlog" initialize the jfslog create a data lv "sharedlv" create a filesystem on a previously created lv verify the log file is in use Options Name the VG something meaningful like shared_vg1 chvg -an shared_vg1 Type=jfs2log.manually AU548. 2pp. that is) with a system generated 3-102 HACMP Implementation © Copyright IBM Corp. size=??. which is discussed later in the course.0 Notes: Introduction This visual describes a procedure for creating a shared volume group and a mirrored file system. automount = no. including JFS Log logical volumes. The C-SPOC method cannot be used until the HACMP cluster’s topology and at least one resource group have been configured. Steps to create a mirrored file system . scheduling = sequential. lsvg -l shared_vg1 should show 1 lv type jfslog. size=1pp. . It is also valuable to notice that unique names are being used for all of the LVM components. one will be created (if one doesn’t exist. V4. © Copyright IBM Corp. Use the smit crjfs2lvstd fastpath to create a JFS file system in the now existing logical volume. 1998. for example. a backup node from attempting to online the volume group at a point in time when it is already online on a primary node. It is also necessary to prevent. Shared storage considerations for high availability 3-103 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. e. Detailed procedure Here are the steps in somewhat more detail: a. .0 Instructor Guide Uempty name. This could conflict with one that already exists on a system that will be sharing this volume group. Use the smit mklv fastpath to create a logical volume for the jfs2log with the parameters indicated in the figure above (make sure that you specify a type of jfs2log or AIX ignores the logical volume and creates a new one. Use the logform -V jfs2log <lvname>command to initialize a logical volume for use as a JFS2 log device. d. Use the smit mklv fastpath again to create a logical volume for the file system with the parameters indicated in the figure above. b. Use the smit mkvg fastpath to create the volume group. c. Notice that if copies were set to 2. f. Make sure that the volume group is created with the Activate volume group AUTOMATICALLY at system restart parameter set to no (or use smit chvg to set it to no). Verify by mounting the file system and using the lsvg command. which has a system generated name. This gives HACMP control over when the volume group is brought online. when you create file system below). then the number for PPs should be twice the number for LPs and that if you specified separate physical volumes then the values for PVs should be 2 (the number of copies). 2008 Unit 3. Instructor Guide Instructor notes: Purpose — Show the manual way to create a volume group and mirrored filesystem. Details — Additional information — Transition statement — Another volume group availability consideration when configuring for high availability is quorum. We now take a look at quorum. 3-104 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Mirroring? Let’s talk quorum checking AIX performs quorum checking on volume groups to ensure that the volume group remains consistent – The quorum rules are intended to ensure that structural changes to the volume group (for example, adding or deleting a logical volume) are consistent across an arbitrary number of varyon-varyoff cycles When mirroring in AIX, quorum checking is an issue because losing access to 50% of the disks in a volume group takes the volume group offline How can you lose access to 50% of the disks? – Logical Volumes are mirrored across two things • The two things can be two disk enclosures or two sites – One of the two things goes away Quorum checking Enabled for volume group (# of VGDAs required) VG status Quorum checking Disabled for volume group (# of VGDAs required) > 1 Running (to stay running) To bring online (varyonvg) >50% VGDAs VGDAs 100% VGDAs >50% VGDAs or Forced Varyon set © Copyright IBM Corporation 2008 Figure 3-37. MIrroring? Let’s talk quorum checking AU548.0 Notes: Introduction If you plan to mirror your data at the AIX level to provide redundancy, you will need to consider AIX quorum checking on a volume group. If you aren’t mirroring your data at the AIX level, quorum isn’t an issue. Quorum Quorum is the check used by the LVM at the volume group level to resolve possible data conflicts and to prevent data corruption. Quorum is a method by which >50% of VGDAs must be available in a volume group before any LVM actions can continue. Note: For a VG with three or more disks, there is one copy of the VGDA on each disk. For a one disk VG, there are two copies of the VGDA. For a 2-disk VG, the first disk has two copies and the second has one copy of the VGDA. The VGDA is identical for all disks in the VG. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-105 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Quorum is especially important in an HA cluster. If LVM can varyon a volume group with half or less of the disks, it might be possible for two nodes to varyon the same VG at the same time, using different subsets of the disks in the VG. This is a very bad situation which we will discuss in the next visual. Normally, LVM verifies quorum when the VG is varied on and continuously while the VG is varied on. Fifty percent of the disks go away This is the reason you worry about quorum. As the visual indicates, the loss of access to 50% of the disks will cause quorum checking to take the volume group offline. This is not good when you consider that you are buying extra hardware to provide greater availability for the end-user. But what does it mean to lose access to 50% of the disks? If you’re mirroring within a site, this will happen if you’re mirroring across disk enclosures. If one enclosure loses power or the adapter that the AIX system is using to access the enclosure goes offline, you have lost access to 50% of the disks. If you’re mirroring cross-site, losing access to 50% of the disks means losing access to the other site’s storage subsystem. This could be a problem with just the storage subsystem at the other site, a problem with the communications to the other site, or the other site is entirely down. In the case where you are dealing within a site, consider disabling quorum. In the case where you are dealing with cross-site LVM mirroring, consider using HACMP to handle the loss of access and ensure you enable the volume group for cross-site mirroring verification (when adding the volume group via C-SPOC), add the disks in the volume group to the list of cross-site mirrored disks (Add Disk/Site Definition for Cross-Site LVM Mirroring, via smitty cl_xslvmm) and set the forced varyon flag in the resource group that contains all cross-site mirrored volume groups. On recovery, if the stale partition synchronization encounters a problem, you may have to use the manual process of synchronizing the mirrors (C-SPOC menu item Synchronize Shared LVM Mirrors). AIX errlog entry for quorum loss If quorum is lost the following is an example of an AIX errlog entry: Id 91F9700D Label LVM_SA_QUORCLOSE Type CL UNKN H Description QUORUM LOST, VOLUME GROUP CLOSING How HACMP reacts to quorum loss HACMP 4.5 and up automatically reacts to a “loss of quorum” (LVM_SA_QUORCLOSE) error associated with a volume group going offline on a cluster node. In response to this error, a non-concurrent resource group goes offline on the node where the error occurred. If the AIX Logical Volume Manager takes a volume group in the resource 3-106 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty group offline due to a loss of quorum for the volume group on the node, HACMP selectively moves the resource group to another node. You can change this default behavior by customizing resource recovery to use a notify method instead of fallover. For more information, see Chapter 3: Configuring HACMP Cluster Topology and Resources (Extended) in the Administration Guide. Note: HACMP launches selective fallover and moves the affected resource group only in the case of the LVM_SA_QUORCLOSE error. This error can occur if you use mirrored volume groups with quorum enabled. However, other types of “volume group failure” errors could occur. HACMP does not react to any other type of volume group errors automatically. In these cases, you still need to configure customized error notification methods, or use AIX Automatic Error Notification methods to react to volume group failures. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-107 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Present the basic quorum rules. Details — How to deal with quorum issues, such as getting a quorum-checking-disabled volume group online with less than 100% of the VGDAs available, is discussed in more detail shortly. Additional information — Transition statement — The first option to look at is eliminating the quorum issues. 3-108 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Elimination of quorum issues You can eliminate the loss of quorum issue (loss of volume group when quorum is enabled and 50% of disks are lost). – Do not mirror in the AIX node – Mirror with quorum disabled Do not mirror in the AIX node – Use external storage subsystem (DS8000/DS6000, EMC, etc) or RAID arrays Mirror with quorum disabled – It may be possible for each side of a 2-node cluster to have different parts of the same volume group vary'd online – Care must be taken in this case to avoid data corruption Overall considerations – Distribute hard disks across more than one bus – Use different power sources © Copyright IBM Corporation 2008 Figure 3-38. Elimination of quorum issues AU548.0 Notes: Introduction Eliminating quorum issues is done either by mirroring with quorum disabled, or by not mirroring at the AIX level. Eliminating quorum problems To enhance the availability of a volume group, think about the following points: - Using more than one disk adapter prevents the loss of access to the disks if a single adapter fails. This can be used with an external disk subsystem to provide multiple path (using multipathing software) to the LUNs, or with mirroring so that different copies of the data are accessed through different adapters. - For higher availability, use two external power sources. - If there are only two disks in the volume group then you lose access to the volume group if the disk with two VGDAs is lost. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-109 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide - If you are mirrored across two disk subsystems, consider a quorum buster disk to prevent loss if quorum if you lose access to one subsystem. This is discussed in the later in the notes. Distribute hard disks across more than one bus Use multipathing software and two Fibre Channel adapters. Use three adapters per node in SCSI. Use two adapters per node, per loop in SSA. Use different power sources Connect each power supply in the storage device to a different power source. Don’t mirror at the AIX level This is the option most configurations use today. The data redundancy is provided in the external storage subsystem. Quorum is not an issue in this case. Disabling quorum: Nonquorum volume groups Quorum checking can be disabled on a per-volume group basis. If quorum checking is disabled, LVM will not varyoff a volume group if quorum is lost while the VG is running. However, in this case, 100% of the VGDAs must be available when the volume group is varied on. Why disable quorum checking? Disabling quorum checking may seem like a good idea from an availability point of view. For example, consider a volume group mirrored across two storage subsystems (for example, in two different buildings across campus). If access to one storage subsystem is lost, only half of the VGDAs are available. With quorum checking enabled, quorum is lost and the VG is varied off. This would seem to defeat the purpose of mirroring. However, there are real risks associated with disabling quorum. We will discuss ways to handle the “quorum problem” in the next few visuals. Risks of disabling quorum checking Disabling quorum checking is an option; however, considerable care must be taken to ensure that a consistent set of VGDAs is used on an ongoing basis. In addition, exceptional care must be taken to ensure that one half of the cluster isn’t running with one half of all the mirrored logical volumes while the other node is running with the other half of all the mirrored logical volumes as this leads to a phenomenon known as data divergence. Sometimes it might be necessary to disable quorum in a cluster. In this case, take care that you do not end up with data divergence. The primary strategy for avoiding data divergence is to avoid partitioned clusters, although careful design of the cluster’s shared storage is also important. 3-110 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss options for eliminating quorum issues. Details — Reiterate that quorum is not an issue if using an external storage subsystem. Then indicate that disabling quorum is the other option for eliminating quorum with the associated risks. Additional information — Transition statement — There is a quorum feature in HACMP 5.x that we should discuss at this point. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-111 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Allow HACMP to handle it: Forced varyon Or you can leave quorum on and allow HACMP to handle it – Involves downtime when a mirror copy is lost (reducing availability) HACMP 5.x provides a per resource group forced varyon: – Each resource group has a flag which can be set to cause HACMP to perform a careful forced varyon of the resource group's VGs – If normal varyonvg fails and this flag is set: • HACMP verifies that at least one complete copy of each logical volume is available • If verification succeeds, HACMP forces the volume group online This is not a complete and perfect solution to quorum issues: – If the cluster is partitioned then the rest of the volume group might still be online on a node in the other partition HACMP 4.5 introduced forced varyon for all shared VGs: – Still available in HACMP 5.x – If the HACMP_MIRROR_VARYON environment variable is set to TRUE, forced varyon is enabled for all shared VGs in the cluster – If set, HACMP_MIRROR_VARYON overrides the per resource group forced varyon flag © Copyright IBM Corporation 2008 Figure 3-39. Allow HACMP to handle it: Forced varyon AU548.0 Notes: Introduction If you decide to mirror at the AIX level and to leave quorum checking on, you will want to have HACMP handle the loss of access to a volume group if half the disks are lost. Be sure you understand what you’re deciding to do, though. If you allow HACMP to handle the loss of access to the volume group, this means that the loss of half the disks (only one of your two copies of the data) will result in the user’s loss of access to the application until it can be taken by another cluster node. You’ve purchased the additional hardware and set up the mirroring precisely to avoid downtime if you lose access to part of the hardware, but this strategy will result in downtime. You make the call (see disabling quorum in the previous visual). varyonvg -f AIX provides the ability to varyon a volume group if a quorum of disks is not available. This is called forced varyon. The varyonvg -f command allows a volume group to be 3-112 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty made active that does not currently have a quorum of available disks. All disks that cannot be brought to an active state will be put in a removed state. At least one disk must be available for use in the volume group. Per resource group forced varyon HACMP 5.x provides a flag in each resource group which allows you to enable forced varyon of the VGs in that resource group, as described in the visual. Forced varyon of all shared volume groups The HACMP_MIRROR_VARYON environment variable, introduced in HACMP 4.5, when set to TRUE, enables the forced varyon mechanism for all shared volume groups in the cluster. In contrast, the HACMP 5.x forced varyon mechanism applies to specific resource group’s volume groups. The HACMP_MIRROR_VARYON variable is still supported by HACMP 5.x and, if set to TRUE, overrides any per-resource group settings for the forced varyon feature. If the HACMP_MIRROR_VARYON variable is used, it should probably be defined by inserting the following line into /etc/environments on each cluster node: HACMP_MIRROR_VARYON=TRUE MISSINGPV_VARYON environment variable An approach commonly used in the past to deal with quorum-related issues involves the use of the MISSINGPV_VARYON environment variable. This AIX provided environment variable, if set to TRUE in /etc/environment, enables the forced varyon of any VGs which are missing disks. Clusters that use the MISSINGPV_VARYON variable should be updated to use either the forced varyon feature in the Resource Group. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-113 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Explain HACMP 5.x’s forced varyon feature and the related HACMP_MIRROR_VARYON environment variable. Details — Additional information — Transition statement — Correct use of the forced varyon feature or the HACMP_MIRROR_VARYON feature requires that you pay attention to a few things. 3-114 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Recommendations for forced varyon Before enabling HACMP's forced varyon feature for a volume group or the HACMP_MIRROR_VARYON variable for the entire cluster, ensure that: – The affected volume groups are mirrored across disk enclosures – The affected volume groups are set to super-strict allocation – There are redundant heartbeat networks between all nodes – Administrative policies are in effect to prevent volume group structural changes when the cluster is running degraded (that is, failed over or with disks missing) © Copyright IBM Corporation 2008 Figure 3-40. Recommendations for forced varyon AU548.0 Notes: Be careful when using forced varyon Failure to follow each and every one of these recommendations could result in either data divergence or inconsistent VGDAs. Either problem can be very difficult if not impossible to resolve in any sort of satisfactory way; so be careful! More information Refer to the HACMP for AIX Administration Guide Version 5.4.1 (Chapter 15) and the HACMP for AIX Planning Guide Version 5.4.1 (Chapter 5) for more information about forced varyon and quorum issues. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-115 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Provide recommendations for using forced varyon or the HACMP_MIRROR_VARYON variable. Details — Additional information — Transition statement — In concluding this unit, let’s take a look at some of the issues that you must consider when configuring LVM components in an HACMP environment. 3-116 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty LVM and HACMP considerations Following these simple guidelines helps keep the configuration easier to administer: – All LVM constructs must have unique names in the cluster. • For example, httplv, httploglv, httpfs and httpvg – Mirror or otherwise provide redundancy for critical logical volumes. • Remember the jfslog • If it is not worth mirroring, then consider deleting it now rather than having to wait to lose the data when the wrong disk fails someday • Even data that is truly temporary is worth mirroring because it avoids an application crash when the wrong disk fails • External disk subsystems (like the DS8000 or EMC Symmetrix) or RAID-5 storage devices are alternative ways to provide redundancy – The VG major device numbers should be the same • Mandatory for clusters exporting NFS filesystems, but it is a good habit for any cluster – Shared data on internal disks is a bad idea – Focus on the elimination of single points of failure © Copyright IBM Corporation 2008 Figure 3-41. LVM and HACMP considerations AU548.0 Notes: Unique names Because your LVM definitions are used on multiple nodes in the cluster, you must make sure that the names created on one node are not in use on another node. The safest way to do this is to use C-SPOC. If creating the LVM components outside C-SPOC, you must explicitly create and name each entity [do not forget to explicitly create, name and format (using logform) the jfslog logical volumes] with a name known to be unique across the nodes in the cluster. Provide data redundancy via an external storage device or mirroring/RAID For availability, use an external storage device that provides data redundancy across multiple disks or mirror (or use hardware RAID) for all your shared logical volumes, including the jfslog logical volume. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-117 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide - If it is worth keeping, then it is worth making redundant. If it is not worth making redundant, then it is not worth keeping and should be deleted. The mirrorvg command provides an easy way to mirror all the logical volumes on a given volume group. This same functionality may also be accomplished manually if you execute the mklvcopy command for each individual logical volume in a volume group. Volume group major numbers If you are using NFS, be sure to use the same major number on all nodes. Even if not using NFS, this is good practice, and makes it easy to begin using NFS with this volume group in the future. Use the lvlstmajor command on each node to determine a free major number common to all nodes. Use external disks for shared data External disks should be used for shared volume groups. If internal disks were configured for shared volume groups and the owning node needed to be powered down for any reason, it would render the shared volume groups unavailable--clearly a bad idea. Eliminate single points of failure The focus of cluster design must always be eliminating single points of failure. 3-118 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Describe special HACMP considerations when creating and configuring LVM components. Details — Additional information — Transition statement — HACMP 5.4.1 includes support for OEM volume groups. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-119 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Support for OEM volume groups OEM volume groups can be used with HACMP HACMP 5.3 and later automatically detects and provides the methods for Veritas volume groups (VxVM) Configuring custom volume group processing methods using SMIT – – – – – – List volume groups of a specified type List physical and logical disks in a volume group Bring a volume group online and offline Determine a volume group status Verify volume groups configuration Provide a location of log files and other debugging information. View using the AIX 5L snap -e command. Limitations and more information © Copyright IBM Corporation 2008 Figure 3-42. Support for OEM volume groups AU548.0 Notes: Introduction You can configure OEM volume groups in AIX and use HACMP as an IBM High Availability solution to manage such volume groups. Note: Different OEMs can use different terminology to refer to similar constructs. For example, the Veritas Volume Manage (VxVM) term Disk Group is analogous to the AIX LVM term Volume Group. We will use the term volume groups to refer to OEM and Veritas volume groups. Veritas volume manager Among other OEM volume groups and filesystems, HACMP 5.3 and later supports volume groups and filesystems created with VxVM in Veritas Foundation Suite v.4.0. To make it easier for you to accommodate Veritas volume groups in the HACMP cluster, the methods for Veritas volume groups support are predefined in HACMP and are used 3-120 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty automatically. After you add Veritas volume groups to HACMP resource groups, you can select the methods for the volume groups from the pick lists in HACMP SMIT menus for OEM volume groups support. Note: Veritas Foundation Suite is also referred to as Veritas Storage Foundation (VSF). Configuring custom volume group processing methods using SMIT When HACMP identifies OEM volume groups of a particular type, it can be configured to provide the volume group processing functions shown in the visual. You can add, change, and remove custom volume groups processing methods for a specific OEM volume group using SMIT. You can select existing custom volume group methods that are supported by HACMP, or you can use your own custom methods. Using SMIT, you can perform the following functions for OEM disks: - Add Custom Volume Group Methods - Change/Show Custom Volume Group Methods - Remove Custom Volume Group Methods Additional considerations The custom volume group processing methods that you specify for a particular OEM volume group is added to the local node only. This information is not propagated to other nodes; you must copy this custom volume group processing method to each node manually. Alternatively, you can use the HACMP File Collections facility to make the disk, volume, and file system methods available on all nodes. Limitations and more information There are some limitations to using OEM volume groups with HACMP. For example, HACMP supports a number of extended functions for LVM volume groups that are not available for OEM volume groups, such as enhanced concurrent mode, active and passive varyon process, heartbeating over disk, selective fallover upon volume group loss and others. In addition, there are several other limitations. For complete details on using OEM volume groups with HACMP, see Appendix B in the HACMP for AIX Installation Guide. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-121 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Discuss HACMP accommodation for OEM volume groups. Details — Additional information — Transition statement — HACMP also provides some support for OEM file systems. 3-122 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Support for OEM file systems OEM file systems can be used with HACMP HACMP 5.3 and later automatically detects and provides the methods for Veritas file systems (VxFS) Configuring custom file systems processing methods using SMIT – – – – – – List file systems of a specified type List volume groups hosting a specified file system type Bring a file system online and offline Determine a file system’s status Verify file system configuration Provide a location of log files and other debugging information. View using the AIX 5L snap -e command. Limitations and more information © Copyright IBM Corporation 2008 Figure 3-43. Support for OEM file systems AU548.0 Notes: Introduction You can configure OEM file systems in AIX and use HACMP as an IBM High Availability solution to manage such file systems. Veritas file systems Among other OEM volume groups and filesystems, HACMP 5.3 and later supports volume groups and filesystems created with VxVM in Veritas Foundation Suite v.4.0. To make it easier for you to accommodate Veritas filesystems in the HACMP cluster, the methods for Veritas filesystems support are predefined in HACMP. After you add Veritas filesystems to HACMP resource groups, you can select the methods for the filesystems from the pick lists in HACMP SMIT menus for OEM filesystems support. Note: Veritas Foundation Suite is also referred to as Veritas Storage Foundation (VSF). © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-123 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Configuring custom volume group processing methods using SMIT When HACMP identifies OEM file systems of a particular type, it can be configured to provide the file system processing functions shown in the visual. You can add, change, and remove custom volume groups processing methods for a specific OEM volume group using SMIT. You can select existing custom file system methods that are supported by HACMP, or you can use your own custom methods. Using SMIT, you can perform the following functions for OEM disks: - Add Custom Filesystem Methods - Change/Show Custom Filesystem Methods - Remove Custom Filesystem Methods Additional considerations The custom file system processing methods that you specify for a particular OEM file system is added to the local node only. This information is not propagated to other nodes; you must copy this custom file system processing method to each node manually. Alternatively, you can use the HACMP File Collections facility to make the disk, volume, and filesystem methods available on all nodes. Limitations and more information There are some limitations to using OEM file systems with HACMP. For complete details on using OEM file systems with HACMP, see Appendix B in the HACMP for AIX Installation Guide. 3-124 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss HACMP accommodation for OEM file systems. Details — Additional information — Transition statement — Let’s review this unit. © Copyright IBM Corp. 1998, 2008 Unit 3. Shared storage considerations for high availability 3-125 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Checkpoint 1.True or False? Lazy update attempts to keep VGDA constructs in sync between cluster nodes (reserve/release-based shared storage protection). 2.Which of the following commands will bring a volume group online? a.getvtg <vgname> b.mountvg <vgname> c.attachvg <vgname> d.varyonvg <vgname> 3.True or False? Quorum should always be disabled on shared volume groups. 4.True or False? Filesystem and logical volume attributes cannot be changed while the cluster is operational. 5.True or False? An enhanced concurrent volume group is required for the heartbeat over disk feature. © Copyright IBM Corporation 2008 Figure 3-44. Checkpoint AU548.0 Notes: 3-126 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Which of the following commands will bring a volume group online? a.True or False? Lazy update attempts to keep VGDA constructs in sync between cluster nodes (reserve/release-based shared storage protection). Shared storage considerations for high availability 3-127 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . © Copyright IBM Corporation 2008 Additional information — Transition statement — Let’s summarize the unit. 4.True or False? Quorum should always be disabled on shared volume groups.attachvg <vgname> d. © Copyright IBM Corp. 5.0 Instructor Guide Uempty Instructor notes: Purpose — Review this unit. Details — Checkpoint solutions 1.mountvg <vgname> c. 2008 Unit 3.V4.varyonvg <vgname> 3.True or False? Filesystem and logical volume attributes cannot be changed while the cluster is operational. 2.getvtg <vgname> b.True or False? An enhanced concurrent volume group is required for the heartbeat over disk feature. Unit summary AU548. no ghost disks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Unit summary Key points from this unit: Access to shared storage must be controlled – Non-concurrent (serial) access • Reserve/release-based protection: Slower and may result in ghost disks • RSCT-based protection (fast disk takeover): Faster.0 Notes: 3-128 HACMP Implementation © Copyright IBM Corp. . 1998. and some risk of partitioned cluster in the event of communication failure • Careful planning is needed for both methods of shared storage protection to prevent fallover due to communication failures – Concurrent access • Access must be managed by the parallel application HACMP supports several disk technologies – Must be well understood to eliminate single points of failure Shared storage should be protected with redundancy – LVM mirroring • LVM configuration options must be understood to ensure availability • LVM quorum checking and forced varyon must be understood to ensure availability – Hardware RAID © Copyright IBM Corporation 2008 Figure 3-45. 0 Instructor Guide Uempty Instructor notes: Purpose — Summarize.V4. © Copyright IBM Corp. . 2008 Unit 3. Shared storage considerations for high availability 3-129 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — On to the next unit. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide 3-130 HACMP Implementation © Copyright IBM Corp. 1998. . Version 5.com/systems/p/library/hacmp_docs.V4. you should be able to: • List and explain the requirements for an application to be supported in an HACMP environment • Describe the HACMP start and stop scripts • Describe the resource group behavior policies supported by HACMP • Enter the configuration information into the Planning Worksheets How you will check your progress Accountability: • Checkpoint questions References SC23-5209-01 HACMP for AIX. Planning for applications and resource groups 4-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. .1 Installation Guide SC23-4864-10 HACMP for AIX. Planning for applications and resource groups Estimated time 00:45 What this unit is about This unit describes the considerations for making an application highly available in an HACMP environment What you should be able to do After completing this unit. Version 5. Version 5. 2008 Unit 4.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.html HACMP manuals © Copyright IBM Corp.ibm.1 Planning Guide http://www-03.4.4.4.0 Instructor Guide Uempty Unit 4. you should be able to: List and explain the requirements for an application to be supported in an HACMP environment Describe the HACMP start and stop scripts Describe the resource group behavior policies supported by HACMP Enter the configuration information into the Planning Worksheets © Copyright IBM Corporation 2008 Figure 4-1. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit objectives AU548.Instructor Guide Unit objectives After completing this unit.0 Notes: 4-2 HACMP Implementation © Copyright IBM Corp. . 1998. 0 Instructor Guide Uempty Instructor notes: Purpose — To tell the students what we will talk about in this unit. 2008 Unit 4. . 1998.V4. Planning for applications and resource groups 4-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. Details — Additional information — Transition statement — So let’s get started with how applications are defined to HACMP. . The first node listed is called the home node. 4-4 HACMP Implementation © Copyright IBM Corp. • Application Server: defines start and stop scripts – Step 2.0 Notes: Two steps to define an application to HACMP To have HACMP manage an application. Create an HACMP resource group. you must do two things: 1. The default priority is the order in the list. ii. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This in turn will require two steps: a.Instructor Guide How to define an application to HACMP The two steps to define an application to HACMP are: – Step 1. Create resource. 1998. Create resource group. Create an HACMP resource called an application server. Defines a list of nodes where the application can run. How to define an application to HACMP AU548. The application server defines a start and a stop script for the application 2. The basic resource group definition: i. Names which policies to use that will control which node the application actually runs on. Resource Group Node 1 Node 2 Node 3 Shared Disk List of Nodes Policies: Where to run Resources Application Server Service Address Volume Group © Copyright IBM Corporation 2008 Figure 4-2. .V4. Service address.0 Instructor Guide Uempty b. 2008 Unit 4. These are the resources that HACMP will move during a fallover. Add resources to the Resource Group. Planning for applications and resource groups 4-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Application server name. i. 1998. and Volume group © Copyright IBM Corp. Transition statement — Now.Instructor Guide Instructor notes: Purpose — Show the connection between applications and HACMP resources and resource groups. . Resource groups will be covered more in depth later in the course. 1998. we look more closely at considerations for applications that you want to make highly available. Details — Additional information — Point out that this is just an introduction to resource groups to show how applications are added. 4-6 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Because the cluster daemons call the start and stop scripts. © Copyright IBM Corp. .0 Instructor Guide Uempty Application considerations Automation – No intervention Dependencies – Using names unique to one node – Other applications Interference – Conflicts with HACMP Robustness – Application can withstand problems Implementation – Other aspects to plan for Monitoring using HACMP – This is critical • Used to be overlooked • Nearly mandatory for – “Unmanaged” resource groups – Non-disruptive Startup/Upgrade © Copyright IBM Corporation 2008 Figure 4-3. upon an HACMP fallover. Additionally. Planning for applications and resource groups 4-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Automation One key requirement for an application to function successfully under HACMP is that the application must be able to start and stop without any manual intervention. 2008 Unit 4.0 Notes: Introduction Many applications can be put under the control of HACMP but there are some considerations that should be taken into account. Application considerations AU548.V4. the recovery process calls the start script to bring the application online on a standby node. there is no option for interaction. 1998. This allows for a fully automated recovery Other requirements for start and stop scripts will be covered on the next visual. 4-8 HACMP Implementation © Copyright IBM Corp. If this is the case with your application. It should also be able to survive the loss of the kernel or processor state. Two areas to look out for are using IPX/SPX Protocol and Manipulating Network Routes. Interference An application can execute properly on both the primary and standby nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Know whether your application uses software that is licensed to a particular CPU ID. We will look at writing scripts and data locations in the following visuals. Application dependencies: Dependencies that in the past you had to worry about but now you may not have to: • One application must be up before another one. Time/date should be synchronized. Cron table is local to a each node. such as /dev/tty0. Software licensing: Software can be licensed to a particular CPU ID. time to restart after failure. Robustness Beyond basic stability. such as time to start. An overview of these is given later in this unit. Using a hostname that is not the same on other nodes. You might be able to avoid this problem by having a copy of the software resident on all cluster nodes. Hard coding. which might not be the same on another node. a conflict with the application or environment might occur that prevents HACMP from functioning successfully. when HACMP is started. Consider characteristics. Implementation There are several aspects of an application to consider as you plan for implementing it under HACMP. However. and time to stop. an application under HACMP should meet other robustness characteristics.Instructor Guide Dependencies Dependencies to be careful of when coding the scripts include: • • • • Referring to a locally attached device. • Using inittab and cron Table: Inittab is processed before HACMP is started. realize that a fallover of the software will not successfully restart. Also consider: • Writing effective scripts. These can now be handled by Runtime Dependency options. • Consider file storage locations. such as successful start after hardware failure and survival of real memory loss. • Applications must both run on the same node. 1998. 2008 Unit 4.0 Instructor Guide Uempty Monitoring using HACMP HACMP provides another runtime option called application monitoring. With monitoring. © Copyright IBM Corp. . These topics are covered in detail in the HACMP Administration II Administration and Problem Determination course. 1998. failure of the application can generate a fallover. Planning for applications and resource groups 4-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This capability is also used to ensure that multiple instances of an application aren’t spawned when returning a resource group to the online state from unmanaged on the same node or when using non-disruptive startup or upgrade. An availability analysis tool is also provided. Consider creating an application monitor.V4. Instructor Guide Instructor notes: Purpose — Discuss application considerations. 1998. Planning and Install Guide. . Transition statement — Let’s take a closer look at writing scripts. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — This is taken from appendix B. 4-10 HACMP Implementation © Copyright IBM Corp. defensive programming can correct any irregular conditions that occur. Planning for applications and resource groups 4-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. Writing start and stop scripts AU548. Are the prerequisite conditions satisfied? These might include access to a file system. 1998. adequate paging space. Remember that cluster manager spawns these scripts off a separate job in the background and carries on processing. Items to check . 2008 Unit 4. IP labels and free file system space. . The start script should exit and run a command to notify system administrators if the requirements are not met.0 Notes: Introduction: Application start scripts should not assume the state of the environment.0 Instructor Guide Uempty Writing start and stop scripts Check these items: – – – – – Environment is what is expected Multiple instances issue Location of scripts Handle errors from previous termination Correct coding Use assists © Copyright IBM Corporation 2008 Figure 4-4.V4. The application start scripts must be able to handle an unknown previous shutdown state.Environment: Verify the environment. and DNS. • Scripts should not kill an HACMP process. . Be careful when using the grep command that only what is to be stopped is killed. . .Correct Coding: • Scripts should start with declaring a shell (that is. There are also “plug-in” filesets that provide help for integrating print server. and Oracle Real Application Server (RAC). #!/bin/usr/ksh).Multiple instances issue: When starting an application with multiple instances. Using assists IBM provides a priced feature for HACMP that provides all the code and monitoring for three applications: WebSphere. DB2. • The stop script should make sure the application is really stopped.Handle any previous state: Was previous termination successful? Is data recovery needed? Always assume the database is in an unknown state since the conditions that occurred to cause the takeover cannot be assumed. This may not be a desired configuration for all environments. Certain database startup commands read a configuration file and start all known databases at the same time. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. In these cases you would not have to write the scripts yourself. 4-12 HACMP Implementation © Copyright IBM Corp. These filesets are part of the base HACMP product. • Scripts should exit with RC=0.Instructor Guide .Location: Scripts must be available and executable on all nodes of the resource group. . DHCP. 1998. only start the instances applicable for each node. the start script. 2008 Unit 4. The start script is started in background by the cluster manager and the return code is not checked. or the application monitor. Planning for applications and resource groups 4-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. we take a closer look at where the data should go. . must handle failed startup scenarios. 1998. © Copyright IBM Corp. This is a great place to point out again the importance of the application monitor.0 Instructor Guide Uempty Instructor notes: Purpose — Focus on script writing considerations Details — Additional information — Point out the assists are not covered in this course. Transition statement — Next. Therefore. For some data. license files. Private storage Private storage must be used for the operating system components. 1998. Putting data on private storage is subject to having different copies but upgrades can be done easier. It can also be used for configuration files. 4-14 HACMP Implementation © Copyright IBM Corp. Where should data go? AU548. For other cases. Putting data on shared storage allows for only one copy but may not be available when needed. .0 Notes: Introduction Deciding where data should go should be thought out well.Instructor Guide Where should data go? Private storage: Operating system components Shared storage: Dynamic data Web server content Application log files Files updated by application It depends: Configuration files Application binaries License files © Copyright IBM Corporation 2008 Figure 4-5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. it depends. and application binaries subject to the trade-offs mentioned in the introduction. the answer is clear. application binaries. Planning for applications and resource groups 4-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. data that is updated by the application and application log files (be sure time is same on the nodes). © Copyright IBM Corp. and license files could go here subject to the trade-offs mentioned in the introduction above. you must learn the license requirements of the application to make a proper determination. .0 Instructor Guide Uempty Shared storage Shared storage must be used for dynamic data. 1998. Again configuration files. If using node locked. 2008 Unit 4. In any case. then you should use private storage. It depends License files deserves a special mention. Web server content.V4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now. 1998. 4-16 HACMP Implementation © Copyright IBM Corp. it’s time to look at how we control on which node the application will run. .Instructor Guide Instructor notes: Purpose — Discuss where the application should go--private or shared storage. . If the resource group hasn’t been started elsewhere. Planning for applications and resource groups 4-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you specify in the resource group definition three policies that control which node a resource group (application) runs on: 1.V4. then the startup policy is examined to further determine if Cluster Services should activate the resource group and start the application. each resource group definition is read to determine if this node is listed. and if so. 2008 Unit 4.0 Notes: Three initial policies In HACMP. 1998.0 Instructor Guide Uempty Resource group policies Three initial policies: – Startup Policy – Fallover Policy – Fallback Policy Additional run-time options – Settling time (Startup) – Delayed Fallback (Fallback) © Copyright IBM Corporation 2008 Figure 4-6. Startup (of Cluster Services) When Cluster Services starts up on a node. whether that resource group has already been started on another node. Resource group policies AU548. 2. Fallover © Copyright IBM Corp. 4-18 HACMP Implementation © Copyright IBM Corp. Runtime options are covered in more detail in the HACMP Administration II Administration and Problem Determination course. Settling time affects one of the Startup policies.Instructor Guide If there is node failure. Fallback If a node earlier in the list of nodes (that is. . then the Fallback policy determines if the resource group should be stopped and started back up on the higher priority node. there are two runtime options that affect these policies: 1. 1998. 3. Delayed fallback timer affects how the Fallback policy works. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2. higher priority) for the resource group is started after a fallover. Additional runtime options In addition to the policies. then the Fallover policy determines which other node should takeover and activate the resource group and start the application there. © Copyright IBM Corp.V4. including the runtime options as required. Details — Additional information — This is only a high-level overview. 1998. let’s take a closer look at each of the policies. we will cover other times when a resource group or application might go to a different node. The next visuals cover each policy in more detail. Point out that in the administration (CSPOC) lecture and in the event lecture. . Planning for applications and resource groups 4-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Also point out that we have only covered control of where a resource group will run using the resource group definition. Transition statement — Now.0 Instructor Guide Uempty Instructor notes: Purpose — Explain what the resource group policies are in general. 2008 Unit 4. . 4-20 HACMP Implementation © Copyright IBM Corp. Startup policy AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Online on home node only When starting Cluster Services on the nodes. only the Cluster Services on the home node (first node listed in the resource group definition) will activate the resource group (and start the application). the first Cluster Services up on a node that is in the list of nodes for the resource group will activate the resource group and start the application. 1998.Instructor Guide Startup policy Online on home node only Online on first available node – Run-time Settling Time may be set Using distribution policy On all available nodes © Copyright IBM Corporation 2008 Figure 4-7. This policy requires the home node to be available. Online on first available node When starting Cluster Services on the nodes. This minimizes the resource group from bouncing. This is equivalent to the “concurrent resource group” behavior in previous releases of HACMP. 1998. If the first node in the resource group’s list of nodes already has another resource group started on it then the next node in the list of nodes is tried. If you set the settling time. 2008 Unit 4. Cluster Services will wait up to the duration of the settling time interval to see if the home node joins the cluster or at the end of the interval choose the highest priority node rather than simply activating the resource group on the first possible node that reintegrates into the cluster. Online on all available nodes Cluster Services on every node will activate the resource group and start the application.V4. . Runtime settling time A Settling Time value can be set for the “Online on first available node” policy.0 Instructor Guide Uempty Online using node distribution policy Similar to the “Online on first available node” except that only one resource group can be active on a given node. ensure that resources in this group can be brought online on multiple nodes simultaneously. Planning for applications and resource groups 4-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If you select this option for the resource group. © Copyright IBM Corp. 3 and later supports only node distribution policy.Instructor Guide Instructor notes: Purpose — Explain the startup policies. 1998. Network distribution policy is no longer supported. Transition statement — Let’s take a closer look at the “On all nodes” policy before going on to the Fallover policy choices. 4-22 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Cluster Services will start the application and make all the resources mentioned available on this node. This policy is also referred to as concurrent mode or access. Resource group restrictions There are restrictions when defining a resource group that will use this policy. when Cluster Services start on the node. The data can not be part of a JFS type logical volume.V4. In this case. 1998. it is up to the application to provide a lock manager to ensure that data isn’t being updated © Copyright IBM Corp. Planning for applications and resource groups 4-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Application runs on all available nodes concurrently If a node belongs to a resource group with this startup policy. Online on all available nodes AU548. You can not include a service address in the resource group definition. it must be defined as a raw logical volume. .0 Instructor Guide Uempty Online on all available nodes Application runs on all available nodes concurrently – No fallover/fallback – just less/more nodes running the application Resource group restrictions: – No JFS or JFS2 filesystems (only raw logical volumes) – No service IP Labels / Addresses (which means no IPAT) – Application must provide own lock manager Potential to provide essentially zero downtime The only Startup Policy that supports multi-node disk heartbeat networks © Copyright IBM Corporation 2008 Figure 4-8. it does not matter if the resource group is already active on another node so the application ends up being started on all nodes where Cluster Services are started. 2008 Unit 4. Finally. Potential to provide essentially zero downtime Because the application is running on multiple nodes. Oracle Real Application Server (RAC) is an application that uses this type of startup policy. 1998. the loss of a node does not result in the loss of the application. . 4-24 HACMP Implementation © Copyright IBM Corp.Instructor Guide simultaneously from multiple nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Planning for applications and resource groups 4-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now. 1998. © Copyright IBM Corp. let’s take a look at the Fallover policy choices that you have when defining the behavior of a resource group. 2008 Unit 4.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Explain all nodes (concurrent) policy for resource groups. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. highest_mem_free (most available memory) 2. . Fallover using dynamic node priority If you select this option for the resource group. a resource group that is online on only one node at a time follows the list in the resource group’s definition to find the next highest priority node currently available. 1998.0 Notes: Fallover to next priority node in the list In the case of fallover.Instructor Guide Fallover policy Fallover to next priority node in the list Fallover using dynamic node priority Bring offline (on error node) © Copyright IBM Corporation 2008 Figure 4-9. lowest_disk_busy (least disk activity) Dynamic node priority is useful in a cluster that has more than two nodes. you can choose one of the following three methods to have HACMP choose the fallover node dynamically: 1. 4-26 HACMP Implementation © Copyright IBM Corp. Fallover policy AU548. highest_idle_cpu (most available processor time) 3. but remains online on other nodes. This option represents the behavior of a concurrent resource group and ensures that if a particular node fails. If you do so. Selecting this option as the fallover preference when the startup preference is not Online On All Available Nodes might allow resources to become unavailable during error conditions. .V4. 2008 Unit 4. © Copyright IBM Corp. HACMP issues an error. the resource group goes offline on that node only. 1998.0 Instructor Guide Uempty Bring offline (on error node only) Select this option to bring a resource group offline on a node during an error condition. Planning for applications and resource groups 4-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Dynamic node priority is not a runtime policy prior to HACMP 5. 1998.4. . 4-28 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Point out that dynamic node priority does not have much use in a 2-node cluster. Transition statement — Next. we look at the Fallback policies. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain the Fallover policy and choices for dynamic node priority. you could set the delayed fallback timer for a specified resource group to the © Copyright IBM Corp. Runtime delayed fallback timer A runtime fallback timer policy can be set to a time in the future when the fallback should happen. If a node in a cluster failed. . Fallback policy AU548.0 Instructor Guide Uempty Fallback policy Fallback to higher priority node in the list – Can use a run time Delayed Fallback Timer preference Never fallback © Copyright IBM Corporation 2008 Figure 4-10. Planning for applications and resource groups 4-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If this node is higher in the list than the node the resource group is currently running on and this policy has been chosen. the resource group is moved and the application is started on this node. 1998. you might want to integrate the node into a cluster during off-peak hours. The following example describes a case when configuring a delayed fallback timer would be beneficial. 2008 Unit 4. HACMP looks to see if there is a resource group with this node in the list and which is currently active on another node. and was later repaired.0 Notes: Fallback to higher priority node When HACMP Cluster Services start on a node. which are both time-consuming and prone to error.V4. Rather than writing a script or a cron job to do the work. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. After starting the node. Runtime policies will be covered in more detail in the HACMP Administration II Administration and Problem Determination course.Instructor Guide appropriate time. 1998. 4-30 HACMP Implementation © Copyright IBM Corp. HACMP automatically starts the resource group fallover at the specified time. Planning for applications and resource groups 4-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 4. Details — Additional information — Describe Fallback policy briefly. 1998. © Copyright IBM Corp. .0 Instructor Guide Uempty Instructor notes: Purpose — Cover the Fallback policy. Transition statement — Now. we will summarize the policies by looking at the valid combinations.V4. other issues can determine the resource groups that a node acquires.0 Notes: Valid combinations HACMP enables you to configure only valid combinations of startup.Instructor Guide Valid combinations of policies © Copyright IBM Corporation 2008 Figure 4-11. . Valid combinations of policies AU548. 1998. We will look at other issues in the administration and event units later in this course. Preferences are not the only factor in determining node In addition to the node preferences described in the previous table. and fallback behaviors for resource groups. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. fallover. 4-32 HACMP Implementation © Copyright IBM Corp. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Review the policies. .V4. 1998. 2008 Unit 4. we look at dependencies that you might have with resource groups. Details — Additional information — Transition statement — Now. Planning for applications and resource groups 4-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Three levels of dependencies are supported. Parent/child can be on different nodes. .Instructor Guide Dependent applications and resource groups Node 1 Parent RG Node 1 Child/Parent RG Node 2 Parent/Child RG Child RG Parent/Child Dependency – One resource group can be the parent of another resource group Location Dependency – A resource group may be on the same node/site or on a different node/site than another resource group Implemented as Run-Time Policy © Copyright IBM Corporation 2008 Figure 4-12. you can specify resource location dependencies: • Online on same node • Online on different nodes • Online on same site 4-34 HACMP Implementation © Copyright IBM Corp.0 Notes: One resource group can be a parent of another resource group In HACMP 5. Parent will be brought offline after child. • • • • Parent will be brought online before child.3 and higher. you can have cluster-wide resource group online and offline dependencies. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. In HACMP 5. Dependent applications/resource groups AU548.2 and higher. . 1998. © Copyright IBM Corp. Planning for applications and resource groups 4-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.0 Instructor Guide Uempty Implemented as runtime policy Runtime policies will be covered in more detail in the HACMP Administration II Administration and Problem Determination course. 2008 Unit 4. 4-36 HACMP Implementation © Copyright IBM Corp. now it’s time to have a checkup. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Transition statement — So.Instructor Guide Instructor notes: Purpose — Details — Additional information — Point out that this is not covered in detail in this course. a. next. Delayed Fallback Timer c. 2008 Unit 4. never all. 1998. b. never home. Which policy is not a Run-time policy? a. Settling b.V4. d. next. Planning for applications and resource groups 4-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. next. True or False Applications are defined to HACMP in a configuration file that lists what binary to use. . Which type of data should not be placed in private data storage? 4. d. 2. error. Dynamic Node Priority Figure 4-13. next. b. e. Checkpoint © Copyright IBM Corporation 2008 AU548. never first. home. c. c. higher Application log data License file Configuration files Application binaries 3. What policies would be the best to use for a 2-node “active-active” cluster using IPAT to minimize both applications running on the same node? a.0 Notes: © Copyright IBM Corp.0 Instructor Guide Uempty Checkpoint 1. higher distribution. . next.distribution.Which policy is not a Run-time policy? a.all.Delayed Fallback Timer c.home. never d.Instructor Guide Instructor notes: Purpose — Checkpoint review. higher 3. True or False Applications are defined to HACMP in a configuration file that lists what binary to use. Details — Checkpoint solutions 1.first.Configuration files d. 2. error. 4-38 HACMP Implementation © Copyright IBM Corp.Which type of data should not be placed in private data storage? a.Application log data b. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. next.home. never b.Application binaries 4. higher c.What policies would be the best to use for a 2-node “activeactive” cluster using IPAT to minimize both applications running on the same node? a. 1998. next. never e.License file c. next.Dynamic Node Priority © Copyright IBM Corporation 2008 Additional information — Transition statement — Let’s summarize what we discussed in this unit.Settling b. 0 Instructor Guide Uempty Unit summary Key points from this unit: To define an application to HACMP. you must: – Create an application server resource (start and stop scripts) – Create a resource group (node list. policies. 1998. resources) Considerations for putting an application under HACMP control – – – – – – – – – – – – Automation Dependencies Interference Robustness Implementation details Monitoring Shared storage requirements Environment Multiple instances Script location Error handling Coding issues Considerations for start and stop scripts: Resource group policies control how HACMP manages the application – Startup policy (with optional Settling timer) – Fallover policy – Fallback policy (with optional Delayed fallback) © Copyright IBM Corporation 2008 Figure 4-14. Planning for applications and resource groups 4-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit summary AU548.V4. . 2008 Unit 4.0 Notes: © Copyright IBM Corp. Instructor Guide Instructor notes: Purpose — Summarize. 4-40 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — Transition statement — We’re done. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. html HACMP manuals © Copyright IBM Corp. Version 5.1 components How you will check your progress Accountability: • Checkpoint • Machine exercise References SC23-5209-01 HACMP for AIX.4.V4.1: Planning Guide SC23-4862-10 HACMP for AIX.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.0 Instructor Guide Uempty Unit 5.4.4. 2008 Unit 5. you should be able to: • • • • State where installation fits in the implementation process Describe how to install HACMP 5.1: Master Glossary http://www-03. Version 5. What you should be able to do After completing this unit.1 for AIX 5L.ibm.4.1 List and explain the purpose of the major HACMP 5.4. HACMP installation Estimated time 01:30 What this unit is about This unit describes the installation process for HACMP 5.4. .1 List the prerequisites for HACMP 5.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX.4.4. 1998. Version 5.com/systems/p/library/hacmp_docs.4. Version 5. Version 5.1: Administration Guide SC23-5177-04 HACMP for AIX. Version 5. HACMP installation 5-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Installation Guide SC23-4864-10 HACMP for AIX.4. 1 components © Copyright IBM Corporation 2008 Figure 5-1.1. you should be able to: State where installation fits in the implementation process Describe how to install HACMP 5.4.1 List and explain the purpose of the major HACMP 5.0 Notes: What this unit covers This unit discusses the installation and the code components of HACMP 5.Instructor Guide Unit objectives After completing this unit.4. 5-2 HACMP Implementation © Copyright IBM Corp. Unit objectives AU548. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.4.4.1 List the prerequisites for HACMP 5. 0 Instructor Guide Uempty Instructor notes: Purpose — To tell the students what we talk about in this unit. 2008 Unit 5.V4. Additional information — Transition statement — So let’s get started with preparing to install HACMP 5.1. you can mention the differences. Details — If you have students that are experienced with HACMP when it was still Classic versus ES. 1998.4. © Copyright IBM Corp. HACMP installation 5-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide 5-4 HACMP Implementation © Copyright IBM Corp. 1998. 0 Instructor Guide Uempty 5.4. 2008 Unit 5.1 Installing the HACMP 5. 1998. HACMP installation 5-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 software © Copyright IBM Corp. .V4. you should be able to: Explain where the installation fits in the implementation process Describe how to install HACMP 5.1 filesets.4.0 Notes: This topic covers the installation of the HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 Discuss the prerequisites for HACMP 5.Instructor Guide Installing the HACMP software After completing this topic.4.4. 5-6 HACMP Implementation © Copyright IBM Corp. 1998. Installing the HACMP software AU548. .1 © Copyright IBM Corporation 2008 Figure 5-2. 1998.V4. 2008 Unit 5.0 Instructor Guide Uempty Instructor notes: Purpose — List the objectives of this topic Details — Additional information — Transition statement — So. HACMP installation 5-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . what are all the steps to successfully install HACMP? © Copyright IBM Corp. Review what you end up with to make sure that it is what you expected. but the principle is that you should plan and follow a methodical process. connect shared disk and network. so we could have included more steps or combined a few steps. Install adapters. Ensure you update to the latest maintenance level. Document your tests and results. Steps for successful implementation AU548. © Copyright IBM Corporation 2008 Figure 5-3. Different opinions Different people have different ideas about the exact order in which a cluster should be configured. Install on all nodes in the cluster (don't forget to install latest fixes). You will need to write your start and stop scripts. until the cluster does everything that it is supposed to do.0 Notes: Steps to building a cluster Here are the steps to building a successful cluster. which includes eventual testing and documentation of the cluster. Set up shared volume groups and filesystems. For example. Ensure you "actually" do this. some people prefer to leave the configuration of the shared storage (step 5 above) until after they’ve synchronized the cluster’s topology (step 7) as 5-8 HACMP Implementation © Copyright IBM Corp. and then add the remaining resources gradually. Get basic resource groups working first. Watch the logs for messages. 1998.Instructor Guide Steps for successful implementation Proper planning is critical to a successful implementation – Special care should be taken when installing HACMP on a system that is in production Step 1 2 3 4 5 6 7 8 9 10 11 12 Step Description Plan Assemble hardware Install AIX Configure networks Configure shared storage Install HACMP Define/discover the cluster topology Configure application servers Configure cluster resources Synchronize the cluster Start Cluster Services Test the cluster Comments Use planning worksheets and documentation. Okay. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Refer to your planning worksheets. It is often best to configure the cluster’s resources iteratively. testing as you go. Requires detailed planning. 4. Version 5.SC23-4867-09HACMP for AIX.0 Instructor Guide Uempty this allows them to take advantage of HACMP’s C-SPOC facility to configure the shared storage.SC23-5177-04HACMP for AIX. Version 5.SC23-4862-10HACMP for AIX. .1: Master Glossary Or get the latest at http://www-03. One other area where different views are common is exactly when to install and configure the application. The other common perspective is that HACMP should be installed and configured prior to installing and configuring the applications as this allows the applications to be installed into the exact context that they will ultimately run in. refer to: .SC23-5209-01HACMP for AIX.com/systems/p/library/hacmp_docs. configured and tested reasonably thoroughly prior to installing and configuring HACMP then most issues which arise during later cluster testing are probably HACMP issues rather than application issues.1: Administration Guide .1: Concepts and Facilities Guide .1: Installation Guide .SC23-4864-10HACMP for AIX.html © Copyright IBM Corp.4. 2008 Unit 5.4. Version 5.4. HACMP installation 5-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Troubleshooting Guide . 1998.4. Version 5. Version 5.V4. For a more comprehensive discussion of the process of planning and implementing a cluster. There is no correct answer to this issue. If the application is installed.SC23-4861-10HACMP for AIX.1: Planning Guide .4.ibm. When to install and configure the applications is just one more point that will have to be resolved during the cluster planning process. Where there is agreement There is general agreement among the experts that the first step in configuring a successful cluster is to plan the cluster carefully. Version 5. Instructor Guide Instructor notes: Purpose — Outline the steps involved in building a cluster. Details — Emphasize the importance of a disciplined approach. Additional information — Transition statement — So. Ad hoc cluster implementation might be more fun but it is unlikely to yield a successful cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. what have we done so far in the course? 5-10 HACMP Implementation © Copyright IBM Corp. . HACMP installation 5-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and application resource groups – Eliminate single points of failure Define and configure the AIX environment – Storage (adapters. node names. Where are we in the implementation? AU548.0 Instructor Guide Uempty Where are we in the implementation? Plan for network. nodes. filesystem – Synchronize. network. filesystem) – Networks (IP interfaces. policies • Resources: Application Server. and application environments for our cluster. HACMP IP and non-IP networks – Resources: • Application Server • Service labels – Resource group: • Identify name. /etc/hosts. storage. LVM volume group. 3.0 Notes: What we have done so far In the units 2. service label. © Copyright IBM Corp. then start Cluster Services © Copyright IBM Corporation 2008 Figure 5-4. and 4 we planned and built the storage. . 1998. 2008 Unit 5. So we are now ready to install the HACMP filesets. VG.V4. non-IP networks. and devices) – Application start and stop scripts Install the HACMP filesets Configure the HACMP environment – Topology • Cluster. and application steps have been done and that these are the AIX activities that you need to build before configuring a cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. network. 5-12 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Review what we have done so far Details — Review what the storage. .. Additional information — Transition statement — Before all else fails. V4.1 SC23-5209-01 – Can be installed from the CD Online Planning Worksheets – Can be installed from the CD Release notes: – On the CD as release_notes – Installed as /usr/es/sbin/cluster/release_notes © Copyright IBM Corporation 2008 Figure 5-5. V5.4.0 Instructor Guide Uempty First steps in planning Study the appropriate HACMP manuals: HACMP for AIX Planning Guide.1 SC23-4861-10 – Contains Planning Worksheets in Appendix A – Can be installed from the CD HACMP for AIX Installation Guide. First steps in planning AU548. 1998. Check out the references at the start of this unit for a complete list.0 Notes: There are other references Other HACMP manuals are available which might prove useful. HACMP installation 5-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 5. . V5. © Copyright IBM Corp.4. 1998. 5-14 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — The manuals can be installed without installing HACMP code.Instructor Guide Instructor notes: Purpose — Remind the students about the HACMP manuals. Additional information — Transition statement — Let’s have a look at how HACMP is packaged on the CD. . 1 with AIX V5.core.<lang>. 2008 Unit 5.msg.<lang>.cfs cluster.2.2.hostrm.es cluster.plugins cluster.0 Instructor Guide Uempty What is on the CD? release_notes Directories – AIX52.5.core.cspoc cluster.basic.worksheets cluster.4.5.3 versions listed as follows) – pubs • in pdf only – Installp/ppc.4.2.core.2 and 6.5.haview cluster.5.4.bff rsct.basic.4.opt.1 directories contain the required rsct filesets for implementing HACMP V5. The AIX 5.5.4.license cluster.bff rsct.2.2.1.1.2.1.es.4. The pubs directory contains the PDF and HTML versions of the HACMP documentation at the time the CD was created.5.2.hativoli cluster.2 and 6.5.data © Copyright IBM Corporation 2008 cluster.4.gui.nfs cluster.es cluster.hativoli cluster.es. © Copyright IBM Corp.utils.man.adt.2.4. HACMP installation 5-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.2.rte. respectively.2. 1998.msg.4.errm.es.msg.es cluster.core.0 Figure 5-6.storagerm.doc.en_US.en_US.sec. What is on the CD? Notes: Files on the CD This visual shows the files that are on the CD.es cluster.es.bff rsct.bff rsct.2.bff rsct.1. AIX61 • RSCT filesets for these AIX versions (AIX 5. They will be expanded to show the table of contents when using SMIT to do the install.hacmp.5.5. usr/sys/inst.core.<lang>.1.core.es.V4.bff rsct.2.msg.es.2.images cluster.bff rsct.bff rsct.2.rmc.4.<lang>.cspoc cluster.haview rsct.bff AU548. . Instructor Guide Instructor notes: Purpose — Describe the contents of the CD. .toc file via smit? 5-16 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — So. what are the filesets that are seen from the . 0 ES Plugins .en_US.4.es. The same filesets should be installed on all nodes or Verify will give warnings every time it executes.4.es contains both client and server components.es + 5.4. .4.0 + 5.nfs + 5.1.4.0 + 5. English Your requirements will determine what you install © Copyright IBM Corporation 2008 Figure 5-7. © Copyright IBM Corp. English + 5.1.4.1.1.4.0 Notes: Fileset considerations Listed are some of the filesets that you see when doing smit install_all in HACMP 5.Print Server + 5. You should install the documentation filesets on at least one non-cluster node (ensuring that the HACMP PDF-based documentation is available even if none of the cluster nodes will boot could prove really useful someday).0 HACMP Tivoli Server cluster.0 HAES PDF Documentation .0 + 5.4.4.0 HACMP CSPOC Messages .0 ES Plugins .4.worksheets + 5.1. Install the HACMP filesets AU548.4.0 ES Client LIBCL Samples + 5.0 ES Client Include Files + 5.0 ES Cluster File System Support cluster.1.1.S.en_US.es.0 ES Man Pages .1.1.4.0 + 5.cspoc as well.0 HACMP Electronic License cluster.0 + 5.0 ES CSPOC Runtime Commands + 5.0 ES Web Based Monitor Demo cluster.1.1 filesets: cluster.0 + 5.es.0 + 5.4.S. HACMP installation 5-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4.U. 1998.es + 5.1.1.4.Name Server + 5.1.4.1.en_US.4. Notice that cluster.0 ES Client Clstat Samples + 5.adt.es + 5.man.1.es + 5.1.es.4.es. When you install cluster.4.4.4.1.S.0 HACMP Tivoli Client + 5.plugins + 5.4.U.1.hativoli + 5.0 Online Planning Worksheets cluster. 2008 Unit 5.1.1.es.1.dhcp cluster.0 ES Client CLINFO Samples + 5.0 Instructor Guide Uempty Install the HACMP filesets Here are some of the HACMP 5.server you will get cluster.1.4.doc.1.4. Using smit install_latest will not show the msg filesets so you should use install_all and select the filesets.0 + 5.0 ES NFS Support cluster.4.1.0 ES Plugins .1.1.msg.1.1.4.1.4.V4.4.0 HAES Web-based HTML Documentation U.4.0 ES CSPOC dsh cluster.S.1.0 + 5.4.4. English cluster.1.0 ES CSPOC Commands + 5.U.license + 5.4. English cluster.4.1. You can install either or both depending on what the system’s HACMP function will be.1.0 + 5.es.4.cfs + 5.0 ES Base Server Runtime ES Client Libraries ES Client Runtime ES Client Utilities ES Cluster Simulator ES Cluster Test Tool ES Server Diags ES Server Events ES Server Utilities ES Two-Node Configuration Assistant Web based Smit cluster.cspoc + 5.cspoc + 5. HAView is never installed on the cluster node. The cluster.Instructor Guide Notice that some of the filesets require other products such as Tivoli or NetView.es.cspoc cluster. The cluster.es cluster. The Web-based smit is not to be confused with WebSM.doc. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.license cluster. which was formerly required for Enhanced Concurrent Mode volume group support and concurrent mode resource group support.adt.es.es cluster. Web-based smit is Web application that allows you to see the HACMP smit configuration screens and to see status. has been removed.en_US.en_US. You should not install these filesets unless you have these products.en_US.es cluster. You might not need the plug-ins. 1998.es cluster.en_US.cspoc cluster. Example of basic install (will be used in the lab) cluster. . The license key for concurrent mode resource groups is no longer required.es 5-18 HACMP Implementation © Copyright IBM Corp.msg. it is installed on the NetView server.msg. The function required for Enhanced Concurrent Mode volume groups and concurrent mode resource groups have been built into the HACMP base code.cfs fileset can only be used if GPFS is installed.man.es.clvm fileset. © Copyright IBM Corp.V4. 1998. 2008 Unit 5.0 Instructor Guide Uempty Instructor notes: Purpose — Outline the various different HACMP software packages. Details — Talk the students through each software package and briefly explain its role in the HACMP software. Additional information — Transition statement — Remember the prerequisites. HACMP installation 5-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 3.1 (APAR IY84920) • Make sure RSCT filesets are at base level 2.enh (at required TL version) – CSPOC with vpath • SDD 1.3 TL4 – AIX 6.9. The point is that these are the components that must be considered when preparing your environment for HACMP. As time goes by. look at the following for the latest prerequisites: . 5-20 HACMP Implementation © Copyright IBM Corp.3.2 Technology Level (TL) 8 – AIX 5L V5.9.Instructor Guide Remember the prerequisites Minimum levels of AIX: – AIX 5L V5.4.basic.1 Announcement Letter • Go to the HACMP web site http://www-03.The HACMP for AIX 5L.rte. these will almost certainly be superseded by later levels.5.hacmp • rsct.4 or higher – AIX 6.5. . Version 5.clvm. see student notes for details Other prerequisites – Enhanced concurrent mode: • bos.2: RSCT version 2.0. 1998.compat.4.HACMP for AIX 5L Installation Guide.1 Minimum levels of RSCT: – AIX 5L V5.0 • If HACMP node through VIOS 1.0 or later. Version 5.3: RSCT version 2.client.1: RSCT version 2.compat. see student notes for details © Copyright IBM Corporation 2008 Figure 5-8. Version 5.1 CDs .0 or higher • rsct.2 (APAR IY84921) • Make sure RSCT filesets are at base level 2.1.1.ibm.5. Before you try to install. Don’t forget the prerequisites AU548.lvm (at required TL version) • bos.1 .3. 2.4.1.3 or later – SDDPCM • V2.4.0 – AIX 5L V5.4.com/systems/p/advantages/ha/ and click the Announcement Letter link under the heading Learn more on the right side. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Release notes / README on the HACMP for AIX 5L.0 Notes: Installation suggestions Listed above are the minimum prerequisites.4.hacmp Otherwise optional AIX filesets.5.5. the three filesets that are on the CD are required in addition to the base RSCT filesets with the AIX 6. Ensure that when you install these on a system that has been updated to a Technology Level / Service Pack (TL / SP).tcp. 1998. Place the RSCT filesets that are on the CD in the same directory as the prerequisites listed on the slide.server bos. the following is needed for AIX 6.rte.data 5. The URL for checking on the latest patches is: http://www14.3.adt.0.adt.syscalls 5. Indicate that you intend to install the latest HACMP PTF (fix pack or whatever it may be called a the time) and ask if it’s known to be stable. 2008 Unit 5.50 bos.libc bos.2.2.ibm.85 bos.libcur bos.1 only: bos. Depending on the timing of your installation.odm bos. 2007 © Copyright IBM Corp.rte.net.net. it might be advisable to either stay one maintenance level behind on AIX and HACMP or both. or it might be wise to wait for an imminent maintenance level for AIX and HACMP or both.V4.1. HACMP installation 5-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.syscalls bos.0 In addition.0 For the RSCT prerequisites on AIX 6.client bos.net.rte. The base levels needed for AIX 5.1. it is generally wisest to start with the latest available AIX and HACMP patches.adt.data bos. HACMP and SVC details (as of June 18.adt.0.rte.1 are: bos. .com/webapp/set2/sas/f/hacmp/home.html Finally.rte.3/6.1 June 18.libm 5.1 installation.7.lib bos.nfs. it’s always a good idea to call IBM support and ask if there are any known issues with the versions of AIX and HACMP that you plan to install/upgrade.SRC Those listed in bold are the ones that needed to be added to a base AIX image. Other AIX filesets bos.server 5.0 Instructor Guide Uempty Because you are unlikely to want to upgrade a new cluster anytime soon. 2007) HACMP and SVC 4.tcp.adt.libm bos.software.libpthreads bos.0. that you update these newly installed HACMP prerequisites to the same TL / SP. 2.1. • Resource Groups to be managed by HACMP cannot contain volume groups with both Metro Mirror-protected and non-Metro Mirror-protected disks.3 TL5 IY95080 IY91487 IY95080 IY98751++ AIX 5. refer to Storage Multipath Subsystem Device Driver User's Guide. 1998.1.4 HACMP APARs: IY87247.3 and V5.1. For information on configuring this feature.ibm. . ++ These APARs will not be made generally available for AIX 5.3 TL5 CSP or higher <no APARs required IY98751++ + These APARs are not yet generally available.ibm. V5. IZ00051+ HACMP 5.com/support/docview.com/support/docview.2 TL9 AIX 5.2 TL 9 and AIX 5.6. IZ00050+ Table 2: Multipathing Drivers APARs for AIX Multi-pathing AIX 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 5-22 HACMP Implementation © Copyright IBM Corp.4 updates support for the IBM System Storage SAN Volume Controller (SVC) Storage Software V4. the configuration of Fast I/O Failure on Fibre Channel devices is highly recommended.Instructor Guide IBM High Availability Cluster Multiprocessing (HACMP*) for AIX 5L*.1 IY95174 IY98568++ <not supported> IY98751++ AIX 5.wss?uid=isg1IY75323) HACMP requires the buffer size be set to at least 1 MB and the log size to 10 MB.3 HACMP APARs: IY94307. page 143 at: http://www-1. Contact IBM Support to obtain fix packages for these APARs. for HACMP's support of Metro Mirror they must match the node names used in the defined HACMP sites.3 TL4 Driver CSP SDDPCM v2. Contact IBM Support to obtain Ifix packages for these APARs. • Although SVC Host Name Aliases are arbitrary. Please refer to the following information for support details. Although it is not required for correct operation with SDDPCM.wss?uid=ssg1S7000303&aid=1 Additional requirement: The AIX OS error daemon parameters should be tuned to avoid lost log entries (See documentation APAR IY75323 at http://www-1.3 TL 5. Note: TL = Technology Level Table 1: HACMP APARs HACMP 5.0 SDD v1. Use the following command: “errdemon -B 1048576 -s 10485760" Restriction notes for Metro Mirror: • An HACMP/XD and SVC Metro Mirror configuration with VIO is not currently supported. V4.3 does not support moving resource groups across sites.4 TL7 RSCT 2. • SDDPCM requires the configuration of Enhanced Concurrent Mode volume groups.0 © Copyright IBM Corp. 2008 Unit 5. HACMP installation 5-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. for an HACMP node name Node_A with two WWPNs of WWNN_1 and WWNN_2. .4.5.4. • Ensure that your SVC is properly configured to support SDD/SDDPCM host multipathing. The latest version as of the writing of this course is VIOS 1. or later. General VIOS/p6 details HACMP can be used with versions of the VIO server dating back to VIOS 1.2. The shared disks must be defined as being in an Enhanced Concurrent Mode (ECM) volume group.5.0.5.5. Other notes: • SDD supports both Shared or Enhanced Concurrent Mode volume groups. • HACMP V5. run svctask mkhost -name Node_A -hbawwpn WWNN_1 WWNN_2 General SDDPCM details HACMP V5 supports use of SDDPCM V2. configured to access the shared disks with the “no-reserve” reserve policy. Persistent reserve policy is not supported in an HACMP environment.3 w/ TL7 APAR IZ07791 HACMP 5. Check the IBM Techdocs Web site for published Flashes indicating support for the version of VIOS that you intend to implement.4. Following are these requirements.0 Instructor Guide Uempty • HACMP does not support Global Mirror functions of SVC Copy Services. Table 3: VIOS 1.4 AIX 6.0. This involves adding all the WWWNs for a host's WWPN's into a single Host object on the SVC.1 w/ APAR IZ02602 RSCT 2. The same requirements exist for HACMP implementation on the Power 6 p520 and p550 systems as for VIOS 1.5.3 HACMP 5.0.1 SP2 RSCT 2.1. For example. 1998.5 Requirements AIX 5.5. refer to the HACMP/XD for Metro Mirror: Planning and Administration Guide.0 SP2 RSCT 2.1. • For specific HACMP C-SPOC restrictions. 5-24 HACMP Implementation © Copyright IBM Corp. 1998. and the HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — List the prerequisites for HACMP 5.4.1. Additional information — Transition statement — There are a few final things to check before you start to configure your cluster. Details — Point out that ensuring that the cluster is built using the latest software probably defers the date when the cluster must be upgraded because it is running software that is about to go off maintenance.4.1 announcement letter to get the latest. . Refer the students to the Installation Guide. the release notes/README that comes with the HACMP CDs. The documentation is delivered as either pdf only for HACMP 5.0 Instructor Guide I Uempty Some final things to check Code installation – Correct filesets and prerequisites have been installed – Documentation is installed and accessible Network setup – – – – /etc/hosts file is configured on all nodes correctly Name resolution works IP and non-IP networks are configured Subnets configured correctly • The subnet mask identical. Some final things to check AU548. previous versions provided an html version too. • All interfaces on different subnets – Routing configured correctly – Test connectivity Shared storage configured correctly You have a written plan describing configuration and testing procedures! © Copyright IBM Corporation 2008 Figure 5-9. © Copyright IBM Corp. Code installation Correct filesets includes making sure that the same HACMP filesets are installed on each node. Documentation can be installed before installing HACMP. and then verify it just before embarking on the actual HACMP configuration of the cluster.0 Notes: Description of checklist This is a checklist of items that you should verify before starting to configure an HACMP cluster. HACMP installation 5-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. It would probably be wise to develop your own checklist during the cluster planning process. . 2008 Unit 5. It is not a complete list because each situation is different.V4.1.4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Shared storage Check to see that the disks are configured and recognized the same (if possible) and can be accessed from all nodes that will share it. 1998. . To do this you can use the host command. You should ensure that a route exists to all logical networks from all nodes. Finally.Instructor Guide Network setup The /etc/hosts file should have entries for all IP labels and all nodes. you should test connectivity by pinging all nodes from all nodes on all interfaces. The file should be the same on all nodes. Name resolution should be tested on all labels and nodes. 5-26 HACMP Implementation © Copyright IBM Corp. You should test address to name and name to address and verify that they are the same on all nodes. Feel free to add your own items to the list based on your experience with HACMP. Details — This foil provides a checklist of items that must be verified before configuring a cluster. Transition Statement — What about the HACMP client machine? How do we install that? © Copyright IBM Corp. HACMP installation 5-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Provide a checklist of things to be verified before configuring a cluster. . 2008 Unit 5. Additional information — Make it clear that these checks are not optional as missing one or more of these points could lead to a failed cluster either during configuration or later during production.V4. 1998. compat. Also make sure clinfo is setup (clhosts file) to be able to find the cluster node(s).en_US.compat. rsct.data rsct. Install HACMP client machine AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.man.libm bos. It can be used to monitor the cluster nodes as well as to test connectivity to an application during fallover or to be a machine that is used to access a highly available application Installing and setting up the client machine Make sure the network is setup so that the client machine can access the cluster nodes. A clhosts file is generated for clients when you synchronize the cluster.hacmp.0 Notes: Client machine properties A client machine is a node running AIX and only the client filesets from HACMP.clients.es – cluster.es – cluster. .client Test connectivity © Copyright IBM Corporation 2008 Figure 5-10.syscalls bos.hacmp Install HACMP client filesets: – cluster.en_US.adt.es Configure /usr/es/sbin/cluster/clhosts – Can copy /usr/es/sbin/cluster/etc/clhosts. The name of the file is /usr/es/sbin/cluster/etc/clhosts.Instructor Guide Install HACMP client machine Set up network – Configure network interface to reach cluster server node • Same subnet as service address – /etc/hosts file updated everywhere Install prerequisites: – – – – bos.adt. 1998.client 5-28 HACMP Implementation © Copyright IBM Corp.msg.basic.adt. If the client machine is on the same LAN then choose an address that is in the same subnet as the service address of the application that you want to monitor.es • ES Client Libraries • ES Client Runtime • ES Client Utilities – cluster. . © Copyright IBM Corp. 2008 Unit 5. HACMP installation 5-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Describe what you can do with a client machine and what to install on a client machine. Details — Additional information — Transition statement — Time for a review and lab 5. 1998. True or False? You should take careful notes while you install and configure HACMP so that you know what to test when you are done. What is the first step in implementing a cluster? a. e. 3.x.4.1 is compatible with any version of AIX V5. b.Instructor Guide Let’s review 1.0 Notes: 5-30 HACMP Implementation © Copyright IBM Corp. c. True or False? HACMP 5. Order the hardware Plan the cluster Install AIX and HACMP Install the applications Take a long nap 2. Let’s review AU548. © Copyright IBM Corporation 2008 Figure 5-11. True or False? Each cluster node must be rebooted after the HACMP software is installed. 4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. d. . e (how can you possibly order the hardware if you do not yet know what you are going to build?) or even just a. © Copyright IBM Corporation 2008 Additional information — Transition statement — Now let’s take a look at what was installed. True or False? HACMP 5. d. What is the first step in implementing a cluster? a. Install AIX and HACMP d. b.x. 3. *There is some dispute about whether the correct answer is b or e although a disconcerting number of clusters are implemented in the order a. Plan the cluster c.1 is compatible with any version of AIX V5. 2008 Unit 5.0 Instructor Guide Uempty Instructor notes: Purpose — Review Details — Let’s review solutions 1. © Copyright IBM Corp. d (cluster implementers who skip step b rarely have time for long naps). . 1998.4. 4. c. True or False? Each cluster node must be rebooted after the HACMP software is installed. c.V4. True or False? You should take careful notes while you install and configure HACMP so that you know what to test when you are done. Install the applications e. Take a long nap 2. HACMP installation 5-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Order the hardware b. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide 5-32 HACMP Implementation © Copyright IBM Corp. 1998. 1998.0 Instructor Guide Uempty 5. 2008 Unit 5. .V4.2 What was installed? © Copyright IBM Corp. HACMP installation 5-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 components © Copyright IBM Corporation 2008 Figure 5-12.Instructor Guide What was installed After completing this topic.0 Notes: 5-34 HACMP Implementation © Copyright IBM Corp. you should be able to: Describe the purpose of the major HACMP 5. What was installed AU548. . 1998. 1998. Transition statement — Let’s look at how HACMP fits into the AIX environment. HACMP installation 5-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Details — Additional information — Should begin this topic after lab 5 has been completed. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — List the objectives of this topic. 2008 Unit 5. . RMC Layer Provides monitoring. Any application or service that the cluster node is making highly available is considered to be running at the application layer (in a sense.(Optionally) monitoring the applications and initiating “recovery” procedures when they fail 5-36 HACMP Implementation © Copyright IBM Corp. including: .Instructor Guide The layered look Here are the layers of software on an HACMP 5. this includes rather low-level AIX facilities. snmpd) LVM Layer Manages disk space at the logical level TCP/IP Layer Manages communication at the logical level © Copyright IBM Corporation 2008 Figure 5-13. This layer is responsible for providing a number of services to the application layer.4. .Tracking the state of the cluster in cooperation with the other cluster nodes . event management and coordination of subsystems for HACMP clusters AIX Layer Provides operating system services (SRC.0 Notes: The application layer The top most layer of the “software stack” is the application layer. such as NFS. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The HACMP layer The next layer is the HACMP layer. when the cluster is acting as a highly available NFS server). 1998.1 cluster node: Application Layer Contains the highly available applications that use HACMP services HACMP Layer Provides highly available services to applications RSCT.Initiating fallovers and other “recovery” actions as required . The layered look AU548. in many respects.V4. are “blissfully unaware” of the existence of the HACMP layer or even the RSCT layer. 2008 Unit 5. Finally. these are “cornerstone” facilities from the perspective of HACMP. 1998. HACMP installation 5-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the HACMP layer must be aware of the overall status of the cluster including the state of the topology (which nodes. The AIX and below layers The AIX layer represents all of the operating system services provided by AIX to programs running on the operating system. The AIX layer takes advantage of all sorts of facilities provided by the AIX kernel. as a rule. and.0 Instructor Guide Uempty . RSCT itself is distributed with AIX. The same can be said in many respects about the LVM and the TCP/IP components: All of the layers above them tend to interact heavily although usually not quite as directly with the LVM and TCP/IP components.Doing “whatever else it takes” to make the applications highly available Note that because most applications are not really aware of how they are started and stopped or if they are being monitored and “recovered” or even if they are being made highly available. if configured. To make the applications highly available and to know when to start and stop. networks and network interfaces are in working order) and the resources (which resources are being made available where). more apparent than real. As should be clear from the rather heavy emphasis given to storage and networking in this course so far. The HACMP layer relies upon the RSCT layer to provide a number of key services including topology status information and a reliable messaging service. of course. the programs at the HACMP layer and the programs at the RSCT layer. monitor and “recover” the applications. The RSCT layer The RSCT layer includes daemons responsible for monitoring the state of the cluster’s topology. © Copyright IBM Corp. the programs at the application layer. recognizing when the state of the cluster changes (for example. the applications running within the application layer. These programs include. and directly with the AIX layer regardless of whether there are layers between them and the AIX layer. and keeping RSCT-aware clients informed as to the state of the cluster (HACMP is itself an RSCT-aware client). All of the layers above the AIX layer tend to interact heavily. coordinating the response to these events. The two that are highlighted in the diagram are the Logical Volume Manager or “storage management facility” and the TCP/IP or “IP networking facility”. please keep in mind that the “layers” in the software stack illustrated above are. a node crashes). . 1998. which is used by HACMP and that HACMP provides “high availability services” to applications. The fact that RSCT. HACMP and the applications are themselves “above” AIX should be fairly obvious as should be the fact that the LVM and TCP/IP components are below the RSCT.Instructor Guide Instructor notes: Purpose — Illustrate the major “layers” of software on an HACMP cluster node. HACMP and applications layers. Additional information — Transition statement — Let’s now take a more detailed look at the components and features of HACMP. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Do not get too bogged down in details. Do not get bogged down on the question of whether the LVM and TCP/IP components are themselves a layer or are part of the AIX layer. such as cluster topology monitoring. 5-38 HACMP Implementation © Copyright IBM Corp. The key point here is that RSCT provides services. . HACMP components and features AU548. and Resource Monitoring and Control (RSCT and RMC) – snmpd monitoring programs – Cluster Information Program – Highly Available NFS Server – Shared External Disk Access © Copyright IBM Corporation 2008 Figure 5-14. . 1998.0 Instructor Guide Uempty HACMP components and features The HACMP software has the following components: – Cluster Manager – Cluster Secure Communication Subsystem – Reliable Scalable Cluster Technology. HACMP installation 5-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Unit 5.0 Notes: HACMP components HACMP consists of the following components: • A cluster manager (recovery driver and resource manager) • RSCT • SNMP related facilities • The Cluster Information Program • A highly available NFS server • Shared external disk access • Cluster Secure Communication Subsystem © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Show the major components of HACMP. Details — Additional information — Transition statement — Let’s take a look at the Cluster Manager. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 5-40 HACMP Implementation © Copyright IBM Corp. . 3 and later. This is the job of the RSCT component to monitor for certain failures. it is necessary to know when they occur. to respond to unexpected events. In HACMP 5.V4. the heart of the HACMP product. 1998. For example. From this responsibility flows most of the features and facilities of HACMP. Cluster manager AU548. HACMP installation 5-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the clstrmgrES subsystem is always running. Its primary responsibility is to respond to unplanned events. 2008 Unit 5.0 Notes: The cluster manager’s role The cluster manager is.0 Instructor Guide I Uempty Cluster Manager Is a subsystem/daemon that runs on each cluster node Is primarily responsible for responding to unplanned events: – Recover from software and hardware failures – Respond to user-initiated events: • Request to online/offline a node • Request to move/online/offline a resource group • And so forth Is a client to RSCT Provides snmp retrievable status information Is implemented by the subsystem clstrmgrES Started in /etc/inittab and “always” running © Copyright IBM Corporation 2008 Figure 5-15. . © Copyright IBM Corp. in essence. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Transition statement — Let’s take a look at the communications component that is new in HACMP 5 systems. 5-42 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Might want to draw/show overview picture that was in unit 1. 1998. .Instructor Guide Instructor notes: Purpose — Introduce the cluster manager. V4.rhosts files by providing the ability to send all cluster communication through a Virtual Private Network (VPN) using persistent labels. In addition. the need for these /.rhosts files was a source of concern for many customers. . It provides connection level security for all HACMP-related communication. Although unlikely to be necessary in most clusters. this capability will allow HACMP to operate securely in “hostile” environments. – VPNs are configured within AIX.2 and later.0 Instructor Guide Uempty Cluster secure communication subsystem It provides communication infrastructure for HACMP HACMP provides two authentication security options: – Connection Authentication • Standard – Uses /usr/es/sbin/cluster/rhosts file and HACMP ODM files • Kerberos (SP only) – Kerberos used with PSSP. • Virtual Private Networks (VPN) using persistent labels.rhosts files or a Kerberos configuration on each cluster node. you can use Message-level authentication and Message Encryption or both in HACMP 5.1 and later systems. You can have HACMP generate and distribute keys. – HACMP is then configured to use VPNs – Message Authentication and/or Message Encryption • HACMP provides methods for key distribution It is implemented using the clcomdES subsystem © Copyright IBM Corporation 2008 Figure 5-16. Although only necessary when the configuration of the cluster was being changed. This facility goes beyond eliminating the need for /. © Copyright IBM Corp. 1998.0 Notes: Introduction to the cluster communication subsystem The cluster secure communication subsystem is part of HACMP 5. eliminating the need for either /. 2008 Unit 5. Cluster secure communication subsystem AU548. HACMP installation 5-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you have options to make clcomd quite secure. it is recommended that you leave it running. which rely on clcomd services. Only supported for HACMP generated requests Finally. 5-44 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. For these commands the administrator must configure their own remote command method. 1998.You might be tempted to stop the clcomd subsystem (especially at the encouragement of the audit/security group) but remember that the facility makes more than just verification and synchronization possible. It is quite likely that the benefit of these services outweighs the potential security risk.Instructor Guide Leaving clcomd running As you can see. With the advent of the File Collections and Automatic Verification functions in HACMP. this subsystem is not supported for use by user commands outside of the cluster manager and CSPOC. V4. 2008 Unit 5. 1998. Details — Additional information — VPN and Message Authentication are not covered in this course. HACMP installation 5-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . © Copyright IBM Corp. Transition statement — Let’s take a closer look at the cluster communication daemon.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce the cluster secure communication subsystem. rhosts Caches coherent copies of other nodes' ODMs Establishes long-term socket connections on TCP port 6191 Implements the principle of least privilege: – Nodes no longer require root access to each other Starts out of the /etc/inittab Is managed by the SRC – startsrc. the verification and synchronization 5-46 HACMP Implementation © Copyright IBM Corp. stopsrc. This daemon replaces a number of ad hoc communication mechanisms with a single facility thus funneling all cluster communication through one point. Cluster communication daemon (clcomd) AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Cluster communication daemon (clcomd) Provides secure node-to-node communications without use of /. Efficient node-to-node communications and data gathering The clcomd’s approach to supporting the verification and synchronization of cluster configuration changes has an important additional benefit. By eliminating numerous rsh calls across the cluster during the verification and synchronization operation and replacing them with a purpose-built facility. in turn. This funneling. refresh © Copyright IBM Corporation 2008 Figure 5-17. makes it feasible to then use a VPN to actually send the traffic between nodes and to be sure that all the traffic is going through the VPN. 1998.0 Notes: clcomd basics The most obvious part of the cluster secure communication facility is the cluster communication daemon (clcomd). Other aspects of clcomd’s implementation which further improve performance include: . which are a natural result of using rsh and other similar mechanisms © Copyright IBM Corp. 1998.Maintaining long-term socket connections between nodes avoids the necessity to constantly create and destroy the short term sessions. These processes might still take a matter of minutes to complete as comparison processing and resource manipulation may be occurring. HACMP installation 5-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Caching coherent copies of each nodes’ ODMs. .V4.0 Instructor Guide Uempty processes are very efficient. which reduces the amount of information which must be transmitted across the cluster during a verification operation . 2008 Unit 5. 5-48 HACMP Implementation © Copyright IBM Corp. Let’s have a look at it. Details — Additional information — Transition statement — To eliminate the need for /. clcomd must provide an alternative authentication mechanism. .Instructor Guide Instructor notes: Purpose — Introduce the cluster communication daemon (clcomd). 1998.rhosts files. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 5.0 Notes: How clcomd authentication works If the source node of the communication is not in the HACMPadapter and HACMPnode ODM files on the target node. If a cluster node is being © Copyright IBM Corp. The action taken to a request depends on the state of the /usr/es/sbin/cluster/etc/rhosts file as shown in the visual. the target clcomd daemon only uses the information about the source node from these ODM files to conduct the authentication. each non-callback session is checked by connecting back to the source IP address and verifying who the sender is. clcomd standard connection authentication AU548. then check HACMP Adapter ODM file for authentication Authentication is done as follows: • Connect back and ask for the hostname • Connection is considered authentic if the hostname matches. If the source node is in the HACMPadapter and HACMPnode ODM files on the target node. otherwise connection is rejected First-time pass at initial configuration time. ensure that the rhosts file exists. HACMP installation 5-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the target clcomd daemon authenticates the in-bound session by checking the session’s source IP address against a list of addresses in /usr/es/sbin/cluster/etc/rhosts and the addresses configured into the cluster itself (in other words. To defeat any attempt at IP-spoofing (a very timing-dependent technique which involves faking a session’s source IP address).0 Instructor Guide Uempty clcomd standard connection authentication Look for source IP address in: – Special rhosts file: /usr/es/sbin/cluster/etc/rhosts – HACMP adapter ODM Take the following actions: – Block communication if the special rhosts file is missing – Assume new cluster if the special rhosts file is empty – Else.V4. in the previously mentioned ODM files). . rhosts file must exist and be empty – More secure installations should populate the rhosts file with only current cluster node IP addresses © Copyright IBM Corporation 2008 Figure 5-18. 1998. you can edit this file just after the installation if it is felt that this window will be a problem. To further reduce this window. it is the deciding factor on whether communications with another clcomd daemon will be accepted (using the addresses in the file). . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. put the addresses of any node that would initiate a clcomd session with any non-configured system that has HACMP server code installed in that non-configured node’s /usr/es/sbin/cluster/etc/rhosts file. you can put anything in the /usr/sbin/cluster/etc/rhosts file and it will be ignored. Again. or clcomd will fail to allow any inbound communications. In fact. it might be necessary to empty /usr/es/sbin/cluster/etc/rhosts or manually populate it with the IP addresses of the source node. Subsequently. the key thing is that the file exists and that the HACMP ODM contains the node/adapter information for the source of the clcomd session. because all clcomd communications will be authenticated based on the HACMP ODM files. testing has shown that once the nodes are established in the HACMP ODM.Instructor Guide moved to a new cluster or if the entire cluster configuration is being redone from scratch. The file must exist. First-time pass at initial configuration time The empty /usr/es/sbin/cluster/etc/rhosts file provides a window of opportunity between installation and when HACMP is configured. 1998. Therefore. 5-50 HACMP Implementation © Copyright IBM Corp. if you want to close the “hole”. the file can be emptied. If an entry in the /usr/es/sbin/cluster/etc/rhosts file with no HACMP ODM files is populated. let’s take a look at RSCT. 1998. Details — Additional information — Transition statement — Now. HACMP installation 5-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the clcomd’s standard connection authentication mechanism. 2008 Unit 5. © Copyright IBM Corp. . fallbacks and dealing with individual NIC failures by moving or swapping IP addresses) 5-52 HACMP Implementation © Copyright IBM Corp. and network adapters) .Instructor Guide RSCT Is included with AIX Provides: – Scalability to large clusters – Cluster failure notification – Coordination of changes Includes key components: – Topology Services • Heartbeat services – Group Services • Coordinates and monitors state changes of an application in the cluster – RMC: Resource Monitoring and Control • Provides process monitoring. fallovers.Notification to the cluster manager of events that it has expressed an interest in primarily events related to the failure and recover of topology components . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: What RSCT provides RSCT’s role in an HACMP cluster is to provide: .Coordination of the recovery actions involved in dealing with the failure and recovery of topology components (in other words. RSCT AU548. dynamic node priority variables and userdefined events Works with HACMP's Cluster Manager which is an RSCT (group services) client © Copyright IBM Corporation 2008 Figure 5-19. networks.Failure detection and diagnosis for topology components (nodes. . 1998. OK. 1998. 2008 Unit 5. HACMP installation 5-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Well. a diagram. © Copyright IBM Corp. so let’s take a look at a picture. .0 Instructor Guide Uempty Instructor notes: Purpose — Introduce RSCT.V4. Details — Additional information — Transition statement — They say that a picture is worth a thousand words. 1998. Topology services Responsible for building heartbeat rings for the purpose of detecting. Group services Associated with RSCT Topology Services is the RSCT Group Services daemon which is responsible for “coordinating and monitoring changes to the state of an application 5-54 HACMP Implementation © Copyright IBM Corp. It also illustrates how they communicate with each other. which in turn reports them to the Cluster Manger.0 Notes: The RSCT environment This diagram includes all of the major RSCT components plus the HACMP cluster manager and event scripts. Topology Services is also responsible for the transmission of any RSCT-related messages between cluster nodes. . and reporting state changes to the RSCT Group Services component. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. diagnosing.Instructor Guide HACMP from an RSCT perspective AIX Process Monitor HACMP HA Recovery Driver ~ Cluster Manager Database Resource Monitor RSCT RMC (ctrmc) Recovery Programs Switch Resource Monitor RSCT Topology Services RSCT Group Services Recovery Commands ~ HACMP Event Scripts heartbeats messages Group Membership Event Subscription Voting Protocols between nodes To/from other nodes Figure 5-20. HACMP from an RSCT perspective © Copyright IBM Corporation 2008 AU548. 2008 Unit 5. These monitors report state changes related to monitored entities to the RSCT RMC Manager. database resources. HACMP installation 5-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and the SP Switch (if one is configured on the node). the “application running on multiple nodes” is the HACMP cluster manager.V4. an RSCT client. Monitors The monitors in the upper left of the diagram monitor various aspects of the local node’s state. the application if application monitoring has been configured).0 Instructor Guide Uempty running on multiple nodes”. including the status of certain processes (for example. © Copyright IBM Corp. Group Services reports failures to the Cluster Manager as it becomes aware of them from Topology Services. The scripts are coordinated via the RSCT group services component. The HACMP cluster manager. . The Cluster Manager then drives cluster-wide coordinated responses to the failure through the use of Group Services voting protocols. 1998. Cluster manager After an event has been reported to the HACMP Cluster Manager. In the HACMP context. It analyzes these events and notifies RSCT clients of those events which they have expressed an interest in. RMC manager The RSCT RMC Manager receives notification of events from the monitors. it responds to this event the use of HACMP’s recovery commands and event scripts to respond to the event. registers itself with both the RSCT RMC Manager and the RSCT Group Services components. Details — Additional information — If asked. Event Management Services (emsvcs) is still started at the start of Cluster Services but only used in conjunction with some Oracle RAC releases. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show how the various RSCT components interact with the HACMP cluster manager. It can be stopped manually after Cluster Services has started without any negative ramifications. . Transition statement — Let’s take a quick look at how RSCT’s Topology Services component does heartbeating on an IP network. 1998. 5-56 HACMP Implementation © Copyright IBM Corp. Rather than send heartbeat packets between all combinations of interfaces.2 25.3 • Heartbeat one way in order of high to low IP address.60.8.8.0 Notes: RSCT topology services functions The RSCT Topology Services component is responsible for the detection and diagnosis of topology component failures. the RSCT Topology Services component sorts the IP addresses of the interfaces on a given logical IP subnet and then arranges to send heartbeats in a round robin fashion from high to low IP addresses in the sorted list.6 25. For non-IP networks (like rs-232 or Heartbeat on Disk). addresses are assigned to the “adapters” that form the endpoints of the network and are used by Topology Services like IP addresses for routing/monitoring the heartbeat packets.60. HACMP installation 5-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Heartbeat rings Heartbeat 25. 1998.60.8.V4. © Copyright IBM Corporation 2008 Figure 5-21.4 25.8. As discussed in the networking unit.60. Heartbeat rings AU548. © Copyright IBM Corp. 2008 Unit 5.60.8. the mechanism used to detect failures is to send heartbeat packets between interfaces. .5 25. 60. 1998.8.60.8.8.8.4.4-->25.6.5.6 --> 25.60. 25.60.60.60. 25.60. the IP addresses in the foil can be sorted as 25. 25.3 and 25. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This ordering results in the following heartbeat path: 25.8.8.60.60.3-->25.60.8.Instructor Guide Example For example.2.8.6 5-58 HACMP Implementation © Copyright IBM Corp.8.60.8. .2-->25.5-->25.8. 0 Instructor Guide Uempty Instructor notes: Purpose — Show how RSCT Topology Services determines heartbeat path. . 1998. Details — Additional information — Transition statement — Let’s take a look at HACMP’s SNMP support. © Copyright IBM Corp.V4. HACMP installation 5-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 5. 3 and later SNMP manager support is provided by the cluster manager component. 5-60 HACMP Implementation © Copyright IBM Corp. . HACMP’s SNMP support AU548.Instructor Guide HACMP's SNMP support HACMP uses SNMP to provide: – Notification of cluster events – Cluster configuration/state information Support in HACMP 5. This extension can be used to make Tivoli NetView HACMP-aware. This is discussed in more detail in the course HACMP Administration II: HACMP Administration and Problem Determination. In addition. The clinfo daemon as well as any SNMP manager and the snmpinfo command can interface to this SNMP manager.3 and later is provided by Cluster Manager – A client (smux peer) of AIX's snmpdv3 Support consists of: – Maintaining a management information base (MIB) – Responding to SNMP queries for HACMP information – Generating SNMP traps ClinfoES and HAView use SNMP Available to any snmp manager and the snmpinfo command © Copyright IBM Corporation 2008 Figure 5-22. HACMP includes an extension to the Tivoli NetView product called HAView. This SNMP manager allows the cluster to be monitored via SNMP queries and SNMP traps. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: HACMP support of SNMP In HACMP 5. V4. 1998. . © Copyright IBM Corp. HACMP installation 5-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce HACMP’s SNMP integration. 2008 Unit 5. Details — Additional information — Transition statement — Let’s take a closer look at an optional HACMP daemon whose name we’ve seen before. 0 Notes: What the clinfo daemon provides The clinfo daemon provides an interface (covered in Unit 3) for dealing with ARP cache related issues as well as an Application Program Interface (API) which can be used to write C and C++ programs. xclstat. clstat. which meet customer-specific needs related to monitoring the cluster.Instructor Guide Cluster information daemon (clinfo) Is an SNMP-aware client to Cluster Manager Provides: – A cluster information API to the HACMP SNMP manager • Focused on providing HACMP cluster information • Easier to work with than the SNMP APIs – Support for ARP cache issues Is used by: – The clstat command – Customer written utility/monitoring tools Implemented as the clinfoES subsystem © Copyright IBM Corporation 2008 Figure 5-23. Where clinfo runs The clinfo daemon can run on HACMP cluster server nodes or on any machine which has the clinfo code installed. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Clinfo is required for some status commands Clinfo must be running on a node or client machine to use any of the clstat related commands (clstat. 1998. Cluster information daemon (clinfo) AU548.cgi) 5-62 HACMP Implementation © Copyright IBM Corp. 1 and later to start clinfo for consistency groups. 2008 Unit 5. . 1998.V4.cluster script or the startsrc command to start clinfo on a client. This support is for HACMP/XD with Metro Mirror replication.4. You can also use the standard AIX startsrc command startsrc -s clinfoES © Copyright IBM Corp. • Starting clinfo on a Client: Use the /usr/es/sbin/cluster/etc/rc.0 Instructor Guide Uempty Starting clinfo • Starting clinfo on an HACMP server node: The clinfo daemon can be started in a number of ways (see the HACMP Administration Guide) but probably the best way is to start it along with the rest of the HACMP daemons by setting the Startup Cluster Information Daemon? field to true when using the smit Start Cluster Services screen (which will be discussed in the next unit). Note that an option exists in HACMP 5. HACMP installation 5-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Instructor notes: Purpose — Introduce clinfo. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 5-64 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Let’s take a quick look at HACMP’s highly available NFS support. Ability to define NFS exports and mounts at the directory level . preserving the locks on NFS filesystems and the duplicate request cache .The locking function is available only for 2-node clusters .Ability to specify export options for NFS-exported directories and filesystems NFS V2/V3 Limitations . .Ability to specify a network for NFS mounting .0 Notes: HACMP NFS V2/V3 support The HACMP software provides the following availability enhancements to NFS V2/V3 operations: . 1998. Highly available NFS server support AU548.V4. HACMP installation 5-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 5.The resource group must behave as non-concurrent--active on one node at a time © Copyright IBM Corp.Reliable NFS server capability that allows a backup processor to recover current NFS activity should the primary NFS server fail.0 Instructor Guide Uempty Highly available NFS server support Cluster administrator can: – Define NFS exports at directory level to all clients – Define NFS mounts and network to HACMP nodes – Specify export options for HACMP to set NFS V2/V3 – HACMP preserves file locks and dupcache across fallovers – Limitations • Lock support is limited to two node clusters • Resource group is only active on one node at a time NFS V4 – – – – It requires Stable Storage location accessible from all nodes in the resource group Resource Group can have more than two nodes NFSv4 application server and monitor are automatically added It requires a new fileset be installed Combination of V2/V3 + V4 supported © Copyright IBM Corporation 2008 Figure 5-24. exports and cross-mounts and provides for enhanced verification methods to catch know configuration issues.1. rather than having to edit /etc/exports. 5-66 HACMP Implementation © Copyright IBM Corp. The HACMP support builds an application monitor automatically for monitoring the NFS V4 daemons.nfs. New fileset To use NFS V4 filesystems in a Resource Group.1 and later provides a SMIT path to configure NFS V4 exports. the fileset cluster. Configuring both NFS V2/V3 and NFS V4 filesystems in the same Resource Group is supported. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The configuration of the NFS V4 filesystems into Resource Groups is made simple through the use of a Configuration Assistant. . NFS V4 with HACMP 5.3/6.es.rte must be installed. 1998.Instructor Guide HACMP NFS V4 support NFS V4 is included in AIX 5.4. 0 Instructor Guide Uempty Instructor notes: Purpose — Introduce HACMP’s highly available NFS support. HACMP provides support for external disks. . 2008 Unit 5.V4. 1998. Details — Additional information — Transition statement — As discussed in the shared storage unit. © Copyright IBM Corp. HACMP installation 5-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Shared external disk access It provides two types of shared disk support: – Serially reusable shared disks: • Varied on by one node at a time under the control of HACMP • LVM or RSCT ensures no access by two nodes at once • Two types of volume groups: – non-concurrent mode – Enhanced Concurrent Mode running in non-concurrent mode – Concurrent access shared disks: • Used by concurrent applications writing to raw logical volumes • One type of volume group – Enhanced Concurrent Mode running in concurrent mode bos. See the shared storage unit for more information on HACMP’s shared external disk support. 1998. HACMP supports shared disks. Shared external disk access AU548.0 Notes: Shared disk support As you know by now. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note that the bos. 5-68 HACMP Implementation © Copyright IBM Corp.clvm.clvm.enh fileset required for Concurrent/Enhanced Concurrent © Copyright IBM Corporation 2008 Figure 5-25. Recall that enhanced concurrent mode can be used in a non-concurrent mode to provide heartbeat over disk and fast disk takeover for resource group policies where the resource group is active on only one node at a time.enh fileset is required for enhanced concurrent support even if using it in non-concurrent mode. . now let’s take a checkpoint. © Copyright IBM Corp. HACMP installation 5-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Review HACMP’s shared external disk support. 1998. Details — Additional information — Transition statement — Okay. 2008 Unit 5. . 0 Notes: 5-70 HACMP Implementation © Copyright IBM Corp. b.Instructor Guide Checkpoint 1. c. Which component provides SNMP information? 3. 1998. d. Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clsmuxpd clinfo Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clcomd clinfo © Copyright IBM Corporation 2008 2. a. c. d. a. Which component is required for clstat to work? 4. Which component detects an adapter failure? a. b. b. d. Which component removes requirement for the /. Checkpoint AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. c. b. . d.rhosts file? Figure 5-26. c. a. Which component provides SNMP information? 3. HACMP installation 5-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. c. d. 1998. a. c. d. Which component removes requirement for the /.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Checkpoint solutions 1. c. c. b. b. a. a. Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clsmuxpd clinfo Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clcomd clinfo © Copyright IBM Corporation 2008 2. . Which component detects an adapter failure? a. Which component is required for clstat to work? 4. d. d. b. b.V4.rhosts file? Additional information — Transition statement — © Copyright IBM Corp. 2008 Unit 5. 4.4.1 List the prerequisites for HACMP 5.Instructor Guide Unit summary Having completed this unit.1 Describe the installation process for HACMP 5.0 Notes: 5-72 HACMP Implementation © Copyright IBM Corp. you should be able to: Explain where installation fits in the implementation process Describe how to install HACMP 5.1 components © Copyright IBM Corporation 2008 Figure 5-27.4. Unit summary AU548. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4.1 List and explain the purpose of the major HACMP 5. 1998. Details — Additional information — Transition statement — Time for lab.0 Instructor Guide Uempty Instructor notes: Purpose — Summarize the unit. HACMP installation 5-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . © Copyright IBM Corp. 2008 Unit 5.V4. 1998. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide 5-74 HACMP Implementation © Copyright IBM Corp. . 2008 Unit 6.1 Installation Guide SC23-4861-10 HACMP for AIX. Version 5.1 Planning Guide SC23-4862-10 HACMP for AIX. 1998.1 cluster . Initial cluster configuration 6-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Initial cluster configuration Estimated time 03:00 What this unit is about In this unit.4. .ibm. you will learn how to configure a cluster using the SMIT HACMP interface.Use the standard path • Configure a standby HACMP 5.IP Address Takeover via alias .1 Administration Guide SC23-5177-04 HACMP for AIX.1 Troubleshooting Guide http://www-03.4. synchronize.4.4.Non-IP networks (rs232.4. you should be able to: • Configure a Mutual Takeover HACMP 5.V4. diskhb) .Persistent address • Verify.com/systems/p/library/hacmp_docs.1 cluster . Version 5. and test a cluster • Start and stop cluster services • Save a cluster configuration How you will check your progress • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX.Use the 2 node Configuration Assistant • Configure Topology to include: . You will also learn how and when to verify and synchronize your cluster. What you should be able to do After completing this unit. You will learn how to perform simple and more advanced cluster configuration.4. Version 5.0 Instructor Guide Uempty Unit 6.html HACMP manuals © Copyright IBM Corp. Version 5. It will then demonstrate how to start up and shut down Cluster Services. synchronize. Follow the markers at the bottom of the screens to see the steps to extend the basic hot-standby to a mutual takeover. add a heartbeat on disk non-IP network and synchronize the changes. Unit objectives AU548. 6-2 HACMP Implementation © Copyright IBM Corp. and test a cluster Start cluster services Save the cluster configuration © Copyright IBM Corporation 2008 Figure 6-1. you should be able to: Configure a mutual takeover HACMP 5.1 cluster – Use the Initialization and Standard Configuration path (Standard Path) – Use the Two-Node Cluster Configuration Assistant (Two-node Assistant) Configure HACMP topology to include: – IP address takeover via alias • This is the default in the Standard Path – Non-IP networks (rs232.4. You will be walked through the methods of configuring the cluster.0 Notes: Objectives This unit will show how to configure a 2-node hot-standby or mutual takeover cluster with a heartbeat over disk non-IP network using the standard configuration menus. You will make the above mentioned extensions using the Extended Configuration path. that is. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the Two-Node Configuration Assistant. using the Initialization and Standard Configuration path. method. You will also see the simplest. diskhb) – Persistent address Verify. It will then show the steps necessary to modify the configuration of the cluster to add a persistent IP label. The final step is making a snapshot backup of the cluster configuration. most limited.Instructor Guide Unit objectives After completing this unit. . Additional information — The unit has many visuals. Initial cluster configuration 6-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Transition statement — Let’s take a look at how we prepare to configure an HACMP cluster. © Copyright IBM Corp.” Consequently. which appear earlier in the unit. many of them are duplicates of visuals. Details — State the unit objectives to the students. 2008 Unit 6. Don’t dwell on points that have already been covered earlier in the unit.V4. Many of them are included to provide the students with reasonably self-contained descriptions of how to perform certain configuration changes that can be referred to “back at the office. .0 Instructor Guide Uempty Instructor notes: Purpose — Unit objectives. 1998. The standard path is ideal for creating a cluster because it gives you the ability to use the pick lists and it automates some steps. What we are going to achieve AU548. It requires that you have a solid understanding of your environment and the way HACMP works to successfully configure the cluster.Instructor Guide What we are going to achieve Either: A two node “hot” standby configuration (active / passive) – Resource group xwebgroup with usa as its home (primary) node and uk as its backup node usa usaadm uk ukadm Two non-IP: heartbeat on disk rs-232 X X Y Y Look for the to signify a mutual takeover task (repeat the step) in the slides that follow Or: A two node mutual takeover configuration (active / active) – Second resource group with uk as its home (primary) node and usa as its backup node © Copyright IBM Corporation 2008 Figure 6-2. 1998. That one resource group will contain all the non-rootvg volume groups present on the node where the configuration is being done. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. with one resource group only. . To adapt this to a hot-standby cluster. you and your team will configure a two-node cluster. The two-node assistant is mentioned in the lecture. omit the steps that involve creating the second resource group and its content. 6-4 HACMP Implementation © Copyright IBM Corp. It can be used to create a simple hot-standby cluster. You will be guided through the process of creating a mutual takeover cluster using the standard path.0 Notes: Configuring either a standby or a mutual takeover configuration During this course. network. 2008 Unit 6. . Initial cluster configuration 6-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and network adapter failure.0 Instructor Guide Uempty The X in the figure represents the application xwebserver and the arrow represents what happens on a fallover. © Copyright IBM Corp.V4. The persistent addresses and both non-IP networks that will be added in this unit are also shown. including NFS export and cross-mount. we will also configure additional features. The cluster will be tested for reaction to node. and later in the week. 1998. 1998. . Additional information — Transition statement — Let’s take a moment to review what we have done so far in the course. 6-6 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Outline the cluster configuration that will be implemented during this course. Details — Talk the students through what we are about to build. Explain that it is practically identical to what they will be building in the lab exercise. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Instructor Guide Uempty Where are we in the implementation? Plan for network. Initial cluster configuration 6-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Ready for configuration Now that the HACMP filesets are installed. Where do we go from here? As mentioned on the previous visual. policies • Attributes: Application Server. non-IP) – Application start and stop scripts Install the HACMP filesets and reboot Configure the HACMP environment – Topology • Cluster. Finally. and application – Eliminate single points of failure Define and configure the AIX environment – Storage (adapters.V4. we will use the extended path to deal with some initial configuration choices that cannot be done with the standard path. we will configure a mutual takeover configuration with two applications and two resource groups using the Standard Configuration method. service label. HACMP IP and non-IP networks – Resources. attributes: • Resources: Application Server. service label • Resource group: Identify name. in this topic. Where are we in the implementation? AU548. filesystem … – Synchronize Start Cluster Services Test configuration Save configuration © Copyright IBM Corporation 2008 Figure 6-3. . filesystem) – Networks (IP interfaces. LVM volume group. 1998. storage. © Copyright IBM Corp. /etc/hosts. 2008 Unit 6. we can start to configure HACMP. node names. VG. resource group. nodes. . 1998. Details — Additional information — Transition statement — So what do we need to assume about the network (IP and non-IP) before configuring topology? 6-8 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Review where we are and where we are going. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 31 192.29 192.31 192.16.16.31 These network interfaces are all connected to the same physical network The subnet mask is 255.15.168. Refer back to this page as required over the coming visuals. .255. 1998.15.31 192.0 Instructor Guide Uempty The topology configuration Here's the key portion of the /etc/hosts file used in this unit: 192.16.168.168.168.192. The above network configuration is the context within which the first phase of this unit will occur. The service address would have been on the same subnet as one of the boot adapters if IPAT via Replacement was to be used.15. Note that the addresses are set to support IPAT via Aliasing. © Copyright IBM Corp.29 192.168.16. It is obviously not an issue with virtual adapters.168.192.5.192.168.0 on all networks/NICs An enhanced concurrent mode volume group “xwebvg" has been created to support the xweb application and will be used for a disk non-IP heartbeat network © Copyright IBM Corporation 2008 Figure 6-4.168.29 uk's network configuration: en0 .0 Notes: A sample network configuration Every discussion must occur within a particular context.5.31 en1 . uk usa's network configuration (defined via smit chinet): en0 .168.168.15.29 en1 . 2008 Unit 6.192. This is true whether you’re dealing with a standalone system or an LPAR with adapters in drawers.92 usaboot1 usaboot2 usaadm ukboot1 ukboot2 ukadm xweb # # # # # # # # usa's first interface IP label usa's second interface IP label persistent node IP label on usa uk's first interface IP label uk's second interface IP label persistent node IP label on uk the IP label for the application normally resident on usa Hostnames: usa.5.255.29 192. The topology configuration AU548. Initial cluster configuration 6-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.168. Also note that an understanding of the physical layout of the adapters in each system is critical to ensure that the cable attachments are going to the correct enX in AIX. 6-10 HACMP Implementation © Copyright IBM Corp.” Details — Because the initial phase of this unit will build a cluster with IPAT via IP aliasing.Instructor Guide Instructor notes: Purpose — Document the network configuration within which the first phase of this unit will “operate. . Additional information — Transition statement — We’ve now come to a fork in the road of sorts. Go over this configuration briefly with the students to ensure that they have a reasonable grasp of the state of the cluster immediately prior to configuration. the network adapters and service IP labels have been configured appropriately for IPAT via IP aliasing. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. This can be a starting point for creating a more robust cluster but should not be viewed as a shortcut to creating a cluster without a thorough understanding of how HACMP works. all the steps of Standard Configuration are done at once. © Copyright IBM Corp. Note that this is a simple two-node configuration with one resource group containing all configured volume groups.Two-Node Cluster Configuration Assistant With this method. 2008 Unit 6.0 Instructor Guide Uempty Configuration methods HACMP provides two menu paths with three methods to configure topology and resources: – Initialization and Standard Configuration • Two-node cluster configuration assistant – Limited configuration > Only supports two-node hot standby cluster – Builds cluster configuration based on AIX configuration > All adapter addresses treated as boot addresses > All volume groups assigned to one resource group – Creates everything needed for simple cluster (Topology. but provides access to all the options © Copyright IBM Corporation 2008 Figure 6-5. Initial cluster configuration 6-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Resources. you must do the following tasks: i. including adding a non-IP disk heartbeat network if you created an enhanced concurrent volume group. Topology (simplified via “Configure an HACMP Cluster and Nodes”) ii. 1998. Resource Group) > No persistent addresses > No non-IP network other than Heartbeat on Disk (only if enhanced concurrent mode volume group present) • Standard configuration – Topology done in one step > Based on IP addresses configured – You then must configure resource groups and synchronize – Desirable method to create more than two-node hot standby cluster – Extended Configuration • More steps. Configuration methods AU548. . Configure Resources and Resource Groups iii.0 Notes: Configuration methods .V4.Standard Configuration With this method. Verify and Synchronize . 1998.Instructor Guide . . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Some options can only be done using this method. 6-12 HACMP Implementation © Copyright IBM Corp. such as adding a non-IP network.Extended Configuration With this method you follow similar steps as the Standard Configuration but Topology has more steps and there are many more options. V4. Point out that this course deals primarily with the Standard Configuration options. © Copyright IBM Corp. 1998. Note that in this version of the course there will be more emphasis on comparing the two Standard Configuration methods. . Details — Don’t spend a lot of time on this visual because the difference between the methods will be demonstrated shortly. Additional information — Transition statement — Let’s begin with a general discussion of planning the cluster configuration. 2008 Unit 6. Initial cluster configuration 6-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Point out the methods to configure. the addresses on the adapters for all the systems being configured into the cluster are used to create the adapter and network objects in the HACMP ODM. Planning and base configuration AU548. by address and by name on all the systems in the cluster. As described in the unit on networking considerations. Now you must ensure that those boot addresses are configured on each of the cluster node’s network adapters. boot. Take care to ensure that you have configured these addresses correctly. . there are basic AIX configuration steps that must be performed. 1998. you chose IP addresses and subnets to match your IP Address Takeover method. you must ensure that all the addresses. forward and reverse. including the subnet mask. When using either the Two-node assistant or the Standard path.0 Notes: Base configuration Prior to using any of the methods to configure the cluster. Next. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. service. are in the /etc/host files for all the systems in the cluster. Then verify that you 6-14 HACMP Implementation © Copyright IBM Corp.Instructor Guide Planning and base configuration Configure your boot addresses to the interfaces on all systems Put all boot. and persistent. and persistent addresses in /etc/hosts on all systems Create the volume groups to be used by the applications – Enhanced Concurrent Mode Volume groups recommended Plan configuration path for both nodes – usaboot1 and ukboot1 in our example Plan Application Server name = xwebserver Plan Application Server name = xwebgroup Ensure that Application Server start and stop scripts exist and are put on usa Plan service IP Label = xweb © Copyright IBM Corporation 2008 Figure 6-6. Check for resolution. service. A simple mistake here will result in incorrect network configurations in HACMP. the volume groups must be configured prior to the resource group configuration (and an HACMP discovery must be done). if you use the Two-node Configuration Assistant. To minimize risk of error in data entry. If you use the two-node assistant. add the volume groups to the resource groups using a pick list. You will choose application server and resource group names when you configure them using Initialization and Standard Configuration. If you use the Standard path. you must add those volume groups to resource groups. Configure at least one Enhanced Concurrent Mode Volume Group for use in a heartbeat on disk non-IP network. You would ensure that the start and stop scripts were placed on all the systems in the cluster and that you specify interface name/address for all the other systems when configuring the cluster. To instruct HACMP to manage your application’s volume groups. To do that. Take caution here. 1998.0 Instructor Guide Uempty can reach all the boot addresses from each system via ping (including the local addresses). Initial cluster configuration 6-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you choose the volume groups to place in the resource groups. © Copyright IBM Corp. Now switch to your storage configuration. . As you will see a little later. In our example you’d ensure that the scripts were on usa and that you chose an interface name/address for the other node (uk).V4. Choosing them from a pick list is the right approach. all volume groups (other than rootvg) will be picked up and used in the resource group that is configured. 2008 Unit 6. the application server name will be used to generate the HACMP names for the cluster and resource group. 6-16 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Introduce the configuration paths. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . now we are ready to start the configuration process. Details — Additional information — Transition statement — So. 1998. The top-level HACMP smit menu AU548. .0 Notes: The main HACMP smit menu This is the top level HACMP smit menu. which everyone familiar with AIX would know. As implied by the # prompt. there is little point in being here if you don’t have root privileges! Starting at the main smit menu If you’re interested in starting at the top level smit screen. Initial cluster configuration 6-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP is under Communications Applications and Services.0 Instructor Guide Uempty The top-level HACMP smit menu # smit hacmp HACMP for AIX Move cursor to desired item and press Enter.V4. Look for HACMP for AIX in that menu. © Copyright IBM Corp. 2008 Unit 6. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools Cluster Simulator F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-7. More often. we will start at the beginning of smit. You’ll find it often simplest to get here using the smit fastpath shown above. this menu would be skipped by entering the command smit hacmp or smitty hacmp but for the sake of completeness. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — You might decide to set aside these visuals at about this point and telnet into one of the student’s cluster nodes (with their permission of course) so that you can give a “live demo” of how to configure a cluster. try to ensure that you cover at least the same ground that the remainder of this unit covers.Instructor Guide Instructor notes: Purpose — Show the top level HACMP smit menu. If you take this route. Details — Emphasize the smit fastpath. 6-18 HACMP Implementation © Copyright IBM Corp. Transition statement — Let’s build a cluster starting with the standard configuration menu. 2008 Unit 6. The importance of this can’t be underestimated. During synchronization. © Copyright IBM Corp. . 1998. Configuration Assistants Configure an HACMP Cluster and Nodes Configure Resources to Make Highly Available Configure HACMP Resource Groups Verify and Synchronize HACMP Configuration Display HACMP Configuration HACMP Cluster Test Tool What you will see. it is good practice to use the Initialization and Standard Configuration path (referred to as the Standard path) for all your cluster configurations because it requires you to be aware of the details of your configuration.0 Notes: The initialization and standard configuration menu This method is preferred for all initial cluster configurations except for the most simple two-node. hot-standby. one resource group. Regardless of your cluster complexity.V4. Initial cluster configuration 6-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Configuration changes made using the HACMP Standard path smit screens do not take effect until they are verified and synchronized (see the third from the bottom selection in this menu). The standard configuration method AU548. one volume group configuration. they are managed on the node from which the configuration work is performed. step-by-step Start with Configure an HACMP Cluster and Nodes F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-8. For those simpler configurations. More about dynamic reconfiguration in a later unit. Instead. the files are propagated to the other nodes and will cause HACMP to be dynamically reconfigured if there are active cluster nodes.0 Instructor Guide Uempty The standard configuration method Initialization and Standard Configuration Move cursor to desired item and press Enter. you can use the Two-node Configuration Assistant. This is done via the Configure Resources to Make Highly Available option. adapters and network objects for IP based networks. via an additional feature. HACMP provides. you must put them in resource groups. make sure that you don’t flip back and forth between nodes while doing configuration work (that is. That is followed by the configuration of the resources that will be made highly available. You will start by configuring the cluster itself. and DB2. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Non-IP networks will be added later. Oracle. You will create resource group(s) objects and then fill them with the resources that were defined above. application servers (specifying your application start and stop script names) and the option to use C-SPOC to create your shared LVM structures. This includes the service addresses. 1998. Each follows the other with each having submenus to be traversed. This is done via the Configure an HACMP Cluster and Nodes option. Recommendation Pick one of your cluster nodes to be the one node that you use to make changes. There will be nodes listed in a specific order for acquiring the resources and the service addresses and volume groups that support the application. This is done via the Configure HACMP Resource Groups option. More on the Two-node Configuration Assistant later. Caution If changes are made on one node but not synchronized and then more changes are made on a second node and then synchronized. Configuration assistants Besides the Two-node Configuration Assistant. 6-20 HACMP Implementation © Copyright IBM Corp. If you want to avoid “losing” work. The method The menu shows the tasks as they are to be performed. This will build the cluster. called Smart Assistants. configuration assistants for WebSphere. the changes made on the first node are lost. work on only one node at least until you’ve synchronized your changes).Instructor Guide Note however that the Two-node Configuration Assistant does do the synchronization step. nodes. To make the resources available to HACMP for management. . This is the launch point for the following slides on the standard path. Additional information — Transition statement — Let’s make the cluster. 1998. Initial cluster configuration 6-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Spend time describing the process that is used to create a cluster and the associated resources and relationships. © Copyright IBM Corp.V4. 2008 Unit 6. . nodes. interfaces and networks. The issue of synchronization is discussed here because it is important that the students understand the requirement to work on only one node at least until they’ve synchronized their changes. It doesn’t hurt if they also clearly understand that essentially all HACMP-related configuration work can be performed from any cluster node.0 Instructor Guide Uempty Instructor notes: Purpose — Illustrate the top of the standard configuration path menu hierarchy. [Entry Fields] [ibmcluster] + * Cluster Name New Nodes (via selected communication paths)[usaboot1 ukboot1] Currently Configured Node(s) Add cluster name and resolvable names to be used to communicate to the nodes. The hostname of each node that is found is used as the HACMP node name. you need only decide on a name for the cluster and choose one IP address/hostname for each node that will be in the cluster. F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-9. the Currently Configured Node(s) field is empty. . Notice that you can select the interfaces from a pick list (from the local /etc/hosts file) and at this point in time. Press Enter AFTER making all desired changes. including the node where you see this screen. 6-22 HACMP Implementation © Copyright IBM Corp.Instructor Guide Add nodes to an HACMP cluster Configure Nodes to an HACMP Cluster (standard) Type or select values in entry fields. 1998. Add nodes to an HACMP cluster AU548.0 Notes: Input for the standard configuration method Assuming your network planning and setup was done correctly. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This is not necessarily the HACMP node name that will be assigned to the node. it is only a resolvable/reachable address that can be used to gather information for the creation of the HACMP topology configuration. what did we get? © Copyright IBM Corp. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Show the input for standard configuration.V4. . Details — Additional information — Transition statement — Okay. 2008 Unit 6. Initial cluster configuration 6-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 168. The network objects are created based on the addresses/subnet masks that are configured on the adapters in the nodes specified in the previous screen.31 ukboot2 192. This exists only on the node where the command was run. 6-24 HACMP Implementation © Copyright IBM Corp.16.168.168.29 NODE uk: Network net_ether_01 ukboot1 192.0 Notes: Output from standard configuration This step has created the cluster. What did we get? AU548.15.29 usaboot2 192.168. Notice that there is no non-IP network and there are no resources and no resource groups yet when using the standard configuration method.16.Instructor Guide What did we get? # /usr/es/sbin/cluster/utilities/cltopinfo Cluster Name: ibmcluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No There are 2 node(s) and 1 network(s) defined NODE usa: Network net_ether_01 usaboot1 192. 1998. Later we will see the synchronization process.31 No resource groups defined © Copyright IBM Corporation 2008 Figure 6-10. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .15. an IP network and non-service IP labels (boot addresses). Details — Additional information — Transition statement — What is left to do? © Copyright IBM Corp. 2008 Unit 6. . 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Show the output of the standard configuration method. Initial cluster configuration 6-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. .The cluster definitions must be propagated to the other nodes using Verify and Synchronize. only there is more to do using the Standard path: . Configuration Assistants Configure an HACMP Cluster and Nodes Configure Resources to Make Highly Available Configure HACMP Resource Groups Verify and Synchronize HACMP Configuration Display HACMP Configuration HACMP Cluster Test Tool F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-11. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Application Server and Service Address are Resources must be created using the Configure Resources to Make Highly Available.Instructor Guide Now define highly available resources Initialization and Standard Configuration Move cursor to desired item and press Enter. Now define highly available resources AU548.Extended Configuration method must be used to add non-IP heartbeat networks.0 Notes: Not done yet Because Configure an HACMP Cluster and Nodes does the topology. .Resource group with policies and attributes must be created using the Configure HACMP Resource Groups. 6-26 HACMP Implementation © Copyright IBM Corp. . . 2008 Unit 6. . Initial cluster configuration 6-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. starting with the Configure Resources to Make Highly Available option. 1998. © Copyright IBM Corp.0 Instructor Guide Uempty Game plan These steps will follow. 6-28 HACMP Implementation © Copyright IBM Corp. . let’s move on to the service addresses. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — Now that we have the topology configured.Instructor Guide Instructor notes: Purpose — Show the remaining steps and introduce the panels that will be followed. . Start with service addresses AU548. © Copyright IBM Corp. The first step is to define the Service IP labels. Logical Volumes and Filesystems Concurrent Volume Groups and Logical Volumes F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-12. Configure Configure Configure Configure Service IP Labels/Addresses Application Servers Volume Groups. Initial cluster configuration 6-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the process will be to follow the menus. 1998. 2008 Unit 6.0 Notes: The first step in definition of highly available resources Again.V4.0 Instructor Guide Uempty Start with service addresses smit hacmp -> Initialization and Standard Configuration Configure Resources to Make Highly Available Move cursor to desired item and press Enter. Details — Additional information — Transition statement — Now. 6-30 HACMP Implementation © Copyright IBM Corp. 1998. choose to add a service label. .Instructor Guide Instructor notes: Purpose — Show that we will start with Service IP Label definitions. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4. To define a new service label.0 Instructor Guide Uempty Adding service IP labels Configure Service IP Labels/Addresses Move cursor to desired item and press Enter. 2008 Unit 6. choose Add a Service IP Label/Address.0 Notes: The Configure Service IP Labels/Addresses menu This is the menu for managing service IP labels and addresses within the standard configuration path. . Add a Service IP Label/Address Change/Show a Service IP Label/Address Remove Service IP Label(s)/Address(es) F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-13. Adding service IP labels AU548. 1998. Initial cluster configuration 6-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. Details — Additional information — Transition statement — Next. 6-32 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the standard path menu selection for configuring service IP labels and addresses. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . we actually choose the service label to use. 168. in practice. 1998.This could be quite a long list depending on how many entries there are in the /etc/hosts file.29) ¦ ¦ ukadm (192. Add xweb service label (1 of 2) AU548.31) ¦ ¦ yweb (192.92) ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ ¦ /=Find n=Find Next ¦ +--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-14. Press Enter AFTER making all desired changes. Initial cluster configuration 6-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.168. The popup for the IP Label/Address field gives us a list of the IP labels that were found in /etc/hosts but not associated with NICs.5.168.0 Instructor Guide Uempty Add xweb service label (1 of 2) Add a Service IP Label/Address (standard) Type or select values in entry fields. .168.5. the list is fairly short as /etc/hosts on cluster nodes tends to only include IP labels which are important to the cluster. The service IP label that we intend to associate with the xweb resource group’s application is xweb. Although.V4. ¦ ¦ ¦ ¦ (none) ((none)) ¦ ¦ usaadm (192. © Copyright IBM Corp.70) ¦ ¦ xweb (192. 2008 Unit 6.5.0 Notes: Selecting the service label This is the HACMP smit screen for adding a service IP label in the standard configuration path. * IP Label/Address * Network Name [Entry Fields] [] [] + + +--------------------------------------------------------------------------+ ¦ IP Label/Address ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter.5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we must choose the name of the network on which the IP Label/Address will be bound. 1998. . 6-34 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the menu for selecting the service label. Details — Additional information — Transition statement — Next. * IP Label/Address * Network Name [Entry Fields] [xweb] [net_ether_01] + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Repeat the process for every service address to be configured if mutual takeover cluster. . press Enter to define the service IP © Copyright IBM Corp. Notice that the popup list entry names the network and indicates the IP subnets associated with each network. which is not in either of these subnets to satisfy the rules for IPAT via IP aliasing. 1998. Press Enter AFTER making all desired changes. Menu filled in This screen shows the parameters for the xweb resource group’s service IP label.0 Instructor Guide Uempty Add xweb service label (2 of 2) Add a Service IP Label/Address (standard) Type or select values in entry fields. The automatically generated network names are a bother to type so we’ve used the popup list which contains the only IP network defined on this cluster. This is potentially useful information at this point as we must specify a service IP label. another menu will display prompting you to choose the network to which this Service IP label belongs.0 Notes: Choosing the network name Although not shown. 2008 Unit 6.V4. Initial cluster configuration 6-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Add xweb service label (2 of 2) AU548. When we’re sure that this is what we intend to do. defining yweb © Copyright IBM Corporation 2008 Figure 6-15. Mutual takeover configuration At this point you would repeat the step to define all the service IP labels for all the applications. In our case that means defining the yweb interface for the second resource group. you would specify the service IP label for the application that will run on the other node. The label is then available from a pick list when you add resources to a resource group later. 6-36 HACMP Implementation © Copyright IBM Corp. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide label. If creating a mutual takeover configuration. 1998. © Copyright IBM Corp. . Initial cluster configuration 6-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now.0 Instructor Guide Uempty Instructor notes: Purpose — Show the menu after choices have been made.V4. 2008 Unit 6. we tackle the next resource--the Application Server. Instructor Guide Continue with application servers smit hacmp -> Initialization and Standard Configuration Configure Resources to Make Highly Available Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 6-38 HACMP Implementation © Copyright IBM Corp. Continue with application servers AU548.0 Notes: The next step is definition of highly available resources Continuing to follow the menus. the next step is to define the application servers. Configure Configure Configure Configure Service IP Labels/Addresses Application Servers Volume Groups. . 1998. Logical Volumes and Filesystems Concurrent Volume Groups and Logical Volumes F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-16. .0 Instructor Guide Uempty Instructor notes: Purpose — Show that the next step is the application server definitions. © Copyright IBM Corp. 1998.V4. choose to add an application server. Initial cluster configuration 6-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 6. Details — Additional information — Transition statement — Now. 6-40 HACMP Implementation © Copyright IBM Corp. This Configure Application Servers menu displays under the Configure Resources to Make Highly Available menu in the standard configuration path. Add xwebserver application server (1 of 2) AU548. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Add xwebserver application server (1 of 2) Configure Application Servers Move cursor to desired item and press Enter.0 Notes: Configuring the application server resource We’ve now got to define an Application Server for the xweb resource group. Add an Application Server Change/Show an Application Server Remove an Application Server F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-17. . Initial cluster configuration 6-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. now let’s go to the Add an Application Server menu.0 Instructor Guide Uempty Instructor notes: Purpose — Show the menu for configuring Application Servers. 2008 Unit 6. 1998. Details — Additional information — Transition statement — So.V4. © Copyright IBM Corp. 0 Notes: Filling out the add application server menu An application server has a name and consists of a start script and a stop script. then the application might seem to function for quite sometime before someone realizes that a critical resource isn’t available. The stop script is invoked when HACMP needs to stop the application (typically during a stop of cluster services or as part of a fallback to a higher priority node). If the start script doesn’t check for the required resources. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Use full path for the script names.Instructor Guide Add xwebserver application server (2 of 2) Add Application Server Type or select values in entry fields. The start script should first verify that all the required resources are actually available and log a “clear and useful” message if it detects a problem. The server name is then available from a pick list when adding resources to a resource group later. . 1998. creating a ywebserver © Copyright IBM Corporation 2008 Figure 6-18. Review of start and stop scripts The start script is invoked by HACMP when it needs to start the application. 6-42 HACMP Implementation © Copyright IBM Corp. * Server Name * Start Script * Stop Script [Entry Fields] [xwebserver] [/usr/local/scripts/startxweb] [/usr/local/scripts/stopxweb] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Repeat the process for every Application Server to be configured if mutual takeover cluster. Add xweb application server (2 of 2) AU548. Press Enter AFTER making all desired changes. Initial cluster configuration 6-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. AU610. HACMP 5. Of course if you configure an application monitor. the start/stop scripts from the node where they exist will be copied to all other nodes. The attempt to release these resources might fail if there are remnants of the application still running. Application monitors are not covered in this class. It is a good idea to then wait until it is sure that the application has completely started. they must reside on a local non-shared filesystem) or you will not be able to verify and synchronize the cluster. 2008 Unit 6. The stop script’s responsibility is to stop the application. the cluster manager will monitor the startup and/or the continuous running of the application. In our case we will create the ywebserver. The cluster manager doesn’t verify that the application has started or that the start script exits with a 0 return code.2 and later provides a file collection facility to help keep the start and stop scripts in synch. 1998. © Copyright IBM Corp.0 Instructor Guide Uempty The start script should then start the application. If you are using the auto-correction facility of verification. Mutual takeover configuration At this point you would repeat the step to define all the application servers for all the applications. The start and stop scripts must exist and be executable on all cluster nodes defined in the resource group (that is. They are covered in detail in the HACMP System Administration II class. Be sure this is what you want.V4. It must not exit until the application is totally stopped as HACMP will start to unmount filesystems and release other resources as soon as the stop script terminates. you would specify the application server for the application that will run on the other node. . In most cases this is acceptable. If creating a mutual takeover configuration. you can use the next menu item to create your shared volume groups. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 6-44 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Optionally.Instructor Guide Instructor notes: Purpose — Finish the application server definition. We’ll be discussing C-SPOC a little later. 2008 Unit 6. We will learn much more about the process you’d use if you chose to define your volume groups here. 1998. Logical Volumes and Filesystems Concurrent Volume Groups and Logical Volumes F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image Create volume groups for every application. © Copyright IBM Corp. The process used when following along the Standard path is through C-SPOC. regardless of which node they’ll run on. later in the course.V4. If creating a mutual takeover configuration.0 Instructor Guide Uempty Configure volume groups (optional) At this point you can proceed to the next item in the menu to make your Volume Groups via C-SPOC – You may have done this earlier when you configured the basics (Planning and Base Configuration) If you choose to. Initial cluster configuration 6-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you would create from one node all the volume groups for all the applications. Creating the volume groups can be done outside of the cluster configuration process or integrated. It is recommended that you use C-SPOC to create your volume group definitions whether you do it at this point or independent of the cluster configuration process. regardless of which system the application will normally run if mutual takeover cluster © Copyright IBM Corporation 2008 Figure 6-19. . Configure volume groups (optional) AU548.0 Notes: Volume group creation Planning your volume groups is critical as we’ve discussed in a previous unit. follow the menus… Configure Resources to Make Highly Available Move cursor to desired item and press Enter. Mutual takeover configuration At this point you would repeat the step to define all the volume groups for all the applications. Configure Configure Configure Configure Service IP Labels/Addresses Application Servers Volume Groups. specifying only the nodes that the application may run on. you’ll want to discover them. . 6-46 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the optional menu to define shared volume groups through C-SPOC. 1998. Transition statement — If you created any volume groups. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Review start and stop script requirements. . then you need to re-generate the pick lists. Discover HACMP-related Information from Configured Nodes Extended Topology Configuration Extended Resource Configuration Extended Cluster Service Settings Extended Event Configuration Extended Performance Tuning Parameters Configuration Security and Users Configuration Snapshot Configuration Export Definition File for Online Planning Worksheets Import Cluster Configuration from Online Planning Worksheets File Extended Verification and Synchronization HACMP Cluster Test Tool F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-20. Discover the volume groups for pick-lists AU548. 2008 Unit 6.0 Instructor Guide Uempty Discover the volume groups for pick-lists Run discovery if you created Volume Groups in the previous step Our first look at the Extended Configuration Path Extended Configuration Move cursor to desired item and press Enter. Pick list information is kept in flat files. The volume group information is in /usr/es/sbin/cluster/etc/config/clvg_config. © Copyright IBM Corp. The IP information is kept in /usr/es/sbin/cluster/etc/config/clip_config. This applies to network objects as well as LVM objects.V4. 1998. Initial cluster configuration 6-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This requires using the Extended Configuration menu shown above.0 Notes: Now run discovery If you chose to create groups. Additional information — Transition statement — Now that we have created the resources. . It’s a good idea to prepare the pick-lists prior to creating the resource groups.Instructor Guide Instructor notes: Purpose — Show the “discover” menu in the extended configuration path. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 6-48 HACMP Implementation © Copyright IBM Corp. Details — Explain that it is probably a good idea to run this at this point regardless of whether volume groups were created in the previous step. 1998. we are ready to create the resource group definition. we are ready to create the xwebgroup Resource Group definition. 1998. Add a Resource Group Change/Show a Resource Group Remove a Resource Group Change/Show Resources for a Resource Group (standard) F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-21.0 Notes: Menu to add a resource group Now. 2008 Unit 6.V4. . Adding the xwebgroup resource group AU548. © Copyright IBM Corp.0 Instructor Guide Uempty Adding the xwebgroup resource group smit hacmp -> Initialization and Standard Configuration -> Configure HACMP Resource Groups Configure HACMP Resource Groups Move cursor to desired item and press Enter. Initial cluster configuration 6-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. let’s select add a resource group. 6-50 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show the menu to “Add a Resource Group”. 1998. Details — Additional information — Transition statement — Now. The order is important. Depending on the type of resource group and how it is configured.V4. and policies Add a Resource Group *Resource Group Name [xwebgroup] *Participating Nodes(Default Node Priority) [usa uk] Startup Policy Online On Home Node O> + Fallover Policy Fallover To NextPrio> + Fallback Policy Fallback To Higher Pr> + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Repeat the process for every Resource Group to be configured if mutual takeover cluster. for example. nodes. the relative priority of nodes within the resource group might be quite important. with usa being the home or highest priority node. ywebgroup. Mutual takeover configuration Another resource group would be defined. 2008 Unit 6. © Copyright IBM Corp. with the order of the participating nodes reversed. 1998. .usa © Copyright IBM Corporation 2008 Figure 6-22.0 Notes: Filling out the Add a Resource Group menu We’ll call this resource group “xwebgroup. The policies will be chosen as listed in the visual.0 Instructor Guide Uempty Setting name. nodes. and policies AU548.” It will be defined to operate on two nodes: usa and uk. Initial cluster configuration 6-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Setting name. creating a ywebgroup with uk. Instructor Guide Instructor notes: Purpose — Show the “Add Resource Group” screen. Details — Additional information — Transition statement — Time to add the resources to the newly created resource group. 6-52 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Adding resources to the xwebgroup RG (1 of 2) Configure HACMP Resource Groups Move cursor to desired item and press Enter. Add a Resource Group Change/Show a Resource Group Remove a Resource Group Change/Show Resources for a Resource Group (standard) +----------------------------------------------------------------------+ ¦ Select a Resource Group ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ xwebgroup ¦ ¦ | ywebgroup ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ ¦ /=Find n=Find Next ¦ +----------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-23. Adding resources to the xwebgroup RG (1) AU548.0 Notes: Selecting the resource group Here’s the Configure HACMP Resource Groups menu in the standard configuration path. This menu is found under the standard configuration path’s top level menu. Select the Change/Show Resources for a Resource Group (standard) to get started. When the “Select a Resource Group” popup appears, select which resource group you want to work with and press Enter. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show how to get to the “Change/Show Resources or a Resource Group (standard)” screen. Details — Additional information — Transition statement — We’ve selected the xwebgroup resource group; so let’s press Enter. 6-54 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Adding resources to the xwebgroup RG (2 of 2) Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. Custom Resource Group Name Participating Node Names (Default Node Priority) Startup Behavior Fallover Behavior Fallback Behavior Service IP Labels/Addresses Application Servers Volume Groups Use forced varyon of volume groups, if necessary Filesystems (empty is ALL for VGs specified) [Entry Fields] xwebgroup usa uk Online On First Avail> Fallover To Next Prio> Fallback To Higher Pr> [xweb] [xwebserver] [xwebvg] false [] + + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Repeat previous two steps for every configured Resource Group if mutual takeover cluster, in our case, ywebgroup © Copyright IBM Corporation 2008 Figure 6-24. Adding resources to the xwebgroup RG (2 of 2) AU548.0 Notes: Filling out the Change/Show All Resources and Attributes for a Resource Group menu This is the screen for showing/changing resources in a resource group within the standard configuration path. There really aren’t a lot of choices to be made: xweb is the service IP label we created earlier and xwebserver is the application server that we just defined. xwebvg is a shared volume group containing a the filesystems needed by the xwebserver application. We could specify the list of filesystems in the Filesystems field but the default is to mount/unmount all filesystems in the volume group. Not only is this what we want, but very practical because it’s easier to maintain over time. This way you don’t have to continue to update the resource group as you add filesystems to the volume group. Remember to press Enter to actually add the resources to the resource group. Using Extended path to configure resources in the Resource Group © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Although the Extended Path hasn’t been covered in detail, the configuring of resources in a Resource Group can be done through that path. When using the Extended Path, there are many more options. Make note of this as you may want to check this in the lab or may need to know this when configuring your cluster at home. Mutual takeover configuration At this point you would repeat the step to define all the resource groups and associated resources for all the applications. If creating a mutual takeover configuration, you would specify the resource groups and associated resources for the application that will run on the other node. 6-56 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the “Change/Show Resources in a Resource Group” screen within the standard configuration path. Details — Explain what’s going on here but don’t get bogged down. Additional information — Transition statement — Now, it’s time again to verify and synchronize and test the changes. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Synchronize and test the changes Initialization and Standard Configuration Move cursor to desired item and press Enter. Configuration Assistants Configure an HACMP Cluster and Nodes Configure Resources to Make Highly Available Configure HACMP Resource Groups Verify and Synchronize HACMP Configuration HACMP Cluster Test Tool Display HACMP Configuration F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-25. Synchronize and test the changes AU548.0 Notes: Using the standard configuration to synchronize and test After you’ve defined or changed the cluster’s topology or resources or both, you need to: - Verify and synchronize your changes - Test your configuration Verify and synchronize These menu choices act immediately in the Standard Configuration. Their actions can be customized in the Extended Configuration menus which we will not cover here. The verification process collects AIX configuration information from each cluster node and uses this information, the cluster’s current configuration (if there is one) and the proposed configuration to verify that the proposed configuration (and the change it represents if this is not the first synchronization) is valid. It is possible to override 6-58 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty verification errors but only if you are using Extended Configuration. Deciding to do so is a decision that must be approached with the greatest of care, because it is very unusual for a verification error to occur that can be safely overridden. Also, remember the earlier discussion about synchronization--any HACMP configuration changes made on any other cluster node will be lost if you complete a synchronization on this cluster node. Log files are created to show progress and problems. Check /var/adm/hacmp/clverify for the logs. These log files have been vastly improved over the years with more details on the commands being run during verify to help in determining the problems encountered during verify. Testing your cluster You must test your newly configured cluster for proper functioning. You also must test your cluster on a regular basis to ensure that it will continue to function properly. Finally, you must test your cluster after every change to the environment, whether directly related to the cluster or not. It is highly recommended that you create a comprehensive test plan prior to configuring your cluster to be used during the test phase. The test plan should be made up of a list of test procedures. The test procedure should include (but not be limited to) the following: - Description of the what the procedure is testing (for example, node crash) - Description of the expected results of the test (for example, application will fallover to node b) - Description of the method by which the test will be conducted (for example, node a will be powered off) - Space for comments on test results HACMP cluster test tool This test facility is disruptive to the cluster so you want to run it when not running cluster services. Thus application downtime is required. The Standard Configuration automated test procedure performs four sets of tests in the following order: 1. General topology tests 2. Resource group tests on non-concurrent resource groups 3. Resource group tests on concurrent resource groups 4. Catastrophic failure test © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide The Cluster Test Tool discovers information about the cluster configuration, and randomly selects cluster components, such as nodes and networks, to be used in the testing. See the Administration Guide, Chapter 7, for more details. 6-60 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the cluster verification and synchronization step and provide a point for a longer discussion of synchronization. Don’t get into the multiple copies of ODM repositories concept yet because that comes later and the students are going to be buried in enough new facts for one unit. Details — Additional information — Transition statement — What have we done so far and what’s next? © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide What do you have at this point? You have a cluster configured as follows: – Two nodes defined – One network defined containing the boot and service addresses – Application Server objects defined containing the start and stop scripts for all the applications to be made highly available – Volume groups defined to contain the data for the applications – Resource Groups defined that dictate the application ownership priority and contain the service labels, application servers and volume groups for the applications – All this has been synchronized But to make this cluster complete, it needs: – A non-IP network – Persistent IP addresses In addition, a snapshot of your work would be prudent Now extend the configuration… © Copyright IBM Corporation 2008 Figure 6-26. What do we have at this point? AU548.0 Notes: It’s a start We have accomplished a large portion of the cluster configuration. The nodes, IP addresses (service and boot), networks, application servers, volume groups, and resource groups have been configured. This configuration has been synchronized across the cluster nodes. We indicate that some level of testing could be performed at this point. You can wait until after we do the rest of the configuration to test everything, or break it up as we have it here. What’s left? Recall the strong recommendation to include at least one non-IP network in your cluster? Well, we haven’t done that yet. And what about access to the cluster nodes using a reliable non-service, non-boot IP address? We can accomplish that with a persistent address. Finally, it is always a good idea to create backups after producing 6-62 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty this much good work. It is advisable to create both a snapshot of the cluster configuration and a mksysb of the systems. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show our progress in configuring the cluster and what is left. Details — Additional information — Transition statement — Some initial configuration steps require the use of Extended Configuration. Let’s now take a look at them. 6-64 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Extending the configuration Extended Configuration Move cursor to desired item and press Enter. Discover HACMP-related Information from Configured Nodes Extended Topology Configuration Extended Resource Configuration Extended Cluster Service Settings Extended Event Configuration Extended Performance Tuning Parameters Configuration Security and Users Configuration Snapshot Configuration Export Definition File for Online Planning worksheets Import Cluster Configuration from Online Planning Worksheets File Extended Verification and Synchronization HACMP Cluster Test Tool F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-27. Extending the configuration AU548.0 Notes: Reasons to use extended path Here’s the top-level extended configuration path menu. We need to pop over to this path in order to perform some steps that cannot be done using the Standard Configuration such as defining a non-IP network, adding a persistent label and saving the configuration data. We will explore these steps in this unit. Extended Configuration is also required for configuring IPAT via Replacement and Hardware Address Takeover as well as defining an SSA heartbeat network. These are not discussed in this course. Appendix C covers IPAT via Replacement and Hardware Address Takeover, and Appendix D covers SSA heartbeat networks. Finally, other reasons for using the Extended Path will be covered in the course HACMP Administration II: Administration and Problem Determination (AU61). © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show the top-level extended configuration path menu. Details — Additional information — Transition statement — We start with non-IP networks, which are elements of the cluster’s topology. 6-66 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Extended topology configuration menu Extended Topology Configuration Move cursor to desired item and press Enter. Configure an HACMP Cluster Configure HACMP Nodes Configure HACMP Sites Configure HACMP Networks Configure HACMP Communication Interfaces/Devices Configure HACMP Persistent Node IP Label/Address Configure HACMP Global Networks Configure HACMP Network Modules Configure Topology Services and Group Services Show HACMP Topology F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do © Copyright IBM Corporation 2008 F8=Image Figure 6-28. Extended topology configuration menu AU548.0 Notes: Getting to the non-IP network configuration menus Non-IP networks are elements of the cluster’s topology; so we’re in the topology section of the extended configuration path’s menu hierarchy. A non-IP network is defined by specifying the network’s end-points. These end-points are called communication devices; so we have to head down into the communication Interfaces/devices part of the extended topology screens. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show the extended path’s Topology Configuration menu. Details — Additional information — Transition statement — A non-IP network is defined by specifying the end-points; so we need to head down into the communication interfaces/devices part of the HACMP menus. 6-68 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Communication interfaces and devices Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-29. Communication interfaces and devices AU548.0 Notes: The communication interfaces and devices menu This is the communication and devices part of the extended configuration path. We will select the Add Communication Interfaces/Devices option. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show the extended path’s communication interfaces and devices menu. Details — Additional information — Just a menu--move on. Transition statement — The first step is to select the Add Communication Interfaces/Devices entry. 6-70 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Defining a non-IP network (1 of 3) Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings +--------------------------------------------------------------------------+ Select a category ¦ ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ ¦ Add Discovered Communication Interface and Devices ¦ ¦ Add Predefined Communication Interfaces and Devices ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-30. Defining a non-IP network (1 of 3) AU548.0 Notes: Deciding which “Add” to choose The first question we encounter is whether we want to add discovered or pre-defined communication interfaces and devices. The automatic discovery that was done when the added the cluster nodes earlier would have found the rs232/hdisk devices; so we pick the Discovered option. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Illustrate the choice that must be made between discovered and pre-defined communication interfaces and devices. Details — Additional information — Transition statement — Select the Discovered choice, and we’re faced with another question. 6-72 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Defining a non-IP network (2 of 3) Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings +--------------------------------------------------------------------------+ Select a category ¦ ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. ¦ ¦ ¦ # Discovery last performed: (Feb 12 18:20) ¦ ¦ ¦ Communication Interfaces ¦ ¦ Communication Devices ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-31. Defining a non-IP network (2 of 3) AU548.0 Notes: Is it an interface or a device? Now we need to indicate whether we are adding a communication interface or a communication device. Non-IP networks use communication devices as end-points (dev/tty, for example); so select Communication Devices to continue. © Copyright IBM Corp. 1998, 2008 Unit 6. Initial cluster configuration 6-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Show the choice that must be made between adding communication interfaces versus communication devices. Details — Additional information — Transition statement — We’re adding a non-IP network; so choose Communication Devices choice. 6-74 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. c. ¦ ONE OR MORE items can be selected. 1998. we cover SSA in Appendix D. /dev/tty1 on usa is connected to /dev/tty1 on uk using a fully wired rs232 null-modem cable (don’t risk a potentially catastrophic partitioned cluster by failing to configure a non-IP network or by using cheap cables). Before you use this smit screen to define the non-IP network. if possible. and press Enter to define the network. For our example. Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. Select these two devices. . the non-IP rs232 network connecting usa to uk can be tested as follows: © Copyright IBM Corp. Choose the ttys for a serial network too. b.0 Instructor Guide Uempty Defining a non-IP network (3 of 3) Press Enter and HACMP defines a new non-IP network with these communication devices. make sure that you verify that the link between the two nodes is actually working. 2008 Unit 6. You can either choose to add an rs232 (using the /dev/tty entries) network or a diskhb network (using the /dev/hdisk entries). rs232 networks The steps to follow to create and test the rs232 network: a.0 Notes: We’re now presented with a list of the “discovered” communication devices. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings +--------------------------------------------------------------------------------------------+ ¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦ ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. ¦ ¦ ¦ ¦ # Node Device Pvid ¦ ¦ > usa hdisk5 000b4a7cd10c73d78 ¦ uk hdisk5 000b4a7cd10c73d78 ¦ ¦ > usa /dev/tty1 ¦ ¦ ¦ uk /dev/tty1 ¦ ¦ usa /dev/tmssa1 ¦ ¦ uk /dev/tmssa2 ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ n=Find Next ¦ F1¦ /=Find F9+--------------------------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-32. If you’re interested. Initial cluster configuration 6-75 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Defining a non-IP network (3 of 3) AU548.V4. ¦ ¦ Press Enter AFTER making all selections. On Node A. diskhb networks The steps to follow to configure and test a Heartbeat on Disk network: a. iv. . 6-76 HACMP Implementation © Copyright IBM Corp. The command should hang. Issue the command stty < /dev/tty1 on one node. because each serial network can only connect between two nodes.Instructor Guide i. unless there is a concurrent (online on all nodes as startup policy in resource group) is used. ii. Heartbeat on disk is not supported across multiple nodes.You should then see on both nodes: “Link operating normally. 1998. This feature is discussed in more detail in HACMP System Administration II. You can test the connection using the command /usr/sbin/rsct/bin/dhb_read as follows: . the serial network must form a loop.Serial port 0 on node B to serial port 1on node C. For example. These commands should not be run while HACMP is using the tty. enter dhb_read -p hdisk5 -r . then from . AU61. Make sure you choose a pair of entries (such as /dev/hdisk5 shown in the figure).Serial port 0 on node A to serial port 1on node B. This requires the implementation of Multi-node Disk Heartbeat which is beyond the scope of this class.Serial port 0 on node C to serial port 1 on node A Such a configuration would require the definition of three serial networks. the RS232 cables might run from: . If you get any other behavior then you probably are using the wrong cable or the rs232 cable isn’t connected the way that you think it is). b. Note that it is actually the pvids that must match since this is the same disk. then from .On Node B. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.” Handling more than two nodes In a cluster with more than two nodes. enter dhb_read -p hdisk5 -t . Issue the command stty < /dev/tty1 on the other node. If you get the behavior described above (especially including the hang in the first step that “recovers” in the second step). then the ports are probably connected together properly (check the HACMP log files when the cluster is up to be sure). iii. The command should immediately report the tty’s status and the command that was hung on the first node should also immediately report its tty’s status. in a three-node cluster. one for each of two nodes. 1998. 2008 Unit 6. Additional information — You should probably emphasize that a functioning non-IP network is mandatory if you want to avoid potentially catastrophic problems. Point out that this screen also shows what the choices look like for tmssa and diskhb. then there’s a problem with the network which must be resolved. let’s define some persistent node IP labels. .V4. such as partitioned clusters. Transition statement — While we’re on the extended path. The only real test is to check the HACMP log files when the cluster is up. If the non-IP network is declared dead. © Copyright IBM Corp. Initial cluster configuration 6-77 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — The test described isn’t perfect.0 Instructor Guide Uempty Instructor notes: Purpose — Show the final step in defining a non-IP network. This could cause some application traffic to use the persistent address/interface causing unpredictable behavior. the service IP label that you configure into the application’s resource group). The (slight) risk associated with persistent node IP labels is that users might start using them to access applications within the cluster. Also. . 1998. Defining persistent node IP labels (1 of 3) AU548. be careful if you decide to put the persistent address on the same subnet as the service address for an application that might be hosted. You should discourage this practice as the application might move to another node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instead.Instructor Guide Defining persistent node IP labels (1 of 3) Configure HACMP Persistent Node IP Label/Addresses Move cursor to desired item and press Enter. Add a Persistent Node IP Label/Address Change / Show a Persistent Node IP Label/Address Remove a Persistent Node IP Label/Address F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-33. users should be encouraged to use the IP address associated with the application (that is. 6-78 HACMP Implementation © Copyright IBM Corp.0 Notes: Benefits/risks on using persistent IP labels Defining a persistent node IP label on each cluster node allows the cluster administrators to contact specific cluster nodes (or write scripts which access specific cluster nodes) without needing to worry about whether the service IP address is currently available or which node it is associated with. Details — Additional information — Transition statement — Okay.0 Instructor Guide Uempty Instructor notes: Purpose — Show the menu to select “Add a Persistent Node IP Label/Address”. 1998. . © Copyright IBM Corp.V4. Initial cluster configuration 6-79 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 6. now let’s go to the Add menu. ¦ ¦ ¦ usa ¦ ¦ uk ¦ ¦ ¦ F2=Refresh F3=Cancel ¦ ¦ F1=Help ¦ F8=Image F10=Exit Enter=Do ¦ ¦ /=Find n=Find Next ¦ +-------------------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 6-34. .0 Notes: First. but it isn’t required. Add a Persistent Node IP Label/Address Change / Show a Persistent Node IP Label/Address Remove a Persistent Node IP Label/Address +-------------------------------------------------------------------------------------+ ¦ Select a Node ¦ ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. 6-80 HACMP Implementation © Copyright IBM Corp. One Persistent Address is supported per network. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide Defining persistent node IP labels (2 of 3) Configure HACMP Persistent Node IP Label/Addresses Move cursor to desired item and press Enter. Each node can have a Persistent Address or Addresses defined. you select a node Selecting the Add a Persistent Node IP Label/Address choice displays this prompt for which node we’d like to define the address on. Defining persistent node IP labels (2 of 3) AU548. 1998. Initial cluster configuration 6-81 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the menu to select a node.V4. let’s hit Enter and see the rest of the configuration choices. © Copyright IBM Corp. Details — Additional information — Transition statement — Okay. 2008 Unit 6. . You can repeat these persistent menus to choose a persistent label for the other nodes. Add a Persistent Node IP Label/Address Type or select values in entry fields. Press Enter AFTER making all desired changes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. select the appropriate IP network from the Network Name and IP Label/Address that you want to use from the pick lists.Instructor Guide Defining persistent node IP labels (3 of 3) Press Enter and then repeat for the uk persistent IP label. 1998.0 Notes: Filling out the Add a Persistent Node IP Label/Address menu When you’re on this screen. 6-82 HACMP Implementation © Copyright IBM Corp. Defining persistent node IP labels (3 of 3) AU548. Press Enter to finish the operation. . * Node Name * Network Name * Node IP Label/Address [Entry Fields] usa [net_ether_01] + [usaadm] + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 6-35. 1998. . © Copyright IBM Corp. 2008 Unit 6. so now it’s time to synchronize.V4. we made a change.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to finish configuring the Persistent IP label. Details — Additional information — Transition statement — Well. Initial cluster configuration 6-83 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 6-84 HACMP Implementation © Copyright IBM Corp. or Both . Automatically correct errors found during verification? . Press Enter AFTER making all desired changes.Instructor Guide Synchronize smitty hacmp -> Extended Configuration Extended Verification and Synchronization Type or select values in entry fields. Synchronize. Synchronize or Both * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [No] + [No] + [Standard] + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 F4=List F8=Image Figure 6-36. When the extended path version is chosen. 1998. [Entry Fields] [Both] + [No] + * Verify. . Synchronizing without verifying is almost certainly a foolish idea except in the most exotic of circumstances. it presents a customization menu (shown above) which the standard path does not do: Verify. This feature can fix certain errors that clverify detects.0 Notes: The Extended Verification and Synchronization menu This time the extended configuration path’s HACMP Verification and Synchronization screen was chosen. This option only displays if cluster services are not started.This option is useful to verify a change without synchronizing it (you might want to make sure that what you are doing makes sense without committing to actually using the changes yet). Synchronize AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. By default it is turned off.This option is discussed in the unit on problem determination. It seems rather risky once the cluster is in production. the verification will run slightly faster.” This can be quite useful if you are having trouble figuring out what is going wrong with a failed verification.0 Instructor Guide Uempty Force synchronization if verification fails? . As a result. Make sure that you really and truly must set this option to Yes before doing so.Setting this option to “Yes” will cause the verification to focus on aspects of the configuration that changed since the last synchronization. Logging .V4. © Copyright IBM Corp. This might be useful during the mid to early stages of cluster configuration. 1998. 2008 Unit 6. .This is almost always a very bad idea. Verify changes only? .You can increase the amount of logging related to this verification and synchronization by setting this option to “Verbose. Initial cluster configuration 6-85 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Point out the correction feature of HACMP 5. 1998. .Instructor Guide Instructor notes: Purpose — Show the extended configuration path’s “HACMP Verification and Synchronization” visual. It will be covered in more detail in the problem determination unit of this course. Additional information — Transition statement — Time to save our configuration.2 and later. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 6-86 HACMP Implementation © Copyright IBM Corp. which allows you to recover the cluster definitions. This visual looks at the snapshot method and the next visual looks at the XML method. * Cluster Snapshot Name Custom-Defined Snapshot Methods Save Cluster Log Files in snapshot * Cluster Snapshot Description [Entry Fields] [] [] No [] / + + © Copyright IBM Corporation 2008 Figure 6-37. If necessary there is. The xml file can also be used with the online planning worksheets and potentially with other applications. The cluster can be restored either from the snapshot file for the xml file. Press Enter AFTER making all desired changes. Initial cluster configuration 6-87 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . another option to restore (apply) a snapshot. The info file is discussed further in the AU61 course HACMP Administration II: Administration and Troubleshooting. 2008 Unit 6. 1998. There is also an info file.0 Instructor Guide Uempty Save configuration: snapshot Snapshot Configuration Move cursor to desired item and press Enter.0 Notes: Saving the cluster configuration You can save the cluster configuration to a snapshot file or to an XML file. © Copyright IBM Corp. Create a Snapshot of the Cluster Configuration Change/Show a Snapshot of the Cluster Configuration Remove a Snapshot of the Cluster Configuration Restore the Cluster Configuration From a Snapshot Configure a Custom Snapshot Method Convert an Existing Snapshot for Online Planning Worksheets Create a Snapshot of the Cluster Configuration Type or select values in entry fields. from the Snapshot Configuration menu. Save configuration: snapshot AU548.V4. Creating a snapshot smit hacmp -> Extended Configuration -> Snapshot Configuration A snapshot captures the HACMP ODM files. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide Instructor notes: Purpose — Explain the methods of saving the configuration and focus on this visual on the snapshot method. . Details — Additional information — Transition statement — What about the xml method? 6-88 HACMP Implementation © Copyright IBM Corp. . you can save the cluster configuration directly to an xml file via the menu Export Definition File for Online Planning Worksheets or from a snapshot via the Snapshot Configuration menu Convert Existing Snapshot For Online Planning Worksheets. you can use the Online Planning worksheets to get an updated view of the configuration or change the configuration or both. For the moment. Save configuration: xml file AU548. it is /usr/es/sbin/cluster/utilities/cl_opsconfig © Copyright IBM Corp. Create a Snapshot of the Cluster Configuration Change/Show a Snapshot of the Cluster Configuration Remove a Snapshot of the Cluster Configuration Restore the Cluster Configuration From a Snapshot Configure a Custom Snapshot Method Convert an Existing Snapshot for Online Planning Worksheets © Copyright IBM Corporation 2008 Figure 6-38.haw] / [] Snapshot Configuration * File Name Cluster Notes Move cursor to desired item and press Enter.0 Instructor Guide Uempty Save configuration: xml file smitty hacmp ->Extended Configuration Export Definition File for Online Planning Worksheets Type or select values in entry fields. 1998. Press Enter AFTER making all desired changes. The xml file can potentially be used from other applications or manually to make and display configuration information. This will be explored in the lab exercise for this course. When created. Initial cluster configuration 6-89 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Creating the xml file Using Extended Configuration. [Entry Fields] [/var/hacmp/log/cluster. 2008 Unit 6. in case you want to know the command to apply an xml file.V4. . 1998. hot-standby cluster configuration? 6-90 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show how to save configuration data to an xml file. What about a simple 2-node. Details — Additional information — Transition statement — You’ve seen the Standard path and Extended path options required to create a complete cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Notes: The two-node cluster configuration assistant smit menu If you have a simple two-node. Resource group and a non-IP network using heartbeat over disk. The example in the visual is run from the usa node. If your network is setup correctly and you have configured a shared enhanced concurrent mode volume group. Here is the menu. synchronization is done and you are all ready to start cluster services on both nodes. Also. Resources. Two-node cluster configuration assistant AU548. then HACMP will use this menu to build a complete two-node cluster including Topology. the Two-Node Cluster Configuration Assistant might be the answer. Press Enter AFTER making all desired changes. 1998. 2008 Unit 6. on both nodes. Initial cluster configuration 6-91 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. hot-standby cluster to configure. .V4. * * * * * Communication Path Application Server Application Server Application Server Service IP Label to Takeover Node Name Start Script Stop Script [Entry Fields] [ukboot1] [xwebserver] [/mydir/xweb_start] [/mydir/xweb_stop] [xweb] + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 6-39. This makes usa the home node (highest priority) in the resource group that is created. You will have defined the boot addresses on both usa and uk and created any shared volume groups.0 Instructor Guide Uempty Two-node cluster configuration assistant Two-Node Cluster Configuration Assistant Type or select values in entry fields. 6-92 HACMP Implementation © Copyright IBM Corp.Instructor Guide System-generated names will be created based on the application server name supplied for the cluster. and application server. resource group. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Instructor Guide Uempty Instructor notes: Purpose — Show the smit menu to execute the 2-node Cluster Configuration Assistant. © Copyright IBM Corp. 2008 Unit 6. let’s see what the Two node assistant gives us. Details — Additional information — Well. 1998. .V4. Initial cluster configuration 6-93 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 31 Network net_diskhb_01 uk_hdisk5_01 /dev/hdisk5 Resource Group xwebserver_group Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node in List Fallback Policy Never Fallback Participating Nodes usa uk Service IP Label xweb © Copyright IBM Corporation 2008 Figure 6-40. This command shows what is configured from the application point of view.16. 6-94 HACMP Implementation © Copyright IBM Corp.29 usaboot2 192. Notice that each node’s IP labels on the ethernet adapters have been defined on the net_ether_01 HACMP network. This command displays the cluster’s topology. 1998.15.168.168.16. Another utility is cldisp. Notice what policies you get automatically configured when using this approach.168.31 ukboot2 192. The non-IP diskhb network was also configured and appears with communication devices (dev/hdisk5) on each of the two nodes. .0 Notes: Seeing what was done One utility that displays what was done is the cltopinfo command.Instructor Guide What does the two-node assistant give you? # /usr/es/sbin/cluster/utilities/cltopinfo Cluster Name: xwebserver_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No There are 2 node(s) and 2 network(s) defined NODE usa: Network net_ether_01 usaboot1 192. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.29 Network net_diskhb_01 usa_hdisk5_01 /dev/hdisk5 NODE uk: Network net_ether_01 ukboot1 192. What does the two-node assistant give you? AU548.168.15. The Two-Node Configuration Assistant did “everything” -. . Else you are on your own to configure a non-IP network.V4. . Initial cluster configuration 6-95 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.created topology objects including a non-IP heartbeat over disk network when it saw an enhanced concurrent volume group. so you will have to decide if you like them. © Copyright IBM Corp. . You might have to remove ones for interfaces that you don’t want HACMP to have. 2008 Unit 6. If it is Enhanced Concurrent Mode then a non-IP heartbeat over disk network is configured. . . .Only one application and two-nodes are supported. created resource groups.The Fallback policy is set to Never Fallback.No persistent or rs-232 non-IP network is defined. .The assistant also takes for HACMP all network adapters found.You need to pre-configure the shared volume group.The Two-Node Configuration Assistant assigns names.0 Instructor Guide Uempty Points to observe . 1998. and verified and synchronized the cluster. . Additional information — Transition statement — Before moving on. Details — Make sure that the students see that the entire resource group has now been defined. 1998.Instructor Guide Instructor notes: Purpose — Show how to see the HACMP resource group configuration. 6-96 HACMP Implementation © Copyright IBM Corp. let’s see where we are in the process. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Where are we in the implementation? AU548. attributes: •Resources: Application Server. . filesystem) – Networks (IP interfaces. service label •Resource group: Identify name. VG. non-IP) – Application start and stop scripts Install the HACMP filesets and reboot Configure the HACMP environment – Topology •Cluster. 2008 Unit 6. nodes. LVM volume group. /etc/hosts. node names. © Copyright IBM Corp. service label.V4. filesystem – Synchronize Start Cluster Services • Test configuration • Save configuration © Copyright IBM Corporation 2008 Figure 6-41. Initial cluster configuration 6-97 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Where are we in the implementation? Plan for network. resource group. and application – Eliminate single points of failure Define and configure the AIX environment – Storage (adapters.0 Notes: Cluster configuration is implemented Wow! All is done except for starting Cluster Services. policies •Attributes: Application Server. storage. 1998. HACMP IP and non-IP networks – Resources. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. let’s fire up HACMP. 1998.Instructor Guide Instructor notes: Purpose — Show that we are finished with setting up a cluster configuration. . Details — Additional information — Transition statement — So. 6-98 HACMP Implementation © Copyright IBM Corp. V4. It might be worth pointing out that if you use the Web-based smit for HACMP fileset.” After a few times. 1998. 2008 Unit 6. Starting Cluster Services (1 of 4) AU548.0 Notes: How to start HACMP Cluster Services Starting Cluster Services involves a trip to the top-level HACMP menu because we need to go down into the System Management (C-SPOC) part of the tree. then there is a navigation menu that allows you to skip from one menu path to another one without having to go “back to the top.0 Instructor Guide Uempty Starting Cluster Services (1 of 4) HACMP for AIX Move cursor to desired item and press Enter. . C-SPOC will be covered in more detail in the next unit. Initial cluster configuration 6-99 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you will probably learn to use the command smit clstart or smitty clstart to bypass this menu and the next two menus. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools Cluster Simulator F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-42. © Copyright IBM Corp. . 1998. 6-100 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Starting with the top-level HACMP menu. choose System Management (C-SPOC).Instructor Guide Instructor notes: Purpose — Show how to get to the smit screen for starting Cluster Services via the smit menu hierarchy. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Notes: The C-SPOC menu Choose Manage HACMP Services next. Starting Cluster Services (2 of 4) AU548. . 2008 Unit 6.0 Instructor Guide Uempty Starting Cluster Services (2 of 4) System Management (C-SPOC) Move cursor to desired item and press Enter. 1998. Initial cluster configuration 6-101 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Open a SMIT Session on a Node F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-43. we will select Manage HACMP Services selection. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 6-102 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the top level System Management (C-SPOC) menu. Details — Additional information — Transition statement — Next. . Starting Cluster Services (3 of 4) AU548. 2008 Unit 6. Initial cluster configuration 6-103 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Starting Cluster Services (3 of 4) Manage HACMP Services Move cursor to desired item and press Enter. Start Cluster Services Stop Cluster Services Show Cluster Services F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 6-44.. © Copyright IBM Corp.V4.0 Notes: The Manage HACMP Services menu We’re almost there. . 1998. we are there! 6-104 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the Manage HACMP Services screen. Details — Additional information — Transition statement — Finally. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . © Copyright IBM Corp. For the moment. Starting Cluster Services (4 of 4) AU548. 2008 Unit 6. * Start now. on system restart or both Start Cluster Services on these nodes * Manage Resource Groups BROADCAST message at startup? Startup Cluster Information Daemon? Ignore verification errors? Automatically correct errors found during cluster start? [Entry Fields] now [usa.V4. we will just recommend the defaults.0 Notes: Startup choices There are a few choices to make. 1998. The other options are discussed in the next unit in more detail. This is often much faster than working your way through the menu tree. .uk] Automatically true true false Interactively + + + + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 6-45.0 Instructor Guide Uempty Starting Cluster Services (4 of 4) # smit clstart Start Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. Initial cluster configuration 6-105 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Remember the fast path Notice the “smit clstart” fastpath. except selecting both nodes and turning on the Cluster Information Daemon. Additional information — More information will come in the next unit.Instructor Guide Instructor notes: Purpose — Show the “Start Cluster Services” screen. Details — Explain the choices but try to avoid getting too bogged down. Transition statement — What if you want to start over completely? 6-106 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . V4. Initial cluster configuration 6-107 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Removing a cluster Use Extended Topology Configuration Configure an HACMP Cluster Move cursor to desired item and press Enter. you can: .Use Extended Configuration. . 1998. Removing a cluster AU548.0 Notes: Starting over If you have to start over.rm -r /usr/es/* (be very careful here) © Copyright IBM Corp.Remove the entries (but not the file) from /usr/es/sbin/cluster/etc/rhosts (on all nodes). .installp -u cluster . . then you can: .Stop cluster services on all nodes. 2008 Unit 6. If you really want to start over. Add/Change/Show an HACMP Cluster Remove an HACMP Cluster Reset Cluster Tunables F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image # > /usr/es/sbin/cluster/etc/rhosts © Copyright IBM Corporation 2008 Figure 6-46. as shown above to remove the cluster (on all nodes). 1998. 6-108 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — Transition statement — Time to say we are done.Instructor Guide Instructor notes: Purpose — Show how to start over. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. without a doubt. 2008 Unit 6. © Copyright IBM Corporation 2008 Figure 6-47.V4. Initial cluster configuration 6-109 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The cluster even has the mandatory non-IP network! © Copyright IBM Corp. It also keeps the folks with budgetary responsibility happier because each of the systems is clearly “doing something useful” all the time (many would argue that a system that is “just” acting as a standby for a critical application is doing something useful but it is a lot easier to make the case if both systems are actually running an important application at all times).0 Instructor Guide Uempty We're there! We've configured a two-node cluster with multiple resource groups. . the most common style of HACMP cluster as it provides a reasonably economical way to protect two separate applications. takeover) services to the other node. This particular style of cluster (mutual takeover with IPAT) is. This is. We're there! AU548. by far. 1998. The term mutual takeover derives from the fact that each node is the home node for one resource group and provides fallover (that is. the most common style of HACMP cluster.0 Notes: Mutual takeover completed We’ve finished configuring a two-node HACMP cluster with two resource groups operating in a mutual takeover configuration. including all the steps with a : – Each resource group has a different home (primary) node – Each resource group falls back to its home node on recovery This is called a two-node mutual takeover cluster usa X X uk Y Y Each resource group is also configured to use IPAT via IP aliasing. . 1998. 6-110 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Indicate that we’ve got to where we were going. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Just a few questions to answer now.. System Management (C-SPOC) d. True or False? It is possible to configure HACMP faster by having someone help you on the other node.Extended Configuration c. 5. . In which of the top-level HACMP menu choices is the menu for defining a non-IP heartbeat network? a.System Management (C-SPOC) d. In which of the top-level HACMP menu choices is the menu for starting and stopping cluster nodes? a. Initial cluster configuration 6-111 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: © Copyright IBM Corp.Initialization and Standard Configuration b.V4. True or False? It is possible to configure a recommended simple two-node cluster environment using just the standard configuration path. True or False? You must specify exactly which filesystems you want mounted when you put resources into a resource group.Problem Determination Tools 3.Extended Configuration c. 1998.0 Instructor Guide Uempty Checkpoint 1. 2008 Unit 6. 2.Problem Determination Tools 4.Initialization and Standard Configuration b. © Copyright IBM Corporation 2008 Figure 6-48. Checkpoint AU548. . 6-112 HACMP Implementation © Copyright IBM Corp.System Management (C-SPOC) d. In which of the top-level HACMP menu choices is the menu for defining a non-IP heartbeat network? a. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Problem Determination Tools 4. 1998.Initialization and Standard Configuration b. True or False? You must specify exactly which filesystems you want mounted when you put resources into a resource group.Problem Determination Tools 3. True or False? It is possible to configure HACMP faster by having someone help you on the other node.System Management (C-SPOC) d. True or False? It is possible to configure a recommended simple two-node cluster environment using just the standard configuration path.Instructor Guide Instructor notes: Purpose — Checkpoint questions Details — Checkpoint solutions 1. In which of the top-level HACMP menu choices is the menu for starting and stopping cluster nodes? a. You can’t create the non-IP network from the standard path.Initialization and Standard Configuration b. © Copyright IBM Corporation 2008 Additional information — Transition statement — Time for a break and lab. 2.Extended Configuration c.Extended Configuration c. 5. Break time! AU548.Extended Configuration c.In which of the top-level HACMP menu choices is the menu for defining a non-IP heartbeat network? a. Initial cluster configuration 6-113 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.True or False? You must specify exactly which filesystems you want mounted when you put resources into a resource group. The hike that goes up to the “tea house” is definitely worth an afternoon (you can pay money to go up on horseback if you don’t feel like walking for free).-) Aoccdrnig to a rscheearchr at an Elingsh uinervtisy.True or False? It is possible to configure HACMP faster by having someone help you on the other node. the olny iprmoetnt tihng is taht the frist and lsat ltteer is at the rghit pclae.True or False? It is possible to configure a recommended simple two-node cluster environment using just the standard configuration path.V4. it deosn't mttaer in waht oredr the ltteers in a wrod are. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe. Also.0 Instructor Guide Uempty Checkpoint 1.System Management (C-SPOC) d. can you read this? .In which of the top-level HACMP menu choices is the menu for starting and stopping cluster nodes? a.System Management (C-SPOC) d.Initialization and Standard Configuration b.Problem Determination Tools 4. There’s also a number of quite spectacular and not particularly strenuous hikes that start from near the point that this photograph was taken.Initialization and Standard Configuration b.Problem Determination Tools 3. 2008 Unit 6. 5. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. © Copyright IBM Corp. 2. .Extended Configuration c. 1998.0 Notes: Some notes from the developer :-) This is a photograph of Lake Louise in the Canadian Rocky Mountains (located about a 90 minute drive west of Calgary). © Copyright IBM Corporation 2008 Figure 6-49. If you are ever there. make sure that you rent one of the canoes in the photograph and go for a paddle out on the lake. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Please make sure that your clothes aren’t wet before fiddling with the computers in lab! 6-114 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Give the students a break. . You can’t create the non-IP network from the standard path. . d. True or False? It is possible to configure a recommended simple two-node cluster environment using just the standard configuration path. 2. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools 3.0 Instructor Guide Uempty Checkpoint solutions 1. c. 1998. In which of the top-level HACMP menu choices is the menu for starting and stopping cluster nodes? a. b. 4. In which of the top-level HACMP menu choices is the menu for defining a nonIP heartbeat network? a. True or False? True or False? You must specify exactly which filesystems you want mounted when you put resources into a resource group. 2008 Unit 6. d.V4. c. b. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools It is possible to configure HACMP faster by having someone help you on the other node.0 Notes: © Copyright IBM Corp. Unit summary AU548. © Copyright IBM Corporation 2008 Figure 6-50. 5. Initial cluster configuration 6-115 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998.Instructor Guide Instructor notes: Purpose — Details — Additional information — Transition statement — 6-116 HACMP Implementation © Copyright IBM Corp. 1: Administration Guide SC23-5177-04 HACMP for AIX. Version 5. .4. Version 5. Basic HACMP administration Estimated time 03:00 What this unit is about This unit describes basic administration tasks for HACMP for AIX. 1998.html HACMP manuals © Copyright IBM Corp. What you should be able to do After completing this unit.4.com/systems/p/library/hacmp_docs.4.4.V4. you should be able to: • Use the SMIT Standard and Extended menus to make topology and resource group changes • Describe the benefits and capabilities of C-SPOC • Perform routine administrative changes using C-SPOC • Start and stop Cluster Services • Perform resource group move operations • Discuss the benefits and capabilities of DARE • Use the snapshot facility to return to a previous cluster configuration or to roll back changes • Configure and use WebSMIT How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX.0 Instructor Guide Uempty Unit 7.4.ibm. Version 5. Basic HACMP administration 7-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Version 5. 2008 Unit 7.1: Troubleshooting Guide http://www-03.1: Planning Guide SC23-4862-10 HACMP for AIX.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.1: Installation Guide SC23-4864-10 HACMP for AIX. Version 5. Instructor Guide Unit objectives After completing this unit. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit objectives AU548. you should be able to: Use the SMIT Standard and Extended menus to make topology and resource group changes Describe the benefits and capabilities of C-SPOC Perform routine administrative changes using C-SPOC Start and stop Cluster Services Perform resource group move operations Discuss the benefits and capabilities of DARE Use the snapshot facility to return to a previous cluster configuration or to roll back changes Configure and use Web SMIT © Copyright IBM Corporation 2008 Figure 7-1.0 Notes: 7-2 HACMP Implementation © Copyright IBM Corp. . Basic HACMP administration 7-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Present unit objectives. 2008 Unit 7. Details — Additional information — Transition statement — In the first topic we’ll look at C-SPOC. 1998.V4. Instructor Guide 7-4 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998. 1998. How students will do it — Lecture and lab. How this will help students on their job — They will be able to administer their cluster. Basic HACMP administration 7-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. .0 Instructor Guide Uempty 7. What students will learn — How to perform basic administration of HACMP topology and resource groups.1 Topology and resource group management Instructor topic introduction What students will do — Learn about basic administration of HACMP topology and resource groups.V4. © Copyright IBM Corp. Instructor Guide Topology and resource group management After completing this topic. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: Add a resource group and resources to an existing cluster Remove a resource group from a cluster Add a new node to an existing cluster Remove a node from an existing cluster Configure a non-IP heartbeat network © Copyright IBM Corporation 2008 3 AU548.0 Figure 7-2. Topology and resource group management Notes: 7-6 HACMP Implementation © Copyright IBM Corp. . 1998. Basic HACMP administration 7-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — In this topic.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss objectives of topic 1. © Copyright IBM Corp. we are looking at how to do routine administration of cluster topology and resource groups. We’re now going to embark on a series of hypothetical scenarios (some more realistic than others) to illustrate these procedures. 2008 Unit 7. .V4. 1998. Add a resource group In this first scenario. 1998.Instructor Guide Yet another resource group The users have asked that a third application be added to the cluster The application uses very little CPU or memory and there's money in the budget for more disk drives in the disk enclosure Minimizing downtime is particularly important for this application The resource group is called zwebgroup usa X Y X Y uk Z Z © Copyright IBM Corporation 2008 Figure 7-3. This resource group’s application has been reported to use very little in the way of system resource. . we’re going to add a resource group to the cluster. 7-8 HACMP Implementation © Copyright IBM Corp. and there is a strong desire to avoid unnecessary zwebgroup outages. Yet another resource group AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This new resource group is called zwebgroup.0 Notes: Introduction We’re now going to embark on a series of hypothetical scenarios to illustrate a number of routine cluster administration tasks. Some of these scenarios are more realistic than others. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the first of a series of hypothetical scenarios.V4. Basic HACMP administration 7-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — We create the resource group first again. . © Copyright IBM Corp. 7-10 HACMP Implementation © Copyright IBM Corp.0 Notes: Add a resource group We use the Extended path. Press Enter AFTER making all desired changes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. * Resource Group Name * Participating Node Names (Default Node Priority) Startup Policy Fallover Policy Fallback Policy avoid startup delay by starting on first available node avoid fallback outage by never falling back [Entry Fields] [zwebgroup] [usa uk] + Online On First Avail> + Fallover To Next Prio> + Never Fallback + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Does the order in which the node names are specified matter? © Copyright IBM Corporation 2008 Figure 7-4. this resource group’s policies make it essentially identical to a cascading without fallback resource group.Instructor Guide Adding a third resource group We'll change the startup policy to "Online On First Available Node" so that the resource group comes up when usa is started when uk is down. . The combination of these two parameters should go a long way towards minimizing this resource group’s downtime. Add a Resource Group Type or select values in entry fields. Adding the third resource group AU548. 1998. It is configured to start up on whichever node is available first and to never fallback when a node rejoins the cluster. If you’re familiar with the older terminology of cascading and rotating resource groups. 2008 Unit 7. Details — Additional information — Transition statement — The zwebgroup application needs its own service IP label. .V4. 1998. © Copyright IBM Corp. Basic HACMP administration 7-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the Add a Resource Group screen filled in for the creation of the zwebgroup resource group. Instructor Guide Adding a third service IP label (1 of 2) The extended configuration path screen for adding a service IP label provides more options. Adding a third service IP label (1 of 2) AU548. ¦ ¦ ¦ ¦ Configurable on Multiple Nodes ¦ ¦ Bound to a Single Node ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-5. A cluster that uses only IPAT via IP replacement is for all practical purposes restricted to one resource group with a service IP label per node per IP network. There 7-12 HACMP Implementation © Copyright IBM Corp. service and interface IP addresses). 1998. Because our cluster has only one IP network.2 and above supports a maximum of 64 resource groups and 256 IP addresses known to HACMP (for example. . Resource group limits HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Choose those that mimic the standard path. IPAT via IP aliasing required Creating a third resource group on a cluster with one network and two nodes requires the use of IPAT via IP aliasing. it would not be able to support three different resource groups with service IP labels if it used IPAT via replacement. Configure HACMP Service IP Labels/Addresses Move cursor to desired item and press Enter.0 Notes: Introduction We need to define a service IP label for the zwebgroup resource group. Add a Service IP Label/Address Change/Show a Service IP Label/Address Remove Service IP Label(s)/Address(es) +--------------------------------------------------------------------------+ ¦ Select a Service IP Label/Address type ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. Service IP label/address type Bound to a Single Node is used with IBM’s General Parallel File System (GPFS). This is not shown in the visual. you run out of CPU power or memory or something for all the applications associated with these resource groups).0 Instructor Guide Uempty are no other limits on the number of resource groups with service labels that can be configured on an IPAT via IP aliasing network (although. . 2008 Unit 7. eventually. Network name The next step is to associate this Service Label with one of the HACMP networks.V4. 1998. © Copyright IBM Corp. Basic HACMP administration 7-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. In the case of per network one would have to have a different interface per RG on each node and implement a different network across the nodes for each RG. .Instructor Guide Instructor notes: Purpose — Show how to define the third service IP label. In the case of per node then only one resource group can be used with two nodes. • For distribution policies only one resource group per node or per network. 7-14 HACMP Implementation © Copyright IBM Corp. Transition statement — After selecting the Service label type and selecting the network. only per-node distribution policy is supported.3. Note: In HACMP 5. we can fill in the Service Label information. Additional information — The rules for IPAT via IP replacement are simple: • For non-distribution policies. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. each node can have at most 1 RG per network and each node must have as many interfaces as there are resource groups. Details — Point out to the students that having a third service IP label on a two-node network requires IPAT via IP aliasing. 2008 Unit 7. . field is used for hardware address takeover (HWAT).0 Notes: Adding a service IP label The visual shows the entry fields for this panel. © Copyright IBM Corp. Basic HACMP administration 7-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 1998... Add a Service IP Label/Address configurable on Multiple Nodes (extended) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * IP Label/Address [zweb] * Network Name net_ether_01 Alternate HW Address to accompany IP Label/Address [] + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-6.0 Instructor Guide Uempty Adding a third service IP label (2 of 2) The Alternate Hardware Address . You can find more information on HWAT configuration in Appendix C. Adding a third service IP label (2 of 2) AU548. we need a third application. .Instructor Guide Instructor notes: Purpose — Show the Extended Path screen to add a service label that shows HWAT. Details — Additional information — Transition statement — Next. 7-16 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Press Enter AFTER making all desired changes. © Copyright IBM Corp. 1998. Add Application Server Type or select values in entry fields. Basic HACMP administration 7-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Add an application server You must give it a name and specify a start and stop script. Adding a third application server AU548. * Server Name * Start Script * Stop Script [Entry Fields] [zwebserver] [/usr/local/scripts/startzweb] [/usr/local/scripts/stopzweb] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-7. 2008 Unit 7. .0 Instructor Guide Uempty Adding a third application server The Add Application Server screen is identical in both configuration paths.V4. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 7-18 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the screen to add a application server. Details — Additional information — Transition statement — Let’s complete the resource group definition with the resource attributes that we need. [TOP] Resource Group Name Participating Node Names (Default Node Priority) Startup Behavior Fallover Behavior Fallback Behavior Fallback Timer Policy (empty is immediate) Service IP Labels/Addresses Application Servers Volume Groups Use forced varyon of volume groups. © Copyright IBM Corp.. Press Enter AFTER making all desired changes. Adding resources to the third RG (1 of 2) AU548. . 2008 Unit 7. it contains a listing of all the possible attributes.0 Instructor Guide Uempty Adding resources to the third RG (1 of 2) The extended path's SMIT screen for updating the contents of a resource group is much more complicated! Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields.17] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 [Entry Fields] zwebgroup usa uk Online On First Avail> Fallover To Next Prio> Never Fallback [] + [zweb] [zwebserver] [zwebvg] false false [] fsck F4=List F8=Image + + + + + + + Figure 7-8.V4. if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check [MORE. 1998.0 Notes: Adding resources to a resource group (extended path) This is the first of two screens to show the Extended Path menu for adding attributes. Unlike the Standard path. Basic HACMP administration 7-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. 7-20 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show the adding of resources to the zwebgroup resource group. Details — Additional information — Transition statement — And the second screen. 1998. 17] Filesystems Consistency Check Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export (NFSv2/3) Filesystems/Directories to Export (NFSv4) Stable Storage Path (NFSv4) Filesystems/Directories to NFS Mount Network For NFS Mount Tape Resources Raw Disk PVIDs Fast Connect Services Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data WPAR Name [BOTTOM] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 [Entry Fields] fsck sequential false [] [] [] [] [] [] [] [] [] [] [] [] [] + + + + + + + + + + + + + + + F4=List F8=Image Figure 7-9. © Copyright IBM Corp.V4.4.0 Instructor Guide Uempty Adding resources to the third RG (2 of 2) Even more choices! Fortunately. Press Enter AFTER making all desired changes. 2008 Unit 7. New choices for HACMP 5.0 Notes: Adding resources to a resource group (extended path) More choices. 1998. Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields. .1 include the NFS V4 entries and the WPAR name.. [MORE.. Basic HACMP administration 7-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. only a handful tend to be used in any given context. Adding resources to the third RG (2 of 2) AU548. Instructor Guide Instructor notes: Purpose — Show the second screen in the Extended Path menu to add resources to a resource group. 1998. 7-22 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Okay. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. so now we synchronize. . 2008 Unit 7. © Copyright IBM Corporation 2008 Figure 7-10. HACMP Verification and Synchronization Type or select values in entry fields. Basic HACMP administration 7-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. * Verify. Press Enter AFTER making all desired changes. Synchronize or Both * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [Entry Fields] [Both] [No] [No] [No] [Standard] + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Remember to verify that you actually implemented what was planned by executing your test plan.V4. © Copyright IBM Corp.0 Notes: Extended path synchronization This is the Extended path screen to show the Synchronization menu options that are not shown in the Standard path. Synchronize your changes AU548. .0 Instructor Guide Uempty Synchronize your changes The extended configuration path provides verification and synchronization options. Details — This one gets about three seconds of air time. 7-24 HACMP Implementation © Copyright IBM Corp. .Instructor Guide Instructor notes: Purpose — Show the synchronize changes step. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. although. you might want to recap how this new resource group behaves. we get money for new node. Additional information — Transition statement — Now. 2008 Unit 7.0 Notes: Expanding the cluster In this scenario. Expanding the cluster AU548.0 Instructor Guide Uempty Expanding the cluster The Users "find" money in the budget and decide to "invest" it to improve the availability of the xweb and yweb applications Nobody seems to be too worried about the zweb application usa X Y X Y uk india X Y Z Z © Copyright IBM Corporation 2008 Figure 7-11. 1998. Basic HACMP administration 7-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we’ll look at adding a node to a cluster. . © Copyright IBM Corp.V4. 7-26 HACMP Implementation © Copyright IBM Corp. Details — Not much to say here that isn’t said on the visual. 1998.Instructor Guide Instructor notes: Purpose — Set the scene for the next scenario. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — Let’s first review the steps for how we add a node to the HACMP configuration. . 2008 Unit 7.Install AIX.Physically connect the new node –Connect to IP networks –Connect to the shared storage subsystem –Connect to non-IP networks to create a ring encompassing all nodes 2.Start Cluster Services on the new node 10. We’ll be discussing the HACMP part of this work. .Add the new node to the existing cluster (from one of the existing nodes) 7.Run through your (updated) test plan © Copyright IBM Corporation 2008 Figure 7-12. 1998.Synchronize your changes 9. Basic HACMP administration 7-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Adding a new cluster node Adding a node to an existing cluster isn’t all that difficult from the HACMP perspective (as we see shortly).Synchronize your changes again 12.Add non-IP networks for the new node 8. HACMP and application software on the new node: –Install patches required to bring the new node up to the same level as the existing cluster nodes –Reboot the new node (always reboot after installing or patching HACMP) 6. © Copyright IBM Corp.Copy /etc/hosts from this node to all other nodes 5.0 Instructor Guide Uempty Adding a new cluster node 1. The hard work involves integrating the node into the cluster from an AIX and from an application perspective.Add the new node's IP labels to /etc/hosts on one existing node 4.V4.Configure the shared volume groups on the new node 3.Add the new node to the appropriate resource groups 11. Adding a new cluster node AU548. Instructor Guide Instructor notes: Purpose — Show the process involved in adding a node to an existing cluster. Additional information — Remind the students that we will look at only the HACMP configuration aspects of the task of adding a node to an existing cluster. 7-28 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Give the students some time to digest this foil and be prepared for at least a short discussion (there’s quite a bit of information on this foil). . Transition statement — Let’s see how we add a node to the HACMP configuration. V4. The name that you assign to your cluster is pretty much arbitrary. If more than one node. 2008 Unit 7. Use F4 to generate a list. Basic HACMP administration 7-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. so use an existing node until the cluster is synchronized. or type one resolvable IP label or IP address for each node. * Cluster Name New Nodes (via selected communication paths) Currently Configured Node(s) [Entry Fields] [ibmcluster] [indiaboot1] + usa uk F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-13. • Cluster Name SMIT fills this field in based on the previous value. 1998. Add node: Standard path AU548. The india node won’t become an existing cluster node until we synchronize our changes in a few pages. . • New Nodes The new nodes are specified by giving the IP label or IP address of one currently active network interface on each node. Press Enter AFTER making all desired changes.0 Notes: Add node: Standard path This operation and any other SMIT HACMP operations must be performed from an existing cluster node. Leave as is or change. they should be space © Copyright IBM Corp. It appears in log files and the output of commands.0 Instructor Guide Uempty Add node: Standard path Configure Nodes to an HACMP Cluster (standard) Type or select values in entry fields. 1998. Obviously. .Instructor Guide separated. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The command launched by this SMIT screen contacts the clcomd at each address and asks them to come together in a new cluster. This path will be taken to initiate communication with the node. 7-30 HACMP Implementation © Copyright IBM Corp. HACMP must already be installed on the new nodes. 0 Instructor Guide Uempty Instructor notes: Purpose — Show how to create a cluster. . Basic HACMP administration 7-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. 1998. Additional information — Transition statement — Let’s see what we get when we press the Enter key. Details — Explain that the nodes are specified by giving IP labels or addresses. 2008 Unit 7. 0 Notes: Add node: Standard path (in progress) When the Enter key is pressed on the previous SMIT screen.. HACMP’s automatic discovery process begins. This could take a few F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find © Copyright IBM Corporation 2008 Figure 7-14. Hostname is india. Adding it to the configuration with Nodename india. Discovering IP Network Connectivity Retrieving data from available cluster nodes. When the nodes have been identified. [TOP] Communication path indiaboot1 discovered a new node.Instructor Guide Add node: Standard path (in progress) Here is the output shortly after pressing Enter: COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion. 1998. the discovery process retrieves the network and disk configuration information from each of the cluster nodes and builds a description of the new cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The network configuration information is used to create the initial IP network configuration.. additional instructions may appear below. 7-32 HACMP Implementation © Copyright IBM Corp. The remainder of the output from this SMIT operation isn’t particularly interesting (unless something goes wrong). so we’ll just ignore it for now.. Add node: Standard path (in progress) AU548. You will get an opportunity to add a node in the lab exercises. minutes. . Basic HACMP administration 7-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Show the initial output from the discovery process. Details — This output is interesting because it shows how HACMP uses the IP addresses or labels specified to discover the host names of the nodes that are to be cluster nodes. 1998. Transition statement — Let’s take a brief look at the Extended Path. Additional information — The remaining output is just barely human readable and it probably best left undiscussed at this point in time.V4. . © Copyright IBM Corp. Be aware that at this point you’ve only configured the node definition. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Add node: Extended path Add a Node to the HACMP Cluster Type or select values in entry fields. Add node: Extended path AU548. To do this you use the Extended path. Press Enter AFTER making all desired changes. 7-34 HACMP Implementation © Copyright IBM Corp. Extended Topology.0 Notes: Add node: Extended path The Extended Path is essentially the same as the Standard Path in this case. . Communications Interfaces/Devices. You must also configure the adapter definitions (boot adapter definitions). * Node Name Communication Path to Node [Entry Fields] [india] [indiaboot1] + Note: In addition to this. See Student notes below for details. F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-15. 1998. the adapters must be added. © Copyright IBM Corp. 2008 Unit 7. Details — Additional information — Transition statement — Next. Basic HACMP administration 7-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . we need to add a pair of non-IP networks to create a ring of non-IP networks for the three nodes.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Show the extended configuration path’s screen for adding nodes to a cluster. ¦ ¦ ONE OR MORE items can be selected. You can’t use an hdisk on one node for a heartbeat on disk network with two different nodes. There must be a unique disk shared between india and usa. the other between india and uk). define that as a non-IP rs232 network. Note that if you are using heartbeat on disk the same two steps are required. ¦ ¦ ¦ ¦ # Node Device Device Path Pvid ¦ ¦ usa tty0 /dev/tty0 ¦ ¦ uk tty0 /dev/tty0 ¦ ¦ india tty0 /dev/tty0 ¦ ¦ usa tty1 /dev/tty1 ¦ ¦ uk tty1 /dev/tty1 ¦ ¦ > india tty1 /dev/tty1 ¦ usa tty2 /dev/tty2 ¦ ¦ > ¦ uk tty2 /dev/tty2 ¦ ¦ india tty2 /dev/tty2 ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ F1¦ Enter=Do /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-16. so now. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. ¦ ¦ Press Enter AFTER making all selections.Instructor Guide Define the non-IP rs232 networks (1 of 2) You have added (and tested) a fully wired rs232 null modem cable between india‘s tty1 and usa's tty2. and the next one. we discuss why we need to add two more non-IP rs-232 links.0 Notes: Introduction This visual. 7-36 HACMP Implementation © Copyright IBM Corp. Configure HACMP Communication Interfaces/Devices +--------------------------------------------------------------------------+ ¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. 1998. and india and uk to define the two heartbeat on disk networks (one between india and usa. show how to add two more non-IP networks to our cluster. In the following notes. Define the non-IP rs232 networks (1 of 2) AU548. . Make sure that the topology of the non-IP networks that you describe to HACMP corresponds to the actual topology of the physical rs232 cables. Mesh configuration The most redundant configuration would be a mesh. other possibilities exist. and F to A being one possibility. B to C. C to D. 2008 Unit 7. a star is not a good choice.0 Instructor Guide Uempty Minimum non-IP network configuration: ring At minimum. and E. each node connected to every other node. C. Star configuration not recommended While the HACMP for AIX Planning and Installation Guide discusses using a star. If. such as A to B. 1998. Five-node example In even larger clusters.V4. B to D. if the nodes are A. B. For example. Basic HACMP administration 7-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . the non-IP networks in a cluster with more than two nodes should form a ring encompassing all the nodes. this means extra complexity and can mean a lot of extra hardware. then five non-IP networks would be the minimum requirement: A to B. that is each node is connected to its two directly adjacent neighbors. D to C. ring or mesh configuration for non-IP networks. A ring provides redundancy (two non-IP heartbeat paths for every node) and is simple to implement. and E to F. © Copyright IBM Corp. a ring and a mesh are the same. then the loss of the usa node would leave the uk and india nodes without a non-IP path between them. so we need to configure one between india and usa (on this page) and another one between uk and india (on the next page). Three-node example In the example in the visual. we already have a non-IP network between usa and uk. losing the center node means that all the other nodes lose non-IP network connectivity. A star means that the center node is a SPOF for the non-IP networks. D. C to E. for example. Note: For a three node cluster. if you have more than three nodes. However. we left out the uk and india non-IP network. Of course. E to F. depending on which type of non-IP network you are using. it is still necessary to configure only a ring of non-IP networks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 7-38 HACMP Implementation © Copyright IBM Corp. we need to define the uk-india non-IP network. they can configure the cluster in two rings. . It really is simplest to configure the nodes in a ring of non-IP networks.Instructor Guide Instructor notes: Purpose — Show the pair of tty ports to use for the first non-IP rs232 network. Additional information — Some students might suggest various star-shaped non-IP network configurations. Transition statement — Next. Details — Make it clear that they must have a ring of non-IP networks encompassing all nodes. If they have extra technology available to them. 2008 Unit 7. so now. Define the non-IP rs232 networks (2 of 2) AU548. ¦ ¦ Press Enter AFTER making all selections. . Configure HACMP Communication Interfaces/Devices +--------------------------------------------------------------------------+ ¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦ ¦ ¦ ¦ Move cursor to desired item and press F7.V4. Basic HACMP administration 7-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. ¦ ¦ ONE OR MORE items can be selected. 1998. © Copyright IBM Corp. ¦ ¦ ¦ ¦ # Node Device Device Path Pvid ¦ ¦ usa tty0 /dev/tty0 ¦ ¦ uk tty0 /dev/tty0 ¦ ¦ india tty0 /dev/tty0 ¦ ¦ usa tty1 /dev/tty1 ¦ ¦ uk tty1 /dev/tty1 ¦ ¦ india tty1 /dev/tty1 ¦ ¦ usa tty2 /dev/tty2 ¦ ¦ > uk tty2 /dev/tty2 ¦ ¦ > india tty2 /dev/tty2 ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ F1¦ Enter=Do /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-17.0 Instructor Guide Uempty Define the non-IP rs232 networks (2 of 2) You have also added (and tested) a fully wired rs232 null-modem cable between uk's tty2 and india‘s tty2. define that as a non-IP rs232 network.0 Notes: Define non-IP networks Make sure that the topology of the non-IP networks that you describe to HACMP corresponds to the actual topology of the physical rs232 cables. Details — Additional information — Transition statement — And then we synchronize the changes.Instructor Guide Instructor notes: Purpose — Show the screen used to define the last of the non-IP rs232 networks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-40 HACMP Implementation © Copyright IBM Corp. 1998. . Synchronize or Both * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [Entry Fields] [Both] [No] [No] [No] [Standard] + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-18.0 Instructor Guide Uempty Synchronize your changes HACMP Verification and Synchronization Type or select values in entry fields. the india node is an official member of the cluster. . 1998. Synchronize your changes AU548. To populate the other node’s HACMP ODM’s. When we’ve synchronized our changes. Press Enter AFTER making all desired changes. you must synchronize. all this configuration exists only on the node where the data was entered. © Copyright IBM Corp.V4. * Verify. Basic HACMP administration 7-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Synchronize At this point. 2008 Unit 7. . we can start Cluster Services on the new node.Instructor Guide Instructor notes: Purpose — Show the synchronizing of the first set of changes in this scenario. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now. 7-42 HACMP Implementation © Copyright IBM Corp. 0 Notes: Start Cluster Services on the new node Now that india is an official member of the cluster. Start Cluster Services on the new node AU548. we can start Cluster Services on the node.0 Instructor Guide Uempty Start Cluster Services on the new node # smitty clstart Start Cluster Services Type or select values in entry fields. * Start now. This and all future SMIT HACMP operations can be performed from any of the three cluster nodes. 1998.V4. © Copyright IBM Corp. Press Enter AFTER making all desired changes. Basic HACMP administration 7-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . on system restart or both Start Cluster Services on these nodes Manage Resource Groups BROADCAST message at startup? Startup Cluster Information Daemon? Ignore verification errors? Automatically correct errors found during cluster start? [Entry Fields] now [india] Automatically true false false Interactively + + + + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-19. 2008 Unit 7. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.Instructor Guide Instructor notes: Purpose — Show the task of starting Cluster Services on the new node. . 7-44 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — Now. we need to add the new india node to the ywebgroup and xwebgroup resource groups. . © Copyright IBM Corp.0 Notes: Add the node to a resource group Remember that adding the new india node to the HACMP configuration is the easy part. Add the node to a resource groups AU548. Press Enter AFTER making all desired changes. [Entry Fields] Resource Group Name ywebgroup New Resource Group Name [] Participating Node Names (Default Node Priority) [uk usa india] Startup Policy Fallover Policy Fallback Policy + Online On Home Node On> + Fallover To Next Prior> + Fallback To Higher Pri> + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Remember to synchronize and verify that the non-IP network is active © Copyright IBM Corporation 2008 Figure 7-20. You would not perform any of the SMIT HACMP operations shown so far in this scenario until you were CERTAIN that the india node was actually capable of running the application. 2008 Unit 7.V4.0 Instructor Guide Uempty Add the node to a resource group Change/Show a Resource Group Type or select values in entry fields. Basic HACMP administration 7-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 1998. Details — Emphasize that the non-HACMP work of getting india ready to run the xwebserver and ywebserver applications must be completed before this step is performed. Additional information — Transition statement — Unfortunately. . it seems that there is a problem.Instructor Guide Instructor notes: Purpose — Show the adding of the new india node to the ywebgroup resource group. Also be sure to mention that this requires a synchronization. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-46 HACMP Implementation © Copyright IBM Corp. 0 Notes: Removing a node In this scenario. we take a look at how to remove a node from an HACMP cluster. . 1998. 2008 Unit 7. Basic HACMP administration 7-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. Shrinking the cluster AU548.0 Instructor Guide Uempty Shrinking the cluster The Auditors are not impressed with the latest investment and force the removal of the india node from the cluster so that it can be transferred to a new project (some users suspect that political considerations might have been involved) usa X Y X Y uk india Z Z © Copyright IBM Corporation 2008 Figure 7-21. Instructor Guide Instructor notes: Purpose — Set the stage for the next scenario: removing a node from a cluster. Details — Additional information — Transition statement — So. how exactly do you remove a node from a cluster? 7-48 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . move resource groups to other nodes 2. 1998. © Copyright IBM Corp.V4. Physically disconnect the (correct) rs232 cables. Remove the departing node from all resource groups and synchronize your changes – Ensure that each resource group is left with at least two nodes 3. Using one of the cluster nodes that is not being removed: – – – • Remove the departing node from the cluster's topology Remove a Node from the HACMP Cluster (Extended Configuration) Synchronize When the synchronization is completed successfully. if necessary 7.0 Instructor Guide Uempty Removing a cluster node 1. the departing node is no longer a member of the cluster 5. Basic HACMP administration 7-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Stop Cluster Services on the departing node 4. 2008 Unit 7. Remove the departed node's IP addresses from /usr/es/sbin/cluster/etc/rhosts on the remaining nodes – Prevents departed node from interfering with HACMP on remaining nodes 6.0 Notes: Removing a node Although removing a node from a cluster is another fairly involved process. some of the work has little. Run through your (updated) test plan © Copyright IBM Corporation 2008 Figure 7-22. if anything. . Removing a cluster node AU548. Disconnect the departing node from the shared storage subsystem – Strongly recommended because it makes it impossible for the departed node to corrupt the cluster's shared storage 8. to do with HACMP. Using any cluster node. Instructor Guide Instructor notes: Purpose — Show the process involved in removing a node from a cluster. . Details — Point out that this is basically the reverse of the procedure for adding a node to the cluster. On the other hand. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-50 HACMP Implementation © Copyright IBM Corp. Additional information — Removing the dearly departed node from the /usr/es/sbin/cluster/etc/rhosts file is very important because this is what prevents the departed node from being used to fiddle with the cluster’s configuration in the future! Transition statement — We’re not going to show you these screens because there is really not much that is new or particularly interesting in them. the zwebgroup resource group has become an issue. 1998. It looks like this imaginary organization could do with a bit more long range planning. we remove a resource group.V4. . 1998.0 Instructor Guide Uempty Removing an application The zwebserver application has been causing problems and a decision has been made to move it out of the cluster usa X Y Z X Y Z uk © Copyright IBM Corporation 2008 Figure 7-23.0 Notes: Removing an application In this scenario. © Copyright IBM Corp. 2008 Unit 7. Removing an application AU548. Basic HACMP administration 7-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-52 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Set the stage for the next scenario: removing a resource group. . Details — Additional information — Transition statement — Let’s see what it takes to remove a resource group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . We will discuss snapshots later in this unit. They will only clutter up the cluster’s configuration and. OPTIONAL: Take a cluster snapshot 3. Basic HACMP administration 7-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Removing a resource group (1 of 3) 1. Cluster snapshot HACMP supports something called a cluster snapshot.0 Notes: Introduction The procedure for removing a resource group is actually fairly straightforward. Removing a resource group (1 of 3) AU548. 1998. such as service IP labels and volume groups.V4. Clean out anything that is no longer needed by the cluster: – Export any shared volume groups previously used by the application. Using any cluster node and either configuration path: – Remove the departing resource group using the Remove a Resource Group SMIT screen – Remove any service IP labels previously used by the departing resource group using the Remove Service IP Labels/Addresses SMIT screen – Synchronize your changes 4. in © Copyright IBM Corp. This would be an excellent time to take a cluster snapshot. – Consider deleting service IP labels from the /etc/hosts file – Uninstall the application 5. Take the resource group offline 2. Run through your (updated) test plan © Copyright IBM Corporation 2008 Figure 7-24. Remove unused resources Do not underestimate the importance of removing unused resources. 2008 Unit 7. just in case we decide to go back to the old configuration. 7-54 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide the case of shared volume groups. tie up physical resources. . that could presumably be better used elsewhere. A cluster should not have any “useless” resources or components because anything that simplifies the cluster tends to improve availability by reducing the likelihood of human error. 1998. © Copyright IBM Corp. .V4. Details — Emphasize the value of keeping the cluster free of clutter. Basic HACMP administration 7-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we remove the resource group. 2008 Unit 7. Additional information — Transition statement — First.0 Instructor Guide Uempty Instructor notes: Purpose — Show the procedure for removing a resource group. 1998. 0 Notes: Removing a resource group Make sure that you delete the correct resource group. 7-56 HACMP Implementation © Copyright IBM Corp. 1998. ¦ ¦ ¦ ¦ xwebgroup ¦ ¦ ywebgroup ¦ ¦ zwebgroup ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-25. . Removing a resource group (2 of 3) AU548.Instructor Guide Removing a resource group (2 of 3) HACMP Extended Resource Group Configuration Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Add a Resource Group Change/Show a Resource Group Change/Show Resources and Attributes for a Resource Group Remove a Resource Group Show All Resources by Node or Resource Group +--------------------------------------------------------------------------+ ¦ Select a Resource Group ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. 0 Instructor Guide Uempty Instructor notes: Purpose — Show the pop-up list of resource groups that displays when you request the removal of a resource group. 2008 Unit 7.V4. 1998. . © Copyright IBM Corp. Basic HACMP administration 7-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Just be very careful. Instructor Guide Removing a resource group (3 of 3) HACMP Extended Resource Group Configuration Move cursor to desired item and press Enter. 7-58 HACMP Implementation © Copyright IBM Corp. ¦ ¦ Press Cancel to return to the application. Add a Resource Group Change/Show a Resource Group Change/Show Resources and Attributes for a Resource Group Remove a Resource Group Show All Resources by Node or Resource Group +--------------------------------------------------------------------------+ ¦ ARE YOU SURE? ¦ ¦ ¦ ¦ Continuing may delete information you may want ¦ ¦ to keep. . it’s easy to go back and step through the process again.0 Notes: Are you sure? Pause to make sure you know what you are doing. ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ F1¦ F8=Image F10=Exit Enter=Do ¦ F9+--------------------------------------------------------------------------+ Press Enter (if you are sure). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This is your last chance to stop ¦ ¦ before continuing. 1998. © Copyright IBM Corporation 2008 Figure 7-26. Be sure to synchronize and run through validation testing. ¦ ¦ Press Enter to continue. Removing a resource group (3 of 3) AU548. If you aren’t sure. 0 Instructor Guide Uempty Instructor notes: Purpose — Remind the students that this is a difficult to reverse operation.V4. Details — Point out that having a cluster snapshot would make this a fairly easy to reverse operation. © Copyright IBM Corp. Additional information — Transition statement — Let’s review topic 1. Basic HACMP administration 7-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 7. . You have decided to add a third node to your existing twonode HACMP cluster. Install HACMP software. 2. Why would you choose to use the Extended Path to add resources to a resource group versus the Standard Path? __________________________________________________ © Copyright IBM Corporation 2008 Figure 7-27. bragging to co-workers about your success. e. Add a resource group for the new node. Take a well deserved break.Instructor Guide Let’s review: Topic 1 1. What very important step follows adding the node definition to the cluster configuration (whether through Standard or Extended Path)? a. Configure a non-IP network. 3.0 Notes: 7-60 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Start Cluster Services on the new node. c. True or False? You cannot add a node while HACMP is running. d. b. 1998. . Let’s review: Topic 1 AU548. Install HACMP software. 2. Basic HACMP administration 7-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Review questions. __________________________________________________ © Copyright IBM Corporation 2008 Additional information — Transition statement — In the next topic. © Copyright IBM Corp. True or False? You cannot add a node while HACMP is running. c. 2008 Unit 7. e. bragging to co-workers about your success. 3. Start Cluster Services on the new node. Add a resource group for the new node. What very important step follows adding the node definition to the cluster configuration (whether through Standard or Extended Path)? a. 1998.V4. Details — Let’s review: Topic 1 solutions 1. b. . Take a well deserved break. You have decided to add a third node to your existing twonode HACMP cluster. d. Configure a non-IP network. we’ll look at change management in a cluster and how C-SPOC can be used to help with that. Why would you choose to use the Extended Path to add resources to a resource group versus the Standard Path? If you need access to the fields that are not shown in the Standard Path (like for NFS or to set “Filesystems mounted before IP configured”). .Instructor Guide 7-62 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 7. Basic HACMP administration 7-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . How students will do it — Lecture and lab.V4. C-SPOC provides a powerful tool to coordinate changes across the cluster.2 Cluster single point of control Instructor topic introduction What students will do — Learn about the importance of change management in an HACMP cluster and how C-SPOC can be used for this. How this will help students on their job — Disciplined change management is critical to a successful cluster.0 Instructor Guide Uempty 7. What students will learn — How to use C-SPOC. © Copyright IBM Corp. 1998. you should be able to: Discuss the need for change management when using HACMP Describe the benefits and capabilities of C-SPOC Perform routine administrative changes using C-SPOC Start and stop cluster services Perform resource group move operations © Copyright IBM Corporation 2008 Figure 7-28. Cluster single point of control AU548. 1998.Instructor Guide Cluster single point of control After completing this topic. .0 Notes: 7-64 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. spend the rest of the topic going through the SMIT C-SPOC commands. then show how change management is important to availability. © Copyright IBM Corp. And finally. . Details — The flow of this topic is to position C-SPOC into a change management discipline. 2008 Unit 7. Basic HACMP administration 7-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. Additional information — Transition statement — We’ll start with a discussion of change management and why having strong change management policies is so important in a cluster.0 Instructor Guide Uempty Instructor notes: Purpose — Present topic objectives. C-SPOC utilities can be used to help. 1998. As you will see. Having well documented and tested procedures to follow. but do not do the job by themselves. as well as restricting who can make changes (for example you should not have more than two or three persons with root privileges) minimizes loss of availability when making changes.0 Notes: Introduction You must develop good change management procedures for managing an HACMP cluster. The snapshot utility should be used before any change is made.Instructor Guide Administering a high availability cluster Administering an HA cluster is different from administering a stand-alone server: – Changes made to one node must be reflected on the other node – Poorly considered changes can have far-reaching implications • Beware the law of unintended consequences – Aspects of the clusters configuration could be quite subtle and yet critical – Scheduling downtime to install and test changes can be challenging – Saying “oops” while sitting at a cluster console could get you fired! © Copyright IBM Corporation 2008 Figure 7-29. 7-66 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Administering a high availability cluster AU548. Point out that the details of how to use C-SPOC are covered later in this topic. . Additional information — Transition statement — Now.V4. Details — Talk about how C-SPOC can minimize errors by giving you a SMIT interface to the change commands.0 Instructor Guide Uempty Instructor notes: Purpose — Talk about change management discipline. Basic HACMP administration 7-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 1998. 2008 Unit 7. let’s see how change management fits in to overall availability. problem escalation procedures should be documented.Every change must be carefully considered 7-68 HACMP Implementation © Copyright IBM Corp. Recommendations AU548. startup. increase size of a filesystem) Restrict access to the root password to trained High Availability cluster administrators Always take a snapshot (explained later) of your existing configuration before making a change © Copyright IBM Corporation 2008 Figure 7-30. and both hardware and software support contracts should either be kept current or a procedure developed for authorizing the purchase of time and materials support during off hours should an emergency arise). use HACMP's C-SPOC facility to make changes to the cluster (details to follow) Document routine operational procedures in a step-by-step list fashion (for example. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. shutdown. . 1998.Instructor Guide Recommendations Implement and adhere to a change control/management process Wherever possible.0 Notes: Some beginning recommendations These recommendations are considered to be the minimum acceptable level of cluster administration. Importance of change management A real change control or management process requires a serious commitment on the part of the entire organization: . There are additional measures and issues that should probably be carefully considered (for example. • No system. keeping in mind the networking concepts we’ve discussed as well as any changes to the application’s data organization or start/stop procedures.0 Instructor Guide Uempty • As the cluster administrator you should make yourself part of every change meeting that occurs on your HACMP systems. if anything changes. or database administrator can be allowed to sneak changes past the process.The onus should be on the requester of the change to demonstrate that it is necessary. even the minor ones. advise all parties of the risks of running without testing. not on the cluster administrators to demonstrate that it is unwise . must follow the process. • Think about the implications of the change on the cluster configuration and function. . . 1998. • Defend cluster administrators against unreasonable request or pressure. Other recommendations Ensure that you request sufficient time during the maintenance window for testing the cluster. If this isn’t possible. • The notion that a change might be permitted without following the process must be considered to be absurd. • Do not allow politics to affect a change's priority or schedule.Management must support the process. 2008 Unit 7.Every change. cluster. © Copyright IBM Corp. Basic HACMP administration 7-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Update any documentation as soon as possible after the change is made to reflect the new configuration or function of the cluster. . Details — Make it clear that these are not necessarily sufficient in all situations. . Additional information — Transition statement — Now that we have discussed change management. let’s look at C-SPOC itself and how it works.Instructor Guide Instructor notes: Purpose — List some basic recommendations for cluster administration practices. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-70 HACMP Implementation © Copyright IBM Corp. 1998. This daemon provides secure communication between cluster nodes for all cluster utilities. . Basic HACMP administration 7-71 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. – Relies on the clcomdES socket based subsystem for secure node-tonode communications – C-SPOC operations might fail if any target node is down at the time of execution or selected resource is not available – Any change to a shared VGDA is synchronized automatically if CSPOC is used to change a shared LVM component – C-SPOC uses a script parser called the command execution language Target node Target node Initiating node Target node © Copyright IBM Corporation 2008 Target node Figure 7-31. 1998. This can lead to © Copyright IBM Corp. only on the active nodes. Secure distributed communications between the nodes The clcomdES subsystem provides secure communications between nodes.V4. Then the HACMP command cl_rsh is used to propagate the command (or a similar command) to the target nodes. More details All the nodes in the resource group must be available or the C-SPOC command will be performed partially across the cluster.0 Notes: C-SPOC command execution C-SPOC commands first execute on the initiating node.0 Instructor Guide Uempty Cluster single point of control C-SPOC provides facilities for performing common cluster wide administration tasks from any node within the cluster. Cluster single point of control AU548. such as verification and synchronization and system management (C-SPOC). 2008 Unit 7. The clcomd daemon is started automatically at boot time by the init process. Appendix C: HACMP for AIX 5L Commands in the HACMP for AIX Administration Guide provides a list of all C-SPOC commands provided with the HACMP for AIX 5L software. The language is described further in Appendix B of the HACMP for AIX Troubleshooting Guide. C-SPOC command line C-SPOC commands can be executed from the command line (or through SMIT. if made through C-SPOC. Error messages and warnings returned by the commands are based on the underlying AIX-related commands. When you invoke a C-SPOC script from a single cluster node to perform an administrative task. 1998. As you saw in the LVM unit. executable. Command execution language C-SPOC commands are written as execution plans in command execution language (CEL). or script) with a minimum of user input. and then only the LV information. LVM changes. of course). 7-72 HACMP Implementation © Copyright IBM Corp.Instructor Guide problems later when nodes are brought up and are out of sync with the other nodes in the cluster. meaning the script uses the C-SPOC distributed mechanism (the C-SPOC Execution Engine) to execute the underlying AIX 5L commands on cluster nodes to complete the defined tasks. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Each plan contains constructs to handle one or more underlying AIX 5L tasks (a command. An execution plan becomes a C-SPOC command when the /usr/es/sbin/cluster/utilities/celpp utility converts it into a cluster aware ksh script. can be synchronized automatically (for enhanced concurrent mode volume groups. the script is automatically executed on all nodes in the cluster. not the filesystem information). CEL is a programming language that enables you to integrate dsh’s distributed functionality into each C-SPOC script the CEL preprocessor (celpp) generates. Details — The idea is to describe in general what C-SPOC is and the requirements to use it.0 Instructor Guide Uempty Instructor notes: Purpose — Give an overview of the C-SPOC facility. Basic HACMP administration 7-73 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Unit 7. Additional information — Transition statement — Now that we have an idea of what C-SPOC is. let’s see how we invoke it. . 1998. © Copyright IBM Corp. CLVM. 1998. RGmove is used for Resource Group management. The fast path is smitty cl_admin. 7-74 HACMP Implementation © Copyright IBM Corp. The top-level C-SPOC menu AU548. Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Open a SMIT Session on a Node F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-32. LVM.0 Notes: Top-level C-SPOC menu The top-level C-SPOC menu is one of the four top-level HACMP menus. We will look at Managing Cluster Services and the Logical Volume Management tasks. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and Physical Volume Management.Instructor Guide The top-level C-SPOC menu System Management (C-SPOC) Move cursor to desired item and press Enter. The other functions are included here as a logical place to put these system management facilities. C-SPOC scripts are used for Users. Basic HACMP administration 7-75 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. . let’s start there. Additional information — Transition statement — The first item in the menu is managing cluster services.0 Instructor Guide Uempty Instructor notes: Purpose — Show the SMIT menu to invoke the C-SPOC utilities. © Copyright IBM Corp. Details — Just use this menu as the starting place to go through the utilities on the subsequent visuals. 2008 Unit 7. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. on system restart or both Start Cluster Services on these nodes * Manage Resource Groups BROADCAST message at startup? Startup Cluster Information Daemon? Ignore verification errors? Automatically correct errors found during cluster start? [Entry Fields] now [usa. Now for the details. Press Enter AFTER making all desired changes. Think carefully about starting Cluster Services at system boot time because this might result in Resource Group movement. only when running through this menu (just invokes cl_rc. This option brings up another menu containing three choices: Start Cluster Services. how did we get here? The first choice in the C-SPOC menu is Manage HACMP Services. Better yet. This menu displays when we select Start Cluster Services. 7-76 HACMP Implementation © Copyright IBM Corp. . Starting cluster services AU548. and Show Cluster Services. smitty clstart.Instructor Guide Starting cluster services # smit clstart Start Cluster Services Type or select values in entry fields.0 Notes: Briefly. Starting cluster services We saw this in the previous unit.cluster). Stop Cluster Services. * Start now. or both. You have the option to start Cluster Services at system boot time (adds entry to /etc/inittab). depending on your Fallback Policies. 1998. just use the fast path.uk] Automatically true true false Interactively + + + + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-33. if so configured. choose the default. 2008 Unit 7. clinfo. Automatically. The options that you choose here are retained in the HACMP ODM and repopulated on reentry. there are options regarding verification.0 Instructor Guide Uempty You have a choice of any or all nodes in the cluster to start services. but you have a better chance of getting cluster services activated in a clean configuration with this option. Finally.4. the function of managing resource groups can be deferred. Basic HACMP administration 7-77 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. An alternative that is safer would be to choose to Interactively correct errors found during verification. services will be started on all nodes. You have the option to start the Client Information Daemon. This is not something that you would do unless you are very aware of the reason for the verification error. You can choose to ignore verification errors and start anyway. Before Cluster Services is started.4 behavior). © Copyright IBM Corp. Use F4 to get a pick list. a verification is run to ensure that you are not starting a node with an inconsistent configuration. The option to choose in that case is Manually. If the field is left blank. Beginning with HACMP V5. 1998. it “wants” to acquire resources in Resource Groups. When Cluster Services is started. You can broadcast a message that cluster services are being started. along with the start of Cluster Services. . To allow Cluster Services to acquire resources and make applications available if so configured (pre-HACMP v5. This is usually a good idea as it allows you to use the clstat cluster monitor utility. you understand the ramifications of starting with the error and you must activate Cluster Services. Not all errors can be corrected. and make applications available. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Focus on the option to manage resource groups. 1998. Additional information — Transition statement — How do I know that Cluster Services has started? 7-78 HACMP Implementation © Copyright IBM Corp. .Instructor Guide Instructor notes: Purpose — Show the SMIT menu to start cluster services. 15.0. Verifying that cluster services has started AU548. © Copyright IBM Corp.0 State: UP Address: 192. . Getting the “OK” in SMIT does not mean that the task has been completely performed.0 Notes: The “Three rules” Patience is key with HACMP tasks. When the Cluster Manager completes the first task. So keep in mind. It doesn’t forget (usually anyway). provided that it’s in a state where it can continue processing. 2008 Unit 7.V4. patience 2.168.168. This relies solely on SNMP to get the current cluster status.29 State: UP Address: 0.print $? 1 Note: An rc=1 means cluster services is active usa # lssrc -ls clstrmgrES Current state: ST_STABLE State: UP Address: 192.0. © Copyright IBM Corporation 2008 Figure 7-34.5. It’s just the beginning in many cases.29 State: UP Address: 192. patience Also consider using the cldump command. it will perform the second task.168.92 State: UP State: On line Node: usa Interface: usaboot1 (2) Interface: usaboot2 (2) Interface: usa_hdisk5_01 (0) Interface: xweb (2) Resource Group: xwebgroup First three rules 1. This might not be what you wanted.16.HACMP Cluster Status Monitor ------------------------------------Cluster: ibmcluster (1156578448) Wed Aug 30 11:16:19 2006 State: UP Nodes: 2 SubState: STABLE usa # clcheck_server grpsvcs. Did I mention patience? The Cluster Manager daemon queues events. patience 3. that if you launch a task with the Cluster Manager and don’t verify its status closely and then attempt to give the process a boost by launching another task (such as following an rgmove with an offline) you have just queued the second task. Basic HACMP administration 7-79 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. There are many things going on under the covers when you “ask” the Cluster Manager to do something. 1998.0 Instructor Guide Uempty Verifying that cluster services has started You have a few options usa # clstat -a clstat . consider using cldump. Run it with grpsvcs as the only parameter and then look at the return code. what to look for Documentation for HACMP V5. A return code of 1 indicates that the Cluster Manager is a member of a group services group that implies Cluster Services are active. that relies on SNMP directly. It might mean that Cluster Services are active or it might mean that Cluster Services was forced down on this node. the clstat utility is a good mechanism to use. What to look at. Know what state to expect. If you’re not a fan of clstat. Another option is to use lssrc. This method still works. A state of ST_STABLE is a tricky indication. This is the solution for those of you who want to see a graphical representation of cluster status.3 indicated that the clcheck_server utility was to be used given that the Cluster Manager daemon was a long running process. You will see more of that later in this unit. Although you might find the output to be unreliable at times. That doesn’t make it right! I have learned the value of patience the hard way. another option is to use WebSMIT. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Finally. . 1998. by not being patient and paying the price. although not shown (due to lack of space on the visual). 7-80 HACMP Implementation © Copyright IBM Corp.Instructor Guide It’s easy to encourage patience when writing a course. You must understand what state is expected and then be patient. The author is extremely impatient and rarely follows his own advice. retrying the command to ensure that the state changes are no longer occurring. This is to be used with caution. Pay close attention to the “Forced down nodes list:” portion of the output of the lssrc -ls clstrmgrES. add your own favorite if it isn’t here. 2008 Unit 7. These are my favorites.V4. This is trickier now that you can have Unmanaged resource groups.0 Instructor Guide Uempty Instructor notes: Purpose — Explain what is to be checked to verify that cluster services has stopped when no Unmanaged resource groups are involved. Clstat will show the node up (along with the resources) but will show the resource group as unmanaged. Details — There are quite a few ways. Basic HACMP administration 7-81 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Transition statement — Beyond knowing that services have started. how do you know what happened during or as a result of starting services? © Copyright IBM Corp. You might defer some or all of the questions about Unmanaged resources until the visual that’s coming up on stopping Cluster Services with Unmanaged resource groups. Additional information — Pick the way you like best. 1998. 980):/fs01[fs_mount+5] typeset FS © Copyright IBM Corporation 2008 Figure 7-35. 7-82 HACMP Implementation © Copyright IBM Corp. As you see. More detailed log You might also want to consult the clstrmgr. It can be difficult to understand as it’s very detailed internal processing. good starting point – hacmp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. usually a good idea) • Events write here.17.log file is a good starting point to see what events have been run. 1998.980):/fs01[290] PS4_LOOP=/fs01 +rg1:cl_activate_fs(2.14:45:54.980)[287] PS4_TIMER=true +rg1:cl_activate_fs(2.Instructor Guide Checking on what actually happened Logs containing what was done as Cluster Services start – cluster. but contains everything event related – hacmp.debug log too.out (popular place to consult. The more you become comfortable with what you expect to see.tmp27018 +rg1:cl_activate_fs(2. but error messages found here might be useful as well as an understanding of whether the Cluster Manager is busy doing something even when no event processing is occurring.All_filesystems.out formatting helps with navigation/understanding • tagging of log lines includes resource group. the format of the entries helps you to understand what is being done. resource. It can be said that looking at the hacmp.ref +rg1:cl_activate_fs(2. You can also see errors and timestamps to help in navigating the hacmp.rg1. Checking on what actually happened AU548. on what resource and how long it’s been running.out log file.cl_activate_fs.2007. .CDT.980):/fs01[305] [[ '' == EMUL ]] +rg1:cl_activate_fs(2.980):/fs01[310] fs_mount /fs01 fsck rg1_activate_fs.log • Log of events that have been processed.Sep. the easier it will be to navigate.980)[287] typeset PS4_TIMER +rg1:cl_activate_fs(2.980):/fs01[fs_mount+5] FS=/fs01 +rg1:cl_activate_fs(2.out file is as much art as it is science.0 Notes: Base Cluster Logs The cluster. invoked function.980):/fs01[291] [[ sequential == parallel ]] +rg1:cl_activate_fs(2. essentially the result of set –x in event scripts) • Very detailed. This is the Cluster Manager daemon log. script name. and in some cases elapsed time • Example: +rg1:cl_activate_fs[278] ALLFS=All_filesystems +rg1:cl_activate_fs[279] [[ '' == EMUL ]] +rg1:cl_activate_fs[284] cl_RMupdate resource_acquiring All_filesystems cl_activate_fs Reference string: Mon. .V4. 1998. Details — Don’t go too deep but mention that these are the basic logs to be consulted for HACMP processing.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce logs that can be checked to see what happened at Cluster Services start. Basic HACMP administration 7-83 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. Additional information — Transition statement — What about stopping Cluster Services? © Copyright IBM Corp. when you stop Cluster Services. Press Enter AFTER making all desired changes.0 Notes: Briefly. You have the option to stop cluster services when you run through this menu or remove the option to start cluster services at system start (removes entry from /etc/inittab). Note that the system start option is a reversal of the setting made for system start when starting cluster services. * Stop now. You can use the fast path.Instructor Guide Stopping cluster services # smit clstop Stop Cluster Services Type or select values in entry fields. Stopping cluster services AU548. Actually. Stopping cluster services Remember that this is not stopping the Cluster Manager daemon. on system restart or both Stop Cluster Services on these nodes BROADCAST cluster shutdown? * Select and Action on Resource Groups [Entry Fields] now [usa] true Bring Resource Groups> + + + + +--------------------------------------------------------------------------+ ¦ Shutdown mode ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. 7-84 HACMP Implementation © Copyright IBM Corp. It runs all the time. smitty clstop. this menu displays when we choose Stop Cluster Services. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. ¦ ¦ ¦ ¦ Bring Resource Groups Offline ¦ ¦ Move Resource Groups ¦ ¦ Unmanage Resource Groups ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ F1¦ F8=Image F10=Exit Enter=Do ¦ F5¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-36. how did we get here? From the Manage HACMP Services C-SPOC menu. the Cluster Manager daemon dies gracefully and is respawned by the System Resource Controller. . or both. these options map directly to the current options and their functions are self-explanatory. © Copyright IBM Corp. While in this state.V4. Graceful meant to Bring Resource Groups Offline prior to stopping cluster services. If the field is left blank. according to the current locations and Fallover policies of the Resource Groups. potentially unavailable). the options regarding Resource Group management. thus. Basic HACMP administration 7-85 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. As you can see. forcing down cluster services is supported by moving the resource groups to an Unmanaged state. You can broadcast a message that cluster services are being stopped.0 Instructor Guide Uempty You have a choice of any or all nodes in the cluster to stop services. As with starting cluster services. services will be stopped on all nodes. and resulted in an environment that was potentially unstable (that is. Finally. the option in the menu shown. This allows the Cluster Manager to participate in cluster activities and keep track of changes that occur in the cluster. 1998. In addition. the options that you choose here are retained in the HACMP ODM and repopulated on reentry. Forcing cluster services down when using Enhanced Concurrent Mode Volume Groups was not supported because Group Services and gsclvmd were brought down as part of the forced down operation. in some scenarios. the cluster manager remains in the ST_STABLE state. It doesn’t die gracefully and respawn as stated earlier and doesn’t return to the ST_INIT state. forcing down Cluster Services was supported sometimes. forced. if applicable.4. Unmanage Resource Groups. With HACMP V5. Prior to HACMP V5. But what about forced down you say? Prior to HACMP V5. Use F4 to get a pick list. 2008 Unit 7. the cluster manager and the RSCT infrastructure remain active permitting this action with Enhanced Concurrent Mode Volume Groups. Takeover meant to Move Resource Groups to other available nodes.4 and later. Group Services and gsclvmd are the components that maintain the volume group’s VGDA/VGSA integrity across all nodes. . the options were graceful.4. takeover and. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. so be prepared to answer questions and take some time. 1998. Additional information — Transition statement — But how do I know that cluster services has stopped? 7-86 HACMP Implementation © Copyright IBM Corp. making sure that students understand the correlation to past options and the functions that are performed. Details — Cover the resource group management options thoroughly.Instructor Guide Instructor notes: Purpose — Show the SMIT menu to stop cluster services. Take time to ensure that the Unmanaged state is understood. . Having a reliable forced down option will probably be welcome by many students who are veterans of HACMP. 1 Wed Aug 30 10:31:54 code is 0 . Verifying that cluster services has stopped (1 of 2) AU548. As with starting cluster services.29 State: DOWN Address: 1.out.168.15. patience usa # tail -1 clstrmgr.V4. 2008 Unit 7.rc : Normal termination of clstrmgrES.HACMP Cluster Status Monitor ------------------------------------Cluster: ibmcluster (1156578448) Wed Aug 30 10:44:20 2006 State: UP Nodes: 2 SubState: STABLE usa # lssrc -ls clstrmgrES Current state: ST_INIT Same three rules State: DOWN Address: State: DOWN Node: usa Interface: usaboot1 (2) 192. 0513-059 The clstrmgrES Subsystem has been started.0 Notes: Stopping cluster services without going to unmanaged This means you’ve chosen to stop cluster services either with the Bring Resource Groups Offline or Move Resource Groups option. 1998. . In other words. Subsystem PID is 483466. patience 2. patience 3.168.exhale our dying breath and count on the good graces of SRC to reincarnate us! © Copyright IBM Corporation 2008 Figure 7-37.1 clexit. uk # clstat -a clstat .16.0 Instructor Guide Uempty Verifying that cluster services has stopped (1 of 2) You have a few options – stopping without Unmanaged RGs usa # tail -2 hacmp. Basic HACMP administration 7-87 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Did I mention patience? © Copyright IBM Corp. Getting the “OK” in SMIT does not mean that the task has been completely performed.debug. remember that patience is essential. it’s not a forced down. Many tasks are performed behind the scenes when you “ask” the Cluster Manager to do something. Restart now. It’s just the beginning in many cases.29 Interface: usaboot2 (2) 192. stopping Cluster Services with Unmanaged Resource Groups leaves the Cluster Manager daemon in ST_STABLE. stopping Cluster Services results in the Cluster Manager daemon being respawned by the System Resource Controller. Otherwise. As you will see in the next visual. This is the resulting state from a respawn of the Cluster Manager daemon. 0513-059 The clstrmgrES Subsystem has been started. 1998. that is. Subsystem PID is nnnnnn. it might be necessary to view the cycled log. You will see more of that later in this unit. what to look for First a comment on the log file locations. This is the solution for those of you who want to see a graphical representation of cluster status.out. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. indicating that Cluster Services has stopped and the Cluster Manager Daemon has been respawned: clexit.rc: Normal termination of clstrmgrES. A state of ST_INIT is the indication that Cluster Services has stopped on this node. the clstat utility is a good mechanism to use. This is to be used with caution. another option is to use WebSMIT. The surest way to verify that Cluster Services has stopped completely is the following message in hacmp. Know what state to expect. consider using cldump. the logs will be in /var/hacmp/log. Finally. Note that it was run on another system. As stated previously. retrying the command to ensure that the state changes are no longer occurring.4. which relies on SNMP directly.1”. not the one where cluster services was stopped. If you’re not a fan of clstat. You must understand what state is expected and then be patient. . although not shown (due to lack of space on the visual). Although you might find the output to be unreliable at times. Another option is to use lssrc. 7-88 HACMP Implementation © Copyright IBM Corp.Instructor Guide What to look at. the one that ends in “.1 or later install. If this is a new HACMP 5. they will be in /tmp. In addition. Restart now. V4. add your own favorite if it isn’t here. Basic HACMP administration 7-89 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The message does provide some comic relief.0 Instructor Guide Uempty Instructor notes: Purpose — Explain what is to be checked to verify that cluster services has stopped when no Unmanaged resource groups are involved. because it could be /tmp or /var/hacmp/log.debug option lightly. can be used. Mention the clstrmgr. Details — There are quite a few ways. 2008 Unit 7.4. These are my favorites. Mention too that WebSMIT. Transition statement — But you’ve stopped cluster services with Unmanaged resource groups. 1998.out file. The log file locations should be mentioned here. which will be discussed later. . © Copyright IBM Corp. One of the best is the entry in the hacmp. It depends on how the HACMP 5. Additional information — Pick the way you like best.1 (or later) installation was done. The Cluster Manager daemon stays up and should remain in the ST_STABLE state. Did I mention that getting the “OK” in SMIT does not mean that the task has been completely performed. 1998. It’s just the beginning in many cases.168.29 State: UP Address: 192.92 State: UP State: Unmanaged Ditto on the rules © Copyright IBM Corporation 2008 Figure 7-38. . One more time.Instructor Guide Verifying that cluster services has stopped (2 of 2) You have a few options – stopping with Unmanaged RGs usa # clRGinfo -------------------------------------Group Name Group State Node -------------------------------------xwebgroup UNMANAGED usa UNMANAGED uk uk # clstat -a clstat . Verifying that cluster services has stopped (1 of 2) AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.5. Did I mention patience? What to look at. But using lssrc -ls clstrmgrES can be useful in determining which nodes have been forced down. remember that patience is essential.168.0 Notes: Stopping cluster services with unmanaged resource groups This means you’ve chosen to force down cluster services. stopping Cluster Services does not result in the Cluster Manager daemon dieing gracefully and being respawned by the System Resource Controller.15.HACMP Cluster Status Monitor ------------------------------------Cluster: ibmcluster (1156578448) Wed Aug 30 11:16:19 2006 State: UP Nodes: 2 SubState: STABLE usa # lssrc -ls clstrmgrES Current state: ST_STABLE … Forced down node list: usa Node: usa Interface: usaboot1 (2) … Interface: xweb (2) Resource Group: xwebgroup State: UP Address: 192. 7-90 HACMP Implementation © Copyright IBM Corp. as it provides a list as shown on the visual. what to look for In the case of Unmanaged resource groups. 1998. A similar option is to start Cluster Services on the forced node. Note that it was run on another system. not the one where cluster services was stopped. You will see more of that later in this unit. another option is to use WebSMIT. This is the solution for those of you who want to see a graphical representation of cluster status. it’s not online anywhere (according to the cluster manager). It will show Unmanaged only on the node where Cluster Services was stopped if the resource group startup policy is “online on all nodes. 2008 Unit 7. Notice that the resource group shows online.” As in the previous slides on verifying the state. unless an Application Monitor is configured for the application that indicates the application is currently running. It also shows the state as Unmanaged. How do I get a resource group out of the unmanaged state? You might be tempted to change the resource group to the offline state and move it to another node. but shows that it’s offline to the cluster manager. but specify Manually for the Manage Resource Groups option. You only stopped Cluster Services. specifying Automatically for the Manage Resource Groups option. not the resources. Basic HACMP administration 7-91 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This is valid. it will show Unmanaged on any node where that resource group can acquired if this is not an “online on all nodes” startup policy resource group. The same warning applies about a respawn of the application server start script in this scenario. The best option (and really only option) is to restart cluster services on the forced node. Note that is shows the state of the resource group as Unmanaged on both nodes. You can’t move the resource group to another node. Activating it on another node would be very bad as both nodes would attempt to access the storage and would have the IP address defined.0 Instructor Guide Uempty Again. Then use C-SPOC to bring the resource group online at your discretion. Understand that this will cause the application server start script to be run again. © Copyright IBM Corp.V4. In fact. In the case where the Application Monitor detects the running application. the clstat utility can be a good mechanism to use. . This doesn’t work and is very dangerous because it leaves the application running on the original node. The quickest way to see that there are unmanaged resources is to use clRGinfo. the application server start script is not invoked. The truest way to see Unmanaged is via clRGinfo -p. 1998. 7-92 HACMP Implementation © Copyright IBM Corp.4. this was an indication that Cluster Services were active. That is no longer true. . those for LVM management. Additional information — Transition statement — Let’s look at the most important of the C-SPOC menus. Details — Again take time because of the way this behaves.Instructor Guide Instructor notes: Purpose — Explain what is to be checked to verify that cluster services has stopped when Unmanaged resource groups are involved. For those familiar with HACMP prior to V5. Pay close attention to the fact that clstat shows the node and resources up with the state of the resource group set to Unmanaged and that the state of the Cluster Manager daemon is still ST_STABLE. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Managing shared LVM components AU548. where the synchronization is automatic. Shared Volume Groups Shared Logical Volumes Shared File Systems Synchronize Shared LVM Mirrors Synchronize a Shared Volume Group Definition • Make non-Enhanced Concurrent Mode Volume Groups • Manage volume groups in “home node” or “first available” Resource Groups HACMP Concurrent Logical Volume Management F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Move cursor to desired item and press Enter. shared disk configuration and maintenance is considerably easier and less prone to errors if you use the C-SPOC for this work. F8=Image Enter=Do Concurrent Volume Groups Concurrent Logical Volumes Synchronize Concurrent LVM Mirrors •Make Enhanced Concurrent Mode Volume Groups •Manage “online on all nodes” volume groups F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-39. © Copyright IBM Corp. Basic HACMP administration 7-93 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 7. .V4. if you can make the changes using C-SPOC utilities. As was mentioned in the LVM unit. C-SPOC simplifies the process When you’ve configured the cluster’s topology and added a resource group.0 Instructor Guide Uempty Managing shared LVM components HACMP Logical Volume Management Move cursor to desired item and press Enter. Generally.0 Notes: Introduction This is the menu for using C-SPOC to perform LVM change management and synchronization. you can configure your shared disks using this part of the C-SPOC hierarchy (available directly from the top level C-SPOC SMIT menu). you can make changes in AIX directly and then synchronize or. for managing volume groups that are in Resource Groups that are configured “Online on all nodes” for their Startup Policy. Mode 3 resource groups. and then you can start working with C-SPOC from the same node. and second.Instructor Guide How it works When you create a shared volume group. Remember that the volume group is not really a part of the resource group until you synchronize the addition of the volume group to the resource group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The HACMP Logical Volume Management menus are for managing volume groups in the other resource group types (Startup Policy is “Online on home node” or “Online on first available”). First. These are sometimes referred to as Concurrent Mode Resource Groups or if you’ve been around HACMP a long time. most importantly. 1998. to create enhanced concurrent mode volume groups. It is supported and generally recommended to use enhanced concurrent mode volume groups for these types of resource groups as well as for concurrent resource groups. Concurrent versus non-concurrent The C-SPOC menus shown are the two menus on the main C-SPOC menu for Logical Volume Management. Synchronization Note that you only need to add the volume group to a resource group using SMIT from one of the cluster nodes. . What’s the difference? The Concurrent Logical Volume Management menus are used for two things. You must then add the volume group to a resource group before you can use C-SPOC to add shared logical volumes or filesystems. you must rerun the discovery mechanism (refer to top-level menu in the enhanced configuration path) to get HACMP to know about the volume group. They are expected to be used in true concurrent mode across all the nodes in the resource group. You don’t see any options for adding filesystems to these volume groups. You do not need to synchronize the cluster between adding the volume group to a resource group and working with it using C-SPOC unless you want to use C-SPOC from some other node. 7-94 HACMP Implementation © Copyright IBM Corp. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce using C-SPOC for LVM management.V4. 2008 Unit 7. Basic HACMP administration 7-95 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — Let’s look at using C-SPOC to create a Volume Group. . This is one case of using C-SPOC where synchronization is not automatic. Before creating a shared volume group for the cluster using C-SPOC check that: .All disk devices are properly configured on all cluster nodes and the device is listed as available on all nodes . F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-40. Press Enter AFTER making all desired changes. 7-96 HACMP Implementation © Copyright IBM Corp. This ensures that we are using the same disk on all nodes. . Please check for a commonly available major number on all nodes before changing this setting.Instructor Guide Creating a shared volume group Create a Concurrent Volume Group Type or select values in entry fields. even if the hdisk names are not consistent across the nodes). 1998.uk 00055207bbf6edab 0000> [xwebvg] 64 [207] true false + # + + Warning : Changing the volume group major number may result in the command being unable to execute successfully on a node that does not have the major number currently available.0 Notes: Creating a shared volume group You can use C-SPOC to create a volume group but be aware that you must then add the volume group name to a resource group and synchronize.All disk devices are properly attached to the cluster nodes . This menu was reached through the “Concurrent Logical Volume Management” option on the main C-SPOC menu. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Node Names PVID VOLUME GROUP name Physical partition SIZE in megabytes Volume group MAJOR NUMBER Enhanced Concurrent Mode Enable Cross-Site LVM Mirroring Verification [Entry Fields] usa.Disks have a PVID (C-SPOC lists the disks by their PVIDs. Creating a shared volume group AU548. © Copyright IBM Corp. and so forth. such as adding it to a resource group. Transition statement — After creating a VG. 1998. . Details — As in figure and student notes. through the Concurrent Volume Group Management menu option.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to use C-SPOC to create a new volume group. Basic HACMP administration 7-97 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. Additional information — Point out that creating a shared volume group is creating an enhanced concurrent mode volume group. you must discover it so that the new VG will be available in pick lists for future actions.V4. . and so forth. add VG to resource group AU548.0 Notes: Discover and add VG to resource group After creating a volume group. You must use the Extended Configuration menu for both of these actions. 1998. such as adding it to a resource group. Discover HACMP-related Information from Configured Nodes Extended Topology Configuration Add the VG to an RG so it can be used Extended Resource Configuration Extended Event Configuration Extended Cluster Service Settings Extended Performance Tuning Parameters Configuration Security and Users Configuration Snapshot Configuration Export Definition File for Online Planning Worksheets Import Cluster Configuration from Online Planning Worksheets File Extended Verification and Synchronization HACMP Cluster Test Tool in the next steps then verify and sync to put it on all cluster nodes F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image © Copyright IBM Corporation 2008 Figure 7-41. 7-98 HACMP Implementation © Copyright IBM Corp. add VG to a resource group Extended Configuration Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you must discover it so that the new volume group will be available in pick lists for future actions.Instructor Guide Discover. Discover. V4. . © Copyright IBM Corp. Details — See student notes. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Briefly point out the Discover and Resource Configuration menus in the Extended Configuration menu. Basic HACMP administration 7-99 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — You can also use C-SPOC to create a shared file system. 1998. change “Number of copies…” to 2. C-SPOC enables you to add a journaled file system to either: . either varyonvg the volume group manually.11] F1=Help F5=Reset F9=Shell F2=Refresh F7=Edit F10=Exit F3=Cancel F8=Image Enter=Do [Entry Fields] xwebgroup xwebvg usa [200] [xweblv] [jfs2] middle minimum [] 1 # + + # + F4=List The volume group must be in a resource group that is online. otherwise. Your choice.A shared volume group (no previously defined cluster logical volume) SMIT checks the list of nodes that can own the resource group that contains the volume group. Creating a shared file system (1 of 2) AU548. Consequently. If the volume group does not already have a JFS2 log (unless you plan to use inline logs. create logical volumes for the filesystem and jfs2log. 1998. it does not display in the pop-up list.. Remember to logform the jfs2log logical volume. For a mirrored LV. However.. The same can be said if you are creating a JFS filesystem.0 Notes: Creating a shared file system using C-SPOC It is generally preferable to control the names of all of your logical volumes. [TOP] Resource Group Name VOLUME GROUP name Reference node * Number of LOGICAL PARTITIONS PHYSICAL VOLUME names Logical volume NAME Logical volume TYPE POSITION on physical volume RANGE of physical volumes MAXIMUM NUMBER of PHYSICAL VOLUMES to use for allocation Number of COPIES of each logical partition [MORE. Add a Shared Logical Volume Type or select values in entry fields. or via starting cluster services. then you must also explicitly create a logical volume for the JFS log and format it with logform. then the jfs2log won’t be needed). Press Enter AFTER making all desired changes.Instructor Guide Creating a shared file system (1 of 2) First. it is generally best to explicitly create a logical volume for the file system. The volume group to which you want to add the filesystem must be online. © Copyright IBM Corporation 2008 Figure 7-42. creates the logical volume (on an existing log logical volume if 7-100 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998. 2008 Unit 7. it creates a new log logical volume) and adds the file system to the node where the volume group is varied on (whether it was varied on by the C-SPOC utility or it was already online). All other nodes in the resource group run an importvg -L for non-enhanced concurrent mode volume groups. It adds the file system to the node where the volume group is varied on (whether it was varied on by the C-SPOC utility or it was already online). or an imfs for enhanced concurrent mode volume groups. © Copyright IBM Corp.V4.A previously defined cluster logical volume (in a shared volume group) SMIT checks the list of nodes that can own the resource group that contains the volume group where the logical volume is located. .0 Instructor Guide Uempty present. . or an imfs for enhanced concurrent mode volume groups. otherwise. Basic HACMP administration 7-101 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. All other nodes in the resource group run an importvg -L for non-enhanced concurrent mode volume groups. However. let’s create the file system. 7-102 HACMP Implementation © Copyright IBM Corp. Don’t know when this changed.” Transition statement — Now that we’ve got the file system’s logical volume and a JFS log logical volume. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Show how to create a shared file system using C-SPOC. . 1998. Details — As in student notes. We used to recommend that: “To allow for the possibility that you might choose to export a JFS2 filesystem in the future. Additional information — Note that the HA manuals say that the highly available NFS server capability of HACMP does not support the exporting of JFS2 file systems that have an inline log. we have heard that the manuals are wrong about this and that HACMP now does support inline logs. it is generally best to always use external JFS2 log logical volumes with JFS2 file systems. 0 Instructor Guide Uempty Creating a shared file system (2 of 2) Then create the filesystem on the now "previously defined logical volume" Add an Enhanced Journaled File System on a Previously Defined Logical Volume Type or select values in entry fields. © Copyright IBM Corp. Creating a shared file system (2 of 2) AU548. [Entry Fields] usa. 2008 Unit 7. . Basic HACMP administration 7-103 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Press Enter AFTER making all desired changes.V4.0 Notes: Creating a shared file system. step 2 When you’ve created the logical volume. then create a file system on it. 1998.uk xweblv [/xwebfs] read/write [] 4096 no [] Node Names LOGICAL VOLUME name * MOUNT POINT PERMISSIONS Mount OPTIONS Block Size (bytes) Inline Log? Inline Log size (MBytes) + + + + # F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-43. Instructor Guide Instructor notes: Purpose — Show the C-SPOC screen for creating a file system on an existing logical volume. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — Let’s take a look at the very important issue of change management. 7-104 HACMP Implementation © Copyright IBM Corp. but also /usr/sbin/cluster/etc/vg. Basic HACMP administration 7-105 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Physical RAM: kernel memory space This information must be kept in sync on all nodes that might access the shared volume group or groups in order for takeover to work.Physical disks: VGDA.0 Instructor Guide Uempty LVM change management Historically. LVCB . 1998.cluster enabled equivalents of the standard SMIT LVM functions VGDA = ODM © Copyright IBM Corporation 2008 Figure 7-44.V4. LVM change management AU548. lack of LVM change management has been a major cause of cluster failure during fallover. – – – – – Manual updates to each node to synchronize the ODM records Lazy update C-SPOC synchronization of ODM records RSCT for Enhanced Concurrent Volume Groups C-SPOC LVM operations . files in the /dev directory and /etc/filesystems . . There are several methods available to ensure LVM changes are correctly synced across the cluster.AIX files: primarily the ODM. 2008 Unit 7. © Copyright IBM Corp.0 Notes: The importance of LVM change management LVM change management is critical for successful takeover in the event of a node failure. Information regarding LVM constructs is held in a number of different locations: . Instructor Guide How to keep LVM synchronized across the cluster There are several ways to ensure this information is kept in sync: • Manual update • Lazy Update • C-SPOC VG synchronization utility • C-SPOC LVM operations • RSCT (for enhanced concurrent mode volume groups) 7-106 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Details — As in student notes. 2008 Unit 7. The idea here is to just list the possible methods before discussing each one in detail. . Additional information — Transition statement — Let’s first look at the manual method.V4.0 Instructor Guide Uempty Instructor notes: Purpose — List methods for doing LVM change management. © Copyright IBM Corp. 1998. Basic HACMP administration 7-107 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Other than the sheer complexity of this procedure. Make sure that the auto activate is turned off (chvg -an sharedvg) after the importvg command is executed because the cluster manager will control the use of the varyonvg command on the node where the volume group should be varied on. 4. there are better ways.0 Notes: After making a change to an LVM component. 1. the Volume Group must be active on one of the nodes 1. the real problem with it is that it requires that the resource group be down while the procedure is being carried out. 2.Instructor Guide LVM changes: Manual To perform manual changes. Make necessary changes to the volume group or filesystem Unmount filesystems and varyoff the vg (or stop cluster services) Export the volume group from the ODM Import the information from the VGDA Change the auto vary on flag (if necessary) Correct the permissions and ownership's on the logical volumes as required Repeat to all other nodes Restart Cluster Services to restart the application #mklv -y‘db10lv' -t'jfs2' sharedvg 10 #crfs -v jfs2 -d'db10lv' -m'/db10' #unmount /sharedfs #varyoffvg sharedvg #exportvg sharedvg #importvg -V123 -y sharedvg hdisk3 #chvg -an sharedvg #varyoffvg sharedvg On all the other nodes that share the volume group © Copyright IBM Corporation 2008 Figure 7-45. 6. LVM changes: Manual AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2. you must propagate the change to the other nodes in the cluster that are sharing the volume group using the steps described. 1998. 5. 3.. . such as creating a new logical volume and file system as shown in the figure.. Fortunately. 7-108 HACMP Implementation © Copyright IBM Corp. .0 Instructor Guide Uempty Instructor notes: Purpose — Describe the manual method of making an LVM change and propagating the change to the other nodes sharing the volume group. © Copyright IBM Corp. Details — Additional information — Transition statement — The next method that can be used to make LVM changes is called Lazy Update. 2008 Unit 7.V4. 1998. Basic HACMP administration 7-109 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Let’s see how that works. 0 Notes: HACMP has a facility called Lazy Update that it uses to attempt to synchronize LVM changes during a fallover. The time needed for takeover expands by a few minutes if a Lazy Update occurs.Instructor Guide LVM changes: Lazy update At fallover time. 7-110 HACMP Implementation © Copyright IBM Corp. HACMP for AIX activates the volume group without exporting and re-importing. it does increase the fallover time minimally for the first fallover after the LVM change was made. If the timestamps are the same. the HACMP for AIX software exports and re-imports the volume group before activating it. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. lazy update compares the time stamp value in the VGDA with one stored in the ODM. If the time stamps are the same. HACMP for AIX compares the timestamp from the ODM with the timestamp in the VGDA on the disk (use /usr/es/sbin/cluster/utilities/clvgdata hdiskn to find the VGDA timestamp for a volume group). AIX updates the ODM timestamp whenever the LVM component is modified on that system. 1998. This method requires no downtime. then HACMP does the export/import cycle similar to a manual update. HACMP uses a copy of the timestamp kept in the ODM and a timestamp from the volume group’s VGDA. as indicated. although. If the timestamps do not agree. then the varyonvg proceeds. If the values are different. When a cluster node attempts to vary on the volume group. LVM changes: Lazy update AU548. – HACMP does change the VG auto vary on flag – It preserves permissions and ownership of the logical volumes when a Big/Scalable VG – Will fail if: • Necessary PVIDs not known on all nodes participating in VG • VG not known on all nodes in RG 9 8 11 12 1 10 2 3 4 7 6 5 9 8 11 12 1 10 2 3 4 7 6 5 © Copyright IBM Corporation 2008 Figure 7-46. V4. © Copyright IBM Corp.0 Instructor Guide Uempty Realize though that this mechanism will not fix every situation where nodes are out-of-sync. Basic HACMP administration 7-111 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. 1998. Further. -G for group id. To preserve permissions/ownership over an import. The importvg must be done with a -R. . the volume group must be a Big or Scalable VG and the logical volumes must be modified using chlv with the -U (for user id). -P (for permissions) flags. having the takeover process fix problems with the LVM meta-data at takeover time is not the preferred method of handling the synchronization. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 7-112 HACMP Implementation © Copyright IBM Corp. . Details — Additional information — Transition statement — The third method to make changes is to use C-SPOC synchronization.Instructor Guide Instructor notes: Purpose — Explain the Lazy Update Facility in HACMP. Note: If using an enhanced concurrent mode volume group and a filesystem has been added to an existing logical volume without using C-SPOC.0 Notes: Using C-SPOC to synchronize manual LVM changes In this method. . you manually make your change to the LVM on one node and then invoke C-SPOC to propagate the change. This task allows you to use C-SPOC to “clean-up” after-the-fact. Basic HACMP administration 7-113 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. you are strongly encouraged to use C-SPOC to perform the LVM add/remove/update and not use this mechanism to synchronize after-the-fact. creating an out-of-sync condition between a node in the cluster and the rest of the nodes. © Copyright IBM Corp. the imfs is not done meaning this is an ineffective function. Most likely the reason you are using this C-SPOC task is because someone who is unfamiliar with cluster node management made a change to a shared LVM component without using C-SPOC. 2008 Unit 7. LVM changes: C-SPOC synchronization AU548.0 Instructor Guide Uempty LVM changes: C-SPOC synchronization Manually make your change to the LVM on one node Use C-SPOC to propagate the changes to all nodes in the resource group – Filesystem updates (imfs) are not performed using this function if the Volume Group is an enhanced concurrent mode volume group update vg constructs use C-SPOC syncvg C-SPOC updates ODM and the time stamp file © Copyright IBM Corporation 2008 Figure 7-47. For this reason (among many others). 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 7-114 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide This facility is accessed by using the following SMIT path in HACMP: smitty hacmp --> System Management (C-SPOC) --> HACMP Logical Volume Management --> Synchronize a Shared Volume Group Definition. Basic HACMP administration 7-115 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the method of manual change + C-SPOC to distribute the change. .V4. the synchronization is automatic. but beware. 1998. © Copyright IBM Corp. Details — Additional information — Transition statement — When using ECMVGs. Instructor Guide Enhanced concurrent mode volume groups Another synchronization method is the use of ECMVGs (Enhanced Concurrent Mode Volume Groups) RSCT updates LVM information automatically for ECMVGs – Happens immediately on all nodes running cluster services – Nodes that are not running cluster services will be updated when cluster services are started Benefits – Fast Disk Takeover – Can convert existing VGs to ECMVGs via C-SPOC Limitations – Incomplete • /etc/filesystems not updated – Incompatible • Must be careful using ECMVGs if any product that is running on the system places SCSI reserves on the disks as part of its function © Copyright IBM Corporation 2008 Figure 7-48. RSCT will automatically update the ODM on all the nodes that share the volume group when an LVM change occurs on one node. . 7-116 HACMP Implementation © Copyright IBM Corp. because it is limited to only ECM volume groups and because /etc/filesystems is not updated. 1998. it’s better to explicitly use C-SPOC to make LVM changes.0 Notes: RSCT as LVM change management With enhanced concurrent mode (ECM) volume groups. Enhanced concurrent mode volume groups AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. However. Details — Additional information — Transition statement — And now we look at the last method to make and distribute changes and that is to use C-SPOC for both making the change and distributing the change. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss RSCT’s update of the ODM for ECM VGs. Basic HACMP administration 7-117 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. © Copyright IBM Corp. .V4. It might take a little longer to run than the normal chfs application.Instructor Guide The best method: C-SPOC LVM changes Enhanced Journaled File Systems Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Other C-SPOC screens exist for pretty much any operation that you are likely to want to do with a shared volume group. but it is well worth the wait. 1998. 7-118 HACMP Implementation © Copyright IBM Corp. Add an Enhanced Journaled File System Add an Enhanced Journaled File System on a Previously Defined Logical Volume List All Shared File Systems Change / Show Characteristics of a Shared Enhanced Journaled File System Remove a Shared File System F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-49. The best method: C-SPOC LVM changes AU548.0 Notes: You can use C-SPOC to both make the change and to distribute the change. This approach has two major advantages: no downtime is required and you can be confident that the nodes are in sync. . 0 Instructor Guide Uempty Instructor notes: Purpose — Explain how to use C-SPOC to make and distribute a change. Basic HACMP administration 7-119 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Details — Additional information — Transition statement — Let’s select Change / Show Characteristics of a Shared File System and use C-SPOC to change the size of a shared file system. 2008 Unit 7. © Copyright IBM Corp. . 1998. 1998. Add an Enhanced Journaled File System Add an Enhanced Journaled File System on a Previously Defined Logical Volume List All Shared File Systems Change / Show Characteristics of a Shared Enhanced Journaled File System Remove a Shared File System +--------------------------------------------------------------------------+ ¦ File System Name ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. 7-120 HACMP Implementation © Copyright IBM Corp.0 Notes: Changing a shared file system using C-SPOC We have to provide the name of the file system that we want to change. The file system must be in a volume group that is currently online somewhere in the cluster and is already configured into a resource group. LVM changes: Select your file system AU548. ¦ ¦ ¦ ¦ # Resource Group File System ¦ ¦ xwebgroup /xwebfs ¦ ¦ ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 7-50. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide LVM changes: Select your filesystem Enhanced Journaled File Systems Move cursor to desired item and press Enter. . Details — Additional information — Transition statement — Select the file system and press Enter. Basic HACMP administration 7-121 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. . © Copyright IBM Corp. 1998.V4.0 Instructor Guide Uempty Instructor notes: Purpose — This is the next step in changing the size of a file system using C-SPOC. in 512 byte blocks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Update the size of a file system AU548. . The file system is re-sized and the relevant LVM information is updated on all cluster nodes configured to use the file system’s volume group. and press Enter.0 Notes: Changing file system size Specify a new file system size. Press Enter AFTER making all desired changes.Instructor Guide Update the size of a filesystem Change/Show Characteristics of a Shared File System in the Cluster Type or select values in entry fields. 7-122 HACMP Implementation © Copyright IBM Corp. 1998. [Entry Fields] xwebgroup /xwebfs [/xwebfs] [4000000] [] read/write [] no 4096 no 0 Resource Group Name File system name NEW mount point SIZE of file system (in 512-byte blocks) Mount GROUP PERMISSIONS Mount OPTIONS Start Disk Accounting? Block Size (bytes) Inline Log? Inline Log size (MBytes) + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-51. V4. © Copyright IBM Corp. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Show the final step in changing the size of a file system using C-SPOC. let’s look at using C-SPOC to manage resource groups. Details — Additional information — Transition statement — Next. 1998. . Basic HACMP administration 7-123 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide HACMP resource group operations HACMP Resource Group and Application Management Move cursor to desired item and press Enter. . 7-124 HACMP Implementation © Copyright IBM Corp.0 Notes: HACMP resource group and application management This visual shows the selections for managing resource groups. HACMP resource group operations AU548. Show the Current State of Applications and Resource Groups Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site Suspend/Resume Application Monitoring Application Availability Analysis F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image © Copyright IBM Corporation 2008 Figure 7-52. 1998. Basic HACMP administration 7-125 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .V4. 2008 Unit 7. © Copyright IBM Corp. 1998. Details — Additional information — Transition statement — When you are managing resource groups. it’s important to understand the concept of priority override location.0 Instructor Guide Uempty Instructor notes: Purpose — Show the resource group management screen. Before HACMP V5. The resource group remains on that node in an online state (if you moved or on-lined it there) or offline state (if you off-lined it there) until the priority override location is cancelled. Priority override location (POL): Old AU548.0 Notes: Priority override location (old) “problem” behavior “Problem” behavior is in the following levels: . The destination node that you specify for a resource group move. online or offline request (see next couple of visuals) becomes the priority override location for the resource group. regardless of Fallback policy – POL is viewed with the command: • /usr/es/sbin/cluster/utilities/clRGinfo –p – Information maintained in a file • Manual manipulation possible by changing the file – Obvious problem is that the behavior of the Resource Group might be unexpected in that it might contradict the policy in the Resource Group © Copyright IBM Corporation 2008 Figure 7-53.3 PTF IY84883 – May 2006 .2 PTF IY82989 – April 2006 .x introduced the notion of a priority override location. A priority override location overrides all other fallover and fallback policies and possible locations for the resource group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 PTF IY84646 – May 2006 HACMP 5. . A resource group does not normally have a priority override location (POL).Instructor Guide Priority override location (POL): Old Old.Before HACMP V5. “problem” behavior – Assigned during a resource group move operation. offline or move request becomes the resource group's POL • Represents the location a Resource Group “goes to” regardless of cluster events. 7-126 HACMP Implementation © Copyright IBM Corp. 1998. • The destination node for a resource group online.Before HACMP V5. – Meant to honor the administrator’s desire to have the Resource Group on a specific node • Truly an override of Resource Group policy setting RestoreNodePriority caused resource group movement. Basic HACMP administration 7-127 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.A non-persistent priority override location is cancelled either explicitly or implicitly when the HACMP daemons are shut down on all the nodes in the cluster simultaneously. The discussion here refers to the behavior of non-concurrent access resource groups.A persistent priority override location remains in effect until explicitly cancelled. . Refer to Chapter 15 of the HACMP for AIX Administration Guide for information on how priority override locations work for concurrent access resource groups.V4. Concurrent access resource groups The behavior of priority override location varies depending on whether the resource group is a concurrent access resource group. 2008 Unit 7. . © Copyright IBM Corp. 1998. .0 Instructor Guide Uempty Persistent and non-persistent POL Priority override locations can be persistent and non-persistent. Transition statement — HACMP 5. Additional information — Be prepared to discuss the rules for concurrent access resource groups if a student needs or wants to know what they are (refer to Chapter 15 of the HACMP for AIX Administration Guide for more information). 1998.Instructor Guide Instructor notes: Purpose — Explain the old priority override location concept. 7-128 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — We do not discuss the rules for the concurrent resource groups (resource groups whose startup policy is Online on All Available Nodes) because add yet another layer of complexity to an already complex issue and very few students ever have anything to do with concurrent access resource groups.4 introduced a further simplification of the POL. ” In that case.V4. © Copyright IBM Corp. Priority override location (POL): New AU548. 2008 Unit 7.HACMP V5.4 but with the above mentioned PTFs or later. the behavior is the same as the old way.HACMP V5. then moves RG back to highest priority node only if Fallback Policy is “fallback to highest priority node” HACMP V5.HACMP V5. the function is strictly internal and the Resource Group Move operation is treated as temporary. unless the Fallback Policy is “fallback to highest priority node.4 and later. The original highest priority node is flagged in SMIT when subsequent resource group moves are initiated. Now the RestoreNodePriority only resets the POL setting. . make the changes in the Resource Group.3 PTF IY84883 – May 2006 . Basic HACMP administration 7-129 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.1 PTF IY84646 – May 2006 Prior to HACMP 5.0 Instructor Guide Uempty Priority override location (POL): New Pre-HACMP V5. the problem where the resource group moved on RestoreNodePriority regardless of Fallback Policy settings was fixed.4 – RestoreNodePriority resets POL. For HACMP 5.4 and later – – – – – Function is strictly internal The resource group is moved only for resource group move operations No RestoreNodePriority SMIT choice Original highest priority node is “remembered” and flagged in SMIT on later moves Persist across cluster reboot is no longer supported Destination node is now the new “home” node Changes to /usr/es/sbin/cluster/utilities/clRGinfo –p – Now shows location of “temporary” highest priority and timestamp of move © Copyright IBM Corporation 2008 Figure 7-54.0 Notes: Priority override location: “Problems” solved “New” behavior is in the following levels and later: .2 PTF IY82989 – April 2006 . If more permanent changes are desired. Additional information — Be prepared to discuss the rules for concurrent access resource groups if a student needs or wants to know what they are (refer to Chapter 15 of the HACMP for AIX Administration Guide for more information). . Transition statement — Now that we’ve got an idea about the priority override location. 1998.Instructor Guide Instructor notes: Purpose — Explain the new priority override location concept. Details — We do not discuss the rules for the concurrent resource groups (resource groups whose startup policy is Online on All Available Nodes) because they add yet another layer of complexity to an already complex issue and very few students ever have anything to do with concurrent access resource groups. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-130 HACMP Implementation © Copyright IBM Corp. let’s take a look at moving a resource group. V4. © Copyright IBM Corp. a resource group’s priority override location can be cancelled by selecting a destination node of Restore_Node_Priority_Order.3 and earlier. See the man page for details. # *Denotes Originally Configured Highest Priority Node *usa uk india F1=Help F8=Image /=Find F2=Refresh F10=Exit n=Find Next F3=Cancel Enter=Do F1 F9 © Copyright IBM Corporation 2008 Figure 7-55. Basic HACMP administration 7-131 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the resource group must be chosen from a list of online resource groups.0 Instructor Guide Uempty Moving a resource group (1 of 2) HACMP Resource Group and Application Management Move cursor to desired item and press Enter. 2008 Unit 7. Moving a resource group (1 of 2) AU548. The clRGmove utility program is used. Working with the POL For HACMP 5. . 1998. Move Resource Groups to Another Node Move Resource Groups to Another Site Select a Destination Node Move cursor to desired item and press Enter. which can also be invoked from the command line.0 Notes: Moving a resource group Prior to the SMIT panel shown. You can request that a resource group be moved to any node that is in the resource group’s list of nodes (where cluster services are active). The destination node that you specify becomes the resource group’s priority override location. it will be non-persistent. 1998. .Instructor Guide For HACMP 5. Otherwise. if Persist Across Cluster Reboot is set to true. then the priority override location will be persistent.3 and earlier. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 7-132 HACMP Implementation © Copyright IBM Corp. 2008 Unit 7. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the Move Resource C-SPOC utility. . Additional information — Transition statement — Now that we’ve selected a node. Details — A lot of complex information is on this foil but the concepts are not really all that complicated. you get the following screen. Basic HACMP administration 7-133 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Try to take the time to make sure people understand them. © Copyright IBM Corp.V4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. [Entry Fields] xwebgroup uk Resource Group to be Moved Destination Node F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image An option to “persist across cluster reboot” is available prior to HACMP V5. Moving a resource group (2 of 2) AU548.Instructor Guide Moving a resource group (2 of 2) Move a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. 1998.4 Monitor for the cluster to stabilize and verify that the resources are available on the target © Copyright IBM Corporation 2008 Figure 7-56. .0 Notes: This screen follows. press enter to move the xwebgroup to the uk node. 7-134 HACMP Implementation © Copyright IBM Corp. 0 Instructor Guide Uempty Instructor notes: Purpose — Show the Select a Resource Group screen for bringing a resource group offline.V4. Basic HACMP administration 7-135 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. © Copyright IBM Corp. . Details — Additional information — Transition statement — Now let’s look at taking a resource group to take offline. 2008 Unit 7. you must select the resource group you wish to take offline. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you can choose All or just one of the active nodes. 7-136 HACMP Implementation © Copyright IBM Corp. Bring a resource group offline (1 of 3) AU548. # # Resource Group # xwebgroup State ONLINE Node(s) / Site uk / F1 F9 F1=Help F8=Image /=Find F2=Refresh F10=Exit n=Find Next F3=Cancel Enter=Do © Copyright IBM Corporation 2008 Figure 7-57. 1998. This is pretty obvious for a resource group that will only be active on one node at a time (OHNO or OFAN). Then you’ll select an online node where you want the resource group brought offline. Show the Current State of Applications and Resource Groups Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site Suspend/Resume Application Monitoring Select a Resource Group Move cursor to desired item and press Enter. For resource groups that can be online on more than one node at once (Online on All Available).Instructor Guide Bring a resource group offline (1 of 3) HACMP Resource Group and Application Management Move cursor to desired item and press Enter. .0 Notes: Bring a resource group offline: Select a resource group To start. . Details — Additional information — Transition statement — When you select the resource group. you will have to choose an online node.V4. Basic HACMP administration 7-137 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the Select a Resource Group screen for bringing a resource group offline. 2008 Unit 7. via the following menu. 1998. © Copyright IBM Corp. Show the Current State of Applications and Resource Groups Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site Suspend/Resume Application Monitoring Application Availability Analysis Select an Online Node Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Now choose the node where the resource group will be taken offline. 1998.Instructor Guide Bring a resource group offline (2 of 3) HACMP Resource Group and Application Management Move cursor to desired item and press Enter. Bring a resource group offline (2 of 3) AU548. uk F1=Help F8=Image /=Find F2=Refresh F10=Exit n=Find Next F3=Cancel Enter=Do F1 F9 © Copyright IBM Corporation 2008 Figure 7-58. . 7-138 HACMP Implementation © Copyright IBM Corp. we look at how to take a Resource Group online. . Details — Additional information — Transition statement — Next. Basic HACMP administration 7-139 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. 1998. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Explain how to take a Resource Group offline. 7-140 HACMP Implementation © Copyright IBM Corp.0 Notes: Bring a resource group offline When a resource group is brought offline on a node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4 © Copyright IBM Corporation 2008 Figure 7-59. [Entry Fields] xwebgroup uk Resource Group to Bring Offline Node On Which to Bring Resource Group Offline F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image The option to “persist across cluster reboot” is available prior to HACMP V5. . 1998.Instructor Guide Bring a resource group offline (3 of 3) Bring a Resource Group Offline Type or select values in entry fields. Press Enter AFTER making all desired changes. all resources will be deactivated on that node. Bring a resource group offline (3 of 3) AU548. . 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Complete the process of taking a Resource Group offline. Basic HACMP administration 7-141 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Unit 7. Details — Additional information — Transition statement — What about bringing a resource group back online? © Copyright IBM Corp. 0 Notes: Bring a resource group online First you’ll choose an offline Resource Group. Bringing a resource group online will activate the resources in it on the target node. Show the Current State of Applications and Resource Groups Bring a Resource Group Online Bring a Resource Group Offline Move a Resource Group to Another Node / Site Suspend/Resume Application Monitoring Application Availability Analysis Select a Destination Node Move cursor to desired item and press Enter. 7-142 HACMP Implementation © Copyright IBM Corp. watch for the cluster to go stable and verify that the resources are available on the intended target node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. # *Denotes Originally Configured Highest Priority Node usa uk F1=Help F8=Image /=Find F2=Refresh F10=Exit n=Find Next F3=Cancel Enter=Do F1 F9 © Copyright IBM Corporation 2008 Figure 7-60. Again. Then the option above will display with the potential nodes on which to bring it online. 1998. Bring a resource group back online AU548. .Instructor Guide Bring a resource group back online HACMP Resource Group and Application Management Move cursor to desired item and press Enter. Details — Point out how it is determined on which node the Resource Group becomes active. . Basic HACMP administration 7-143 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — Let’s examine the logging facilities provided by HACMP for AIX. © Copyright IBM Corp.V4. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Explain how to take a Resource Group online. 2008 Unit 7. 0 Notes: Log files The visual summarizes the HACMP log files.log cl_testtool. Tracks more detailed activity of clcomd when tracing is turned on.log "High level view" of cluster activity.log /var/hacmp/log/clavan. Output of application availability analysis tool. Generated by C-SPOC commands.out* /var/hacmp/log/hacmp. 7-144 HACMP Implementation © Copyright IBM Corp.1 © Copyright IBM Corporation 2008 Figure 7-61. Tracks internal execution of the cluster manager. Tracks execution of group services daemon. .out* /var/hacmp/log/hacmp.log /var/hacmp/clcomd/clcomddiag.log /var/hacmp/log/emuhacmp.4.1 AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. -Two-Node Cluster Configuration Assistant -Generated by utilities and file propagation -Generated by test tool * denotes logs that were in /tmp prior to HACMP 5. Tracks activity of clcomd.out. Cluster history files generated daily.log* /var/hacmp/clverify/clverify.debug* /var/hacmp/clcomd/clcomd.4. Output of emulated events.log clutils.log /var/hacmp/adm/history/cluster. Log files generated by HACMP .Instructor Guide Log files generated by HACMP /var/hacmp/adm/cluster.<1-7>* AIX error log /var/ha/log/topsvcs /var/ha/log/grpsvcs /var/hacmp/log/clstrmgr. All sorts of stuff! Tracks execution of topology services daemon.mmddyyyy /var/hacmp/log/cspoc. Contains verbose messages from clverify (cluster verification utility).before HACMP 5. 1998.log /var/hacmp/log/ clconfigassist. Output of today's HACMP event scripts. V4.4. 1998. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Detail the log files generated by HACMP for AIX showing their locations prior to HACMP 5. Basic HACMP administration 7-145 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 brings to the log file locations. Additional information — Transition statement — Now let’s look at the standardization that HACMP 5. © Copyright IBM Corp. Details — Quickly run through these log files. pointing out the information that HACMP keeps in each file and when it is updated.4. .1. Similarly for key components such as clcomd and clver. clver snap command can collect AIX snapshot data from cluster nodes as well as HACMP data © Copyright IBM Corporation 2008 Figure 7-62. however.1 will use /var/hacmp/log as the default for all log files.4. if you install on top of an existing configuration. HACMP uses korn shell scripts to perform recovery operations. Log files generated by HAMCP .1 and later On HACMP 5. Of course.log /var/hacmp/log/cspoc. the logging was made more consistent. 1998.out – More in Unit 10 Logging improvements for clcomd.4. HACMP 5. your settings will be preserved.1 and later AU548.1 to clean up these scripts and consolidate the use of things like “VERBOSE LOGGING”.remote Improvements to event script logging in hacmp. .1 and later cluster configurations. This produces more consistent results in hacmp.long /var/hacmp/log/migration. “set –x” and the PS4 settings. all log files default to /var/hacmp – – HACMP Log Viewing and Management facility Existing configurations preserve any log file redirections New log files: /var/hacmp/log/clstrmgr.debug. You can view the current settings through SMIT using the HACMP Log Viewing and Management path. 7-146 HACMP Implementation © Copyright IBM Corp.0 Notes: When installed from scratch.4. An effort was made in HACMP 5.out and makes it easier to read and follow.Instructor Guide Log files generated by HACMP: HACMP 5.4.log.HACMP 5. if you want to redirect all log files there is a new SMIT path that enables you to redirect them all at once. or apply a snapshot. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4. 2008 Unit 7. 1998.V4. © Copyright IBM Corp. . Basic HACMP administration 7-147 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty The clsnap command was also updated to collect everything needed at the same time rather than multiple commands and multiple options. 1 and will be present in future releases. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Okay. 1998. 7-148 HACMP Implementation © Copyright IBM Corp. any questions for me? If not.4. let’s review. .Instructor Guide Instructor notes: Purpose — Address the log file changes that occurred in HACMP 5. f. . 5. e.0 Notes: © Copyright IBM Corp. /var/adm/cluster. 4. 2008 Unit 7.0 Instructor Guide Uempty Let’s review: Topic 2 1. True or False? C-SPOC reduces the need for a change management process. Let’s review topic 2 AU548.out c. 3. 1998. /tmp/clstrmgr. C-SPOC cannot do which of the following administration tasks? a.V4.log © Copyright IBM Corporation 2008 Figure 7-63. True or False? Which log file provides detailed output on HACMP event script execution? a. 2. d. True or False? Using C-SPOC reduces the likelihood of an outage by reducing the likelihood that you will make a mistake. b. c.debug b. Basic HACMP administration 7-149 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Add a user to the cluster Change the size of a filesystem Add a physical disks to the cluster Add a shared volume groups to the cluster Synchronize existing passwords None of the above It does not matter which node in the cluster is used to initiate a C-SPOC operation. /tmp/hacmp. Add a user to the cluster Change the size of a filesystem Add a physical disks to the cluster Add a shared volume groups to the cluster Synchronize existing passwords None of the above It does not matter which node in the cluster is used to initiate a C-SPOC operation. Details — Let’s review: Topic 2 solutions 1. 3. 7-150 HACMP Implementation © Copyright IBM Corp. 5. /tmp/clstrmgr. In the next topic. Let’s look at the solutions. /var/adm/cluster. d.debug b.out c. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. /tmp/hacmp.log © Copyright IBM Corporation 2008 Additional information — Transition statement — HACMP has the ability to make many types of changes to a cluster while the cluster remains online. c. e. 2. we take a look at how HACMP can do that. b. C-SPOC cannot do which of the following administration tasks? a. True or False? Which log file provides detailed output on HACMP event script execution? a. 4.Instructor Guide Instructor notes: Purpose — Topic review. True or False? Using C-SPOC reduces the likelihood of an outage by reducing the likelihood that you will make a mistake. f. 1998. . True or False? C-SPOC reduces the need for a change management process. 2008 Unit 7.3 Dynamic automatic reconfiguration event facility Instructor topic introduction What students will do — Learn how the Dynamic Automatic Reconfiguration Event (DARE) facility works.V4. © Copyright IBM Corp. Basic HACMP administration 7-151 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. How students will do it — Lecture and lab. . What students will learn — How DARE works. How this will help students on their job — Understanding how DARE works will help students understand how to make dynamic reconfigurations and what to do if a reconfiguration does not work as expected.0 Instructor Guide Uempty 7. 1998. 1998. we examine HACMP’s capability to perform changes to the cluster configuration while the cluster is running. 7-152 HACMP Implementation © Copyright IBM Corp. or DARE for short.Instructor Guide Dynamic Automatic Reconfiguration Event facility After completing this topic. Dynamic Automatic Reconfiguration Event facility AU548. you should be able to: Discuss the benefits and capabilities of DARE Make changes to cluster topology and resources in an active cluster Use the snapshot facility to return to a previous cluster configuration or to roll back changes © Copyright IBM Corporation 2008 Figure 7-64. .0 Notes: Dynamic Automatic Reconfiguration Event In this topic. This capability is known as Dynamic Automatic Reconfiguration Event. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. Basic HACMP administration 7-153 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Additional information — Transition statement — What is DARE? © Copyright IBM Corp. Details — State the unit objectives to the students. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Topic objectives. . DARE requires three copies of the HACMP ODM. known as the Default. and Active configuration directory. HACMP can make changes on one node and propagate them to other nodes in the cluster while an active configuration is currently being used. By holding three copies of the ODM. DCD Default Configuration Directory which is updated by SMIT/command line: /etc/objrepos Staging Configuration Directory which is used during reconfiguration: /usr/es/sbin/cluster/etc/objrepos/staging Active Configuration Directory from which clstrmgr reads the cluster configuration: /usr/es/sbin/cluster/etc/objrepos/active SCD rootvg ACD © Copyright IBM Corporation 2008 Figure 7-65. Dynamic reconfiguration AU548.Instructor Guide Dynamic reconfiguration HACMP provides a facility that allows changes to cluster topology and resources to be made while the cluster is active. 1998. Staging. . 7-154 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: How it works Dynamic Reconfiguration is made possible by the fact that HACMP holds three copies of the ODM. This facility is known as DARE. Additional information — Transition statement — So. the change takes effect as soon as you synchronize (depending on the change in question). Details — Explain to the students that DARE is a behind the scenes series of event scripts that manipulate the ODM in response to changes in HACMP configuration while HACMP is running. Basic HACMP administration 7-155 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. this means that if you change the cluster’s configuration using an HACMP-related SMIT screen while HACMP is running. In simple terms.V4. . what changes does this allow us to make while Cluster Services is running on at least one node? © Copyright IBM Corp. 2008 Unit 7.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce DARE. 0 Notes: What can DARE do? The visual shows some of the changes that can be made dynamically using DARE. take the application offline or reboot a node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . Topology Changes Adding or removing cluster nodes Adding or removing networks Adding or removing communication interfaces or devices Swapping a communication interface's IP address Resource Changes All resources can be changed © Copyright IBM Corporation 2008 Figure 7-66.Instructor Guide What can DARE do? DARE allows changes to be made to most cluster topology and nearly all resource group components without the need to stop Cluster Services. What can DARE do? AU548. Here are some examples of the tasks that DARE can complete for Topology and Resources without having to bring Cluster Services down. 7-156 HACMP Implementation © Copyright IBM Corp. All changes must be synchronized in order to take effect. Some changes still require a stop and restart of the Cluster Services. Basic HACMP administration 7-157 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Give examples of changes that can be made while Cluster Services is running on at least one node. .V4. Details — Mention to the students that we covered how to add an additional node to a running cluster in the first topic of this unit. Additional information — Transition statement — Not every change can be made with Cluster Services running. 2008 Unit 7. 1998. © Copyright IBM Corp. 0 Notes: Limitations Some changes require a restart of Cluster Services. take the application offline or reboot a node Here are some examples that require a stop and restart of Cluster Services for the change to be made Topology Changes Change the name of the cluster Change the name of a cluster node Change a communication interface attribute Changing whether a network uses IPAT via IP aliasing or via IP replacement Change the name of a network module Add a network interface module Removing a network interface module Resource Changes Change the name of a resource group Change the name of an application server Change the node relationship DARE cannot run if two nodes are not at the same HACMP level © Copyright IBM Corporation 2008 Figure 7-67. What limitations does DARE have? AU548. . 7-158 HACMP Implementation © Copyright IBM Corp. DARE requires that all nodes are at the same HACMP level. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Also.Instructor Guide What limitations does DARE have? DARE cannot change all cluster topology and resource group components without the need to stop Cluster Services. 1998. 2008 Unit 7. Details — Additional information — Transition statement — Let’s see how DARE works. .0 Instructor Guide Uempty Instructor notes: Purpose — Give examples of changes that cannot be made while Cluster Services is running on at least one node. Basic HACMP administration 7-159 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. and then move on to more changes. and synchronize again. not possible to synchronize using DARE. Cluster Configuration Cluster Services Cluster System Management Cluster Recovery Aids RAS Support HACMP Move cursor to desired item and press Enter. Instead. Cluster Configuration Cluster Services Cluster System Management Cluster Recovery Aids RASfdsfsfsafsafsfs fsafsfdsafdsafdsafdsfsdafsdadafsdafsdf Support SCD F1=Help F2=Refresh F3=Cancel Esc+9=Shell Esc+0=Exit Enter=Do Esc+8=Image F1=Help F2=Refresh F3=Cancel Esc+9=Shell Esc+0=Exit Enter=Do Esc+8=Image Type text DCD SCD SCD ACD ACD SCD © Copyright IBM Corporation 2008 Figure 7-68. Such changes are. Although it is possible to make a nearly arbitrarily large set of changes to the configuration and then synchronize them all in one operation. the cluster has to be taken down while the appropriate AIX configuration changes are applied. change the AIX configuration of the resources. . Note that many changes are incompatible with the cluster’s current AIX configuration. 7-160 HACMP Implementation © Copyright IBM Corp. verify that it works. So how does DARE work? AU548. add them back into the resource group. (It is sometimes possible to remove some resources from a resource group.0 Notes: How it works DARE uses three copies of the HACMP ODM to propagate live updates to the cluster topology or resource configuration across the cluster. synchronize it. although. there is likely to be little point in running the resource group without the resources). it is usually better to make a modest change. This is done in five steps detailed above. synchronize. therefore.Instructor Guide So how does DARE work? DARE uses the three separate copies of the ODM to allow changes to be propagated to all nodes while the cluster is active change topology synchronize topology snapshot taken of cluster manager reads SCD is deleted ACD and refreshes or resources in SMIT or resources in SMIT the current ACD 1 2 3 4 5 HACMP Move cursor to desired item and press Enter. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty HACMP 5. . 2008 Unit 7.x synchronizes both topology changes and resource changes whenever it is run. 1998. This is a change from previous releases of HACMP. © Copyright IBM Corp. Basic HACMP administration 7-161 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we have to synchronize the new configuration. Transition statement — So. Additional information — HACMP 5.Instructor Guide Instructor notes: Purpose — Explain the steps in DARE. . Details — Talk the students through the sequence of events for a DARE operation. when the SMIT panel has been edited on one of our cluster nodes. 7-162 HACMP Implementation © Copyright IBM Corp. Point out that the ACD is copied to a snapshot before the change takes effect and that the SCD is only cleared once the change has been committed on all nodes in the cluster. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.x makes no distinction between synchronizing the cluster’s topology and synchronizing the cluster’s resources. Verifying and synchronizing (standard) AU548. Invoking this menu entry initiates an immediate verification and synchronization of the HACMP configuration from the local node’s DCD (there is no opportunity provided to modify the process in any way). Basic HACMP administration 7-163 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .0 Instructor Guide Uempty Verifying and synchronizing (standard) Initialization and Standard Configuration Move cursor to desired item and press Enter.V4. 1998.0 Notes: Verifying and synchronizing (standard) This visual highlights the Verify and Synchronize HACMP Configuration menu entry in the top-level Standard Configuration path’s SMIT menu. © Copyright IBM Corp. 2008 Unit 7. Configuration Assistants Configure an HACMP Cluster and Nodes Configure Resources to Make Highly Available Configure HACMP Resource Groups Verify and Synchronize HACMP Configuration Display HACMP Configuration HACMP Cluster Test Tool F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-69. Instructor Guide Instructor notes: Purpose — Show the standard configuration path’s menu entry for verifying and synchronizing the HACMP configuration. 7-164 HACMP Implementation © Copyright IBM Corp. Details — As in the student notes. Additional information — Transition statement — The extended configuration path’s verification and synchronization mechanism is more flexible. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . (When NODE DOWN) * Verify.V4. This is useful to get a sense of what side effects the synchronization is likely to result in. Basic HACMP administration 7-165 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Setting this field to Emulate causes HACMP to verify and then go through the motions of a synchronize without actually causing the changes to take effect. 2008 Unit 7. It allows the cluster administrator to modify the default verification and synchronization procedure somewhat. For example. Emulate or actual The default of Actual causes the changes being verified and synchronized to “take effect” (become the actual cluster configuration) if the verification succeeds. if the proposed change would trigger a fallover or a fallback (because node priorities have © Copyright IBM Corp. Press Enter AFTER making all desired changes. 1998. the SMIT screen above displays. .0 Notes: Verifying and synchronizing (extended) When the Extended Verification and Synchronization option in the extended configuration path’s top-level menu is selected. Verifying and synchronizing (extended) AU548. Synchronize or Both * Automatically correct errors found during verification? * Force synchronization if verification fails? * Verify changes only? * Logging [Entry Fields] [Both] + [No] + [No] [No] [Standard] + + + HACMP Verification and Synchronization (Active Cluster Nodes Exist) (When NODE UP) * Emulate or Actual * Verify changes only? * Logging F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit [Actual] [No] [Standard] F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 + + + F4=List F8=Image Figure 7-70.0 Instructor Guide Uempty Verifying and synchronizing (extended) HACMP Verification and Synchronization Type or select values in entry fields. Instructor Guide changed) then this would be apparent by looking at /<log_dir>/emuhacmp. 1998. no fallover or fallback actually occurs. Force synchronization if verification fails? Setting this to True requests that HACMP accept configurations that it does not consider to be entirely valid. in this case. Note: Because. 7-166 HACMP Implementation © Copyright IBM Corp. then setting the logging level to Verbose might provide additional information. This can be used to see if a change is valid without actually putting it into effect. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.out or /var/hacmp/adm/cluster. Verify changes only? Setting this to True causes the proposed change to be verified but not synchronized. This is potentially a very dangerous request and should not be made without considerable planning and analysis to ensure that the impact is acceptable. Logging This field can be set to Standard to request the default level of logging or to Verbose to request a more. it is not possible to determine if the hypothetical fallback works if actually performed (it might fail for any number of subtle reasons that simply cannot be discovered by an emulated synchronization). .log. verbose level of logging! If you are having problems getting a change to verify and do not understand why it will not verify. ummmm. Basic HACMP administration 7-167 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 7. .0 Instructor Guide Uempty Instructor notes: Purpose — Show the extended configuration path’s verification and synchronization screen.V4. Details — Additional information — Transition statement — Let’s see how we can discard unwanted changes. © Copyright IBM Corp. this operation is sometimes called rolling back an emulated change. It is located under the Problem Determination Tools menu (accessible from the top-level HACMP SMIT menu).0 Notes: Rolling back an unwanted change that has not yet been synchronized If you have made changes that you have decided to not synchronize. the current contents of the DCD on the node used to initiate the roll back is saved as a snapshot (in case they should prove useful in the future).Sep. they can be discarded using the Restore HACMP Configuration Database from Active Configuration menu entry shown above. Prior to rolling back the DCD on all nodes.19. 1998. HACMP Verification View Current State HACMP Log Viewing and Management Recover From HACMP Script Failure Restore HACMP Configuration Database from Active Configuration Release Locks Set By Dynamic Reconfiguration Clear SSA Disk Fence Registers HACMP Cluster Test Tool HACMP Trace Facility HACMP Event Emulation HACMP Error Notification Manage RSCT Services Open a SMIT Session on a Node F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-71.Instructor Guide Discarding unwanted changes Problem Determination Tools Move cursor to desired item and press Enter. Because the change being discarded is sometimes a change that has been emulated. . The snapshot will have a rather long name similar to: Restored_From_ACD.18.33. Discarding unwanted changes AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This is a misnomer 7-168 HACMP Implementation © Copyright IBM Corp.58 This name can be interpreted to indicate that the snapshot was taken at 19:33:58 on September 18th (the year is not preserved in the name). © Copyright IBM Corp. 1998. 2008 Unit 7. Basic HACMP administration 7-169 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty as the operation rolls back any change that has not yet been verified and synchronized by restoring all node’s DCDs to the contents of the currently active cluster configuration.V4. . . what if you actually did the synchronize and then found that you wanted to roll back? 7-170 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — But.Instructor Guide Instructor notes: Purpose — Explain how you can roll back from an unwanted change that has not yet been synchronized. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. x.0 Instructor Guide Uempty Rolling back from a DARE operation Restore the Cluster Snapshot Type or select values in entry fields. Basic HACMP administration 7-171 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.x snapshot is the right one! © Copyright IBM Corp.V4. Cluster Snapshot Name Cluster Snapshot Description Un/Configure Cluster Resources? Force apply if verify fails? [Entry Fields] jami Cuz -. Manual snapshots are useful If many changes have been made in reasonably rapid succession. 0 being the most recent).odm (where x is 0. . 1998. it is best to manually take a snapshot before embarking on a series of changes. This allows you to roll back to a known point rather than having to guess which active. It can be used to restore the cluster to an earlier state.0 Notes: Rolling back an unwanted change that has been synchronized If you find that a DARE change does not give the desired result.. then you might lose track of which active. Rolling back from a DARE operation AU548. This snapshot is named active. Press Enter AFTER making all desired changes. 2008 Unit 7.he did the lab> [Yes] + [No] + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 7-72. DARE cuts a snapshot of the active configuration immediately prior to committing HACMP configuration.x snapshot is the one that you want.. then you might want to roll it back. To defend yourself against this possibility.9. .Instructor Guide Snapshots are stored in the directory /usr/es/sbin/cluster/snapshots by default (the default can be overridden by setting the SNAPSHOTPATH environment variable). 1998. 7-172 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Point out that DARE cuts a snapshot before it commits the new changes and that this snapshot might be restored in order to roll back to the earlier configuration. Transition statement — But what happens if a DARE operation is in progress and a node crashes? © Copyright IBM Corp. 2008 Unit 7. Additional information — Snapshots are located in /usr/sbin/cluster/snapshots or the directory pointed to by the SNAPSHOTPATH variable.V4. .0 Instructor Guide Uempty Instructor notes: Purpose — Explain how we can roll back from a completed DARE operation. 1998. DARE records snapshots of the previous 10 configurations. Basic HACMP administration 7-173 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Because a node failure at any point after any of the SCDs exists could result in only some of the nodes having the updated SCD. the SCDs must be removed before a restart of Cluster Services on any node (or you risk different cluster nodes 7-174 HACMP Implementation © Copyright IBM Corp. deletes the SCD and uses the new ACD as its configuration. Note that the SCD copies are made before the change is copied by each node’s cluster manager into each node’s ACD.Instructor Guide What if DARE fails? If a dynamic reconfiguration fails because of an unexpected cluster event.0 Notes: What if DARE fails? If a node failure should occur while a synchronization is taking place. Cluster Configuration Cluster Services Cluster System Management Cluster Recovery Aids RAS Support HACMP Move cursor to desired item and press Enter. Cluster Configuration Cluster Services Cluster System Management Cluster Recovery Aids RASfdsfsfsafsafsfs fsafsfdsafdsafdsafdsfsdafsdadafsdafsdf Support SCD F1=Help F2=Refresh F3=Cancel Esc+9=Shell Esc+0=Exit Enter=Do Esc+8=Image F1=Help F2=Refresh F3=Cancel Esc+9=Shell Esc+0=Exit Enter=Do Esc+8=Image Type text Bang! DCD SCD SCD ACD ACD SCD © Copyright IBM Corporation 2008 Figure 7-73. and hence the SCD acts as a lock against further changes being made. change topology synchronize topology snapshot taken of cluster manager reads SCD is deleted ACD and refreshes or resources in SMIT or resources in SMIT the current ACD 1 2 3 4 5 HACMP Move cursor to desired item and press Enter. then the Staging Configuration Directory (SCD) was not cleared on all nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If there is an SCD when Cluster Services starts up on a node. If the SCD is not cleared at the end of a synchronize. What if DARE fails? AU548. The presence of the SCD prevents further configuration changes from being performed. it copies it to the ACD. then this indicates that the DARE operation did not complete or was not successful. then the staging configuration directory might still exist. 1998. This prevents further changes being made to the cluster. © Copyright IBM Corp. a situation that results in one or more cluster nodes crashing). Basic HACMP administration 7-175 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. 1998. .V4.0 Instructor Guide Uempty running with different configurations. Details — More details on this can be found in the HACMP Concepts and Facilities Guide. Additional information — Transition statement — So. how do we clean out the SCD? 7-176 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Introduce the concept of the SCD acting as a dynamic reconfiguration lock. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Basic HACMP administration 7-177 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Dynamic reconfiguration lock AU548. If an SCD exists on any cluster node.0 Notes: Clearing dynamic reconfiguration locks The SMIT menu option Release Locks Set By Dynamic Reconfiguration clears out the SCD and allows further synchronizations to be made to the cluster configuration. then no further synchronizations are permitted until it is deleted using the above SMIT menu option.0 Instructor Guide Uempty Dynamic reconfiguration lock Problem Determination Tools Move cursor to desired item and press Enter. 2008 Unit 7. 1998. © Copyright IBM Corp.V4. HACMP Verification View Current State HACMP Log Viewing and Management Recover From HACMP Script Failure Restore HACMP Configuration Database from Active Configuration Release Locks Set By Dynamic Reconfiguration Clear SSA Disk Fence Registers HACMP Cluster Test Tool HACMP Trace Facility HACMP Event Emulation HACMP Error Notification Manage RSCT Services Open a SMIT Session on a Node F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 7-74. . Additional information — Transition statement — Okay. Details — Run this SMIT menu only in situations where DARE changes were not successfully synchronized because of a node crash at a bad moment. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Explain how we can clear out the SCD. 1998. let’s review. . 7-178 HACMP Implementation © Copyright IBM Corp. any questions for me? If not. c.0 Notes: © Copyright IBM Corp. True or False? Cluster snapshots can be applied while the cluster is running. To prevent further changes being made until a DARE operation has completed c.V4. What is the purpose of the dynamic reconfiguration lock? a. Basic HACMP administration 7-179 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 7. Which operations can DARE not perform (select all that apply)? a. True or False? DARE operations can be performed while the cluster is running. Let’s review: Topic 3 AU548. 4. To prevent unauthorized access to DARE functions b. 6. . 2. True or False? Running a DARE operation requires three separate copies of the HACMP ODM. 5. To keep a copy of the previous configuration for easy rollback © Copyright IBM Corporation 2008 Figure 7-75. Changing the name of the cluster Removing a node from the cluster Changing a resource in a resource group Change whether a network uses IPAT via IP aliasing or via IP replacement 3.0 Instructor Guide Uempty Let’s review: Topic 3 1. b. 1998. d. True or False? It is possible to roll back from a successful DARE operation using an automatically generated snapshot. b. 4. 2. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. d. Details — Let’s review: Topic 3 solutions 1.Instructor Guide Instructor notes: Purpose — Let’s review questions. True or False? Running a DARE operation requires three separate copies of the HACMP ODM. . 6. To prevent further changes being made until a DARE operation has completed c. What is the purpose of the dynamic reconfiguration lock? a. To prevent unauthorized access to DARE functions b. True or False? DARE operations can be performed while the cluster is running. c. 5. To keep a copy of the previous configuration for easy rollback © Copyright IBM Corporation 2008 Additional information — Transition statement — The next topic discusses WebSMIT. True or False? It is possible to roll back from a successful DARE operation using an automatically generated snapshot. a convenient Web-based interface to SMIT. 7-180 HACMP Implementation © Copyright IBM Corp. True or False? Cluster snapshots can be applied while the cluster is running. Changing the name of the cluster Removing a node from the cluster Changing a resource in a resource group Change whether a network uses IPAT via IP aliasing or via IP replacement 3. Which operations can DARE not perform (select all that apply)? a. . How students will do it — Lecture and lab. What students will learn — Capabilities of WebSMIT and how to configure it. Basic HACMP administration 7-181 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 7. Knowing how to configure security for WebSMIT should help students keep their cluster secure.V4. How this will help students on their job — Knowing how to configure WebSMIT allows students to use the this powerful tool.0 Instructor Guide Uempty 7. if they choose to use WebSMIT.4 WebSMIT Instructor topic introduction What students will do — Learn about WebSMIT. © Copyright IBM Corp. Instructor Guide Implementing WebSMIT After completing this topic. . you should be able to: Configure and use WebSMIT © Copyright IBM Corporation 2008 Figure 7-76. Implementing WebSMIT AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.0 Notes: 7-182 HACMP Implementation © Copyright IBM Corp. 1998. Let’s take a look at it.0 Instructor Guide Uempty Instructor notes: Purpose — Discuss objectives for this topic. Basic HACMP administration 7-183 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. . Details — Additional information — Transition statement — HACMP 5. 2008 Unit 7.2 and up includes a convenient Web-based interface to SMIT. © Copyright IBM Corp. For those looking for a graphical interface for managing and monitoring HACMP. component Details. 1998.Instructor Guide Web-enabled SMIT HACMP 5.4 and later • Use websmit_config utility © Copyright IBM Corporation 2008 Figure 7-77. and so on. There are multiple views. similar to the clstat. Node-by-node. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Resource Group. It provides real-time graphical status of the cluster components.2 and up includes a web-enabled user interface that provides easy access to: – HACMP configuration and management functions – Interactive cluster status display and manipulation – HACMP online documentation The Web-enabled SMIT (WebSMIT) interface is similar to the ASCII SMIT interface. It also provides context menu access to those components to control by launching a WebSMIT menu containing the action or actions to take.0 Notes: Introduction WebSMIT combines the advantages of SMIT with the ease of access from any system that runs a browser. Associations. WebSMIT provides those capabilities via a Web browser. Web-enabled SMIT (WebSMIT) AU548. so it is imperative that you have your snmp interface to the cluster manager functioning. Configuration This utility uses snmp. . You do not need to learn a new user interface or terminology and can easily switch between ASCII SMIT and WebSMIT To use the WebSMIT interface. attempt a cldump command on the system 7-184 HACMP Implementation © Copyright IBM Corp.cgi. To test that. you must configure and run a Web server process on the cluster nodes to be administered – The configuration has been made simpler with HACMP 5. Check it out in lab. A configuration utility is provided (websmit_config) requiring that only a supported HTTP server be installed to configure the system for use as a WebSMIT server. Basic HACMP administration 7-185 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Off-line/Unavailable status is displayed as “grayed out. . The tool is called websmitctl.” .V4.An “instant help” system. 2008 Unit 7.Resource-type awareness in the display. Features .Most WebSMIT items can be assigned a custom color set.Language support is more sophisticated. . A robust control tool is provided as well to control the HTTP server functioning.Auto-configuration improvements. . . 1998.0 Instructor Guide Uempty where you will be running the WebSMIT utility. © Copyright IBM Corp. Instructor Guide Instructor notes: Purpose — Introduce WebSMIT. 7-186 HACMP Implementation © Copyright IBM Corp. 1998. Details — Additional information — Transition statement — The next visual shows the WebSMIT main page. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . WebSMIT main page AU548. point your browser to the cluster node that you have configured for WebSMIT. 2008 Unit 7.V4. functions or controls. Each pane is tabulated to provide access to different status. Basic HACMP administration 7-187 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. this will be the first screen that you see. Navigation Frame tabs: . WebSMIT uses port 42267 by default.access to HACMP SMIT .a Node-by-node relationship and status view of the cluster (if snmp can get cluster information) © Copyright IBM Corp. After authentication. . Note the Navigation Frame (left side) and the Activity Frame (right side).N&N . note that we’re looking at configuration options only. Also.SMIT .0 Instructor Guide Uempty WebSMIT main page HACMP SMIT access © Copyright IBM Corporation 2008 Figure 7-78. 1998.0 Notes: Introduction To connect to WebSMIT. shows component relationship to other HACMP components for component that is selected in the Navigation Frame . this tab will display links to access them Don’t attempt to navigate using the browser’s Back or Forward buttons.Configuration .Instructor Guide . Activity Frame tabs: .permanent access to HACMP SMIT from Activity Frame . Note the FastPath box at the bottom of the Configuration tab. What’s the fastpath to the SMIT top menu? 7-188 HACMP Implementation © Copyright IBM Corp.If the HACMP pubs were installed (html or pdf version). This allows you to go directly to any (that is any) SMIT panel if you know the fastpath. and displays configuration information about the component .Details .comes to top when a component is selected in Navigation Frame. .RGs . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Associations . 1998.a Resource Group relationship and status view of the cluster status Use the Expand All or Collapse All links to get the full view or clean up the view.Doc . Additional information — Transition statement — Check out the layout of the WebSMIT main screen and the use of the context menus. © Copyright IBM Corp. 1998. . Basic HACMP administration 7-189 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — View the main WebSMIT page. 2008 Unit 7. Details — This interface is pretty straightforward. Go through the panes and the tabs and introduce the idea of context menus to control cluster components.V4. . Status Notice that the icons (on the screen anyway) indicate online (not grayed out) or offline (grayed out). regarding the associations. WebSMIT context menu controls AU548. 7-190 HACMP Implementation © Copyright IBM Corp. so you’ll get HACMP SMIT menus as a result of the context menu selections.0 Notes: Using the context menus Right-click the object in the Navigation Frame. This is real-time status.Instructor Guide WebSMIT context menu controls Activity Frame changes Right mouse click on app_server Choose an item from the context menu © Copyright IBM Corporation 2008 Figure 7-79. Choose the item you want to control from the context menu and watch the Activity Frame change to the task you’re trying to perform. More to come on the next visual. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Remember this is still SMIT. Basic HACMP administration 7-191 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. called associations. 2008 Unit 7. Details — Additional information — Transition statement — An extension of status is to see how the components relate to one another. . 1998.V4. © Copyright IBM Corp. in a tree format.0 Instructor Guide Uempty Instructor notes: Purpose — Using the context menus. Prior releases used a red square for off-line/unavailable status . 1998.4. industry standard approach. .1 Some of the changes made for HACMP 5.More common.1 are: . WebSMIT associations AU548. . .Instructor Guide WebSMIT associations © Copyright IBM Corporation 2008 Figure 7-80.Off-line/unavailable status is now indicated by “graying out” the affected item or items. Enhancements with HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Industry convention is that the color red is used to indicate a problem situation. 7-192 HACMP Implementation © Copyright IBM Corp. you’ll see the Details tab come to the top of the Activity Frame with the configuration details of the Resource Group.4.0 Notes: Associations If you don’t click fast enough (or just pause long enough) between selecting the Resource Group and clicking the Associations tab. © Copyright IBM Corp.0 Instructor Guide Uempty . . . 2008 Unit 7. so if the customers change browsers. . 1998.V4. they must recreate their customizations in their new browsers. Basic HACMP administration 7-193 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.css” file on the WebSMIT server.Most WebSMIT items can now be assigned custom colors. • Local.Color customizations can be assigned both globally and at the browser level. which can be accessed via the Configuration tab under Extended Options. Note: Local customizations are stored as a cookie in the browser. per-browser customizations can be made through the new Customize WebSMIT panel.Improves accessibility for customers with visual deficits. • Global customizations must be made manually in the “wsm_custom. Details — Additional information — Transition statement — Online Documentation allows you to view the HACMP manuals. 7-194 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Discuss the ability to see the relationships between the cluster components via Associations. 1998. V4. .0 Instructor Guide Uempty WebSMIT online documentation © Copyright IBM Corporation 2008 Figure 7-81. 1998. You must install the HACMP documentation file sets. WebSMIT online documentation AU548. 2008 Unit 7.0 Notes: Online documentation This screen enables you to view the HACMP manuals in either HTML or PDF format. © Copyright IBM Corp. Basic HACMP administration 7-195 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 7-196 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . let’s look at simple configuration process. Details — Additional information — Transition statement — Finally.Instructor Guide Instructor notes: Purpose — Discuss the Online Documentation screen. doc. located in /usr/es/sbin/cluster/wsm. control the SMIT panels that can be accessed – . Basic HACMP administration 7-197 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp.0 (or later) or Apache 1.3 (or later). 1998./websmit_config takes it from there – Readme located at /usr/es/sbin/cluster/wsm/README – Manuals installed from cluster.es./logs/wsm_smit./wsm_smit.conf – Setuid program .allow – .pdf Optionally. ./cgi-bin/wsm_cmd_exec permissions must be set correctly Consult log files for progress status – . WebSMIT configuration AU548.0 Instructor Guide Uempty WebSMIT configuration Base Directory is /usr/es/sbin/cluster/wsm Consult the documentation • Configure and run a Web server on cluster nodes – .es.0 Notes: Documentation The primary source for information on configuring WebSMIT is the WebSMIT README file as shown in the visual. you must configure one (or more) of your cluster nodes as a Web server.redirect © Copyright IBM Corporation 2008 Figure 7-82.script Optionally./wsm_smit.en_US./wsm_smit. You must use either IBM HTTP Server (IBMIHS) V6. See the README file for details.deny – .V4.html and cluster. implement stricter security – Customize .en_US./logs/wsm_smit. Refer to the specific documentation for the Web server you choose. The HACMP Planning and Installation Guide provides some additional information on installation and the HACMP Administration Guide provides information on using WebSMIT.log – ./wsm_smit. 2008 Unit 7. Web server To use WebSMIT.doc. This configuration is done using the websmit_config utility. conf AUTHORIZED_PORT setting.” This did not appear to be true in our testing. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. WebSMIT can be configured to use AIX authentication using the wsm_smit. You might be able to use different combinations of security settings for AIX. and it may not be moved to another location. Apache. then any connections via any port will be accepted. The default value for REDIRECT_TO_HTTPS is 1. the README file states: “This variable will only function correctly if the AUTHORIZED_PORT feature is disabled.Instructor Guide WebSMIT security Because WebSMIT gives you root access to all the nodes in your cluster. If the value for this setting is 1.conf. This will ensure that you are not transmitting sensitive information about your cluster over the Internet in plain text. The default settings used provide the highest level of security in the default AIX/Apache environment. • Require user authentication If Apache's built-in authentication is not being used.conf. The default setting for this configuration variable is 42267. wsm_smit. WebSMIT uses a configuration file. then users connecting to WebSMIT via an insecure connection will be redirected to a secure http connection. This file is installed as /usr/es/sbin/cluster/wsm/wsm_smit.conf file REQUIRE_AUTHENTICATION setting. • Allow only secure http If your HTTP server supports secure HTTP. you should carefully consider the security characteristics of your system before putting WebSMIT to use. If you do not specify an AUTHORIZED_PORT.htaccess file controlling access to 7-198 HACMP Implementation © Copyright IBM Corp. and WebSMIT to improve the security of the application in your environment. If the value for this setting is 1 and there is no . WebSMIT can be configured to require secure http access using the wsm_smit. . It is strongly recommended that you explicitly specify the AUTHORIZED_PORT. you must carefully consider the security implications. it is strongly recommended that you require all WebSMIT connections to be established via HTTPS. or specify a port of 0. that contains settings for WebSMIT's security related features. However. WebSMIT uses the following configurable mechanisms to implement a secure environment: Non-standard port Secure http (https) User authentication Session time-out wsm_cmd_exec setuid program • Use non-standard port WebSMIT can be configured to allow access only over a specified port using the wsm_smit. Note: Regarding the REDIRECT_TO_HTTPS variable.conf REDIRECT_TO_HTTPS setting. and that you use a non-standard port. 1998. Only users whose names are specified will be allowed access to WebSMIT. WebSMIT allows the user to adjust the time-out period using the wsm_smit.conf ACCEPTED_USERS setting. By default. then the HACMP administrator must specify one or more users who are allowed to access the system. The setuid bit for this program must be turned on in order for the WebSMIT system to function.conf for the program to © Copyright IBM Corp. Warning Because AIX authentication mechanisms are in use.V4. Do not allow a non-root user to copy the executable to another location or to “decompile” the program. (Refer to the documentation included with Apache for more details about Apache's built-in authentication. failed WebSMIT login attempts could quickly lock the root account. . This configuration setting must have a value expressed in minutes.conf REQUIRED_WEBSERVER_UID setting. wsm_cmd_exec must not have read permission for non-root users. • Session time-out Continued access to WebSMIT is controlled through the use of a non-persistent session cookie.0 Instructor Guide Uempty WebSMIT. and all ACCEPTED_USERS will be provided with root access to the system. Care must be taken to limit access to this executable. This can be done using the wsm_smit. If the root user has a login failure limit.) The default value for REQUIRE_AUTHENTICATION is 1. then the cookie will not expire. only the root user is allowed access via the ACCEPTED_USERS setting. 2008 Unit 7. the user will be required to provide AIX authentication information before gaining access. • Controlling access to wsm_cmd_exec (setuid) A setuid program is supplied with WebSMIT that allows non-root users to execute commands with root permissions (wsm_cmd_exec). Cookies must be enabled in the client browser in order to use AIX authentication for access control. Thus the utility wsm_cmd_exec (located in /usr/es/sbin/cluster/wsm/cgi-bin/) must be set with 4511 permissions.conf SESSION_TIMEOUT setting. The default value for SESSION_TIMEOUT is 20 (minutes). It is recommended that a separate user be created for the sole purpose of accessing WebSMIT. the cookie is designed to time out after an extended period of inactivity. If REQUIRE_AUTHENTICATION is set. See the README for details. If the session is used continuously. However. For security reasons. Basic HACMP administration 7-199 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The real user ID of the process must match the UID of one of the users listed in wsm_smit. login failures can cause an account to be locked. WebSMIT allows the user to dictate the list of users who are allowed to use the wsm_cmd_exec program using the wsm_smit. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. a Web server CGI process runs as user nobody.allow If this file exists on the server. The WebSMIT logs are not subject to manipulation by the HACMP Log Viewing and Management SMIT panel. If you copy the CGI scripts to the default location for the IBM HTTP Server. . just like smit.cgi displays cluster information in the WebSMIT status panel. A sample file is provided. The snap -e utility captures the WebSMIT log files if you leave them in the default location (/usr/es/sbin/cluster/wsm/logs). snap -e will not find them. Simply rename this file to wsm_smit.log file and are equivalent to the logging done with smitty -v./logs>. WebSMIT will process just about any valid SMIT panel. which contains all the SMIT panel IDs for HACMP.conf file.allow if you want to limit access to just the HACMP SMIT panels. You can limit the set of panels that WebSMIT will process by configuring one or more of these files. This file allows you to configure logging and the menus for the WebSMIT status panel.script log file.log and smit. WebSMIT log files are created by the CGI scripts using a relative path of <. 7-200 HACMP Implementation © Copyright IBM Corp. If the SMIT panel ID (fast path) is not contained in the file. . It is strongly recommended that the HTTP server be configured to run CGI programs as a user who is not authorized to open a login shell (as with user nobody). By default. and by default.Instructor Guide carry out any of its functionality. the files grow indefinitely.. You can customize wsm_clstat. the http request will be rejected. but if you install WebSMIT somewhere else.cgi by changing the /usr/es/sbin/cluster/wsm/cgi-bin/wsm_smit. Use this file to limit WebSMIT to a specific set of SMIT panels. The default value for REQUIRED_WEBSERVER_UID is nobody. it is important to ensure that the REQUIRED_WEBSERVER_UID value matches the configuration of your Web server. the final path to the logs is /usr/HTTPServer/logs. 1998. it will be checked before any SMIT panel is processed. Log files All operations of the WebSMIT interface are logged to the wsm_smit. Script commands are also captured in the wsm_smit. Controlling which SMIT screens can be used As mentioned earlier. If your HTTP server configuration executes CGI programs as a different user. Customizing the WebSMIT status panel wsm_clstat.script. non-root users cannot execute programs as user nobody.wsm_smit. Also. 1998. Using the online documentation feature To use the online documentation feature. . you must install the file sets shown in the visual. .redirect Instead of simply rejecting access to a specific page. See the README file for details. 2008 Unit 7.allow and .redirect file has entries to redirect the user from specific HACMP SMIT panels that are not supported by WebSMIT.deny files.V4.wsm_smit.deny Entering a SMIT panel ID in this file will cause WebSMIT to deny access to that panel. © Copyright IBM Corp. If the same SMIT panel ID is stored in both the . Basic HACMP administration 7-201 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.wsm_smit. you can redirect the user to a different page.deny processing takes precedence. The default . .0 Instructor Guide Uempty . 1998. Additional information — Transition statement — Okay. You should probably discuss the mechanisms for making WebSMIT secure in detail. You probably do not need to go through all of this in detail. Let’s do a Checkpoint to review what we covered.Instructor Guide Instructor notes: Purpose — Discuss WebSMIT configuration Details — There’s a lot of details here. And just provide a quick summary of the other information. we’re done with this unit (finally). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 7-202 HACMP Implementation © Copyright IBM Corp. 0 Notes: © Copyright IBM Corp. . Checkpoint AU548. 2. HACMP Logical Volume Management b. 2008 Unit 7. 1998. HACMP Logical Volume Management b. You want to add a logical volume to the volume group you created in the question above.4. You want to create an Enhanced Concurrent Mode Volume Group that will be used in a Resource Group that will have an “Online on Home Node” Startup policy. True or False? A star configuration is a good choice for your non-IP networks. 3. Which C-SPOC menu should you use? a. 6. 4.V4. Basic HACMP administration 7-203 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP Concurrent Logical Volume Management 5. HACMP Concurrent Logical Volume Management © Copyright IBM Corporation 2008 Figure 7-83. True or False? Using DARE. Which C-SPOC menu should you use? a.0 Instructor Guide Uempty Checkpoint 1. you can change from IPAT via aliasing to IPAT via replacement without stopping the cluster. a resource group’s priority override location can be cancelled by selecting a destination node of Restore_Node_Priority_Order. True or False? RSCT will automatically update /etc/filesystems when using enhanced concurrent mode volume groups True or False? With HACMP V5. HACMP Logical Volume Management b. a resource group’s priority override location can be cancelled by selecting a destination node of Restore_Node_Priority_Order. 1998. 7-204 HACMP Implementation © Copyright IBM Corp. 3. HACMP Logical Volume Management b.4. HACMP Concurrent Logical Volume Management © Copyright IBM Corporation 2008 Additional information — Transition statement — Let’s summarize this unit. you can change from IPAT via aliasing to IPAT via replacement without stopping the cluster.Instructor Guide Instructor notes: Purpose — Checkpoint Details — Checkpoint solutions 1. You want to add a logical volume to the volume group you created in the question above. 6. You want to create an Enhanced Concurrent Mode Volume Group that will be used in a Resource Group that will have an “Online on Home Node” Startup policy. 2. True or False? Using DARE. 4. HACMP Concurrent Logical Volume Management 5. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. True or False? RSCT will automatically update /etc/filesystems when using enhanced concurrent mode volume groups True or False? With HACMP V5. True or False? A star configuration is a good choice for your non-IP networks. Which C-SPOC menu should you use? a. Which C-SPOC menu should you use? a. Basic HACMP administration 7-205 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 © Copyright IBM Corp. 2008 Unit 7.V4. Unit summary AU548. .0 Instructor Guide Uempty Unit summary Key points from this unit: Implementing procedures for change management is a critical part of administering an HACMP cluster C-SPOC provides facilities for performing common cluster-wide administration tasks from any node within the cluster: – – – – Perform routine administrative changes Start and stop cluster services Perform resource group move operations Start and stop cluster services The SMIT Standard and Extended menus are used to make topology and resource group changes The Dynamic Automatic Reconfiguration Event facility (DARE) provides the mechanism to make changes to cluster topology and resources without stopping the cluster The Cluster Snapshot facility allows the user to save and restore a cluster configuration WebSMIT provides access to HACMP SMIT menus from any system with a Web browser © Copyright IBM Corporation 2008 Figure 7-84. 1998. 7-206 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . Details — Additional information — Transition statement — We’re done with this unit.Instructor Guide Instructor notes: Purpose — Summarize the unit. ibm. Unit 8. What you should be able to do After completing this unit.1: Planning Guide SC23-4862-10 HACMP for AIX.4.A new node joins an existing cluster . Version 5. Version 5. 1998.1: Installation Guide SC23-4864-10 HACMP for AIX. Version 5.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX. Version 5.V4.The first node starts in a cluster .4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.A node leaves a cluster voluntarily • Explain what happens when HACMP processes an event • Describe how to customize the event flow • State how to monitor other devices How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX.com/systems/p/library/hacmp_docs.html HACMP manuals © Copyright IBM Corp. Version 5.1: Master Glossary http://www-03.4.4.4.0 Instructor Guide Uempty Unit 8.1: Administration Guide SC23-5177-04 HACMP for AIX. Events 8-1 . Events Estimated time 01:30 What this unit is about This unit describes the event process in HACMP.4. you should be able to: • Describe what is meant by the term “event” • Describe the sequence of events when: . Version 5. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: Describe what an HACMP event is Describe the sequence of events when: – The first node starts in a cluster – A new node joins an existing cluster – A node leaves a cluster voluntarily Explain what happens when HACMP processes an event Describe how to customize the event flow State how to monitor other devices © Copyright IBM Corporation 2008 Figure 8-1.Instructor Guide Unit objectives After completing this unit. 1998.0 Notes: 8-2 HACMP Implementation © Copyright IBM Corp. Unit objectives AU548. Unit 8. Additional information — Transition statement — Let’s go to the first topic.V4.0 Instructor Guide Uempty Instructor notes: Purpose — List the objectives of this unit. 1998. Events 8-3 . The second topic will cover customizing the event flow and dealing with other devices. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The first will describe what an event is and introduce the basic event flows for node start and stop. © Copyright IBM Corp. Details — This unit will be divided into two topics. .Instructor Guide 8-4 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Unit 8.1 HACMP events Instructor topic introduction What students will do — Learn about events.0 Instructor Guide Uempty 8.V4. © Copyright IBM Corp. How students will do it — Lecture and lab. How this will help students on their job — Be better able to implement HACMP into their environment. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. What students will learn — Event definition and start/stop flow. Events 8-5 . 1998. Instructor Guide Topic 1 objectives: HACMP events After completing this topic. you should be able to: Describe what an HACMP event is Explain what happens when HACMP processes an event Describe the sequence of events when: – The first node starts in a cluster – A new node joins an existing cluster – A node leaves a cluster voluntarily © Copyright IBM Corporation 2008 Figure 8-2. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . Topic 1 objectives: HACMP events AU548.0 Notes: 8-6 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Details — Additional information — Transition statement — So.0 Instructor Guide Uempty Instructor notes: Purpose — List objectives for this topic. 1998. Events 8-7 .V4. what is an event anyway in HACMP? © Copyright IBM Corp. which meaning is appropriate is almost certainly obvious from the context of the discussion. 8-8 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998.An incident that is of interest to the cluster. – – – – – – node_up node_down fail_interface join_interface rg_move reconfig_topology_start © Copyright IBM Corporation 2008 Figure 8-3.A script that is used by HACMP to actually deal with one of these incidents Unfortunately. such as the failure of a node or the recovery of a NIC . Fortunately. What is an HACMP event? AU548.Instructor Guide What is an HACMP event? An HACMP event is an incident of interest to HACMP: – – – – – – A node joins the cluster A node crashes A NIC fails A NIC recovers Cluster administrator requests a resource group move Cluster administrator requests a configuration change (synchronization) An HACMP event script is a script invoked by a recovery program to perform the recovery function required.0 Notes: What the term “HACMP event” means The term HACMP event has two contexts: . it is not all that uncommon for the script word to be left off in a discussion of event scripts. 0 Instructor Guide Uempty Instructor notes: Purpose — Provide a more detailed explanation of what an HACMP event is than has been provided so far. Details — Additional information — Transition statement — Let’s take a look at how events are recognized by HACMP. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.V4. Events 8-9 . © Copyright IBM Corp. Unit 8. 1998.Instructor Guide HACMP basic event flow Recovery Programs Recovery Command Recovery Command __ __ __ Event Script HACMP Cluster Manager # ## Beginning of Event Definition Node Up ### # TE_JOIN_NODE 0 /usr/sbin/cluster/events/node_up. and DLPAR. HACMP basic event flow AU548. They arrive at the Cluster Manager. Dynamic Node Priority. . The rules for how these recovery programs should be coordinated and sequenced are described in the HACMP Rules ODM file.rp 2 0 # 6) Resource variable only used for event manager events # 7) Instance vector.0 Notes: How an event script is triggered Most HACMP events result from the detection and diagnostic capabilities of RSCT’s Topology Services component. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. only used for event manager events HACMP Rules ODM Group Services/ES Topology Services/ES © Copyright IBM Corporation 2008 Figure 8-4. which then uses recovery programs to determine which event scripts to call to actually deal with the event. Application Monitoring. The coordination of and sequencing of the recovery programs is actually handled by the Cluster Manager working with RSCT group services. Dynamic Node Priority is one of the fallover policies and DLPAR refers to the Dynamic LPAR capability of HACMP. 8-10 HACMP Implementation © Copyright IBM Corp. The RMC subsystem is used for implementing User-defined Events. 1998.V4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Provide an overview of how HACMP events are recognized and then dealt with. © Copyright IBM Corp. Events 8-11 . Unit 8. Details — Additional information — Transition statement — Let’s take a look now at what the events are and how they are organized. rp node_up.rp server_down.rp rg_offline.rp network_up. 1998.rp resource_state_change.rp join_standby. 8-12 HACMP Implementation © Copyright IBM Corp.rp resource_state_change_complete.rp © Copyright IBM Corporation 2008 Figure 8-5.rp server_restart.rp node_up_dependency.rp external_resource_state_change_complete.rp rg_move. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.rp fail_standby.rp site_up. Recovery programs AU548.rp fail_interface.rp reconfig_topology.rp reconfig_resource.rp reconfig_configuration_dependency_complete. .rp migrate.rp external_resource_state_change.rp node_down.rp reconfig_configuration_dependency_release.rp site_merge.rp node_down_dependency_complete.Instructor Guide Recovery programs cluster_notify.rp rg_online.rp network_down.rp node_up_dependency_complete.rp swap_adapter. These form the first step in processing an event.rp reconfig_configuration.0 Notes: Recovery programs This visual lists the recovery programs that are used by the resource manager component of the Cluster Manager Services to determine what event scripts to invoke.rp reconfig_configuration_dependency_acquire.rp node_down_dependency.rp site_down.rp join_interface.rp site_isolation. V4. Details — This is first step in processing an event. Unit 8. Events 8-13 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Show the list of recovery programs. Additional information — Transition statement — What does a recovery program look like? © Copyright IBM Corp. 8-14 HACMP Implementation © Copyright IBM Corp.rp # This file contains the HACMP/ES recovery program for # site_up events # # format: # relationship command to run expected status NULL # other "site_up" 0 NULL # barrier # event "site_up" 0 NULL # barrier # all "site_up_complete" 0 NULL © Copyright IBM Corporation 2008 Figure 8-6. This is a wait. Recovery program example AU548. .Instructor Guide Recovery program example site_up. The second type of line is the word barrier.0 Notes: Format of a recovery program The first type of line contains where the event script should run and what the name of the script is. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. which is handled by group services so that other nodes can complete their processing before the next step of this recovery program. 0 Instructor Guide Uempty Instructor notes: Purpose — Show what a recovery program looks like. Details — Additional information — Transition statement — So what are the event scripts? © Copyright IBM Corp.V4. Unit 8. 1998. Events 8-15 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. down. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. complete resume_appmon migrate. error reconfig_configuration_dependency_acquire rg_temp_error_state reconfig_configuration_dependency_complete rg_acquiring_secondary reconfig_configuration_dependency_release rg_up_secondary node_up_dependency. join_interface get_disk_vg_fs rg_move. join_standby start_server. rg_move_complete get_aconn_rs rg_online. network_up_complete. rg_offline release_service_addr. site_merge_complete node_up_local_complete node_up. node_up_complete.Instructor Guide Event scripts (called by cluster manager) (called by other events) node_up_local. remote site_up. Event scripts AU548. complete rg_up. acquire.down.down_complet node_down_local.0 Notes: Event scripts This is the list of HACMP events that are managed by HACMP. server_restart © Copyright IBM Corporation 2008 Figure 8-7. site_up_complete. one or more pre-event scripts. one or more post-event scripts and an optional recovery command associated with it. swap_address_complete acquire_takeover_addr fail_standby. aconn_rs config_too_long swap_aconn_protocols reconfig_topology_start. acquiring reconfig_resource_release. The events on the right are invoked by primary or other secondary events on an as-needed basis. node_down_remote_complete down_complete acquire_aconn_service swap_adapter. The events on the left are directly called by the cluster manager or process_resources in response to unexpected happenings. complete releasing. down. takeover_addr event_error release_vg_fs. 8-16 HACMP Implementation © Copyright IBM Corp. node_up_remote_complete down_complete node_down_local_complete network_up. migrate_complete suspend_appmon external _resource_state_change server_down. remote site_merge. Each of these events can have an optional notify command. down. complete rg_error_secondary node_down_dependency. stop_server fail_interface. 1998. swap_adapter_complete acquire_service_addr swap_address. © Copyright IBM Corp. Unit 8.V4. let’s look at how that works. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — With parallel processing of resource groups the default.0 Instructor Guide Uempty Instructor notes: Purpose — List HACMP events. Events 8-17 . Loops through each returned task (JOB_TYPE): • Calls cl_RMupdate as required to update the Cluster Manager with the status change • Processes the next JOB_TYPE that the RGPA passes (via clrgpa) until all tasks in the list are completed. 8-18 HACMP Implementation © Copyright IBM Corp. . 1998. process_resources AU548. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide process_resources process_resources clrgpa Cluster Manager Cluster Status? next task RGPA Resource Manager Update RM cl_RMupdate Update RM Exit © Copyright IBM Corporation 2008 Figure 8-8. Some can be run once each event. useful for parallel processing of resources This is meant to show you that the process_resources script is responsible for interacting with the event scripts.There is one JOB_TYPE for each resource type.0 Notes: Script process_resources The script process_resources handles the calls from event scripts to the Resource Group Policy Administrator (RGPA): .out log file. You will see the JOB_TYPE in the /tmp/hacmp. Transition statement — Let’s take a look at some sample event flows that occur when the HACMP cluster starts up. Unit 8. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Events 8-19 . 1998. If a student is familiar with the serial processing of resource groups. the event scripts are called directly. © Copyright IBM Corp. just indicate that parallel processing of resource groups is the default and handled through this script.0 Instructor Guide Uempty Instructor notes: Purpose — Show at a high level the way process_resources works. Details — Additional information — Don’t spend a lot of time on this. then node_up_local and node_up_local_complete do very little processing at all. . 8-20 HACMP Implementation © Copyright IBM Corp. If there are no resource groups to start on the node.0 Notes: Startup processing Implicit in this example is the assumption that there is actually a resource group to start on the node. 1998. First node starts cluster services AU548.Instructor Guide First node starts cluster services Start Cluste r servic es 1) node_up ca lls RC clstrmgrES Event Manager cal RC ls process_resources (NONE) for each RG: process_resources (ACQUIRE) process_resources (SERVICE_LABELS) acquire_service_addr acquire_aconn_service en0 net_ether_01 process_resources (DISKS) process_resources (VGS) process_resources (LOGREDO) process_resources (FILESYSTEMS) process_resources (SYNC_VGS) process_resources (TELINIT) process_resources (NONE) < Event Summary > 2) node_up_complete for each RG: process resources (APPLICATIONS) start_server app01 process_resources (ONLINE) process_resources (NONE) < Event Summary > © Copyright IBM Corporation 2008 Figure 8-9. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — Now. let’s see what happens when a subsequent node joins the cluster. Unit 8. © Copyright IBM Corp.V4. Events 8-21 .0 Instructor Guide Uempty Instructor notes: Purpose — Show the sequence of events when the first node starts in a cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the determination is made to move an already active resource group to the new node (this is the assumption in this visual).Instructor Guide Another node joins the cluster g nin un r clstrmgrES Event Manager clstrmgrES Messages ca ll RC Event Manager t ar S t st e r s e u Cl rvic c a se ll C R call 1) node_up process_resources (NONE) or process_resources (release) 2) node_up Same sequence as node 1 up (previous visual) ll ca Figure 8-10. 1998. node_up processing on the old node “1)” must inactivate the resource group before node_up processing on the new node “2)” can acquire and activate the resource group. Another node joins the cluster Notes: Another node joins the cluster When another node starts up.0 © Copyright IBM Corp. After that. . it must first join the cluster. 8-22 HACMP Implementation RC RC 3) node_up_complete for each RG: process_resources (SYNC_VGS) process_resources (NONE) < Event Summary > © Copyright IBM Corporation 2008 4) node_up_complete for each RG: process resources (APPLICATIONS) start_server app02 process_resources (ONLINE) process_resources (NONE) < Event Summary > AU548. If that is the case. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. let’s see what happens when a node leaves a cluster voluntarily.0 Instructor Guide Uempty Instructor notes: Purpose — Show the event flow when a node joins an existing cluster. 1998. Unit 8. © Copyright IBM Corp.V4. Details — Additional information — Transition statement — Now. Events 8-23 . 1998. 8-24 HACMP Implementation © Copyright IBM Corp.Instructor Guide Node leaves the cluster (stopped) n run clstrmgrES Event Manager ing clstrmgrES Messages ll ca C R Event Manager p Sto ter s Clu vices ca ll ser1) node_down takeover RC ll ca for each RG: process_resources (RELEASE) process_resources (APPLICATIONS) stop_server app02 process_resources (FILESYSTEMS) process_resources (VGS) process_resources (SERVICE_LABELS) release_service_addr < Event Summary > 3) node_down takeover ll ca Same sequence as node up RC Figure 8-11. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Node down processing normal with takeover Implicit in this example is the assumption that there is actually a resource group on the departing node which must be moved to one of the remaining nodes. Because it is not in a position to run any events. Node leaves the cluster (stopped) RC 4) node_down_complete for each RG: process_resources (APPLICATIONS) start_server app02 process_resources (ONLINE) < Event Summary > 2) node_down_complete process_resources (OFFLINE) process_resources (SYNC_VGS) < Event Summary > © Copyright IBM Corporation 2008 AU548. Node failure The situation is only slightly different if the node on the right had failed suddenly. the calls to process_resources listed under the right hand node do not get run. . 1998. Unit 8. what did we learn so far? © Copyright IBM Corp. Details — Additional information — Transition statement — So.V4. Events 8-25 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the sequence of events when a node leaves a cluster that will still have a surviving node after the departure. node_up_complete on new node. b. node_up_complete on new node. d.Instructor Guide Let’s review 1. node_up_complete on new node node_up on existing node. node_up on existing node. node_up_complete on existing node. node_up on new node. When a node joins an existing cluster. node_up on existing node. node_up_complete on new node © Copyright IBM Corporation 2008 Figure 8-12. node_up node_up_local node_up_complete start_server Rg_up 2. node_up_complete on existing node node_up on existing node. . Which of the following are examples of primary HACMP events (select all that apply)? a. c. e. c. node_up on new node. node_up on new node. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Let’s review AU548. what is the correct sequence for these events? a. node_up_complete on existing node. d.0 Notes: 8-26 HACMP Implementation © Copyright IBM Corp. b. node_up_complete on existing node node_up on new node. node_up_complete on existing node node_up on new node. what is the correct sequence for these events? a. b. d. node_up_complete on existing node. When a node joins an existing cluster. Unit 8. node_up_complete on new node node_up on existing node. node_up on existing node. b. Which of the following are examples of primary HACMP events (select all that apply)? a. node_up_complete on existing node. d. node_up on new node. node_up_complete on existing node node_up on existing node.V4. node_up on new node. node_up_complete on new node. node_up_complete on new node. node_up node_up_local node_up_complete start_server Rg_up 2. c. c. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. node_up on new node.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Let’s review solutions 1. node_up on existing node. node_up_complete on new node © Copyright IBM Corporation 2008 Additional information — Transition statement — © Copyright IBM Corp. Events 8-27 . 1998. e. .Instructor Guide 8-28 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. What students will learn — How to customize events. © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. How this will help students on their job — Help maintain high availability.0 Instructor Guide Uempty 8.V4.2 Cluster customization Instructor topic introduction What students will do — Learn how to customize events. How students will do it — Lecture and lab. Events 8-29 . you should be able to: Describe how to customize the event flow State how to handle devices outside the control of HACMP © Copyright IBM Corporation 2008 Figure 8-13. 1998. .Instructor Guide Topic 2 objectives: Event customization After completing this topic. 8-30 HACMP Implementation © Copyright IBM Corp.0 Notes: In this topic. Topic 2 objectives: Event customization AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we examine how to customize events in HACMP. © Copyright IBM Corp. Events 8-31 . Details — State the unit objectives to the students.0 Instructor Guide Uempty Instructor notes: Purpose — Topic objectives. Unit 8. Additional information — Transition statement — More often than not. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. it is necessary to perform some degree of event customization to your cluster. 1998.V4. the HACMP Event Script).out file. then the HACMP Event itself (in other words. (If it worked. . if not then you better go look at the Problem Determination unit.Instructor Guide Event processing customization Notify Command Pre-Event Script (1) Pre-Event Script (n) Event Manager clcallev HACMP Event HACMP Event Recovery Command RC=0 ODM HACMP Classes Yes No Counter >0 No Yes “Event Error” Post-Event Script (1) Post-Event Script (n) Notify Command © Copyright IBM Corporation 2008 Figure 8-14. which is coming up later in the week. then everyone is happy.0 Notes: Event processing without customization When a decision is made to run a particular HACMP event script on a particular node.) Events are logged in the /var/hacmp/adm/cluster. the above event processing logic takes control. If no event-related cluster customization has been done on the cluster. is run and whether it works is noted.log file and the /<log_dir>/hacmp. 1998. environmental 8-32 HACMP Implementation © Copyright IBM Corp. Event processing customization AU548. Event processing with customization The rather simple procedure described in the last paragraph can be modified by the cluster configurator or administrator to deal with cluster requirements. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the HACMP event script is run again. You’ll see more in the following units..out log file that indicates that the cluster has been in “.”... The simplest way you’ll know that “Error” has occurred is an error message in the /tmp/hacmp. This recovery command is run if the HACMP event script fails. Unit 8. The message is given that way because it can vary depending on the error.rp” scripts (recovery programs) that call the event scripts. But how do you now that “Error” has occurred? We’ll talk more about that in later units and much more in the HACMP System Administration II class. This command is run once at the very start of processing the event and once again right as the last step in processing the event. or both. © Copyright IBM Corp. it retries the command. What does “Error” mean in the visual? The cluster manager expects the event scripts to complete successfully. If the number of retries expires. It won’t go on until told to go on. But this is a little ahead of the course. Each of these pre-event scripts are run after the optional notify command (if it has been configured).0 Instructor Guide Uempty issues. Note there are “. . beyond the normal scope of HACMP. When you see that.out log file. This is the oldest form of HACMP event-related customization. If the time pops.V4. When the recovery command completes. . When all of the pre-event scripts have been executed. Events 8-33 . the HACMP event script itself is executed. you must begin troubleshooting.. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Each of these are run after the HACMP event script itself completes and before the optional notify command.Each HACMP event can have zero or more post-event scripts associated with it. A time is set prior to the start of event processing. . 1998. event processing didn’t finish successfully and the error message is placed in the /tmp/hacmp. Location of event processing scripts The HACMP event scripts are stored in /usr/es/sbin/cluster/events. It is still supported in order to avoid breaking long existing clusters that rely upon it. These customization opportunities are as follows: . the cluster manager waits. Associated with each recovery command is a count of the maximum number of times that the HACMP event script might fail in a single overall attempt to run the event before HACMP should declare the failure as “not fixable by the recovery command”.Each HACMP event can have zero or more pre-event scripts associated with it.A recovery command can be specified for each HACMP event. The event scripts then might call other event scripts. If not.reconfig too long. It is not used all that often anymore because better mechanisms now exist.Each HACMP event can have a single optional Notify Command associated with it. resulting in fixing the problem and instructing the cluster manager to continue (Recover from Script Failure option in the Problem Determination SMIT panel). .Instructor Guide Instructor notes: Purpose — Review HACMP’s event customization processing capabilities. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 8-34 HACMP Implementation © Copyright IBM Corp. 1998.and post-events. Details — Additional information — Transition statement — Let’s see how we add a customized event to HACMP starting with pre. Configure Pre/Post-Event Commands Change/Show Pre-Defined HACMP Events Configure User-Defined Events Configure Pager Notification Methods Change/Show Time Until Warning F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 8-15. 1998. Events 8-35 . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. post event scripts To customize the event processing for pre. that manages custom cluster events. Adding/changing cluster events (1 of 3) AU548.0 Notes: Path to smit menu smitty hacmp -> Extended Configuration -> Extended Event Configuration pre.V4. Unit 8. © Copyright IBM Corp. We start here with the SMIT menu.and post-event scripts.0 Instructor Guide Uempty Adding/changing cluster events (1 of 3) Extended Event Configuration Move cursor to desired item and press Enter. you must first create a custom event object that points to your script. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Next. Let’s see how to do that. 8-36 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Examine the custom cluster event menu. we have to go to the menu to create our object. . This makes it easy to reuse the same custom event script for multiple HACMP events.and post-custom cluster event object In this example. 1998. Events 8-37 . Unit 8. * Cluster Event Name * Cluster Event Description * Cluster Event Script Filename [Entry Fields] [stop_printq] [stop the print queues] [/usr/local/cluster/events/stop_printq] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 8-16. we have forgotten why we wrote the script or the system administrator for the cluster has changed. This event runs a script of our own creation.0 Instructor Guide Uempty Adding/changing cluster events (2 of 3) Add a Custom Cluster Event Type or select values in entry fields. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The custom event has a description that allows us to identify what the script does when six months down the line. Adding/changing cluster events (2 of 3) AU548. rather than referenced directly by the script path. Press Enter AFTER making all desired changes. which in this case resides in /usr/local/cluster/events (a directory created for this purpose by the HACMP administrator). © Copyright IBM Corp.0 Notes: Path to smit menu smitty hacmp -> Extended Configuration -> Extended Event Configuration -> Configure Pre/Post-Event Commands -> Add a Custom Cluster Event Example of creating pre. we add a new custom cluster event called stop_printq.V4. Custom events are given a name. be executable and have the same path and name on every node. 1998. Of course. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Script considerations HACMP does not develop the script content for you. an additional requirement is that the script perform as required under all circumstances! In HACMP 5.2 and later there is a file collections feature if you wish to have your changes kept in sync. 8-38 HACMP Implementation © Copyright IBM Corp. neither does it synchronize the script content between cluster nodes (indeed the content can be different on each node). The only requirements that HACMP imposes are that the script must exist on each node in a local (non-shared) location. . © Copyright IBM Corp. Unit 8. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Events 8-39 . Details — Additional information — Transition statement — Now.0 Instructor Guide Uempty Instructor notes: Purpose — Show how a custom cluster event is defined. 1998. let’s associate this custom cluster event object with an HACMP event.V4. . [Entry Fields] Event Name Description * Event Command Notify Command Pre-event Command Post-event Command Recovery Command * Recovery Counter F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit node_down Script run after the > [/usr/es/sbin/cluster/> [] [] [stop_printq] [] [0] F3=Cancel F7=Edit Enter=Do F4=List F8=Image + + # © Copyright IBM Corporation 2008 Figure 8-17. we see our new custom event object. On this visual. 8-40 HACMP Implementation © Copyright IBM Corp.0 Notes: The path to the menu smitty hacmp -> Extended Configuration -> Extended Event Configuration -> Change/Show Pre-Defined HACMP Events -> node_down Associating a custom cluster event with the node_down event Notice that in the menu path you choose “Pre-Defined” to see the list of standard HACMP events. being added as a post event to the HACMP event script node_down. we can run more than one pre. 1998.Instructor Guide Adding/changing cluster events (3 of 3) Change/Show Cluster Events Type or select values in entry fields.and post-event script by stringing their names together in the pre. Because we are simply referencing the script by its name. Press Enter AFTER making all desired changes. Note that for the commands (other than pre and post) on this menu you need not create a custom object first--you would come directly to this menu. Adding/changing cluster events (3 of 3) AU548.or post-event script field. stop_printq. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — What about if an HACMP event script fails? © Copyright IBM Corp.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show how custom events can be added as a pre. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Unit 8.or post-event to an HACMP event script. Events 8-41 . Instructor Guide Recovery commands If an event script fails to exit 0. 8-42 HACMP Implementation © Copyright IBM Corp. logging error in /<log_dir>/hacmp. 1998. .out © Copyright IBM Corporation 2008 Figure 8-18. Recovery commands AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Recovery Commands can be executed HACMP Event Recovery Command RC=0 No Counter >0 No Yes Cluster manager waits.0 Notes: Recovery command event customization Recovery commands are another customization that can be made to recover from the failure of an HACMP event script. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. then the recovery command is not run (or not run again). Additional information — Transition statement — So. how do we add a recovery command? © Copyright IBM Corp.V4. If an event exits with a return code of 0. 1998. Events 8-43 . Details — Recovery commands run when an event script exits non zero. Unit 8. The recovery command will run a maximum of n times where n is the value of the recovery counter.0 Instructor Guide Uempty Instructor notes: Purpose — Introduce recovery commands. 0 Notes: Recovery command menu Here we see an example of a recovery command being added to the start_server event script. 8-44 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Recovery commands do not execute unless the recovery counter is > 0.Instructor Guide Adding/changing recovery commands Change/Show Cluster Events Type or select values in entry fields. This can handle an incorrect application start up. [Entry Fields] Event Name Description * Event Command Notify Command Pre-event Command Post-event Command Recovery Command • Recovery Counter start_server Script run to start a> [/usr/es/sbin/cluster/> [] [] + [] + [/usr/local/bin/recover] [3] # F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 8-19. Adding/changing recovery commands AU548. . Press Enter AFTER making all desired changes. 1998. Unit 8. Additional information — Transition statement — Let’s take a look at a few of the issues to consider when implementing pre-.V4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 1998.and recovery scripts and commands. Events 8-45 . post.0 Instructor Guide Uempty Instructor notes: Purpose — Show an example of a recovery command being added to the cluster. Details — Point out that the recovery counter must be greater than 0 for the recovery command to run if the associated HACMP event script exits non zero. the most important point to note is the last one: test your changes very carefully.and post-event scripts to all nodes. in other words. 8-46 HACMP Implementation © Copyright IBM Corp.2 and later.or recovery script/command generally becomes apparent during a fallover.Instructor Guide Points to note The execute bit must be set on all pre-. An error in a pre-.0 Notes: Test your changes Without a doubt. and recovery scripts. Points to note AU548. All scripts must declare the shell they will run in.and post-event scripts must handle non-zero exit codes.and post-event script content from one node to another. This facility is covered in more depth in the HACMP course HACMP System Administration II: Administration and Problem Determination (AU61). Your pre. you can implement the file collections feature to synchronize your scripts across the cluster. Synchronization does not copy pre. at a point in time when you can least afford it to happen! Use the CSPOC file collection facility In HACMP 5. © Copyright IBM Corporation 2008 Figure 8-20. notify. You need to copy all your pre. post. . post-. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. such as: #!/bin/ksh Test your changes very carefully because a mistake is likely to cause a fallover to abort. 1998. 0 Instructor Guide Uempty Instructor notes: Purpose — List some of the issues to consider when implementing pre-. and recovery commands. Events 8-47 . Unit 8. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — Additional information — Transition statement — What if you have to edit the HACMP event scripts? © Copyright IBM Corp.V4. post-. that monitors the health of the service IP labels. . 8-48 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the selective fallover utility moves the affected resource group. If an SNA link failure is detected. . starts a network_down event.In cases of WAN Connections failures. the Cluster Manager monitors the status of the SNA links and captures some of the types of SNA link failures. the application monitor informs the ClusterManager about the failure of the application. .Instructor Guide RG_move event and selective fallover Selective Fallover allows fallover for a resource group Cluster Manager uses rg_move event for selective fallover CSPOC can also be used to cause an rg_move event Selective fallover can happen for the following failures: – – – – NIC failures Applications Communication Links Volume groups Selective Fallover can be customized by resource group © Copyright IBM Corporation 2008 Figure 8-21. RG_Move event and selective fallover AU548.In cases of service IP label failures. using the Selective Fallover logic: . 1998. Topology Services. which causes the selective fallover of the affected resource group. the following scenarios and utilities can lead HACMP to selectively move an affected resource group. This causes the selective fallover of the affected resource group.0 Notes: Selective fallover logic In general.In cases of application failures. You can recognize that HACMP uses Selective Fallover when you identify that an rg_move event is run in the cluster.In cases of volume group failures.V4. the occurrence of the AIX error label LVM_SA_QUORCLOSE indicates that a volume group went off-line on a node in the cluster. © Copyright IBM Corp. an rg_move event is launched as a response to a resource failure.0 Instructor Guide Uempty . Events 8-49 . Remember that in each case when HACMP uses Selective Fallover. This causes the selective fallover of the affected resource group. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Unit 8. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998. Details — Additional information — Transition statement — We looked at customizing the event flow but how can we customize HACMP to make an event when a device fails? 8-50 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Discuss selective fallover and the rg_move event. and network adapters by default. networks. © Copyright IBM Corp.V4. HACMP provides a smit menu to simplify the process.0 Notes: Dealing with other failures detected by AIX Remember that HACMP natively only monitors nodes. 1998. Events 8-51 . Customizing event flow for other devices AU548.0 Instructor Guide Uempty Customizing event flow for other devices HACMP provides smit screens for managing the AIX error logging facility's error notification mechanism. that allows the administrator to map an entry in the AIX error log to a command to execute. you can use error notification methods. If you wish to monitor other devices. Disk adapters Disks CPU Other shared devices Disk subsystems © Copyright IBM Corporation 2008 Figure 8-22. Error notification is a facility of AIX. Unit 8. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Additional information — Transition statement — Let’s see how we use the HACMP menu to create an error notification method. 1998. . 8-52 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Introduce error notification. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you can monitor any error that can be logged in the AIX error log. Details — With error notification. .HACMP provides “Add a Notify Method” to handle any AIX error label that might not be detected by HACMP. Error notification within smit AU548. HACMP provides a tool to Emulate an Error Log Entry. We will look at these options in this and the subsequent visuals. . Unit 8.0 Notes: Menu path smitty hacmp -> Problem Determination Tools -> HACMP Error Notification What HACMP provides This is the smit menu that HACMP provides for managing error notification methods.HACMP provides error notification methods that you can add by selecting the option Configure Automatic Error Notification above.V4.0 Instructor Guide Uempty Error notification within smit HACMP Error Notification Move cursor to desired item and press Enter. © Copyright IBM Corp. However. . these Automatic Error Notification methods are automatically added during verification and synchronization. Events 8-53 .Finally. 1998.3 and later. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Configure Automatic Error Notification Add a Notify Method Change/Show a Notify Method Remove a Notify Method Emulate Error Log Entry F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 8-23. in HACMP 5. Details — Additional information — Transition statement — What if you don’t want the automatic error notification.Instructor Guide Instructor notes: Purpose — Outline the SMIT menu for error notification. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 8-54 HACMP Implementation © Copyright IBM Corp. Let’s look at the automatic error notification method menu. 1998. Disk type vscsi is unknown to HACMP. the Automatic Error Notify Methods are automatically added. Events 8-55 .0 Instructor Guide Uempty Configuring automatic error notification Configure Automatic Error Notification Move cursor to desired item and press Enter. © Copyright IBM Corp. If you do then after synchronization you would have to come back here to remove them again. This output highlights the fact that the virtual SCSI adapters are not recognized.0 Notes: Removing automatic error notify methods In HACMP 5. Unit 8. For HACMP nodes with only virtualized I/O resources The output you will receive when running Automatic Error Notification is as follows: rt1s1vlp5: rt1s1vlp5: rt1s1vlp6: rt1s1vlp6: Disk type vscsi is unknown to HACMP. you can come here to remove them. Configuring automatic error notification AU548.V4. Disk type vscsi is unknown to HACMP. 1998. but it is not recommended. therefore. List Error Notify Methods for Cluster Resources Add Error Notify Methods for Cluster Resources Remove Error Notify Methods for Cluster Resources F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do F8=Image © Copyright IBM Corporation 2008 Figure 8-24. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.3 and later. Disk type vscsi is unknown to HACMP. Instructor Guide Instructor notes: Purpose — Show how to enable automatic error notification methods. Details — Additional information — Transition statement — Now that we’ve turned it on, let’s see how to determine which events it applies to. 8-56 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Listing automatic error notification (non-virtual HACMP nodes) COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. [TOP] bondar: bondar: HACMP Resource bondar: bondar: hdisk0 bondar: scsi0 bondar: hdisk11 bondar: hdisk5 bondar: hdisk9 bondar: hdisk7 bondar: ssa0 hudson: hudson: HACMP Resource [MORE...9] F1=Help F8=Image n=Find Next Error Notify Method /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror Error Notify Method F3=Cancel F10=Exit F6=Command /=Find F2=Refresh F9=Shell © Copyright IBM Corporation 2008 Figure 8-25. Listing automatic error notification (non-virtual HACMP nodes) AU548.0 Notes: Listing the automatic event notification methods Here’s the full output from this screen for a sample cluster: bondar: bondar: bondar: bondar: bondar: bondar: bondar: bondar: bondar: bondar: hudson: HACMP Resource hdisk0 scsi0 hdisk11 hdisk5 hdisk9 hdisk7 ssa0 Error Notify Method /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-57 Instructor Guide hudson: hudson: hudson: hudson: hudson: hudson: hudson: hudson: hudson: HACMP Resource hdisk0 scsi0 hdisk10 hdisk4 hdisk8 hdisk6 ssa0 Error Notify Method /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror /usr/es/sbin/cluster/diag/cl_logerror 8-58 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to find out what automatic error notification does on HACMP nodes where physical I/O resources exist. Details — Point out that adapter and disk resources are protected. Additional information — Transition statement — Now, let’s look at the screen for adding your own custom notification methods. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-59 Instructor Guide Listing automatic error notification (virtual HACMP nodes) COMMAND STATUS Command: OK stdout: yes stderr: no Before command completion, additional instructions may appear below. rt1s1vlp5: rt1s1vlp5: rt1s1vlp5: rt1s1vlp5: rt1s1vlp5: rt1s1vlp6: rt1s1vlp6: rt1s1vlp6: rt1s1vlp6: rt1s1vlp6: HACMP Resource hdisk0 hdisk1 HACMP Resource hdisk0 hdisk1 F2=Refresh F9=Shell Error Notify Method /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_logerror Error Notify Method /usr/es/sbin/cluster/diag/cl_failover /usr/es/sbin/cluster/diag/cl_logerror F3=Cancel F10=Exit F6=Command /=Find F1=Help F8=Image n=Find Next © Copyright IBM Corporation 2008 Figure 8-26. Listing automatic error notification (virtual HACMP nodes) AU548.0 Notes: We already saw that there were errors when running the automatic error notification setup on HACMP nodes that have only virtual I/O resources. Here we see that it will cover the disks, but the adapters are not protected. Should you cover them? Probably not, because they’re virtual. 8-60 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the results of Automatic Error Notification on HACMP nodes with only virtual resources. Details — Additional information — Transition statement — Now, let’s look at the screen for adding your own custom notification methods. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-61 Instructor Guide Adding error notification methods Add a Notify Method Type or select values in entry fields. Press Enter AFTER making all desired changes. * Notification Object Name * Persist across system restart? Process ID for use by Notify Method Select Error Class Select Error Type Match Alertable errors? Select Error Label Resource Name Resource Class Resource Type * Notify Method [Entry Fields] [] No [] None None None [] [All] [All] [All] [] + +# + + + + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 8-27. Adding error notification methods AU548.0 Notes: Menu path smitty hacmp -> Problem Determination Tools -> HACMP Error Notification -> Add a Notify Method The error notify stanza errnotify: en_pid = 0 en_name = "" en_persistenceflg = 1 en_label = "" en_crcid = 849857919 en_class = "" en_type = "" en_alertflg = "" 8-62 HACMP Implementation This is an example of a stanza from /etc/objrepos/errnotify Notice the screen above is designed to create a stanza like this. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty en_resource = "" en_rtype = "" en_rclass = "" en_symptom = "" en_err64 = "" The last line is the command to execute en_dup = "" en_method = "/usr/lib/ras/notifymeth -l $1 -t CHECKSTOP" Parameters passed to the error notify method One or more error notification methods can be added for every error that can be in the AIX error log. The $ parameters that can be used with the en_method are: $1 Sequence Number $2 Error ID $3 Error CLASS $4 Error Type $5 Alert Flag $6 Resource Name $7 Resource Type $8 Resource Class $9 Error Label © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-63 Instructor Guide Instructor notes: Purpose — Show the smit screen for adding an error notification method. Details — Additional information — Transition statement — Let’s see how we might do some initial testing of an error notification method. 8-64 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Emulating errors (1 of 2) HACMP Error Notification Mo+--------------------------------------------------------------------------+ | Error Label to Emulate | | | | Move cursor to desired item and press Enter. | | | | [TOP] | | LVM_SA_QUORCLOSE rootvg | | LVM_SA_QUORCLOSE xwebvg | | FIRMWARE_EVENT diagela_FIRM | | PLAT_DUMP_ERR diagela_PDE | | SERVICE_EVENT diagela_SE | | INTRPPC_ERR diagela_SPUR | | FCP_ARRAY_ERR6 fcparray_err | | FCS_ERR10 fcs_err10 | | DISK_ARRAY_ERR2 ha_hdisk0_0 | | DISK_ARRAY_ERR3 ha_hdisk0_1 | | DISK_ARRAY_ERR5 ha_hdisk0_2 | | [MORE...12] | | | | F1=Help F2=Refresh F3=Cancel | | F8=Image F10=Exit Enter=Do | F1| /=Find n=Find Next | F9+--------------------------------------------------------------------------+ • Note that LVM_SA_QUORCLOSE entries exist only for mirrored volume groups © Copyright IBM Corporation 2008 Figure 8-28. Emulating errors (1 of 2) AU548.0 Notes: Menu path smitty hacmp -> Problem Determination Tools -> HACMP Error Notification -> Emulate Error Log Entry Emulating an error log entry HACMP provides a menu to allow you to emulate an error log entry. This screen shows part of the list of error labels, that is provided when the Emulate Error Log Entry is selected in the HACMP Error Notification menu (this menu appears a few foils back). We are going to generate an emulated loss of quorum on the xwebvg volume group. This will generate an example of the error LVM_SA_QUORCLOSE in the AIX error log and run the script associated with the error notification method quorum_lost. This mechanism for emulating errors allows you to do basic testing of an error notification method. If at all possible to do so without actually damaging the equipment, © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-65 Instructor Guide it would be best to cause the actual hardware error that is of concern to verify that the error notification method has been associated with the correct AIX error label. Note that the emulated error does not have the same resource name as an actual record, but otherwise passes the same arguments to the method as the actual one. 8-66 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Explain how you can test and error notification method. Details — Error notification methods can be tested, thus avoiding the need to simulate a component failure by removing a disk, an adapter or disconnecting a cable (or similar). Additional information — Transition statement — After choosing the AIX Error label, there is one more screen to navigate. Let’s take a look at that screen. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-67 Instructor Guide Emulating errors (2 of 2) Emulate Error Log Entry Type or select values in entry fields. Press Enter AFTER making all desired changes. Error Label Name Notification Object Name Notify Method [Entry Fields] LVM_SA_QUORCLOSE xwebvg /usr/es/sbin/cluster/> F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 8-29. Emulating errors (2 of 2) AU548.0 Notes: Kicking off the emulation Use this screen to start the emulation process. 8-68 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the Emulate Error Log Entry smit screen immediately prior to pressing Enter. Details — Additional information — Transition statement — So, how can we see what this does for us? © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-69 Instructor Guide What will this cause? # errpt -a --------------------------------------------------------------------------LABEL: LVM_SA_QUORCLOSE IDENTIFIER: CAD234BE Date/Time: Sequence Number: Machine Id: Node Id: Class: Type: Resource Name: Resource Class: Resource Type: Location: Fri Sep 19 13:58:05 MDT 469 000841564C00 bondar H UNKN LVDD NONE NONE Description QUORUM LOST, VOLUME GROUP CLOSING Probable Causes PHYSICAL VOLUME UNAVAILABLE Detail Data MAJOR/MINOR DEVICE NUMBER 00C9 0000 QUORUM COUNT 0 ACTIVE COUNT 0 SENSE DATA 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 --------------------------------------------------------------------------- ... and a fallover of the xwebgroup resource group to uk. © Copyright IBM Corporation 2008 Figure 8-30. What will this cause? AU548.0 Notes: Example emulated error record Here is an example of the output produced by running such an emulated event. The top of the screen is the truncated output of the error template associated with the LVM_SA_QUORCLOSE error, which gives a brief indication of the nature of the error. The output of an emulation will have the value Resource Name: EMULATE. If you are depending on this field, you have a problem testing. You might have to change your command to execute while testing via emulation. 8-70 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the output of an example error notification method. Details — This is the output of an actual AIX error. Point out that the emulated record does not agree exactly with this in the Resource name field. Additional information — Transition statement — Well, now it’s time for a checkup. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-71 Instructor Guide Checkpoint 1. Which of the following runs if an HACMP event script fails? (select all that apply) a.Pre-event scripts b.Post-event scripts c.Error notification methods d.Recovery commands e.Notify methods 2. How does an event script get started? a.Manually by an administrator b.Called by the SNMP SMUX (clsmuxpd) c.Called by the cluster manager using a recovery program d.Called by the topology services daemon 3. True or False? Pre-event scripts are automatically synchronized. 4. True or False? Writing error notification methods is a normal part of configuring a cluster. © Copyright IBM Corporation 2008 Figure 8-31. Checkpoint AU548.0 Notes: 8-72 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Let’s take a checkpoint. Details — Checkpoint solutions 1. Which of the following runs if an HACMP event script fails? (select all that apply) a.Pre-event scripts b.Post-event scripts c.Error notification methods d.Recovery commands e.Notify methods 2. How does an event script get started? a.Manually by an administrator b.Called by the SNMP SMUX (clsmuxpd) c.Called by the cluster manager using a recovery program d.Called by the topology services daemon 3. True or False? Pre-event scripts are automatically synchronized. 4. True or False? Writing error notification methods is a normal part of configuring a cluster. © Copyright IBM Corporation 2008 Additional information — Transition statement — Okay, let’s summarize what we have seen in this unit. © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-73 Instructor Guide Unit summary Having completed this unit, you should be able to: Describe what an HACMP event is Describe the sequence of events when: – The first node starts in a cluster – A new node joins an existing cluster – A node leaves a cluster voluntarily Explain what happens when HACMP processes an event Describe how to customize the event flow State how to monitor other devices © Copyright IBM Corporation 2008 Figure 8-32. Unit summary AU548.0 Notes: 8-74 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Details — Additional information — Transition statement — © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Unit 8. Events 8-75 Instructor Guide 8-76 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Unit 9. Integrating NFS into HACMP Estimated time 01:00 What this unit is about This unit covers the concepts of using Sun’s Network File System (NFS) in a highly available cluster. You learn how to configure NFS in an HACMP environment for maximum availability. What you should be able to do After completing this unit, you should be able to: • Explain the concepts of NFS • Configure HACMP to support NFS • Discuss why Volume Group major numbers must be unique when using NFS with HACMP • Outline the NFS configuration parameters for HACMP How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX, Version 5.4.1: Installation Guide SC23-4864-10 HACMP for AIX, Version 5.4.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX, Version 5.4.1: Planning Guide SC23-4862-10 HACMP for AIX, Version 5.4.1: Administration Guide SC23-5177-04 HACMP for AIX, Version 5.4.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX, Version 5.4.1: Master Glossary http://www-03.ibm.com/systems/p/library/hacmp_docs.html HACMP manuals © Copyright IBM Corp. 1998, 2008 Unit 9. Integrating NFS into HACMP 9-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit objectives After completing this unit, you should be able to: Explain the concepts of NFS Configure HACMP to support NFS Discuss why Volume Group major numbers must be unique when using NFS with HACMP Outline the NFS configuration parameters for HACMP © Copyright IBM Corporation 2008 Figure 9-1. Unit objectives AU548.0 Notes: Objectives In this unit, we examine how NFS can be integrated in to HACMP to provide a Highly Available Network File System. 9-2 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . what is NFS? © Copyright IBM Corp. Details — Additional information — Transition statement — So.V4.0 Instructor Guide Uempty Instructor notes: Purpose — State the unit objectives. Integrating NFS into HACMP 9-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 9. a file. 9-4 HACMP Implementation © Copyright IBM Corp.Instructor Guide So. . So. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. An NFS client is a recipient of a remote file service.0 Notes: NFS NFS is a suite of protocols that allow file sharing across an IP network. A system can be both an NFS client and server at the same time. An NFS server is a provider of file service (that is. 1998. what is NFS? AU548. a directory or a file system). what is NFS? The Network File System is a client/server application that lets a computer user view and optionally store and update files on a remote computer as though they were on the user's own computer NFS Client NFS mount NFS Server read-write NFS mount read-only JFS mount read-only NFS mount NFS Client and Server shared_vg © Copyright IBM Corporation 2008 Figure 9-2. 1998. © Copyright IBM Corp. 2008 Unit 9. rather it is a service that runs on top of TCP/IP and comes with AIX. Details — Point out that NFS is not part of HACMP. let’s look at how NFS works. Transition statement — Before we go any further. Additional information — Explain briefly the concept of client and server in a NFS relationship. Integrating NFS into HACMP 9-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Emphasize that HACMP can make an NFS server highly available. .V4.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the concept of NFS. 0 Notes: NFS processes The NFS server uses a process called mountd to allow remote clients to mount a local disk or CD resource across the network. run on the client to handle I/O on the client side. One or more block I/O daemons. Clients can automatically mount network file systems using the /etc/filesystems file. 9-6 HACMP Implementation © Copyright IBM Corp. . The NFS client uses the mount command to establish a mount to a remote storage resource which is offered for export by the NFS server. One or more nfsd processes handle I/O on the server side of the relationship. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. NFS background processes AU548. biod. The server maintains details of data resources offered to clients in the /etc/exports file.Instructor Guide NFS background processes NFS uses TCP/IP and a number of background processes to allow clients to access disk resource on a remote server Configuration files are used on the client and server to specify export and mount options NFS Client NFS Server n x nfsd and mountd n x biod /etc/exports /etc/filesystems NFS Client and Server n x biod n x nfsd and mountd © Copyright IBM Corporation 2008 Figure 9-3. 0 Instructor Guide Uempty Instructor notes: Purpose — Explain the background processes involved in NFS.V4. Integrating NFS into HACMP 9-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . © Copyright IBM Corp. let’s see how we can combine NFS with HACMP to achieve a highly available NFS environment. 2008 Unit 9. Details — Point out that a server can also be a client in a NFS relationship. Additional information — Transition statement — Now. In the configuration shown above. Combining NFS with HACMP Notes: Combining NFS with HACMP We can combine NFS with HACMP to achieve a Highly Available Network File System.Instructor Guide Combining NFS with HACMP NFS exports can be made highly available by using the HACMP resource group to specify NFS exports and mounts client system # mount aservice:/fsa /a The A resource group specifies: aservice as a service IP label resource /fsa as a filesystem resource (by default as part of a volume group) /fsa as an NFS filesystem to export client system sees /fsa as /a export /fsa A aservice /fsa # mount /fsa usa © Copyright IBM Corporation 2008 uk AU548. A second node is configured to take over the NFS export in the event of node failure. . There is one unusual aspect to the above configuration. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the client is free to use whatever mount point is wishes to use including. Why this example is using a local mount point of /a will become clear shortly. in fact. /fsa. 1998. One node in the cluster mounts the disk resource locally and offers that disk resource for export across the IP network. Clients optionally mount the disk resource. there is no particularly good reason why the client is using a different mount point than /fsa and.0 Figure 9-4. 9-8 HACMP Implementation © Copyright IBM Corp. The client is mounting the aservice:/fsa file system on the local mount point /a. which should be discussed. of course. This is somewhat unusual in the sense that client systems usually use a local mount point which is the same as the NFS file system’s name on the server. The HACMP cluster is exporting the /fsa file system via the aservice service IP label. This is. 1998. a precursor to the discussion of HACMP’s use of local mount points when configuring cross-mounts but that fact should not be mentioned yet. Integrating NFS into HACMP 9-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. of course. Transition statement — So. Additional information — The use of /a as the client’s local mount point is intended to get the students used to the idea that the NFS client can use a local mount point which is different than the file system name exported by the NFS server.V4. . 2008 Unit 9. what happens if the node offering the NFS export fails? © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Examine the components of a Highly Available NFS environment. Details — Point out that the NFS client is mounting aservice:/fsa on the local mount point /a but do not explain why other than to emphasize that the client is free to use whatever local mount point it wishes to use. including /a or /fsa or anything else. 9-10 HACMP Implementation © Copyright IBM Corp.Instructor Guide NFS fallover with HACMP In this scenario. Clients see NFS server not responding during fallover client system The A resource group specifies: aservice as a service IP label resource /fsa as a filesystem resource (by default as part of a volume group) /fsa as a NFS filesystem to export # mount aservice:/fsa /a client system "sees" /fsa as /a aservice /fsa export /fsa A # mount /fsa usa © Copyright IBM Corporation 2008 uk AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note that the aservice service IP label is in the resource group. The HACMP NFS server support requires that resource groups that export NFS filesystems be configured to use IPAT because the client system is not capable of dealing with two different IP addresses for its NFS server. a standby node takes over the shared disk resource. which is exporting /fsa. . depending on which node the NFS server service happens to be running on. NFS fallover with HACMP Notes: Fallover If the node offering the NFS export should fail. 1998.0 Figure 9-5. which exports /fsa. If the client was not accessing the disk resource during the period of the fallover. the resource group moves to the surviving node in the cluster. locally mounts the file system. and exports the file system or directory for remote mount. then it is not aware of the change in which node is serving the NFS export. Integrating NFS into HACMP 9-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Explain what happens when the node serving the NFS file system or directory fails. Additional information — Transition statement — Let’s see how we configure this in HACMP.V4. 2008 Unit 9. Details — Explain that the standby node acquires the shared disk resource. © Copyright IBM Corp. mounts the file system locally and then exports the file system or directory to the clients. Point out that clients might or might not be affected by this change in ownership of the NFS export. 1998. . This prevents access from a client before the filesystems are ready.Instructor Guide Configuring NFS for high availability Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields. you should also set this option.Filesystems/Directories to Export Specifies the filesystems to be NFS exported.0 Notes: Configuring NFS for high availability The visual shows the resource group attributes that are important for configuring an NFS file system. Press Enter AFTER making all desired changes.Filesystems mounted before IP configured When implementing NFS support in HACMP.10] Volume Groups Use forced varyon of volume groups. if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check Filesystems Recovery Method [Entry Fields] [aaavg] false false [] fsck sequential + + + + + + Filesystems mounted before IP configured Filesystems/Directories to Export (NFSv2/3) Filesystems/Directories to Export (NFSv4) Stable Storage Path (NFSv4) Filesystems/Directories to NFS Mount [MORE. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. 1998. . .13] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do true [/fsa] [] [] [] + + + + F4=List F8=Image © Copyright IBM Corporation 2008 Figure 9-6. . Configuring NFS for high availability AU548. . This is not necessary because this field could have been left blank to indicate 9-12 HACMP Implementation © Copyright IBM Corp..Filesystem (empty is ALL for VGs specified) This particular example also explicitly lists the /fsa filesystem as a resource to be included in the resource group (see the Filesystem (empty is ALL for VGs specified) field). [MORE... Integrating NFS into HACMP 9-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. 2008 Unit 9. 1998. Only non-concurrent access resource groups The resource group policy cannot be concurrent (On Line On All Available Nodes).0 Instructor Guide Uempty that all the filesystems in the aaavg volume group should be treated as resources within the resource group. . we can achieve a kind of concurrent access journaled filesystem between two or more nodes in the cluster. There is no need to explain cross-mounting at this stage because that is the very next foil’s topic. . Let’s see how. Details — Explain to the students each of the parameters relevant to this example. 9-14 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Explain the SMIT configuration for NFS within HACMP. Additional information — Transition statement — Within the cluster. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 9.0 Notes: Cross-mounting We can use HACMP to mount an NFS exported filesystem locally on all the nodes within the cluster. All nodes within the resource group then NFS mount the filesystem. This allows two or more nodes to have access to the same disk resource in parallel. . © Copyright IBM Corp.V4. Cross-mounting NFS filesystems (1 of 3) AU548. it is not necessary for the takeover node to unmount the filesystem before becoming the NFS server.0 Instructor Guide Uempty Cross-mounting NFS filesystems (1 of 3) A filesystem configured in a resource group can be made available to all the nodes in the resource group: – One node has the resource group and acts as an NFS server • Mounts the filesystem (/fsa) • Exports the filesystem (/fsa) – All nodes act as NFS clients • Mount the NFS filesystem (aservice:/fsa) onto a local mount point (/a) aservice /fsa /a /a acts as an NFS server (exports /fsa) acts as an NFS client # mount aservice:/fsa /a © Copyright IBM Corporation 2008 Figure 9-7. including the node that holds the resource group. Integrating NFS into HACMP 9-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. An example of such a configuration might be a shared repository for the product manuals (read only) or a shared /home filesystem (read-write). By having all nodes in the resource group act as an NFS client. 1998. One node mounts the filesystem locally. then exports the filesystem. Parallel or concurrent writes are not supported. True concurrent access Clusters wanting to have true concurrent access to the same filesystem for reading and writing purposes should use the IBM GPFS (General Parallel File System) product instead of NFS to share the filesystem across the cluster nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. applications running on the two cluster nodes should not attempt to update the same NFS served file because only one of them is likely to succeed with the other getting either stale NFS file handle problems or mysterious loss of changes made to the file. 1998. For example. This is a fundamental issue with NFS. 9-16 HACMP Implementation © Copyright IBM Corp.Instructor Guide Concurrent access limitations Although the NFS file system can be mounted read-write by multiple nodes. all of the NFS caching issues that exist with a regular NFS configuration (one not involving HACMP in any way) still exist. . NFS cross-mounts should only be used as a convenience feature for administrative purposes and such. It should probably not form part of the actual “plan of how to make the application highly available. Details — The NFS caching issues described in the student notes tend to make the use of NFS cross-mounts a less than satisfactory solution to the problem of making a filesystem simultaneously available across a cluster. 2008 Unit 9. © Copyright IBM Corp. . 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the concept of NFS cross mounts.” Additional information — Transition statement — Let’s take a look at what happens after a fallover. Integrating NFS into HACMP 9-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. This primarily consists of: . Cross-mounting NFS filesystems (2 of 3) AU548.0 Notes: Fallover with a cross-mounted file system If the left-hand node fails then HACMP on the right hand node initiates a fallover of the resource group.Assigning or aliasing (depending on which flavor of IPAT is being used) the aservice service IP label to a NIC . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .NFS exporting the /fsa filesystem Note that the right hand node already has the aservice:/fsa filesystem NFS mounted on /a.Varying on the shared volume group and mounting the /fsa journaled filesystem . 9-18 HACMP Implementation © Copyright IBM Corp.Instructor Guide Cross-mounting NFS filesystems (2 of 3) When a fallover occurs. 1998. the role of NFS server moves with the resource group All (surviving) nodes continue to be NFS clients aservice /a /fsa /a acts as an NFS client (retries until JFS returns on other node) # mount aservice:/fsa /a © Copyright IBM Corporation 2008 acts as an NFS server (exports /fsa) Figure 9-8. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Explain what happens during a fallover of a resource group that is providing highly available NFS services. . 2008 Unit 9.V4. Details — Additional information — Transition statement — Let’s take a more concrete look at cross-mounting in action. © Copyright IBM Corp. Integrating NFS into HACMP 9-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. compared to the configuration that did not use cross-mounting. is that this configuration’s resource group lists /fsa as an NFS filesystem and specifies that it is to be mounted on /a.Instructor Guide Cross-mounting NFS filesystems (3 of 3) Here is a more detailed look at what is happening: The A resource group specifies: client system aservice as a service IP label resource # mount aservice:/fsa /a /fsa as a filesystem resource client system "sees" /fsa as /a /fsa as a NFS filesystem to export /fsa as a NFS filesystem to mount on /a aservice export /fsa A /fsa # mount /fsa # mount aservice:/fsa /a usa © Copyright IBM Corporation 2008 # mount aservice:/fsa /a uk AU548.0 Figure 9-9. Cross-mounting NFS filesystems (3 of 3) Notes: Cross-mounting details The key change. . Only the node that actually has the resource group is acting as an NFS server for the /fsa filesystem. This causes every node in the resource group to act as an NFS client with aservice:/fsa mounted at /a. 9-20 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 9. Integrating NFS into HACMP 9-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Additional information — Transition statement — Clusters with multiple IP networks might need to be configured to force NFS cross-mounting traffic to flow over a particular IP network. 1998.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Explain how NFS cross-mounts work in more detail. Details — Point out the change to the resource group’s configuration which causes HACMP to use cross-mounts (refer to student notes). © Copyright IBM Corp. it might be useful to specify which network should be used by HACMP for cross-mounts This is usually done as a performance enhancement The A resource group specifies: aservice as a service IP label resource /fsa as a filesystem resource /fsa as a NFS filesystem to export /fsa as a NFS filesystem to mount on /a net_ether_01 is the network for NFS mounts net_ether_01 net_ether_02 aGservice aservice export /fsa A /fsa # mount /fsa # mount aservice:/fsa /a # mount aservice:/fsa /a usa Figure 9-10.0 Notes: Network for NFS mount HACMP allows you to specify which network should be used for NFS exports from this resource group. we have an NFS cross-mount within a cluster that has two IP networks. the cluster administrator has decided to force the cross-mount traffic to flow over the net_ether_01 network. it will seek other defined. available IP networks in the cluster on which to establish the NFS mount. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 9-22 HACMP Implementation © Copyright IBM Corp. The Service IP Labels/IP Addresses field should contain a service label which is on the network you select. 1998. For some reason. Choosing the network for cross-mounts © Copyright IBM Corporation 2008 uk AU548. probably that the net_ether_01 network is either a faster networking technology or under a lighter load. . This field is relevant only if you have filled in the Filesystems/Directories to NFS Mount field. In this scenario. If the network you have specified is unavailable when the node is attempting to NFS mount.Instructor Guide Choosing the network for cross-mounts In a cluster with multiple IP networks. 1998. 2008 Unit 9. Additional information — Transition statement — Let’s see how we would configure cross-mounting using the appropriate HACMP SMIT screen. you can select a preferred network for NFS traffic to flow across. You might do this for performance reasons.V4. .0 Instructor Guide Uempty Instructor notes: Purpose — Explain the concept of choosing a network for your NFS traffic. Details — When a cluster has two or more IP-based networks defined to HACMP and is using cross-mounting. Integrating NFS into HACMP 9-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. . Press Enter AFTER making all desired changes... Note that the resource group must include a service IP label.11] Volume Groups Use forced varyon of volume groups. if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export (NFSv2/3) Filesystems/Directories to Export (NFSv4) Stable Storage Path (NFSv4) Filesystems/Directories to NFS Mount [Entry Fields] [aaavg] false false [] fsck sequential true [/fsa] [] [] + + + + + + + + + + Network For NFS Mount [MORE.12] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 [/a. The network to be used for NFS cross-mounts is optionally specified in the Network for NFS Mount field. . which is on the net_ether_01 network (aservice in the previous foil). This rather unusual syntax is explained in the next foil. Cross-mount syntax Note the rather strange /a. 1998.Instructor Guide Configuring HACMP for cross-mounting Change/Show All Resources and Attributes for a Resource Group Type or select values in entry fields./fsa syntax for specifying the directory to be cross-mounted. [MORE. Configuring HACMP for cross-mounting AU548.. 9-24 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM./fsa] + [net_ether_01]+ F4=List F8=Image Figure 9-11.0 Notes: Configuring HACMP for cross-mounting The directory or directories to be cross-mounted are specified in the Filesystems/Directories to NFS Mount field. Additional information — Transition statement — Let’s take a closer look at the syntax for specifying cross-mounts. Integrating NFS into HACMP 9-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. 1998. 2008 Unit 9.0 Instructor Guide Uempty Instructor notes: Purpose — Show how cross-mounting is configured into an HACMP cluster.V4. . Details — Point out the Filesystems/Directories to NFS Mount field and the Network for NFS Mount field. The local mount point to be used by all the nodes in the resource group when they act as NFS clients is specified before the semi-colon./fsa What the filesystem is exported as # mount aservice:/fsa /a What HACMP does (on each node in the resource group) © Copyright IBM Corporation 2008 Figure 9-12. each node in the resource group will mount aservice:/fsa on their local /a mount point directory. Because the configuration specified in the last HACMP smit screen uses net_ether_01 for cross-mounts and the service IP label on the net_ether_01 network is aservice (see the diagram a couple of foils back showing the two IP networks).Instructor Guide Syntax for specifying cross-mounts Where the filesystem should be mounted over /a. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Syntax for specifying cross-mounts AU548. The NFS filesystem which they are to NFS mount is specified after the semi-colon. 9-26 HACMP Implementation © Copyright IBM Corp. 1998.0 Notes: Syntax for specifying cross-mounts The inclusion of a semi-colon in the Filesystems/Directories to NFS Mount field indicates that the newer (and easier to work with) approach to NFS cross-mounting described in this unit is in effect. . 2008 Unit 9.2. /a. . © Copyright IBM Corp. 1998. Transition statement — When using NFS with HACMP.V4. Integrating NFS into HACMP 9-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — As of HACMP 5. we must also ensure that a shared volume group is known by the same volume group major number across all machines in the cluster.0 Instructor Guide Uempty Instructor notes: Purpose — Explain the syntax for specifying NFS cross-mount local mount points and filesystems./fsa is the only valid mount syntax. use: # ls -l /dev/*webvg crw-rw---1 root crw-rw---1 root crw-rw---1 root system system system 201. ... 1998.200.206.202. 205. 203. for example: # importvg -V100 -y shared_vg_a hdisk2 – C-SPOC will "suggest" a VG major number which is unique across the nodes when it is used to create a shared volume group © Copyright IBM Corporation 2008 Figure 9-13.Instructor Guide Ensuring the VG major number is unique Any Volume Group that contains a filesystem that is offered for NFS export to clients or other cluster nodes must use the same VG major number on every node in the cluster – To display the current VG major numbers. 0 Sep 04 23:23 /dev/xwebvg 0 Sep 05 18:27 /dev/ywebvg 0 Sep 05 23:31 /dev/zwebvg – The command lvlstmajor will list the available major numbers for each node in the cluster For example: # lvlstmajor 43. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Ensuring the VG major number is unique AU548. This is a requirement for any volume group that has filesystems which are NFS exported to clients (either within or without the cluster). 9-28 HACMP Implementation © Copyright IBM Corp.0 Notes: VG major numbers Volume group major numbers must be the same for any given volume group across all nodes in the cluster.. – The VG major number may be set at the time of creating the VG using SMIT mkvg or by using the -V flag on the importvg command.. Explain why VG major numbers must be consistent across all nodes that share a volume group that contains filesystems which are exported. let’s examine them.V4. Additional information — Transition statement — There are some limitations of using NFS with HACMP. . Integrating NFS into HACMP 9-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Explain that VG major numbers must be consistent across nodes. Explain the output of this command. Details — Point out the lvlstmajor command. 2008 Unit 9. © Copyright IBM Corp. 1998. Specify NFS export options in /usr/es/sbin/cluster/etc/exports if you want better control HACMP only preserves NFS locks if the NFS exporting resource group has no more than two nodes.Instructor Guide NFS with HACMP considerations Some points to note... system restart or both Pathname of alternate exports file [] [-2] no both + + / [/usr/es/sbin/cluster/etc/exports] . HACMP does not use /etc/exports and the default is to export filesystems rw to the world. if you need to specify NFS options. The filesystems mounted before IP configured resource group attribute must be set to true.. 9-30 HACMP Implementation © Copyright IBM Corp. 1998.. NFS with HACMP considerations AU548. you must use the HACMP exports file.. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: HACMP exports file As mentioned in the visual. You can use AIX smit mknfsexp to build the HACMP exports file: Add a Directory to Exports List * Pathname of directory to export Anonymous UID Public filesystem? * Export directory now. 1 2 3 4 © Copyright IBM Corporation 2008 Figure 9-14. not the standard AIX exports file. .. Resource groups which export NFS filesystems must implement IPAT. any questions for me? If not. . 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Outline the limitations of using NFS with HACMP. 2008 Unit 9.V4. Additional information — Transition statement — OK. Integrating NFS into HACMP 9-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we will move on to the checkpoint questions. Details — Explain that HANFS is no longer developed or available. © Copyright IBM Corp. Instructor Guide Checkpoint 1. and /xyz is the name of the filesystem that is exported 4. . c. Which of the following is a special consideration when using HACMP to NFS export filesystems? (select all that apply) a. What does [/abc. True or False? HACMP supports all NFS export configuration options. 5. d. /abc is where the filesystem should be mounted. 2./xyz] mean when specifying a directory to crossmount? a. b. NFS exports must be read-write. Checkpoint AU548. Secure RPC must be used at all times. True or False? HACMP's NFS exporting feature supports only clusters of two nodes. © Copyright IBM Corporation 2008 Figure 9-15. 3. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. A cluster may not use NFS cross-mounts if there are client systems accessing the NFS exported filesystems. A volume group that contains filesystems that are NFS exported must have the same major device number on all cluster nodes in the resource group. True or False? IPAT is required in resource groups that export NFS filesystems. /abc is the name of the filesystem that is exported and /xyz is where it should be mounted b. 1998.0 Notes: 9-32 HACMP Implementation © Copyright IBM Corp. b. 4. and /xyz is the name of the filesystem that is exported 3. Integrating NFS into HACMP 9-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. b. Which of the following is a special consideration when using HACMP to NFS export filesystems? (select all that apply) a. A cluster may not use NFS cross-mounts if there are client systems accessing the NFS exported filesystems. **Resource groups larger than two nodes that export NFS filesystems do not provide full NFS functionality (for example./xyz] mean when specifying a directory to cross-mount? a. NFS file locks are not preserved across a fallover). /abc is the name of the filesystem that is exported and /xyz is where it should be mounted /abc is where the filesystem should be mounted. 1998. True or False? * HACMP supports all NFS export configuration options. Details — Checkpoint solutions 1. © Copyright IBM Corporation 2008 Additional information — Read the questions to the students. */usr/es/sbin/cluster/exports must be used to specify NFS export options if the default of "read write to the world" is not acceptable. True or False? ** HACMP's NFS exporting feature supports only clusters of two nodes. What does [/abc.V4. NFS exports must be read-write. Transition statement — Let’s summarize what we’ve covered. True or False? IPAT is required in resource groups that export NFS filesystems. © Copyright IBM Corp. d. Point out the correct answer and explain why it is the correct answer. Ask the students for their answer. Secure RPC must be used at all times. 5. A volume group that contains filesystems that are NFS exported must have the same major device number on all cluster nodes in the resource group. 2008 Unit 9.0 Instructor Guide Uempty Instructor notes: Purpose — Checkpoint questions. c. 2. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Unit summary Key points from this unit: HACMP provides a means to make Network File System (NFS) highly available – Configure Filesystem/Directory to Export and Filesystems mounted before IP started in resource group – VG major number must be the same on all nodes – Clients NFS mount using service address – In case of node failure. but concurrent write attempts will produce inconsistent results – Use GPFS for true concurrent access Non-default export options can be specified in /usr/es/sbin/cluster/etc/exports © Copyright IBM Corporation 2008 Figure 9-16. 1998. takeover node acquires the service address. acquires the disk resource. mounts the file system and NFS exports the file system – Clients see NFS server not responding during the fallover NFS file systems can be cross-mounted across all nodes – Faster takeover: Takeover node does not have to unmount the file system – A preferred network can be selected – Really only for read only file systems: NFS cross-mounted file systems can be mounted read-write.0 Notes: 9-34 HACMP Implementation © Copyright IBM Corp. Unit summary AU548. V4. 1998. © Copyright IBM Corp. 2008 Unit 9. . Integrating NFS into HACMP 9-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — We’re done with this unit.0 Instructor Guide Uempty Instructor notes: Purpose — Summarize the unit. .Instructor Guide 9-36 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Unit 10.Problem determination and recovery Estimated time 01:30 What this unit is about This unit describes the problem determination and recovery tools and techniques for diagnosing problems that might occur in your cluster.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX.com/systems/p/library/hacmp_docs.4.1: Master Glossary http://www-03. Version 5. you should be able to: • • • • • • List reasons why HACMP can fail Identify configuration and administration errors Explain why the Dead Man's Switch invokes Explain when the System Resource Controller kills a node Isolate and recover from failed event scripts Correctly escalate a problem to IBM support How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX. Version 5.V4. 1998.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX. Version 5.4.1: Planning Guide SC23-4862-10 HACMP for AIX.4. Version 5. Version 5.1: Administration Guide SC23-5177-04 HACMP for AIX.0 Instructor Guide Uempty Unit 10. Version 5. Problem determination and recovery 10-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.4.1: Installation Guide SC23-4864-10 HACMP for AIX. What you should be able to do After completing this unit.ibm. .4.4.html HACMP manuals © Copyright IBM Corp. and how to perform basic problem determination to recover from failure. . you should be able to: List reasons why HACMP can fail Identify configuration and administration errors List the problem determination tools available in smit Explain why the Dead Man's Switch invokes Explain when the System Resource Controller kills a node Isolate and recover from failed event scripts Correctly escalate a problem to IBM support © Copyright IBM Corporation 2008 Figure 10-1. 10-2 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Unit objectives After completing this unit. 1998. Unit objectives AU548.0 Notes: In this unit we examine some of the reasons why HACMP might fail. Problem determination and recovery 10-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. Details — State the unit objectives to the students. 2008 Unit 10. . © Copyright IBM Corp. Additional information — Transition statement — Let’s start by looking at why clusters turn bad.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Unit objectives. 0 . the cluster should remain stable. or poor design and planning. administer. 1998. Why do good clusters turn bad? © Copyright IBM Corporation 2008 Notes: Root causes Often the root cause of problems with HACMP is the absence of design and planning at the outset. a couple of hours spent in planning HACMP reaps rewards later on in terms of how easy it is to configure. As you will have now figured out. Typically. X B uk AU548. The prime reason for cluster failure when the environment is in production is administrative mistakes and an absence of change control. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Upon investigation it was 10-4 HACMP Implementation © Copyright IBM Corp. During the writing of this course. This means that provided the cluster synchronizes and starts successfully. HACMP clusters are very stable.Instructor Guide Why do good clusters turn bad? Common reasons why HACMP fails: – – – – – – A poor cluster design and lack of thorough planning Basic TCP/IP and LVM configuration problems HACMP cluster topology and resource configuration problems Absence of change management discipline in a running cluster Lack of training for staff administering the cluster Performance or capacity problems X A usa Figure 10-2. and diagnose problems with the cluster. HACMP verifies all topology and resource configuration parameters and most IP configuration parameters before synchronization takes place. a customer complained to IBM that his HACMP cluster had failed on him because a node had failed and his workload did not get taken over by the standby node. 0 Instructor Guide Uempty proven that in fact an earlier (undetected) failure had resulted in the standby node taking over the workload and a subsequent component failure resulted in a second point of failure. 1998. . How many points of failure does HACMP handle? © Copyright IBM Corp. 2008 Unit 10.V4. Problem determination and recovery 10-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. what does that mean if a failure occurs and HACMP also fails? Additional information — Transition statement — The best form of cure is prevention. HACMP is often purchased to handle a worst case scenario. robust. . which is not prone to failure. 1998. Details — Emphasize that HACMP is a very mature.Instructor Guide Instructor notes: Purpose — Outline the key reasons why HACMP might fail. so you should always test your cluster before going live and after every change. Indeed. So. 10-6 HACMP Implementation © Copyright IBM Corp. and stable product. 2008 Unit 10. test your cluster to ensure that fallover works correctly and correct your test plan if your assumptions about what will happen differ from that which HACMP actually performs (for example. shutdown -F does not cause fallover). Test your cluster before going live! AU548. Problem determination and recovery 10-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP 5. Start by taking your cluster diagram and highlighting all the things that could go wrong. Although it is recommended that testing of the cluster services be performed using Move Resource Groups. it is especially important to conduct this testing if HACMP is to be used to reduce Planned Downtime (for upgrades/maintenance) as this will be the cluster function that will be used. should not replace © Copyright IBM Corp. . Test Item Node Fallover Network Adapter Swap IP Network Failure Storage Adapter Failure Disk Failure clstrmgr daemon Killed Serial Network Failure Disk Adapter for rootvg Failure Application Failure Node re-integration Partitioned Cluster © Copyright IBM Corporation 2008 An example test plan might include: How to test Checked Figure 10-3. then write down what you expect the cluster to do in response to that failure.0 Notes: Importance of testing Every cluster should be thoroughly tested before going live. Periodically.2 and later provides a test tool. which will be discussed later in this unit. It is important that you develop and document a cluster test plan for your environment.V4. 1998. however. This method of testing.0 Instructor Guide Uempty Test your cluster before going live! Careful testing of your production cluster before going live reduces the risk of problems later. halt -q or just stop the LPAR at the HMC). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 10-8 HACMP Implementation © Copyright IBM Corp.Instructor Guide the testing of a node failure due to crash (for example. Remember that whenever you make a change to cluster configuration. then it will definitely not succeed when you actually do it. . All efforts should be made to verify application functions (user level testing) as the cluster function tests are being performed. You should always emulate a DARE change before actually doing it. 1998. If a DARE change does not succeed during emulation. test the change before putting the cluster back into production if at all possible. Verifying that the cluster functions “correctly” without verifying that the application functions correctly as part of the cluster function test is not recommended. Getting the end-user commitment is sometimes the hardest part of this process. Use of emulation You can emulate some common cluster status change events. because we cannot predict what actions their application or pre. 2008 Unit 10. what are some of the tools we can use if something goes wrong? © Copyright IBM Corp. Additional information — Transition statement — Well now that we know we should test.or post-event scripts might take. A comprehensive test plan cannot be developed for all customer scenarios. Remind the students that they can emulate DARE changes and common cluster status changed (events). .0 Instructor Guide Uempty Instructor notes: Purpose — Explain that a cluster test plan is a vital part of the final solution. 1998.V4. Problem determination and recovery 10-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Point out to the students that a test plan must be developed locally for their cluster. Instructor Guide Tools to help you diagnose a problem Most problems related to IP. LVM.debug Simple AIX and HACMP commands: df -k no -a lslv clRGinfo mount lsdev lspv cltopinfo lsfs lsvg [<ecmvg>] ifconfig clcheck_server clstat netstat -i lsvg -o © Copyright IBM Corporation 2008 Figure 10-4. Tools to help you diagnose a problem AU548.log. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Some key tools Some of the key tools to aid you in diagnosing a problem in the cluster are detailed above. which document all of the output that the HACMP event scripts generate.log. especially useful are the /<log_dir>/hacmp. cluster. clverify.log files. 10-10 HACMP Implementation © Copyright IBM Corp. Remember the documentation Useful help on errors generated by HACMP and diagnosing problems with the cluster can be found in the HACMP for AIX Administration Guide and the HACMP for AIX Troubleshooting Guide. and cluster configuration errors Tools: – – – – – Automatic Cluster Configuration Monitoring Automatic Error Correction during verify HACMP Cluster Test Tool Emulation Tools HACMP Troubleshooting manual Log files: hacmp. Also. 1998. .out and /var/hacmp/adm/cluster. and hence the commands used to diagnose them are also straightforward. Most problems are simple configuration issues. clstrmgr.out. so the problems you run into using HACMP tend to be common configuration problems and are relatively easy to diagnose. Details — Don’t get trapped into the idea that HACMP configuration problems are complex and difficult to diagnose. Problem determination and recovery 10-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. Point out to the students some examples of these commands in operation. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Outline the tools that can be used for diagnosing problems with a high-availability cluster. HACMP simply automates common AIX commands using event scripts. 2008 Unit 10. . © Copyright IBM Corp. Additional information — Transition statement — Let’s take a look at what is available from smit hacmp. 10-12 HACMP Implementation © Copyright IBM Corp. “watch” is basically a tail -f operation. This tool allows you to “watch” as well as “scan” the HACMP log files as well as set options on the /<log_dir>/hacmp. . Not covered are: . . while “scan” is to view the entire file.out file to see event summaries or to see the file in searchable HTML format.HACMP Error Notification.HACMP Trace Facility. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Tools available from the problem determination tools smit menu We will be looking at some of these tools on the following pages. . 1998. which gives the state of the cluster as long as at least one node has cluster manager services running.Instructor Guide Tools available from smit menu Problem Determination Tools Move cursor to desired item and press Enter. This was covered in Unit 8. HACMP Verification View Current State HACMP Log Viewing and Management Recover From HACMP Script Failure Restore HACMP Configuration Database from Active Configuration Release Locks Set By Dynamic Reconfiguration Clear SSA Disk Fence Registers HACMP Cluster Test Tool HACMP Trace Facility HACMP Event Emulation HACMP Error Notification Manage RSCT Services Open a SMIT Session on a Node F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image © Copyright IBM Corporation 2008 Figure 10-5. .Clear SSA Disk Fence Registers. . Tools available from smit menu AU548.Release Locks Set By Dynamic Reconfiguration. This tool executes the /usr/es/sbin/cluster/utilities/cldump command.HACMP Log Viewing and Management. . This was covered in Unit 7.Restore HACMP Configuration Database from Active Configuration.View Current State. . Details — Additional information — Use as agenda and introduction for the subsequent visuals. Problem determination and recovery 10-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Transition statement — Let’s now take a look at monitoring verification. 2008 Unit 10.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Let the smit menu for problem determination. 1998. . © Copyright IBM Corp. log. monitoring detects errors in cluster configuration.out throughout the cluster on each node that is running cluster services. Automatic cluster configuration monitoring AU548. The output of this event is logged in hacmp. Press Enter AFTER making all desired changes. 1998. Verify HACMP Configuration Configure Custom Verification Method Automatic Cluster Configuration Monitoring Automatic Cluster Configuration Monitoring Type or select values in entry fields. clverify maintains the log file /var/hacmp/log/clverify/clverify. the first node in alphabetical order runs the verification at midnight. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. By default.23) © Copyright IBM Corporation 2008 Enabled Default [00] + + +# Figure 10-6.0 Notes: How it works The clverify utility runs on one user-selectable HACMP cluster node once every 24 hours. clverify triggers a general_notification event. When automatic cluster configuration.Instructor Guide Automatic cluster configuration monitoring HACMP Verification Move cursor to desired item and press Enter. 10-14 HACMP Implementation © Copyright IBM Corp. [Entry Fields] * Automatic cluster configuration verification Node name * HOUR (00 . 2.V4. Transition statement — clverify is also run during Extended Verification and Synchronization as well as when cluster services are started. © Copyright IBM Corp. . 1998. Let’s look at the options that are available at that time. Problem determination and recovery 10-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 10. Details — Additional information — It is planned to add more about this to the AU57 HACMP Administration II: Administration and Problem Determination class.0 Instructor Guide Uempty Instructor notes: Purpose — Make students aware of a problem detection feature first introduced in HACMP 5. Synchronize or Both * Automatically correct errors found during [Entry Fields] [Both] + [No] [No] [No] [Standard] + + + + verification? * Force synchronization if verification fails? * Verify changes only? * Logging F1=Help Esc+5=Reset F2=Refresh Esc+6=Command F3=Cancel Esc+7=Edit F4=List • Also automatic synchronization during cluster start (HACMP 5. 10-16 HACMP Implementation © Copyright IBM Corp. Automatic connection AU548. 1998. Press Enter AFTER making all desired changes. The following errors are detected and fixed: .SSA concurrent volume groups need unique SSA node numbers.Instructor Guide Automatic correction HACMP Verification and Synchronization Type or select values in entry fields. you are prompted to authorize a corrective action before clverify continues error checking. You can choose to run this useful utility in one of two modes.3+) © Copyright IBM Corporation 2008 Figure 10-7. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. If you select Interactively. it takes the corrective action automatically without a prompt.The /etc/hosts file on a node does not contain all HACMP-managed IP addresses. . Automatic correction of clverify errors is not enabled by default. . * Verify.Required /etc/services entries are missing on a node. . If you select Yes.0 Notes: Autocorrection of some verification errors during verify You can run automatic corrective actions during cluster verification on an inactive cluster.HACMP shared volume group time stamps are not up to date on a node. when clverify detects a correctable condition related to importing a volume group or to exporting and re-importing mount points and filesystems. when clverify detects that any of the conditions listed as follows exists. . • There are resource groups with site policies defined. • HACMP WAN support configured and WAN software is missing. 2008 Unit 10. but no sites configured.3. . Note that the autocorrection selection will not appear if cluster services are running. • Ensure active shared volume groups are not set to auto-varyon. • Issue an error instead of the warning when a volume group that is set up for cross site mirroring does not have copies of the logical volumes at both sites. but no XD software is installed. Problem determination and recovery 10-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. • Ensure filesystems are not set to auto-mount. but the volume group has not been imported to a node. 1998. although disks are available. • There are resource groups with site policies defined. . • Disks are not accessible before the cluster startup.0 Instructor Guide Uempty . • Certain volume group settings are different. • Shared volume groups defined as auto-varyon.V4.A filesystem is not created on a node. • Resource group contains a volume group set up for cross site mirroring. • Ensure boot-time IP-Addresses are configured on the network interfaces that RSCT expects.3 In HACMP 5. Instead the top line of the menu will look like: HACMP Verification and Synchronization (Active Cluster Nodes Exist) Additional autocorrection in HACMP 5.Disks are available. • MTU sizes are different on cluster nodes. • Certain Network Options (“no” command settings) are different in cluster nodes or will be modified by RSCT during cluster startup. . © Copyright IBM Corp. • RSCT software levels are different for the same AIX levels. the following are added to verification: • Incompatibilities between network and network adapter types.3 The enhancements made to autocorrection in HACMP 5. and forced varyon is not set. Additional verification in HACMP 5.Required HACMP snmpd entries are missing on a node.3 are: • RSCT instance number synchronized properly across all nodes. but the user has not synchronized. a synchronization will be done. The smit path to disable is: smitty hacmp -> Extended Configuration -> Extended Cluster Service Settings 10-18 HACMP Implementation © Copyright IBM Corp. then the local DCD will be compared against an ACD of a running cluster node where the local node participates in the ACD's configuration. 1998. which leaves the HACMP cluster node handle field blank. The assumption is there is a valid cluster configuration on the local node the user is attempting to start. cluster services are running on a node in the cluster. This feature can be disabled such that verification and synchronization does not occur during cluster startup. If. If cluster services are not running on any node in the cluster (known to the local node). then a snapshot is made of the DCD. If the DCD and ACD do not match. and the active node's ACD will be copied to the DCD on the local node and verification will be run prior to starting cluster services. however. then verification is run. then the local cluster configuration will be synchronized to all nodes attempting to start cluster services after successfully verifying the local DCD configuration. If the DCD and ACD match. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Automatic verification during cluster start There is an additional automatic verification and correction done during cluster start: If a user attempts to start cluster services on a node on which the HACMP topology has not yet been synchronized. . you are supposed to test the cluster but not sure how.2 and later. 1998.V4. Let’s take a look at a feature that was introduced in HACMP 5. 2008 Unit 10. Details — Additional information — Transition statement — So. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Make students aware of a problem resolution tool available in HACMP 5. Problem determination and recovery 10-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.2 and later to do just that. © Copyright IBM Corporation 2008 Figure 10-8. such as when cluster services on a node fail or when a node loses connectivity to a cluster network. 1998. let it run unattended. Execute Automated Test Procedure Execute Custom Test Procedure F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image Warning: These tests are disruptive. HACMP cluster test tool AU548.0 Notes: Test tool description The Cluster Test Tool utility lets you test an HACMP cluster configuration to evaluate how a cluster operates under a set of specified circumstances. and return later to evaluate the results of your testing. You should run the tool under both low load and high load conditions to observe how system load affects your HACMP cluster. You can start a test.Instructor Guide HACMP cluster test tool HACMP Cluster Test Tool Move cursor to desired item and press Enter. to be used in the testing. such as nodes and networks. . and randomly selects cluster components. The Cluster Test Tool discovers information about the cluster configuration. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 10-20 HACMP Implementation © Copyright IBM Corp. They should not be done in production mode. this node is referred to as the control node. • General topology tests • Resource group tests on non-concurrent resource groups • Resource group tests on concurrent resource groups • Catastrophic failure test © Copyright IBM Corp. For testing purposes. Problem determination and recovery 10-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. These tests are disruptive. and stores this information in the Cluster Test Tool log file for evaluation or future reference. gathers information about the success or failure of the tests processed.0 Instructor Guide Uempty How to run the test tool You run the Cluster Test Tool from SMIT on one node in an HACMP cluster. From the control node.V4. 1998. 2008 Unit 10. . the tool runs a series of specified tests—some on other cluster nodes. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 10-22 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Make students aware of test tools. Details — Additional information — Transition statement — Let’s see what commands can be used to diagnose problems that might occur from time to time in a cluster. The clstrmgrES subsystem is always running--even if cluster services is not running. Problem determination and recovery 10-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.3.V4. An © Copyright IBM Corp. the supported command is /usr/es/sbin/cluster/utilities/clcheck_cluster grpsvcs.0 Instructor Guide Uempty Checking cluster subsystems (1 of 2) # /usr/es/sbin/cluster/utilities/clcheck_server \ clstrmgrES . so you will need to check the return code. echo $? # lssrc –ls clstrmgrES | grep state # lssrc -g cluster (Daemon only . It’s interesting to note that these cluster processes are not displayed by the command when they are inactive. 2008 Unit 10. Checking for cluster services up Starting in HACMP 5. So to check if cluster services is running. This command returns 0 (for down) or 1 (for up). This option can be changed (one subsystem at a time) using the chssys -s subystem_name -a “-D” command. . This was a display option (or probably better a non-display option) that HACMP chose to use when the subsystems were defined during the install process. you must make a distinction between the clstrmgrES subsystem and cluster services. Checking cluster processes (1 of 2) AU548.NOT Services Check) Subsystem Group PID Status clstrmgrES cluster 21032 active clinfoES cluster 21676 active Mandatory clstrmgrES Cluster Components clinfoES Optional © Copyright IBM Corporation 2008 Figure 10-9.0 Notes: clstart subsystems Listed here are the processes that are listed in the startup smit menu for HACMP. you can use the smit path: Problem Determination Tools -> View Current State. .3 is the command /usr/es/sbin/cluster/utilities/cldump.3 but is not guaranteed for the future is easier. 10-24 HACMP Implementation © Copyright IBM Corp. 1998. Another command that will give you state information in HACMP 5. Look for ST_STABLE for a prolonged period of time as an indication that cluster services has started successfully.Instructor Guide alternative command that works in HACMP 5. It is lssrc -ls clstrmgrES | grep state. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Finally. V4. © Copyright IBM Corp. 2008 Unit 10. Problem determination and recovery 10-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — This does not tell us a lot. let’s look at other subsystems that are needed by HACMP. Additional information — Transition statement — Now.0 Instructor Guide Uempty Instructor notes: Purpose — List the status of the processes that are named in the startup menu for HACMP. other than the fact that the processes are running. . 1998. 10-26 HACMP Implementation © Copyright IBM Corp. ctrmc subsystems # # lssrc –a | grep svc topsvcs grpsvcs emsvcs emaixos topsvcs grpsvcs emsvcs emsvcs 258248 434360 335994 307322 active active active active # lssrc -s clcomdES Subsystem Group clcomdES clcomdES # lssrc -s ctrmc Subsystem Group ctrmc rsct # PID 13420 Status active PID 2954 Status active © Copyright IBM Corporation 2008 Figure 10-10.Instructor Guide Checking cluster subsystems (2 of 2) Check rsct. 1998.0 Notes: Supporting subsystems Listed here are the additional processes we would expect to find running on an HACMP cluster node. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. clcomd. Checking cluster processes (2 of 2) AU548. Point out the additional processes that are part of the RSCT (Reliable Scalable Cluster Technology). we see some tests that you can use to prove communication across the networks defined to your cluster. 1998. 2008 Unit 10. Additional information — clcomd was introduced in HACMP 5. other than the fact that the processes are running. Problem determination and recovery 10-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.2 and later cluster. . Details — This does not tell us a lot.0 Instructor Guide Uempty Instructor notes: Purpose — List the processes that are running in a typical HACMP 5.1 and ct rmc was introduced in HACMP 5. © Copyright IBM Corp.V4. Transition statement — Next. Briefly remind the students of the purpose of each process.2. subnet mask) To test your non-IP networks: – Heartbeat over disk: • /usr/sbin/rsct/bin/dhb_read -p hdiskx -r • /usr/sbin/rsct/bin/dhb_read -p hdiskx -t (receive is done first) – RS232 • stty < /dev/tty# (on 2 connected nodes) – Target mode SSA network: • cat < /dev/tmssa#. execute the command. echo test > /dev/tmssa#. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Check addresses and subnet mask (netsat -i. Check the entries in the routing table on each node (netstat -rn). 1998. execute the command /usr/sbin/rsct/bin/dhb_read -p hdiskx -t • This causes both nodes to display the message “Link operating normally” to display on both nodes. For example. Testing your non-IP networks . . ifconfig).im – Do not perform these tests while HACMP is running © Copyright IBM Corporation 2008 Figure 10-11. /usr/sbin/rsct/bin/dhb_read -p hdiskx -r.For Heartbeat over Disk • On one node. • On the other connected node.tm. host node1boot1. Check names are resolvable (host). This causes the message “waiting for response” to display.0 Notes: Testing your IP network Ping between all pairs of interfaces on the same subnet. Testing your network connections AU548. 10-28 HACMP Implementation © Copyright IBM Corp.Instructor Guide Testing your network connections To test your IP network: • • • • ping (interfaces) netstat –rn (routing) host (name resolution) netstat -i and ifconfig (addresses. execute the command cat < /dev/tmssa#. 1998. © Copyright IBM Corp.V4. • On the other connected node. • This causes the tty settings to be displayed on both nodes. .For Target Mode SSA • On one node.0 Instructor Guide Uempty . This will hang at the command line.im where # is the node id of the source ssa router. execute the command stty < /dev/tty#. 2008 Unit 10. Problem determination and recovery 10-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.tm where the value of # is the node id of the target ssa router. These tests can be used to validate that network communications are functioning between cluster nodes over the defined cluster networks. execute the command echo test > \ /dev/tmssa#.For RS232 • On one node. . • This causes the word test to display on the first node. execute the command stty < /dev/tty#. • On the other connected node. . Details — Additional information — Transition statement — Let’s look now at what can happen if the cluster manager does not get enough CPU. 1998.Instructor Guide Instructor notes: Purpose — Introduce some of the tools used to test network communications. 10-30 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. There are steps that can be taken to mitigate the chances of the DMS invoking.' © Copyright IBM Corporation 2008 Figure 10-12. 1998. The dead man’s switch should not invoke if your cluster is not overloaded with I/O traffic. but often this is a result of the machine being fundamentally overloaded.0 Instructor Guide Uempty Dead man’s switch timeout 888 LED code -> possible DMS timeout. . If the dead man switch is not reset in time. it can cause a system panic and dump under certain cluster conditions.V4. This enables another node in the cluster to acquire the hung node’s resources in an orderly fashion.. 2008 Unit 10. Why? – Clstrmgr starved of CPU • Excessive I/O traffic • Excessive TCP/IP traffic over an interface Was it DMS? – – – – Copy the system dump to a file kdb on the dump file stat subcommand Look for 'HACMP dms timeout halting.. Dead man's switch timeout AU548. Problem determination and recovery 10-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp.0 Notes: Dead man’s switch The dead man’s switch (DMS) is the AIX kernel extension that halts a node when it enters a hung state that extends beyond a certain time limit. avoiding possible contention problems. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Emphasize that the purpose of the dead man’s switch is to ensure that if other cluster nodes suspect that this cluster node is dead then it is actually dead. 1998. Additional information — Transition statement — How might we avoid DMS time-outs? 10-32 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Review the behavior of the Dead Man’s Switch. Tune I/O pacing. Isolate the cause of excessive I/O or TCP/IP traffic and fix it. Buy a bigger machine © Copyright IBM Corporation 2008 Figure 10-13.. 2008 Unit 10. Avoiding dead man’s switch timeouts AU548.0 Notes: Causes of DMS timeouts Most dead man’s switch problems are the result of either an extremely overloaded cluster node or a sequence of truly bizarre cluster configuration misadventures (for example. DMS timeouts have been known to occur when the disk subsystem is sufficiently screwed up that AIX encounters difficulties accessing any disks at all).. The error label is TS_DMS_WARNING_ST and you can set an error notify method to notify you when this occurs.0 Instructor Guide Uempty Avoiding dead man’s switch timeouts Steps to avoid DMS timeout problems: 1. . Large amounts of TCP traffic over an HACMP-controlled service interface might cause AIX to experience problems when queuing and later releasing this traffic. HACMP via Topology Services produces an AIX error if the time gets close. When traffic is released.V4.. Increase the frequency of the syncd. and if that does not work. and if that does not work. and if that does not work. thus causing the Cluster Manager to issue a DMS timeout.. Problem determination and recovery 10-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.. 2.. it generates a large CPU load on the system and prevents timing-critical threads from running. 1998. Reduce the failure detection rate for the slowest network… 3. 4. 5. © Copyright IBM Corp. 10 seconds versus 8 for the diskhb network. you have two networks. the Ethernet has the longer failure detection rate. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Although we don't recommend changing the DMS time-out value. 10-34 HACMP Implementation © Copyright IBM Corp. Note that because the DMS time-out period is directly tied to failure detection rates. the custom NIM settings would be: Ethernet: Failure Cycle 16 Interval between Heartbeats (seconds) diskhb: Failure Cycle 8 Interval between Heartbeats (seconds) 1 2 This would increase the DMS timeout from 20 seconds to 32. There is no strict time-out setting.Instructor Guide The command /usr/sbin/rsct/bin/hatsdmsinfo can be used to see how often the DMS timer is being reset. If the failure detection rate is being modified to extend the DMS time-out. we are sometimes asked about how to increase the time-out period on the dead man’s switch to make it less likely that the DMS will pop and crash the node. while making the failure detection the same for both networks. or 20 seconds. so the DMS time-out is set to 2*10. for example. If. To set the DMS timeout value to 30 seconds. and a disk heartbeat network. . 1998. it is monitored by RSCT and is calculated as twice the value of the longest failure detection rate of all configured HA network in the cluster. an Ethernet. It would also increase the amount of time necessary to detect a network failure by the same amount. it is best to ensure that all networks have the same failure detection period. increasing the DMS time-out period will necessarily increase the delay before the secondary node starts to acquire resources in the event of a node failure. node hang or the loss of all network connectivity. Details — Emphasize to the students that they should not go through the steps to help prevent DMS time-out problems unless they have the DMS problem. Problem determination and recovery 10-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 2008 Unit 10. © Copyright IBM Corp.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to avoid Dead Man’s Switch timeouts. They will need to prove that the DMS is the cause of a node halting before enabling any of the steps that are aimed at mitigating DMS problems. Additional information — Transition statement — Let’s see how we can implement the performance tuning commands within HACMP. 1998. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Change/Show I/O pacing Change/Show syncd frequency F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do © Copyright IBM Corporation 2008 F8=Image Figure 10-14. 10-36 HACMP Implementation © Copyright IBM Corp.0 Notes: Extended performance tuning parameter configuration This is the menu for changing the I/O pacing and syncd frequency.Instructor Guide Setting performance tuning parameters Extended Performance Tuning Parameters Configuration Move cursor to desired item and press Enter. Setting performance tuning parameters AU548. 1998. 1998. 2008 Unit 10. Details — Additional information — Transition statement — Let’s look at setting the values for I/O pacing.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show menu to choose the performance commands. Problem determination and recovery 10-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. © Copyright IBM Corp. . The AIX default value for syncd as set in /sbin/rc.0 Notes: Setting the syncd frequency The syncd setting determines the frequency with which the I/O disk-write buffers are flushed. Change/Show syncd frequency Type or select values in entry fields. Frequent flushing of these buffers reduces the chance of dead man switch time-outs.Instructor Guide Changing the frequency of syncd • The documentation recommends a value of 10. . 10-38 HACMP Implementation © Copyright IBM Corp.boot is 60. It is recommended to change this value to 15. Start with 15. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. [Entry Fields] syncd frequency (in seconds) [15] # F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do © Copyright IBM Corporation 2008 F4=List F8=Image Figure 10-15. Press Enter AFTER making all desired changes. 1998. Enabling I/O pacing AU548. 1998. Problem determination and recovery 10-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 10.V4. .0 Instructor Guide Uempty Instructor notes: Purpose — Show how to set I/O pacing values. Details — Additional information — Transition statement — So what about changing the frequency of syncd? © Copyright IBM Corp. but consider… Change/Show I/O pacing Type or select values in entry fields. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. set as last resort For >= AIX 6. See the AIX 5L Performance Monitoring & Tuning Guide for more information on I/O pacing. an initial high-water mark of 33 and a low-water mark of 24 provides a good starting point.0 Notes: Setting the I/O pacing values Remember. [Entry Fields] HIGH water mark for pending write I/Os per file [33] +# LOW water mark for pending write I/Os per file [24] +# For <= AIX 5. These settings only slightly reduce write times and consistently generate correct fallover behavior from the HACMP software.3 – Leave as 0. 1998. . lo=4096) – See the AIX 6.and low-water marks vary from system to system. I/O pacing and other tuning parameters should only be set to values other than the defaults after a system performance analysis indicates that doing so will lead to both the desired and acceptable side effects. Although the most efficient high. Changing the frequency of syncd AU548.1 Differences Guide for more details F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 10-16. Press Enter AFTER making all desired changes. Consider changing the sensitivity of the network components in HACMP before making this system-wide change.Instructor Guide Enabling I/O pacing The HACMP documentation recommends a high water mark of 33 and a low water mark of 24. 10-40 HACMP Implementation © Copyright IBM Corp.1 – Leave at defaults (hi=8193. This should be the option of last resort. . Problem determination and recovery 10-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now. © Copyright IBM Corp.V4. 1998. what happens if clstrmgrES halts unnaturally. 2008 Unit 10.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to change the frequency of syncd. 10-42 HACMP Implementation © Copyright IBM Corp. This entry causes clexit.0 Notes: How SRC halt works The SRC looks for an entry in the /etc/objrepos/SRCnotify odm file if a subsystem is killed or crashed.term © Copyright IBM Corporation 2008 Figure 10-17. The script clexit. Another possibility is to modify the /etc/cluster/hacmp. 1998.rc to run which does a halt q by default. Avoiding SRC halts Most likely cause is untrained administrator with root privilege.term file. HACMP provides an entry for the clstrmgr.rc will call this script. which allows you to do something different than halt q. SRC halts a node AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide SRC halts a node Under what circumstances does the SRC halt a node? – The cluster manager was killed or has crashed Proving that SRC halted a node: – Check the AIX error log • Look for abnormal termination of clstrmgr daemon To avoid SRC halts in the first place: • Do not give untrained staff access to the root password • Consider modifying /etc/cluster/hacmp. . Let’s take a look at this situation. then this is most likely a result of someone having access to the root password who should not. 2008 Unit 10. 1998.0 Instructor Guide Uempty Instructor notes: Purpose — Review what happens when the SRC detects that clstrmgr is killed. Additional information — Transition statement — Another reason why nodes can halt is due to a node discovering that a partitioned cluster exists. Explain to the students what action the SRC will take in response to a kill -9 of the clstrmgrES process ID.V4. Problem determination and recovery 10-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — If the SRC halts a node. . © Copyright IBM Corp. and the other partition is shutdown by a Group Services (GS) merge from nodes in the other partition or by a node sending a GS merge to itself. a decision is made as to which partition should remain up.0 Notes: Node isolation When you have a partitioned cluster. the decision is based on which partition has the most nodes left in it. the two sides of the partition do not agree on which nodes are still members of the cluster. In clusters consisting of more than two nodes. With an equal number of nodes in each partition (as is always the case in a two-node cluster). Partitioned clusters and node isolation AU548. and that partition stays up. . If.debug file – AIX error log entry GS_DOM_MERGE_ER © Copyright IBM Corporation 2008 Figure 10-18. 10-44 HACMP Implementation © Copyright IBM Corp. which is also generally the first in alphabetical order. the node or nodes that remain up are determined by the node number (lowest node number in cluster remains). while running this or after communication is restored. 1998.Instructor Guide Partitioned clusters and node isolation When: – Heartbeats are received from a node that was marked as failed – HACMP ODM configuration is not the same on a joining node as nodes already active in the cluster – Two clusters with the same ID appear in the same logical network – The rogue recovering or joining node is halted What happens: – Group Services and clstrmgr exit on some node(s) Proving that Node Isolation caused the problem: – /tmp/clstrmgr. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the node or nodes on each side of the partition detect this and run a node_down for the node or nodes on the opposite side of the partition. 2008 Unit 10. The clstrmgr. © Copyright IBM Corp.V4. exiting" "CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs)" There is also an entry in the AIX error log “GS_DOM_MERGE_ER”. 1998. Group Services and the Cluster Manager exit. When a domain merge occurs. Problem determination and recovery 10-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Role of group services Group Services domain merge messages indicate that a node isolation problem was handled to keep the resources as highly available as possible. .debug file will contain the following error: "announcementCb: GRPSVCS announcement code=n. giving you time to later investigate the problem and its cause. that is. Details — Point out that a partitioned cluster can be detected only when it stops occurring. 1998. heartbeat begins to flow again. . one that does not have a non IP network. Bad idea! Additional information — This used to be referred to as DGSP. Transition statement — Well.Instructor Guide Instructor notes: Purpose — Review the circumstances under which a partitioned cluster can occur. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. how can you avoid a partitioned cluster? 10-46 HACMP Implementation © Copyright IBM Corp. A partitioned cluster is a poorly designed cluster. Test disabling each non-IP network and making sure this is detected by HACMP then enabling each non-IP network and ensure this is also detected. Avoiding partitioned clusters AU548. © Copyright IBM Corp. Problem determination and recovery 10-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Unit 10.V4. 1998.0 Notes: What can go wrong? A partitioned cluster can result in data divergence (two cluster nodes each gain access to half of the disks mirrors and proceed to perform updates on their halves). Avoiding the problem The best way to avoid a partitioned cluster is to install and configure one or more non-IP networks. . This is a scenario that can be extremely difficult to completely recover from because the changes made by the two nodes might be fundamentally incompatible and impossible to reconcile.0 Instructor Guide Uempty Avoiding partitioned clusters Have a non IP (serial) network Have a second non-IP network Check your non-IP networks before going live Watch for non-IP network failures in HACMP log files Do not segment your cluster's IP networks – Avoid multiple switches • Except in carefully designed highly available network configurations – Avoid bridges © Copyright IBM Corporation 2008 Figure 10-19. . Details — Additional information — Transition statement — If something does go wrong.Instructor Guide Instructor notes: Purpose — Explain how to avoid partitioned clusters. 10-48 HACMP Implementation © Copyright IBM Corp. getting the needed support information quickly will be necessary. fortunately. HACMP does this for you. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 1 and later. IBM Support data is gathered – FFDC (First Failure Data Capture) feature automatically captures diagnostic data – snap data collected after recovery from software or node failure – Can be disabled via an environment variable FFDC data saved in /tmp/ibmsupt/hacmp/ffdc. The user can disable these specific FFDC actions by setting the environment variable FFDC_COLLECTION to “disable” before starting cluster services. .out file – Max of five incidents retained – An FFDC message is displayed on screen at next Cluster Services start Also implemented for Event Failures and CONFIG_TOO_LONG error – hacmp. Automatic failure data capture AU548.0 Instructor Guide Uempty Automatic failure data capture With HACMP 5. © Copyright IBM Corp. 1998. Problem determination and recovery 10-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Uses the clsnap command under the covers – local collection only. The clsnap utility runs with the report option first to verify there is enough space.<DateTimeStamp> directory – Message logged in hacmp. 2008 Unit 10.out files from all nodes collected and saved in /tmp/ibmsupt/hacmp © Copyright IBM Corporation 2008 Figure 10-20.4.V4. Point out the directory location and the fact that it’s done after recovery of the node. this will be done automatically. Additional information — Transition statement — Now. 10-50 HACMP Implementation © Copyright IBM Corp.1 and later.4. let’s consider the case of an HACMP event script that does not finish in a reasonable time. Details — If a failure occurs. IBM Support will ask for data to be collected. . 1998.Instructor Guide Instructor notes: Purpose — Provide information on the First Failure Data Capture (FFDC) feature. With HACMP 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Check event status message AU548. © Copyright IBM Corp. .0 Notes: The config _too_long event For each cluster event that does not complete within the specified event duration time. config_too_long messages are logged in the hacmp. © Copyright IBM Corporation 2008 Figure 10-21.out file at 30-second intervals.out file and sent to the console according to the following pattern: . . 2008 Unit 10. This error can occur if an event script fails or does not complete within a customizable time period. HACMP stops processing events until you resolve this issue. or is taking too long.These messages are logged every hour until the event is complete or is terminated on that node. which by default is 360 seconds. Problem determination and recovery 10-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998.First five config_too_long messages appear in the hacmp. It means that an event script has failed. .0 Instructor Guide Uempty Check event status message Config too long message Cluster <clustername> has been running event <eventname> for # seconds.Next set of five messages appears at interval that is double the previous interval until the interval reaches one hour.V4. is hung. Please check event status. Instructor Guide Why does it happen? There are two major reasons this might happen. in which case the message is sent forever. The event script fails to complete. in which case this error message eventually stops being generated when the HACMP event script that was running finally completes. . 1. 2. 10-52 HACMP Implementation © Copyright IBM Corp. An event just takes a lot more time such as varying on a lot of disks or processing dependent resource groups. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. © Copyright IBM Corp. Details — Talk the students through the sequence of events that lead to this error message being generated and the circumstances under which it will cease issuing.0 Instructor Guide Uempty Instructor notes: Purpose — Explain why the config_too_long event runs. Additional information — Transition statement — Let’s first see how we can change the definition of too long for the case that events are just taking too long to complete. 1998. .V4. 2008 Unit 10. Problem determination and recovery 10-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Changing the timeouts Change/Show Time Until Warning Type or select values in entry fields. 1998. . 10-54 HACMP Implementation © Copyright IBM Corp. Event-only Duration (in seconds) [180] Max. Resource Group Processing Time (in seconds) [180] Total time to process a Resource Group event before a warning is displayed # # 6 minutes and 0 secon> NOTE: Changes made to this panel must be propagated to the other nodes by Verifying and Synchronizing the cluster F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure 10-22.This is the amount of time that a fast event is allowed to take.0 Notes: smit menu smit hacmp -> Extended Configuration -> Extended Event Configuration -> Change/Show Time Until Warning How to set the values Note that the timeouts are specified as two values one for “fast” events that do not involve resource group movements and a second value for “slow” events: Max. Event-only Duration and the Max. Event-only Duration (in seconds) . [Entry Fields] Max. Resource Group Processing Time (in seconds) . Resource Group Processing Time. Therefore.This is the additional amount of time to be allowed for slow events. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Press Enter AFTER making all desired changes. Changing the timeouts AU548. the amount of time for Resource Group Processing is the sum of the Max. Max. V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to change the event time-outs. 2008 Unit 10. Details — Additional information — Transition statement — Let’s see how to recover from an event failure. Problem determination and recovery 10-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. © Copyright IBM Corp. . log | more makes it easier to find when the config too long event first occurred.Instructor Guide Recovering from an event script failure 1. . Perform "Recover from Script Failure" 5. the “Please check event status” message starts to display as described on the previous visual. HACMP stops processing cluster events until the situation is resolved. Manually correct the problem and complete failed event 4.out file – go to time of first “too long” message – Use /var/hacmp/adm/cluster. The procedure The procedure is outlined in the visual above.log to find time of first message 2. Recovering from an event script failure AU548. Go backwards to find the AIX error messages 3. which is 10-56 HACMP Implementation © Copyright IBM Corp. Using the /var/hacmp/adm/cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Verify that “config too long” message stops 6. 1998.log file with the command grep EVENT /var/hacmp/adm/cluster. If a HACMP event script has actually failed. You must manually complete what the event would have done before doing recover from script failure. If the problem is that an event took too long. then the problem might soon solve itself.0 Notes: Why recovery from script failure is necessary If an event script fails or takes too long. Be sure to find the earliest AIX error message--not just the first AIX error message. /<log_dir>/hacmp. then manual intervention is required. Verify that the cluster is now working properly © Copyright IBM Corporation 2008 Figure 10-23. .out. You can also use the cluster.V4. 2008 Unit 10.0 Instructor Guide Uempty described on the next visual.log in combination with hacmp. 1998. © Copyright IBM Corp. Problem determination and recovery 10-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 10-58 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Make it clear that the failure of an HACMP event script is a major incident requiring immediate attention as the entire HACMP cluster effectively stops protecting any of the highly available applications until the issue is resolved. Additional information — Transition statement — Let’s look at the smit screen which causes HACMP to try to resume after an event script failure.Instructor Guide Instructor notes: Purpose — Explain how to deal with an HACMP event script failure. © Copyright IBM Corp.V4. Recovering from an event failure AU548. but this menu is used to allow the cluster manager to continue to the next event following an event script failure that you have identified and manually corrected.0 Instructor Guide Uempty Recovering from an event failure Problem Determination Tools Move cursor to desired item and press Enter. 2008 Unit 10.0 Notes: What this procedure does This SMIT menu entry can be used to recover from a script failure. Select the node experiencing the problem and press Enter. This does not mean that HACMP fixes problems in event scripts. HACMP Verification View Current State HACMP Log Viewing and Management Recover From HACMP Script Failure Restore HACMP Configuration Database from Active Configuration Release Locks Set By Dynamic Reconfiguration Clear SSA Disk Fence Registers HACMP Trace Facility +--------------------------------------------------------------------------+ ¦ Select a Node ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. 1998. ¦ ¦ ¦ ¦ usa ¦ ¦ uk ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure 10-24. Problem determination and recovery 10-59 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . rather it returns an unstable cluster to a stable state after manual intervention. 10-60 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Point out that this menu does not fix problems in event scripts.Instructor Guide Instructor notes: Purpose — Review how HACMP can be used to recover from a script failure. Additional information — Transition statement — Let’s take a look at a basic troubleshooting methodology. . . Problem determination and recovery 10-61 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Save copies of them very early in the troubleshooting exercise to ensure that they are not lost. Attempt to duplicate the problem While keeping in mind the importance of not making a bad situation worse by causing even more problems. Having access to all relevant cluster log files and application log files could prove very important. These log files might be overwritten while you are investigating the problem or they might be lost entirely if more hardware failures occur. A troubleshooting methodology AU548.0 Instructor Guide Uempty A troubleshooting methodology Save the log files from all available nodes as soon as possible Attempt to duplicate the problem Approach the problem methodically Distinguish between what you know and what you assume Keep an open mind Isolate the problem Go from the simple to the complex Make one change at a time Stick to a few simple troubleshooting tools Do not neglect the obvious Watch for what the cluster is not doing Keep a record of the tests you have completed © Copyright IBM Corporation 2008 Figure 10-25. © Copyright IBM Corp. 2008 Unit 10. 1998.V4. this can lead to a greater understanding of exactly what went wrong.0 Notes: Troubleshooting suggestions Save the log files from every available cluster node while they are still available Things might get much worse than they already are. it is often useful to try to duplicate the circumstances that are believed to have been in effect when the problem occurred. . Go from the simple to the complex Most problems are actually simple problems. Distinguish between what you know and what you assume It is far too easy to spend quite a while chasing down a path of inquiry that is based on a faulty assumption.Instructor Guide Approach the problem methodically Jumping around from idea to idea and just trying whatever comes to mind might be an entertaining use of your time but it is unlikely to yield a fast solution to the problem at hand. 1998. it is probably time to start to wonder about the validity of the assumption. Make one change at a time When you believe that you understand the problem. 10-62 HACMP Implementation © Copyright IBM Corp. which are each intended to eliminate some aspect of the problem. It is frequently necessary to proceed on the basis of an assumption but be sure that you understand when you are working based on an assumption. and then verify that they had the intended effect. Do not start to develop elaborate theories of what went wrong until you have demonstrated that the simpler possibilities did not cause the problem to occur. Isolate the problem Consider temporarily simplifying the cluster in order to remove elements which may be confusing the issue at hand. make small changes to the cluster. Keep an open mind Although related to the issue of knowing if you are working on the basis of an assumption or a fact. it is far easier to back out a few simple changes than to back out a long series of changes if it should turn out that your diagnosis is wrong. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Keep in mind that your simplifications may change the situation enough that the problem vanishes. then it is probably time to figure out a way to determine if the assumption is true or not (devise a test that will indicate if the assumption is valid and then perform the test). When you have spent twenty minutes to half an hour working on the basis of an assumption with no apparent progress. It means being careful to not make assumptions which are based on flimsy or non-existent evidence and it means to be on the lookout for clues that are not compatible with your current assumptions so that you are able to drop faulty assumptions more rapidly. If you spend more than about three quarters of an hour based on an assumption with still little or no apparent progress. keeping an open mind is much more than that. then your diagnosis of what is at fault might be wrong. If the small changes are not having the intended effect. This does not necessarily mean that the elements which you removed were part of the problem’s cause as their removal may simply have changed the relative timing of key events such that the bad sequence of events no longer occurs. Also. Keep a record of the tests you have completed If the problem is truly simple. 1998. untrained cluster administrators or the lack of a proper change control methodology. If the search takes longer than about fifteen minutes. If finding and fixing the problem should happen to turn into a major adventure then the ability to look back on what you did (as opposed to what you vaguely remember doing) could prove extremely useful. if the dog does not bark). an error message about a disk I/O problem or the inability to access a data file is unlikely to have anything to do with a networking problem. at least initially. Do not neglect the obvious Pay attention to the most obvious indications that you have a problem and. On the other hand. Stick to the tools that you are comfortable with but be prepared to learn new tools if it should become necessary to do so (just make sure that it is truly necessary and not just a chance to try out a new toy). trying to use tools which you are not extremely comfortable with is likely to increase the time that it takes to resolve the problem.0 Instructor Guide Uempty Stick to a few simple troubleshooting tools Although sophisticated tools are often useful and sometimes even essential. then you might be able to find it within a few minutes. it is possible that disk I/O problems have caused your non-IP target mode SSA network to fail (in other words. the problem is usually obvious but not necessarily obvious).V4. If they do not appear (in other words. Watch for what the cluster is not doing Also known as watching out for the dog that didn’t bark (a reference to Arthur Conan Doyle’s Sherlock Holmes story Silver Blaze. then your assumptions may be faulty. Problem determination and recovery 10-63 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Without a doubt. © Copyright IBM Corp. remember that many cluster problems are the result of poor cluster design. then it is probably time to start taking notes of what you are doing (also include a list of your assumptions so that you can review them later to see which ones are starting to look doubtful). . Important Finally. in which a key clue involves a dog that did not bark during the commission of the crime but would normally have been expected to do so in the situation at hand). focus on what they seem to suggest as obvious places to start. the easiest and fastest way to deal with a problem is to ensure that it cannot happen in the first place. For example. 2008 Unit 10. Watch for messages that should appear given your current assumptions. Transition statement — Okay. Details — Talk through each of the points until you are sure that the students understand them. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . so what if you have no choice but to escalate the problem to IBM? 10-64 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Introduce and explain a troubleshooting methodology that has proven to be useful in the past. Refer students to the HACMP Administration II course which gives them more experience with problem determination. Additional information — Refer to student notes for details. with all components clearly labeled A network topology diagram for the network as far as the users Copies of all HACMP log files (snap –e command) © Copyright IBM Corporation 2008 Checked Figure 10-26. The default location is /var/hacmp/log © Copyright IBM Corp.haw. It is a very good idea to collate as much of this information in advance of having a problem as is possible. Updating your planning worksheets To update your planning worksheets. especially snapshots and the cluster diagram. The file should have a name of the form name. collect the following information: Item EXACT error messages that appear in HACMP logs such as hacmp. if you are using the Online Planning Worksheets. 2008 Unit 10. If you have not already got this information assembled at your office for your existing clusters.3) to the planning using the smit path Extended Configuration -> Export Definition File for Online Planning Worksheets (or the path Extended Configuration -> Snapshot Configuration ->Convert Existing Snapshot For Online Planning Worksheets). you can now export the HACMP om (or a snapshot with HACMP 5.V4. 1998. HACMP and application software levels Details of any PTFs applied to HACMP or AIX the cluster The adapter microcode levels (especially for storage adapters) Cluster planning worksheets. . Problem determination and recovery 10-65 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Contacting IBM for support Before contacting IBM about a support issue.0 Notes: What to do when contacting IBM The visual above summarizes the steps. Contacting IBM for support AU548.out or on the console Your cluster diagram or Planning Worksheets (updated) A snapshot of your current cluster configuration (not a photo) Details of any customization performed to HACMP events Details of current AIX. you are strongly recommended to do so as soon as you get back. let’s do the checkpoint questions. 1998. Additional information — Transition statement — Okay. This information should be prepared in advance to contacting IBM and much of it can be gathered before problems occur. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 10-66 HACMP Implementation © Copyright IBM Corp. . any questions for me? If not. Details — Make it clear to students that IBM technical support is a chargeable service and that HACMP support is separate from AIX support.Instructor Guide Instructor notes: Purpose — Outline the information that should be compiled for contacting IBM support. True or False? Event emulation can emulate all cluster events. Checkpoint AU548. Marauding space aliens from another galaxy d. Cluster administrator error c. Bugs in AIX or HACMP b. . Poor/inadequate cluster design 2. e.) a. 1998. If the cluster manager process dies. b. 4. what will happen to the cluster node? a. Nobody knows because this has never happened before. Problem determination and recovery 10-67 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. d. Failure to include a nonIP network can cause the cluster to fail or malfunction in rather ugly ways. It continues running AIX but any resource groups will fallover. © Copyright IBM Corporation 2008 Figure 10-27.0 Notes: © Copyright IBM Corp. It continues running but without HACMP to monitor and protect it. Cosmic rays e. 2008 Unit 10.0 Instructor Guide Uempty Checkpoint 1. The System Resource Controller sends an e-mail to root and issue a shutdown -F. 3. c. What is the most common cause of cluster failure? (Select all that apply.V4. True or False? A non-IP network is strongly recommended. The System Resource Controller sends an e-mail to root and issue a halt -q. Instructor Guide Instructor notes: Purpose — Read the questions to the students. Get the students to state what they believe to be the correct answer. Identify the correct answer to the students and explain why it is the correct answer. Details — Checkpoint solutions What is the most common cause of cluster failure? (Select all that apply.) a. Bugs in AIX or HACMP b. Cluster administrator error c. Marauding space aliens from another galaxy d. Cosmic rays e. Poor/inadequate cluster design 2. True or False? Event emulation can emulate all cluster events. 3. If the cluster manager process dies, what will happen to the cluster node? a. It continues running but without HACMP to monitor and protect it. b. It continues running AIX but any resource groups will fallover. c. Nobody knows because this has never happened before. d. The System Resource Controller sends an e-mail to root and issue a halt -q. e. The System Resource Controller sends an e-mail to root and issue a shutdown -F. 4. True or False? A non-IP network is strongly recommended. Failure to include a nonIP network can cause the cluster to fail or malfunction in rather ugly ways. *The correct answer is almost certainly "cluster administrator error" although "poor/inadequate cluster design" would be a very close second. © Copyright IBM Corporation 2008 1. Additional information — Transition statement — And now the summary. 10-68 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide Uempty Unit summary Having completed this unit, you should be able to: List reasons why HACMP can fail Identify configuration and administration errors Explain why the Dead Man's Switch invokes Explain when the System Resource Controller will kill a node Isolate and recover from failed event scripts Correctly escalate a problem to IBM support © Copyright IBM Corporation 2008 Figure 10-28. Unit summary AU548.0 Notes: © Copyright IBM Corp. 1998, 2008 Unit 10. Problem determination and recovery 10-69 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Instructor notes: Purpose — Details — Additional information — Transition statement — 10-70 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Appendix A. Checkpoint solutions Unit 1 - Introduction to HACMP for AIX Let’s review solutions 1. Which of the following items are examples of topology components in HACMP? (Select all that apply.) a. Node b. Network c. Service IP label d. Hard disk drive True or False? All nodes in an HACMP cluster must have roughly equivalent performance characteristics. Which of the following is a characteristic of high availability? a. High availability always requires specially designed hardware components. b. High availability solutions always require manual intervention to ensure recovery following fallover. c. High availability solutions never require customization. d. High availability solutions use redundant standard equipment (no specialized hardware). True or False? A thorough design and detailed planning is required for all high availability solutions. 2. 3. 4. © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 1 - Introduction to HACMP for AIX Checkpoint solutions 1. True or False? Resource Groups can be moved from node to node. 2. True or False? HACMP/XD is a complete solution for building geographically distributed clusters. 3. Which of the following capabilities does HACMP not provide? (Select all that apply.): a. Time synchronization b. Automatic recovery from node and network adapter failure c. System Administration tasks unique to each node; back-up and restoration d. Fallover of just a single resource group 4. True or False? All nodes in a resource group must have equivalent performance characteristics. © Copyright IBM Corporation 2008 A-2 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 2 - Network considerations for high availability Let’s review: Topic 1 solutions 1. How does HACMP use networks? (Select all that apply.) a. Provide client systems with highly available access to the cluster's applications b. Detect failures c. Diagnose failures d. Communicate between cluster nodes e.Monitor network performance 2. Using information from RSCT, HACMP directly handles only three types of failures: Network interface card (NIC) failures, Node failures, and Network failures. 3. True or False? Heartbeat packets must be acknowledged or a failure is assumed to have occurred. 4. True or False? Clusters should include a non-IP network. 5. True or False? Each NIC on each physical IP network on each node is required to have an IP address on a different logical subnet. © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 2 - Network considerations for high availability Let’s review: Topic 2 solutions 1. True or False? Clusters must always be configured with a private IP network for HACMP communication. 2. Which of the following options are true statements about communication interfaces? (Select all that apply.) a.Has an IP address assigned to it using the AIX TCP/IP SMIT screens b.Might have more than one IP address associated with it c.Sometimes but not always used to communicate with clients d.Always used to communicate with clients 3. True or False? Persistent node IP labels are not supported for IPAT via IP replacement. 4. True or False? There are no exceptions to the rule that, on each node, each NIC on the same LAN must have an IP address in a different subnet. (The HACMP 5.1 heartbeat over IP aliases feature is the exception to this rule.) © Copyright IBM Corporation 2008 A-4 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 2 - Network considerations for high availability Let’s review: Topic 3 solutions 1. True or False? A single cluster can use both IPAT via IP aliasing and IPAT via IP replacement. 2. True or False? All networking technologies supported by HACMP support IPAT via IP aliasing. 3. True or False? All networking technologies supported by HACMP support IPAT via IP replacement. 4. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1 and the right hand node has NICs with the IP addresses 192.168.20.2 and 192.168.21.2, then which of the following options are valid service IP addresses if IPAT via IP aliasing is being used? (Select all that apply.) a.(192.168.20.3 and 192.168.20.4) or (192.168.21.3 and 192.168.21.4) b.192.168.20.3 and 192.168.20.4 and 192.168.21.3 and 192.168.21.4 c. 192.168.22.3 and 192.168.22.4 d.192.168.23.3 and 192.168.24.3 5. If the left node has NICs with the IP addresses 192.168.20.1 and 192.168.21.1 and the right hand node has NICs with the IP addresses 192.168.20.2 and 192.168.21.2, then which of the following options are valid service IP addresses if IPAT via IP replacement is being used? (Select all that apply.) a.(192.168.20.3 and 192.168.20.4) or (192.168.21.3 and 192.168.21.4) b.192.168.20.3, 192.168.20.4, 192.168.21.3 and 192.168.21.4 c. 192.168.22.3 and 192.168.22.4 d.192.168.23.3 and 192.168.24.3 © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 2 - Network considerations for high availability Checkpoint solutions 1. True or False? Clients are required to exit and restart their application after a fallover. 2. True or False? All client systems are potentially directly affected by the ARP cache issue. 3. True or False? clinfo must not be run both on the cluster nodes and on the client systems. 4. If clinfo is run by cluster nodes to address ARP cache issues, you must add the list of clients to ping to either the /etc/cluster/ping_client_list or the /usr/es/sbin/cluster/etc/clinfo.rc file. © Copyright IBM Corporation 2008 A-6 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 3 - Shared storage considerations for high availability Let’s review: Topic 1 solutions 1. Which of the following statements is true (select all that apply)? a. Static application data should always reside on private storage. b. Dynamic application data should always reside on shared storage. c. Shared storage must always be simultaneously accessible in read-write mode to all cluster nodes. d. Application binaries should only be placed on shared storage. 2. True or False? • Using RSCT-based shared disk protection results in slower fallovers. Ghost disks must be checked for and eliminated immediately after every cluster fallover or fallback. 3. True or False? • © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 3 - Shared storage considerations for high availability Let’s review: Topic 2 solutions 1. Which of the following disk technologies are supported by HACMP? a. b. c. d. • • SCSI SSA FC All of the above SSA disk subsystems can support RAID5 (cache-enabled) with HACMP. Compatibility must be checked when using different SSA adapters in the same loop. No special considerations are required when using SAN based storage units (DS8000, ESS, EMC HDS, and so forth). hdisk numbers must map to the same PVIDs across an entire HACMP cluster. © Copyright IBM Corporation 2008 2. True or False? 3. True or False? 4. True or False? • 5. True or False? • A-8 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 3 - Shared storage considerations for high availability Checkpoint solutions 1.True or False? Lazy update attempts to keep VGDA constructs in sync between cluster nodes (reserve/release-based shared storage protection). 2.Which of the following commands will bring a volume group online? a.getvtg <vgname> b.mountvg <vgname> c.attachvg <vgname> d.varyonvg <vgname> 3.True or False? Quorum should always be disabled on shared volume groups. 4.True or False? Filesystem and logical volume attributes cannot be changed while the cluster is operational. 5.True or False? An enhanced concurrent volume group is required for the heartbeat over disk feature. © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 4 - Planning for applications and resource groups Checkpoint solutions 1. True or False Applications are defined to HACMP in a configuration file that lists what binary to use. 2.What policies would be the best to use for a 2-node “activeactive” cluster using IPAT to minimize both applications running on the same node? a.home, next, never b.first, next, higher c.distribution, next, never d.all, error, never e.home, next, higher 3.Which type of data should not be placed in private data storage? a.Application log data b.License file c.Configuration files d.Application binaries 4.Which policy is not a Run-time policy? a.Settling b.Delayed Fallback Timer c.Dynamic Node Priority © Copyright IBM Corporation 2008 A-10 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 5 - HACMP installation Let’s review solutions 1. What is the first step in implementing a cluster? a. Order the hardware b. Plan the cluster c. Install AIX and HACMP d. Install the applications e. Take a long nap 2. True or False? HACMP 5.4.1 is compatible with any version of AIX V5.x. 3. True or False? Each cluster node must be rebooted after the HACMP software is installed. 4. True or False? You should take careful notes while you install and configure HACMP so that you know what to test when you are done. *There is some dispute about whether the correct answer is b or e although a disconcerting number of clusters are implemented in the order a, b, c, d, e (how can you possibly order the hardware if you do not yet know what you are going to build?) or even just a, c, d (cluster implementers who skip step b rarely have time for long naps). © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 5 - HACMP installation Checkpoint solutions 1. Which component detects an adapter failure? a. b. c. d. a. b. c. d. a. b. c. d. a. b. c. d. Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clsmuxpd clinfo Cluster Manager RSCT clcomd clinfo Cluster Manager RSCT clcomd clinfo © Copyright IBM Corporation 2008 2. Which component provides SNMP information? 3. Which component is required for clstat to work? 4. Which component removes requirement for the /.rhosts file? A-12 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 6 - Initial cluster configuration Checkpoint solutions 1. True or False? It is possible to configure a recommended simple two-node cluster environment using just the standard configuration path. You can’t create the non-IP network from the standard path. 2. In which of the top-level HACMP menu choices is the menu for starting and stopping cluster nodes? a. b. c. d. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools 3. In which of the top-level HACMP menu choices is the menu for defining a nonIP heartbeat network? a. b. c. d. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools It is possible to configure HACMP faster by having someone help you on the other node. 4. 5. True or False? True or False? You must specify exactly which filesystems you want mounted when you put resources into a resource group. © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 7 - Basic HACMP administration Let’s review: Topic 1 solutions 1. True or False? You cannot add a node while HACMP is running. 2. You have decided to add a third node to your existing twonode HACMP cluster. What very important step follows adding the node definition to the cluster configuration (whether through Standard or Extended Path)? a. Take a well deserved break, bragging to co-workers about your success. b. Install HACMP software. c. Configure a non-IP network. d. Start Cluster Services on the new node. e. Add a resource group for the new node. 3. Why would you choose to use the Extended Path to add resources to a resource group versus the Standard Path? If you need access to the fields that are not shown in the Standard Path (like for NFS or to set “Filesystems mounted before IP configured”). __________________________________________________ © Copyright IBM Corporation 2008 A-14 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 7 - Basic HACMP administration Let’s review: Topic 2 solutions 1. 2. 3. True or False? Using C-SPOC reduces the likelihood of an outage by reducing the likelihood that you will make a mistake. True or False? C-SPOC reduces the need for a change management process. C-SPOC cannot do which of the following administration tasks? a. b. c. d. e. f. Add a user to the cluster Change the size of a filesystem Add a physical disks to the cluster Add a shared volume groups to the cluster Synchronize existing passwords None of the above It does not matter which node in the cluster is used to initiate a C-SPOC operation. 4. 5. True or False? Which log file provides detailed output on HACMP event script execution? a. /tmp/clstrmgr.debug b. /tmp/hacmp.out c. /var/adm/cluster.log © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 7 - Basic HACMP administration Let’s review: Topic 3 solutions 1. True or False? DARE operations can be performed while the cluster is running. 2. Which operations can DARE not perform (select all that apply)? a. b. c. d. Changing the name of the cluster Removing a node from the cluster Changing a resource in a resource group Change whether a network uses IPAT via IP aliasing or via IP replacement 3. True or False? It is possible to roll back from a successful DARE operation using an automatically generated snapshot. 4. True or False? Running a DARE operation requires three separate copies of the HACMP ODM. 5. True or False? Cluster snapshots can be applied while the cluster is running. 6. What is the purpose of the dynamic reconfiguration lock? a. To prevent unauthorized access to DARE functions b. To prevent further changes being made until a DARE operation has completed c. To keep a copy of the previous configuration for easy rollback © Copyright IBM Corporation 2008 A-16 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 7 - Basic HACMP administration Checkpoint solutions 1. 2. 3. 4. True or False? A star configuration is a good choice for your non-IP networks. True or False? Using DARE, you can change from IPAT via aliasing to IPAT via replacement without stopping the cluster. True or False? RSCT will automatically update /etc/filesystems when using enhanced concurrent mode volume groups True or False? With HACMP V5.4, a resource group’s priority override location can be cancelled by selecting a destination node of Restore_Node_Priority_Order. You want to create an Enhanced Concurrent Mode Volume Group that will be used in a Resource Group that will have an “Online on Home Node” Startup policy. Which C-SPOC menu should you use? a. HACMP Logical Volume Management b. HACMP Concurrent Logical Volume Management 5. 6. You want to add a logical volume to the volume group you created in the question above. Which C-SPOC menu should you use? a. HACMP Logical Volume Management b. HACMP Concurrent Logical Volume Management © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 8 - Events Let’s review solutions 1. Which of the following are examples of primary HACMP events (select all that apply)? a. b. c. d. e. node_up node_up_local node_up_complete start_server Rg_up 2. When a node joins an existing cluster, what is the correct sequence for these events? a. b. c. d. node_up on new node, node_up on existing node, node_up_complete on new node, node_up_complete on existing node node_up on existing node, node_up on new node, node_up_complete on new node, node_up_complete on existing node node_up on new node, node_up on existing node, node_up_complete on existing node, node_up_complete on new node node_up on existing node, node_up on new node, node_up_complete on existing node, node_up_complete on new node © Copyright IBM Corporation 2008 A-18 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4.0 Instructor Guide AP Unit 8 - Events Checkpoint solutions 1. Which of the following runs if an HACMP event script fails? (select all that apply) a.Pre-event scripts b.Post-event scripts c.Error notification methods d.Recovery commands e.Notify methods 2. How does an event script get started? a.Manually by an administrator b.Called by the SNMP SMUX (clsmuxpd) c.Called by the cluster manager using a recovery program d.Called by the topology services daemon 3. True or False? Pre-event scripts are automatically synchronized. 4. True or False? Writing error notification methods is a normal part of configuring a cluster. © Copyright IBM Corporation 2008 © Copyright IBM Corp. 1998, 2008 Appendix A. Checkpoint solutions A-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Instructor Guide Unit 9 - Integrating NFS into HACMP Checkpoint solutions 1. 2. True or False? * HACMP supports all NFS export configuration options. Which of the following is a special consideration when using HACMP to NFS export filesystems? (select all that apply) a. b. c. d. NFS exports must be read-write. Secure RPC must be used at all times. A cluster may not use NFS cross-mounts if there are client systems accessing the NFS exported filesystems. A volume group that contains filesystems that are NFS exported must have the same major device number on all cluster nodes in the resource group. /abc is the name of the filesystem that is exported and /xyz is where it should be mounted /abc is where the filesystem should be mounted, and /xyz is the name of the filesystem that is exported 3. What does [/abc;/xyz] mean when specifying a directory to cross-mount? a. b. 4. 5. True or False? ** HACMP's NFS exporting feature supports only clusters of two nodes. True or False? IPAT is required in resource groups that export NFS filesystems. */usr/es/sbin/cluster/exports must be used to specify NFS export options if the default of "read write to the world" is not acceptable. **Resource groups larger than two nodes that export NFS filesystems do not provide full NFS functionality (for example, NFS file locks are not preserved across a fallover). © Copyright IBM Corporation 2008 A-20 HACMP Implementation © Copyright IBM Corp. 1998, 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Problem determination and recovery Checkpoint solutions What is the most common cause of cluster failure? (Select all that apply. Cosmic rays e. It continues running but without HACMP to monitor and protect it. 1998. e. 3. what will happen to the cluster node? a. True or False? A non-IP network is strongly recommended. If the cluster manager process dies.V4. It continues running AIX but any resource groups will fallover. 4. Nobody knows because this has never happened before. © Copyright IBM Corporation 2008 1. True or False? Event emulation can emulate all cluster events. *The correct answer is almost certainly "cluster administrator error" although "poor/inadequate cluster design" would be a very close second. The System Resource Controller sends an e-mail to root and issue a halt -q. d. Bugs in AIX or HACMP b. Poor/inadequate cluster design 2. Marauding space aliens from another galaxy d. Failure to include a nonIP network can cause the cluster to fail or malfunction in rather ugly ways. b.0 Instructor Guide AP Unit 10 . 2008 Appendix A. Checkpoint solutions A-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Cluster administrator error c. . c.) a. The System Resource Controller sends an e-mail to root and issue a shutdown -F. © Copyright IBM Corp. Each service IP address cannot be in any non-service address subnet 2. the service IP address replaces the IP address of a NIC with an IP address in the same subnet as the subnet of the service IP address. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you must enable and complete the ALTERNATE ETHERNET address field in the SMIT devices menu. True or False? You must stop the cluster in order to change from IPAT via aliasing to IPAT via replacement. 1998. For IPAT via replacement (select all that apply) a.IPAT via IP replacement Checkpoint solutions 1. Each service IP address must be in the same subnet c. 3. 4. .Instructor Guide Appendix C . True or False? If the takeover node is not the home node for the resource group and the resource group does not have a Startup policy of Online Using Distribution Policy. Each service IP address must be in the same subnet as one of the non-service addresses b. © Copyright IBM Corporation 2008 A-22 HACMP Implementation © Copyright IBM Corp. True or False? In order to use HWAT. 4.4. 1998.4. 2007 Last updated.V4. 2008 Appendix B.1. November.1 B-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. By using HACMP in combination with AIX WPAR. The following topics are discussed: • Enhancements of the HACMP software • Installation and Migration Notes • HACMP Configuration Restrictions • Notes on Functionality • Required Release of AIX 5L for HACMP 5. Release Notes for HACMP 5.1 ==================================================================== Release Notes for IBM High Availability Cluster Multi-Processing (HACMP) for AIX 5L. 09/20/2007 ==================================================================== These Release Notes contain the latest information about the HACMP software.1 Enhancements ------------------------------------ • Integrated support for utilizing AIX Workload Partition (WPAR) to maintain high availability for your applications by configuring them as a resource group and assigning the resource group to an AIX WPAR. Release Notes for HACMP 5.4.1 • HACMP Configuration Restrictions • HACMP 5.4. .0 Instructor Guide AP Appendix B. you can leverage the advantages of application environment isolation and resource © Copyright IBM Corp. Release 5.1 Documentation • Product Directories Loaded • Product Man Pages • Accessing IBM on the Web • Feedback ========================================== Enhancements of the HACMP Software ========================================== -----------------------------------5.4. 4 Installation and Administration Guide or the HACMP for Linux 5. HACMP/XD responds to PPRC consistency group failures by automatically freezing the pairs and managing the data mirroring.4. 1998. Improvements to First Failure Data Capture and additional standardized logging increase the reliability and serviceability of HACMP 5.1 introduces new features for detecting a partitioned cluster and avoiding data divergence through earlier detection and reporting.Start cluster services without stopping applications. in the worse case. These monitors enable you to keep better track of the status of your application data when using the HACMP/XD GLVM option for data replication. -----------------------------------5. see the HACMP for Linux 5.0 Enhancements ------------------------------------ • HACMP for Linux 5.4 release notes.4.Instructor Guide control provided by AIX WPAR along with the high availability feature of HACMP V5. see the Smart Assist guides or the Release Notes for HACMP for AIX 5L version 5. HACMP can support both NFS V4 and V2/V3 within the same high availability environment. • New options for detecting and responding to a partitioned cluster. . New log files have been added. which include the ability to customize the color and appearance of the display.4 For more information.4. which includes additional configuration options.1. can lead to data divergence (out of sync data between the primary and backup nodes in a cluster).1. • Improved management of Stopping and Starting HACMP Cluster Services: . • Serviceability Improvements for HACMP. Certain failures or combinations of failures can lead to a partitioned cluster. B-2 HACMP Implementation © Copyright IBM Corp. • Usability improvements for the WebSMIT Graphical User Interface. • HACMP/XD support of PPRC Consistency Groups to maintain data consistency for application-dependent writes on the same logical subsystem (LSS) pair or across multiple LSS pairs. which.4. • Improved support for NFS V4. For more information. as well as improved recovery time. HACMP V5.4 Smart Assists. The default locations of all managed log files have been moved to a subdirectory of /var/hacmp. • A new Geographical Logical Volume Manager (GLVM) Status Monitor that provide the ability to monitor GLVM status and state. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. • HACMP Smart Assist Programs now support Automatic Discovery. This is also referred to as nondisruptive startup. Easier to move the resource groups for cluster management.Messages have been reformatted for consistency and to remove repetitious entries. resource groups. • Instead of a forced down.Improved SMIT interface.0 Instructor Guide AP . The cluster services are stopped. .The final verification report lists any nodes. you can move it without setting the Priority Override Location (POL) for the node to which it was moved.Volume group verification checks have been restructured for faster processing.4.Terminology that describes stopping cluster services has changed: • Instead of stopping cluster services gracefully. networks and/or network interfaces that are in the 'failed' state at the time that cluster verification is run.New Verification checks: • Can each node reach each other node in the cluster through non-IP connections? • Are netmasks and broadcast addresses valid? © Copyright IBM Corp. move them to other nodes. . . Release Notes for HACMP 5.Stop cluster services and also bring the resources and applications offline. . . • Instead of stopping cluster services gracefully with takeover.4. this option is known as stopping cluster services and moving the resource groups to other nodes. .Clear method to maintain the previously configured behavior for a resource group.Improved handling of non-concurrent resource groups with No Fallback resource group and site policies. . if accessible from the Cluster Manager. sites. .When you move a resource group. .V4. . POL is a setting you had to specify for manually moved resource groups in releases prior to HACMP 5.Start and restart cluster services automatically according to how you define the resources.1 B-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. or keep them running on the same nodes (but stop managing them for high availability).Improved status and troubleshooting with WebSMIT and clRGinfo. • Resource Group Management (clRGmove) improvements . • Verification enhancements . 2008 Appendix B. this option is known as stopping cluster services and bringing resource groups offline. this option is known as stopping cluster services immediately and placing resource groups in an unmanaged state. and application monitors that are in the suspended state. such as applications. 1998. . The final verification report also lists other 'failed' components. This option leaves resource groups on the local node active. . 1998. • Cluster Test Tool Enhancements . and node information .6 . • GEO_primary and GEO_secondary networks (HAGEO) .Graphical representation of resource groups and their dependencies .Ability to view the cluster configuration and the cluster status simultaneously .4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. and volume group tests.automatically changed to XD Networks when you upgrade to HACMP 5.Graphical representation of cluster site. IP network tests.IP Address Takeover (IPAT) via IP Aliases (default) on XD networks • High Availability Cluster Multi-Processing Extended Distance (HACMP/XD) for Metro Mirror .Supports more cluster events.Ability to navigate the running cluster . and extends the logic in the automated test tool to run these test plans as appropriate based on the cluster configuration.3 for AIX and FireFox 1.Enhanced Concurrent Volume Groups within a site . . network.Ability to specify a group of users who can view but not modify the configuration.0.New WebSMIT framework for the user interface . You can now use an intermix of supported DS units. non-IP network tests.Full support for Mozilla-based browsers Mozilla 1. It is disabled by default. See Chapter 13 in the Administration Guide for information on how to enable it.The ability to have up to four XD_data data mirroring networks improves reliability and mirroring performance in an HACMP cluster .7.Has specific test plans for running site tests.Instructor Guide • Are all Volume Groups and PVIDs on the vpath devices? • Is the distribution preference “collocation or anti-collocation with persistent label” used when persistent labels have not been defined? • WebSMIT Application .Assisted WebSMIT set up .Fast failure detection can be turned on in SMIT. • Geographic Distance Capability Enhancements for clusters with HACMP/XD for GLVM (See separate XD Release Notes for full description) .Increased data availability for IBM TotalStorage Enterprise Storage Server (DS and ESS) volumes that use Peer-to-Peer Remote Copy (PPRC) to copy data to a remote site for disaster recovery purposes. B-4 HACMP Implementation © Copyright IBM Corp.Improves the speed and reliability of the detection of a node failure. . • Fast Failure Detection Method Enhancements . 0 Instructor Guide AP . that is located at this URL: http://www.1 is a modification release.html Once you accept the license agreement.4. run the following command from the AIX 5L command line: java -jar worksheets.V4.You can configure multiple XD_rs232 networks for cluster heartbeating.4 is the first edition of a new manual. . worksheets. .1 migration restriction ------------------------------------ HACMP 5. Do not mix base-level filesets on some nodes and update (ptf) images on others in the same cluster. Release Notes for HACMP 5.com/systems/p/ha/ha_olpw.4.4. There are both base-level filesets and update (ptf) images. locate the worksheets.4. Or. ==================================== Installation and Migration Notes ==================================== • For HACMP version 5. planning and installation information is split into two separate guides: the Planning Guide and the Installation Guide. ------------------------------------------------------------------------- © Copyright IBM Corp.ibm. • Methods of installation and migration supported in previous releases of HACMP are still supported.1 on an individual node using rolling migration.jar file and click on it. • HACMP for Linux: Installation and Administration Guide v5. while your critical applications and resources continue running on that node although they will not be highly available during the upgrade. 1998. 2008 Appendix B.jar • You can apply a PTF to HACMP 5. • The Online Planning Worksheets (OLPW) application is now available for download from the installable image.jar. -----------------------------------5. IPAT via IP Aliases is not allowed for HAGEO.1 B-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.You can configure multiple XD_data networks for the mirroring function.4. Users should use a consistent method for upgrading their HACMP cluster nodes. NFSv4 exports will fail to export with the misleading error message exportfs: <export_path>: No such file or directory The following commands enable grace periods and restart the NFS daemon. .nfs.0 necessitated leaving the cluster manager daemon in an online state (there were multiple motivations for this change in behavior-one was that it was required to allow Enhanced Concurrent Mode Volume Groups to remain online). The modifications to this feature in HACMP 5. This step is required.Instructor Guide Enable grace periods and restart nfsd after installing cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Consequently.nfs. The NFS daemon nfsd must be restarted on each cluster node with grace periods enabled after installing cluster. this operation did bring the cluster manager daemon to a “stopped” state and this was reflected by clstat showing the cluster node's status as DOWN. otherwise.4.4.0. ----------------------------------------------------Clstat cluster node status for 'forced down' nodes ----------------------------------------------------- The behavior of stopping a cluster node with the option to unmanage resource groups (previously known as the force option) was significantly modified with HACMP 5.0 release.0. You may ignore this note if all of your cluster nodes are 32 bit. chnfs -I -g on -x stopsrc -s nfsd startsrc -s nfsd Please note that this will impact the availability of all exported filesystems on the machine. therefore the best time to perform this step is when all resource groups with NFS exports are offline or failed over to another node in the resource group.4. Prior to the HACMP 5.4.es.rte ------------------------------------------------------------------------- This note applies only to 64-bit systems. 1998. clstat run on an HACMP 5.es.4. B-6 HACMP Implementation © Copyright IBM Corp.rte before configuring NFSv4 exports.0 or later cluster will display such cluster node's status as UP instead of DOWN as they were displayed before HACMP 5. 1 ============================================== AIX 5L 5.4. For example. where some nodes are pre.4 node.0 Instructor Guide AP One other thing to keep in mind is that when migrating a cluster from before HACMP 5.1 B-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.5 (APAR IY84920) or higher AIX 5L 6.1 cluster node. start it from a “downlevel” node of 5.2 ML8 with RSCT version 2. If you are upgrading and have nodes that are 5. you must start the 5.4 node.9 (APAR IY84921) or higher AIX 5L 5.4.4. .3.4.3.3 APAR IY85489 to avoid having to start a 5.0 or later.0 or later. when you have upgraded any node to HACMP 5.0 to HACMP 5.1 with RSCT version 2. -----------------------------------IMPORTANT NOTE ON UPGRADING ------------------------------------ Install the HACMP 5.3 node from a 5.5.4.4 nodes are active.0 cluster node will show those same nodes as DOWN.4. ============================================== Required Release of AIX 5L for HACMP 5. Unless you have this APAR. it will display all forced down nodes as UP whereas running the same clstat command on an HACMP 5. clstat run on cluster node A will display cluster node B's status following the conventions of cluster node A.0 and others are 5.0 or higher ============================================== HACMP Configuration Restrictions ============================================== HACMP configuration restrictions remain the same as in previous releases and are as follows: • Maximum nodes per cluster: 32 • Maximum number of sites: 2 • Minimum number of nodes per site: 1 © Copyright IBM Corp. if you need to start a 5.5.2 or earlier node. if clstat is run on a 5.4.3 node from a 5.2 or earlier and must start the 5. Release Notes for HACMP 5.4.2 or lower.3 ML4 with RSCT version 2. 2008 Appendix B.4.V4. 1998.3 node while any 5. HACMP updates SNMP during installation. .Instructor Guide ====================== Notes on Functionality ====================== • Fast failure detection . 1998. DS6000. =================================== HACMP 5.The fast failure detection function is restricted to use with DS8000. and DS4000 disk types. • SNMP .4.1 Documentation =================================== -------------------------------Order Numbers and Document Names -------------------------------Order numbers for 5. ES 800.4. This will require that you restart any SNMP client applications that communicate with the node on which you are installing HACMP. SVC. Therefore. HACMP stops and starts the SNMP daemon during installation and deinstallation.1 documentation are as follows: Concepts and Facilities Guide Planning Guide Installation Guide Administration Guide Troubleshooting Guide Master Glossary Programming Client Applications HACMP/XD: Metro Mirror Planning and Administration Guide HACMP/XD GLVM Planning and Administration Guide Smart Assist for WebSphere User's Guide Smart Assist for Oracle B-8 HACMP Implementation SC23-4864-10 SC23-4861-10 SC23-5209-01 SC23-4862-10 SC23-5177-04 SC23-4867-09 SC23-4865-10 SC23-4863-11 SA23-1338-06 SC23-4877-08 SC23-5178-04 © Copyright IBM Corp. This function is not supported on SSA disks. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4. 4.U. Image cluster.4. Select the CD-ROM drive from the picklist and press Enter. press the F4 key.html HAES Web-based HTML Documentation .doc.es.en_US. You may want to install the documentation before doing the full install of the product. to read the chapters on installation procedures or the description of migration.0 Instructor Guide AP Smart Assist for DB2 Smart Assist Developer's Guide HACMP for LINUX Installation and Administration Guide SC23-5179-04 SC23-5210-01 SC23-5211-01 Important Note: Online Documentation for HACMP 5.ibm.1 B-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Release Notes for HACMP 5.pdf) are the filesets that can be installed. 1998. On the next SMIT screen with the cursor on “Software to Install”. 3.1 documentation found on the product media and at http://www-03.es.S. At the command line.4.en_US. cluster. English © Copyright IBM Corp. 2008 Appendix B. SMIT lists the image cluster.doc. Viewing and installing the documentation files ---------------------------------------------You can view the PDF documentation before installing the product.html is only delivered in PDF format this release.es -------------------------cluster.4. .doc.en_US.1 ------------------------------------------------------ The HACMP 5.com/systems/p/library/hacmp_docs. ------------------------------------------------------ Documentation for HACMP 5.en_US fileset with its subdirectories: The individual lines under the image name (for example. 2. Take the following steps to install the documentation: 1. enter: smit install_selectable_all SMIT asks for the input device/directory for software.doc.1 is supplied in PDF format.V4. English Image cluster. English cluster.pdf HACMP GLVM PDF Documentation .websphere.html PPRC Web-based HTML Documentation .en_US.U.U.Instructor Guide cluster.en_US.U. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.doc.doc.doc.html HACMP GLVM HTML Documentation .assist -----------------------------cluster.U. English cluster.en_US.S.doc.en_US.U.doc.en_US. English Image cluster.oracle. English cluster.en_US.html HACMP Smart Assist for Oracle HTML Documentation .assist.pprc.doc.doc.pdf PPRC PDF Documentation .pdf HAES PDF Documentation .assist.doc.en_US.doc.html HACMP Smart Assist for DB2 HTML Documentation . 1998.en_US.assist.S.doc.glvm ---------------------------cluster. English cluster.S.S.U.S.en_US. English cluster.html HACMP Smart Assist for B-10 HACMP Implementation © Copyright IBM Corp.S. .S.pprc.glvm.assist.pprc ---------------------------cluster.es.db2.db2.en_US.U.oracle.en_US.S.U.S.doc. English cluster.U.glvm.doc.doc.en_US. English Image cluster.pdf HACMP Smart Assist for DB2 PDF Documentation .en_US.assist.pdf HACMP Smart Assist for Oracle PDF Documentation . 1. You can view the documentation in the Mozilla Firefox browser.U.pdf HACMP Smart Assist for WebSphere PDF Documentation .1: Installation Guide (filename = ha_install) HACMP Version 5.4.websphere.) 5.U. HTML files = 20 MB. 2008 Appendix B.4.1: Glossary (filename = ha_glossary) HACMP/XD for MetroMirror: Planning and Administration Guide (filename = ha_xd_pprc) HACMP/XD for GLVM: Planning and Administration Guide (filename = ha_xd_glvm) © Copyright IBM Corp. English After you install the documentation.V4.1: Troubleshooting Guide (filename = ha_troubleshoot) HACMP Version 5. .S.4. Version 5.S.4. documentation set are: HACMP Version 5. Select all filesets that you wish to install and execute the command.4.doc.0 Instructor Guide AP WebSphere HTML Documentation . Release Notes for HACMP 5.4.4. Note: Installing all of the documentation requires about 46 MB of space in the /usr filesystem. 1998. The documentation is installed in the following directory: /usr/share/man/info/en_US/cluster/HAES The titles of the HACMP for AIX 5L products.1: Administration Guide (filename = ha_admin) HACMP Version 5.1: Planning Guide (filename = ha_plan) HACMP Version 5.4. English cluster.1 B-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.en_US.1: Programming Client Applications (filename = ha_clients) HACMP Version 5. (PDF files = 26 MB.4. store it on a server that is accessible through the Internet.1: Concepts and Facilities Guide (filename = ha_concepts) HACMP Version 5.assist. Instructor Guide HACMP for LINUX Installation and Administration Guide (filename = ha_linux) --------------------------------------------Accessing Documentation --------------------------------------------- You can access the documentation in PDF format. .1: lslpp -f “cluster*” ================== PRODUCT MAN PAGES ================== Man pages for HACMP commands and utilities are installed in the following directory: /usr/share/man/cat1 Execute man [command-name] to read the information. version 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. ======================== Accessing IBM on the Web ======================== • Access IBM's home page at: http://www. 1998.4. They are described in a separate Smart Assists release notes Use the following command to determine the exact files loaded into product directories when installing the HACMP for AIX 5L.ibm. NOTE: The Smart Assist Guides and the Smart Assist Developer's Guide are installed with the base fileset.com B-12 HACMP Implementation © Copyright IBM Corp. 1 B-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .com © Copyright IBM Corp. Release Notes for HACMP 5.4. 1998. You can send any comments via e-mail to: [email protected]. 2008 Appendix B.0 Instructor Guide AP ======== Feedback ======== IBM welcomes your comments. 1998.Instructor Guide B-14 HACMP Implementation © Copyright IBM Corp. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. V4. Version 5.4.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX.1: Administration Guide SC23-5177-04 HACMP for AIX.ibm. Version 5. Version 5.4. 2008 Appendix C.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX. Version 5.1: Master Glossary http://www-03.4.1: Installation Guide SC23-4864-10 HACMP for AIX.4.com/systems/p/library/hacmp_docs.4. IPAT via IP replacement What this unit is about This unit describes the HACMP IP Address Takeover via IP replacement function.html HACMP manuals © Copyright IBM Corp. IPAT via IP replacement C-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1: Planning Guide SC23-4862-10 HACMP for AIX. Version 5.4.0 Instructor Guide AP Appendix C. Version 5. 1998. you should be able to: • Explain and configure IP Address Takeover (IPAT) via IP replacement How you will check your progress Accountability: • Checkpoint • Machine exercises References SC23-5209-01 HACMP for AIX. . What you should be able to do After completing this unit. Unit objectives AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Unit objectives After completing this unit.0 Notes: C-2 HACMP Implementation © Copyright IBM Corp. you should be able to: Explain and set up IP Address Takeover (IPAT) via IP replacement © Copyright IBM Corporation 2008 Figure C-1. 1998. . . © Copyright IBM Corp. IPAT via IP replacement C-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. 2008 Appendix C.0 Instructor Guide AP Instructor notes: Purpose — To tell the students what we will talk about in this unit. Details — Additional information — Transition statement — Let’s look at what you need to do to configure IPAT via IP replacement.V4. 0 Notes: Requirements Keep the following items in mind when you configure a network for IPAT via IP replacement: . these were called boot adapters.If you have more than one service IP address.47.10. where service addresses are required to not be in a boot subnet. The reason for this will become clear when we discuss what happens during a takeover. Contrast with IPAT via IP aliasing.11. – Each interface IP address on a given node must be in a different logical IP subnet* and there must be a common subnet among the nodes – Define these address in the /etc/hosts file and configure them in HACMP topology Define service IP addresses in /etc/hosts and HACMP resources – The address must be in the SAME subnet as a common interface subnet – HACMP configures them to AIX as required Before starting the application resource group 9.5 terminology.47.11.47.10.Each service IP address must be in the same logical IP subnet as one of the non-service addresses.1 (ODM) 9. .1 (ODM) 9. . (In HACMP 4. .None of the other non-service addresses may be in the same subnet as the service IP address (this is true regardless of whether IPAT via IP replacement is being used C-4 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.There must be at least one logical IP subnet that has a communication interface (NIC) on each node.Instructor Guide IPAT via IP replacement configuration Define each network’s boot IP addresses in the AIX ODM. 1998.47. see “IPAT via IP replacement after a node fails” on page C-12.2 (ODM) * See earlier discussion of heartbeating and failure diagnosis for explanation of why © Copyright IBM Corporation 2008 Figure C-2. they must all be in the same subnet. IPAT via IP replacement configuration AU548.2 (ODM) 9.) . 10.11.10/24 192. the network will require two subnets: Node name node1 node1 node2 node2 Service address Service address NIC en0 en1 en0 en1 IP Label n1-if1 n1-if2 n2-if1 n2-if2 appA-svc appB-svc IP Address 192. For example. IPAT via IP replacement C-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. appB-svc n1-if2.168.2 192. appA-svc.10.10.25 subnet 192.V4.11/24 IP labels n1-if1. . where each node has two communication interfaces and two service IP labels.168.168.168.168. of the non-service subnets. . All service IP addresses must be in the same subnet. 2008 Appendix C.168. in a cluster with one network using IPAT via replacement.1 192. Each non-service IP address on each node must be in a separate subnet. 1998. IPAT via IP replacement subnet rules example Each service IP address must be in one. n2-if1.0 Instructor Guide AP because the NICs on each node are required to be on different IP subnets in order to support heartbeating). n2-if2 © Copyright IBM Corp.10.168. and only one.1 192.2 192.22 192.168.11.All network interfaces must have the same subnet mask. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Instructor notes: Purpose — Show how to configure a network for IPAT via IP replacement. C-6 HACMP Implementation © Copyright IBM Corp. 1998. Details — Additional information — Transition statement — Let’s have a look at IPAT via IP replacement in operation. associated with different physical networks). the boot adapter in HACMP 4. the resource group’s service IP address replaces the interface IP address of the NIC (AIX ODM). IPAT via IP replacement in operation AU548.11. . IPAT via IP replacement C-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.x terminology).47. the resource group’s service IP address replaces the interface IP address of one of the © Copyright IBM Corp. When the resource group comes up on any node other than its home node.0 Notes: Operation When the resource group comes up on its home node.11. it is not possible to have two or more service IP addresses in the same resource group.2 (ODM) © Copyright IBM Corporation 2008 Figure C-3. which is in the same subnet as the service IP label (that is.47.47.2 (ODM) 9. Also. – It replaces a boot IP label on a different subnet otherwise After starting the application resource group 9.10. which are in the same IP subnet (as there will not be an adapter to assign the second service IP address to). Note that this approach implies that there cannot be two resource groups in the cluster that both use IPAT via IP replacement and use the same node as their home node unless their respective service IP addresses are in different subnets (in other words. HACMP replaces an boot (ODM) IP label with the service IP label – It replaces the boot IP label on the same subnet if the resource group is on its startup node or if the distribution startup policy is used.47. since the service IP address replaces the existing IP address on the NIC.1 (ODM) 9.V4.10. 2008 Appendix C.22 (service) 9. 1998.0 Instructor Guide AP IPAT via IP replacement in operation When the resource group comes up on a node. Instructor Guide NICs which is not in the same subnet as the service IP address (this is primarily to allow some other resource group to use the node as its home node). C-8 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . . IPAT via IP replacement C-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Appendix C.V4.0 Instructor Guide AP Instructor notes: Purpose — Show what happens when an IPAT via IP replacement resource group comes up on a node. © Copyright IBM Corp. Details — Additional information — Transition statement — Let’s see what happens if a NIC fails. 1998. 47.0 Notes: Interface failure If a communications interface (NIC A). If there are no available (that is.47.Instructor Guide IPAT via IP replacement after an I/F fails If the communication interface being used for the service IP label fails. IPAT via IP replacement after an I/F fails AU548.10. 1998. currently functional) communication interfaces The IP labels remain swapped when the failed interface recovers NIC A 9. C-10 HACMP Implementation © Copyright IBM Corp. which is currently assigned an IPAT via IP replacement service IP address.2 (ODM) © Copyright IBM Corporation 2008 Figure C-4.22 (service) 9.47. HACMP swaps the service IP label with a boot (ODM) IP label on one of the node's remaining available (that is.11.47. Interface swap The failed communications interface (NIC A) is then reconfigured with the address of the communication interface (NIC B) as this allows the heartbeat mechanism to watch for when the failed communication interface (NIC A) recovers.1 (ODM) NIC B 9.11. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. then HACMP moves the service IP address to one of the other communication interfaces (NIC B) on the same node (to one of the standby adapters using HACMP 4.2 (ODM) 9. fails.10. functional) NICs left.x terminology). . the relevant network then HACMP initiates a fallover. Details — Additional information — Transition statement — Let’s look at what happens if a node fails.V4. 2008 Appendix C. IPAT via IP replacement C-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . © Copyright IBM Corp. 1998.0 Instructor Guide AP Instructor notes: Purpose — Show what happens when a NIC fails with IPAT via IP replacement. . HACMP moves the resource group to a new node and replaces an interface IP label with the service IP label: – If the resource group is on its startup node or if the Startup policy is distribution. the service IP addresses are assigned to NICs on the fallover node: . it replaces the interface (ODM) IP label in the same subnet – Else it replaces an interface (ODM) IP label in a different subnet – Or fails if there isn't an available interface 9. then HACMP initiates a fallover.x terminology). the C-12 HACMP Implementation © Copyright IBM Corp.Instructor Guide IPAT via IP replacement after a node fails If the resource group's node fails.10. 1998.Not the home node and not Online Using Distribution Policy If the takeover node is not the home node for the resource group and the resource group does not have a Startup policy of Online Using Distribution Policy.22 (service) © Copyright IBM Corporation 2008 Figure C-5. IPAT via IP replacement after a node fails AU548.0 Notes: Node failure If the node currently responsible for an IPAT via IP replacement using resource group fails.Home node or Startup policy of Online Using Distribution Policy (rotate in HACMP 4. When the resource group comes up on the takeover node. the Service IP addresses replace the IP addresses of a communications interface (NIC) with an IP address in the same subnet as the service IP address. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.x terminology) If the takeover node is the home node for the resource group or the resource group has a Startup policy of Online Using Distribution Policy (rotate in HACMP 4. .10.47.2 (ODM) 9.47. The home node is a node that normally owns the resource group. . These concepts will be discussed in detail in the unit on resource groups. If the first node in the resource group’s list of nodes already has another resource group started on it then the next node in the list of nodes is tried. This is primarily to allow some other resource group to use the node as its home node. Home node and startup policy The home node (or the highest priority node for this resource group) is the first node that is listed in the participating nodelist for a non-concurrent resource group. 2008 Appendix C. IPAT via IP replacement C-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note: This explains why all service IP addresses must be in the same subnet when using IPAT via replacement.0 Instructor Guide AP service IP addresses replace the IP addresses of a communications interface (NIC) with an IP address in a different subnet than the subnet of the service IP address (a standby adapter in HACMP 4.x terminology). 1998. Note that the takeover node might actually be the home node since a resource group can be configured to not always run on the highest priority available node. © Copyright IBM Corp. Resource groups have three policies that HACMP uses to determine which nodes will start a which resource groups.V4. A Startup policy of Online Using Distribution Policy (also called a distributed policy) specifies that only one resource group can be active on a given node. C-14 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show what happens when a node fails with IPAT via IP replacement in effect. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s summarize IPAT via IP replacement. . 4. Disadvantages Probably the most significant disadvantages are that IPAT via IP replacement limits the number of service IP labels per subnet per resource group on one communications © Copyright IBM Corp. Note: Another alternative. this may be important. 1998. if you are limited on the number of subnets you have available.0 Instructor Guide AP IPAT via IP replacement summary Configure each node with up to eight communication interfaces (each on a different subnet) Assign service IP labels to resource groups as appropriate – Each node can be the most preferred node for at most one resource group – No limit on number of service IP labels per resource group but each service IP label must be on a different physical network HACMP replaces non-service IP labels with service IP labels on the same subnet as the service IP label when the resource group is running on its most preferred node or if the Startup Policy is distributed HACMP replaces non-service IP labels with service IP labels on a different subnet from the service IP label when the resource group is moved to any other node IPAT via IP replacement supports hardware address takeover © Copyright IBM Corporation 2008 Figure C-6. which will be discussed in a few pages. Another advantage is that it requires fewer subnets. If you are limited in the number of subnets available for your cluster.0 Notes: Advantages Probably the most significant advantage of IPAT via IP replacement is that it supports hardware address takeover (HWAT). See Heartbeating Over IP Aliases in the HACMP for AIX. is to use heartbeating via IP aliases. . IPAT via IP replacement summary AU548. IPAT via IP replacement C-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Version 5.1 Planning Guide. 2008 Appendix C.V4. Instructor Guide interface to one and makes it rather expensive (and complex) to support lots of resource groups in a small cluster. C-16 HACMP Implementation © Copyright IBM Corp. In other words. 1998. you need more network adapters to support more applications.x terminology) even if there are no resource groups currently on the node that uses IPAT via IP replacement. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Note that HACMP tries to keep the service IP Labels available by swapping IP addresses with other communication interfaces (standby adapters in HACMP 4. IPAT via replacement usually takes more time than IPAT via aliasing. . Also. 0 Instructor Guide AP Instructor notes: Purpose — Show a summary of the IPAT via IP replacement configuration rules and behavior. 1998. IPAT via IP replacement C-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — One reason for using IPAT via replacement is if you need to use hardware address takeover (HWAT). © Copyright IBM Corp.V4. 2008 Appendix C. . Let’s review why you might need to use HWAT. there may be issues. you can use AIX’s gratuitous ARP features to update client and router ARP caches after a takeover. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Gratuitous ARP support issues Gratuitous ARP is supported by AIX on the following network technologies: – – – – Ethernet (all types and speeds) Token-Ring FDDI SP Switch 1 and SP Switch 2 Gratuitous ARP is not supported on ATM Operating systems are not required to support gratuitous ARP packets – Practically every operating system does support gratuitous ARP – Some systems (for example. Gratuitous ARP issues Not all network technologies provide the appropriate capabilities to implement gratuitous ARP.0 Notes: Review When using IPAT via aliasing. which result in gratuitous ARP packets being lost is likely C-18 HACMP Implementation © Copyright IBM Corp. certain routers) can be configured to respect or ignore gratuitous ARP packets © Copyright IBM Corporation 2008 Figure C-7. an extremely overloaded network or a network that is suffering intermittent failures might result in gratuitous ARP packets being lost. Gratuitous ARP support issues AU548. (A network that is sufficiently overloaded to be losing gratuitous ARP packets or that is suffering intermittent failures. 1998. However. support issues aside. . operating systems which implement TCP/IP are not required to respect gratuitous ARP packets (although practically all modern operating systems do). Finally. In addition. 1998.) © Copyright IBM Corp. IPAT via IP replacement C-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Appendix C.0 Instructor Guide AP to be causing the cluster and the cluster administrator far more serious problems than the ARP cache issue involves. . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . Details — Additional information — Transition statement — What if gratuitous ARP is not supported? C-20 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Discuss the key gratuitous ARP support issues. © Copyright IBM Corp. then they should proceed as though their context does not support gratuitous ARP.V4. The first two are discussed in Unit 3.0 Notes: If gratuitous ARP is not supported HACMP supports three alternatives to gratuitous ARP. – HACMP can be configured to perform Hardware Address Takeover (HWAT). . © Copyright IBM Corporation 2008 Figure C-8. What if gratuitous ARP is not supported? AU548. possibly unnecessary complexity into the cluster. 2008 Appendix C. If the cluster administrator or configurator decides that the probability of a gratuitous ARP update packet being lost is high enough to be relevant. forcing an update to their ARP caches. IPAT via IP replacement C-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. We will discuss the third option here. – clinfo can be used on the servers to ping a list of clients. Don’t add unnecessary complexity Cluster configurators should probably not simply assume that gratuitous ARP won’t provide a satisfactory solution as each of the alternatives introduces additional. 1998.0 Instructor Guide AP What if gratuitous ARP is not supported? If the local network technology doesn't support gratuitous ARP or there is a client system or router on the local physical network which must communicate with the cluster and which does not support gratuitous ARP packets: – clinfo can used on the client to receive updates of changes. Suggestion: Do not get involved with using either clinfo or HWAT to deal with ARP cache issues until you've verified that there actually are ARP issues which need to be dealt with. HWAT. C-22 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Present the list of alternative ways of dealing with the ARP cache issue. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s look at the third option. . 1998.0 Notes: Hardware address takeover Hardware Address Takeover (HWAT) is the most robust method of dealing with the ARP cache issue as it ensures that the hardware address associated with the service IP address does not change (which avoids the whole issue of whether the client system’s ARP cache is out-of-date).V4. Option 3: Hardware address takeover AU548. © Copyright IBM Corp. Cluster implementer designates a Locally Administered Address (LAA) which HACMP assigns to the NIC which has the service IP label © Copyright IBM Corporation 2008 Figure C-9. IPAT via IP replacement C-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. HACMP then ensures that whichever NIC the service IP address is on also has the designated hardware address.0 Instructor Guide AP Option 3: Hardware address takeover HACMP can be configured to swap a service IP label's hardware address between network adapters. . HWAT is incompatible with IPAT via IP aliasing because each service IP address must have its own hardware address and a NIC can support only one hardware address at any given time. 2008 Appendix C. The essence of HWAT is that the cluster configurator designates a hardware address that is to be associated with a particular service IP address. HWAT is an optional capability which must be configured into the HACMP cluster (we will see how to do that in a few minutes). 1998. token ring. . .Instructor Guide HWAT considerations There are a few points which must be kept in mind when contemplating HWAT: . 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. C-24 HACMP Implementation © Copyright IBM Corp. ATM networks do not support HWAT. but each NIC can only have one hardware address. . . .The hardware address that is associated with the service IP address must be unique within the physical network that the service IP address is configured for.Cluster nodes using HWAT on token ring networks must be configured to reboot after a system crash because the token ring card will continue to intercept packets for its hardware address until the node starts to reboot.HWAT increases the takeover time (usually by just a few seconds). and FDDI networks (MCA FDDI network cards do not support HWAT).HWAT is only supported for Ethernet. .HWAT is not supported by IPAT via IP aliasing because each NIC can have more than one IP address. 2008 Appendix C.0 Instructor Guide AP Instructor notes: Purpose — Introduce HWAT. © Copyright IBM Corp. 1998. . IPAT via IP replacement C-25 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Let’s take a look at what happens with HWAT.V4. The alternate hardware address is usually referred to as a Locally Administered Address or LAA.255. C-26 HACMP Implementation © Copyright IBM Corp.2 255.1 255.0 40:04:ac:62:72:49 After resource group is started © Copyright IBM Corporation 2008 hudson-if2 9.0 00:04:ac:48:22:f6 tr0 Bondar Figure C-10.47.3 255.255.5.1 255.255.9.2 255.0 00:04:ac:62:72:49 Before resource group is started hudson-if2 9. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 00:04:ac:62:72:61 tr1 tr1 tr0 bondar-if2 9.2 255.5.47.255.255.5.Instructor Guide Hardware address takeover (1 of 3) tr1 bondar-if1 9.255.255.255.2 255. . the interfaces are assigned their normal hardware addresses.47.0 Notes: Hardware address takeover Boot time At boot time.47.0 00:04:ac:48:22:f4 hudson-if1 9.9.0 00:04:ac:62:72:61 xweb 9.5.9.255.47.255.0 00:04:ac:48:22:f6 tr0 Bondar Hudson tr1 tr0 bondar-if1 9.255.255.1 255.47. the service IP address replaces the non-service IP address of the interface and the alternate hardware address replaces the normal hardware address for that NIC. HWAT: resource group started When HACMP starts the resource group. 1998.47.255.255.255.0 00:04:ac:48:22:f4 hudson-if1 9.47.9.255. Hardware address takeover (1 of 3) Hudson AU548. © Copyright IBM Corp. Details — Additional information — Transition statement — Let’s see what happens when a node or an interface fails.V4. IPAT via IP replacement C-27 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998. 2008 Appendix C.0 Instructor Guide AP Instructor notes: Purpose — Show what happens with HWAT at startup. 255.0 40:04:ac:62:72:49 Interface failure hudson-if2 9.255.255.9.1 255.255. is that the local client’s ARP caches are still up to date because the HW address associated with the IP address has not changed.255.1 255.5.47.0 00:04:ac:48:22:f6 Bondar Hudson xweb 9.47.1 255.0 00:04:ac:48:22:f4 Node failure xweb 9.1 255.0 00:04:ac:48:22:f6 Bondar © Copyright IBM Corporation 2008 Hudson AU548.255. The result.Instructor Guide Hardware address takeover (2 of 3) •LAA is moved along with the service IP label xweb 9. 1998.5.0 00:04:ac:62:72:61 tr1 xweb 9. If a node fails. HACMP moves the IP address to a NIC on the takeover node.0 40:04:ac:62:72:49 hudson-if1 9.255.255. are moved to another node. in both of these cases.255. .5.47.47. Hardware address takeover (2 of 3) Notes: HWAT: interface or node failure If a NIC (with a service IP address that has an LAA) fails.2 255.255.0 40:04:ac:62:72:49 hudson-if2 9.47. It also moves the LAA (alternative hardware address) to the same NIC.2 255.5.5.0 Figure C-11.5.47.9.47.2 255.255.255. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.255.0 40:04:ac:62:72:49 bondar-if1 9. and its associated LAA.255.47. C-28 HACMP Implementation © Copyright IBM Corp. the service IP address.1 255.255.255. . 1998. 2008 Appendix C. © Copyright IBM Corp.V4.0 Instructor Guide AP Instructor notes: Purpose — Discuss what happens when a NIC or a node fails. Details — Additional information — Transition statement — Let’s see what happens when a node recovers. IPAT via IP replacement C-29 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 47.5. If AIX is configured to set the network card’s HW address to the alternate hardware address at boot time.1 255.0 00:04:ac:48:22:f4 xweb 9.0 40:04:ac:62:72:49 After HACMP is started the node reintegrates according to its resource group parameters tr1 tr0 hudson-if2 9.255. then two NICs on the same network have the same hardware address (weird things happen when you do this).255. .0 00:04:ac:62:72:61 xweb 9. AIX must be configured to leave the network card’s factory-defined hardware address in place.2 255. HWAT: resource moved back to home node If HACMP ultimately moves the resource group back to the now recovered node.9.5.255.5.255.255.0 00:04:ac:48:22:f6 tr0 Bondar Hudson tr1 tr0 bondar-if1 9.1 255. and the LAA associated with the service IP address lands on the same NIC on the recovered node as the service IP address lands on.255.47.47.0 00:04:ac:48:22:f6 Bondar © Copyright IBM Corporation 2008 Hudson Figure C-12.47. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.1 255.255.255.47.2 255. then the hardware address of the NIC on the backup node is restored to its factory setting.255.47.255.5.255.47.255.0 Notes: HWAT: node recovery When the failed node reboots. Hardware address takeover (3 of 3) AU548.0 00:04:ac:48:22:f4 hudson-if1 9.1 255.47.2 255.3 255.255. hudson-if2 9.255.0 40:04:ac:62:72:49 tr1 tr0 bondar-if2 9.9. the burned in ROM Address is used on the service network adapter. 1998.0 00:04:ac:62:72:49 When a failed node comes back to life.Instructor Guide Hardware address takeover (3 of 3) tr1 bondar-if1 9.9. C-30 HACMP Implementation © Copyright IBM Corp.255.5.255. . © Copyright IBM Corp.V4. Details — Additional information — Transition statement — Let’s look at how to implement IPAT using replacement and HWAT.0 Instructor Guide AP Instructor notes: Purpose — Discuss what happens when a node recovers. 2008 Appendix C. 1998. We’ll start with a scenario. IPAT via IP replacement C-31 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. This scenario is phony but it presents a real if rather unlikely problem. it looks like we need to implement hardware address takeover to support these FOOL-97X’s. there are strange devices out there that do not. C-32 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. the ATM network does not support gratuitous ARP and so could be a candidate for the use of HWAT. . Reality check A side note is probably in order: although most TCP/IP-capable systems respect gratuitous ARP. Implementing hardware address takeover AU548.Instructor Guide Implementing hardware address takeover Someone just got a great deal on a dozen used FOOL-97x computers for the summer students to use They run some strange proprietary operating system which refuses to update its ARP cache in response to either ping or gratuitous ARP packets bondar hudson D A D A © Copyright IBM Corporation 2008 Figure C-13. we will implement HWAT to support the new computers discussed in the visual.0 Notes: Hardware address takeover In this scenario. Just imagine how much money they have saved once they realize that these new computers don’t do what the summer students need done! In the meantime. For example. 2008 Appendix C. . 1998.V4. Details — Additional information — Transition statement — Let’s have a look at how we go about implementing hardware address takeover. IPAT via IP replacement C-33 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide AP Instructor notes: Purpose — Introduce the next scenario: hardware address takeover. © Copyright IBM Corp. . In this scenario.0 Notes: Implementing HWAT To use HWAT. we must stop the cluster.15.168. 5. © Copyright IBM Corporation 2008 Figure C-14.Instructor Guide Our plan for implementing HWAT 1. The plan for implementing HWAT AU548. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Synchronize the changes 6. They are on the wrong subnet. We will either need to change our service addresses or change our non-service addresses. Stop cluster services Changing from IPAT via aliasing to IPAT via replacement cannot be done dynamically. Remove the alias service labels from the Resources – They are in the wrong subnet for replacement – They are automatically removed from the RG 3.0 subnet – Use the procedure described in the networking to select the (Locally Administered Addresses (LAA) addresses – Configure new service IP labels with these LAA addresses in the HACMP SMIT screens 4. Restart cluster services on the two nodes. Remove existing service IP labels The service IP labels used for IPAT via aliasing cannot be used for IPAT via replacement. C-34 HACMP Implementation © Copyright IBM Corp. – Update /etc/hosts on both cluster nodes to describe service IP labels and addresses on the 192. we choose to change the service addresses. Convert the net_ether_01 Ethernet network to use IPAT via IP replacement: – Disable IPAT via IP aliasing on the Ethernet network. Define resource groups to use the new service IP labels. Stop cluster services on both cluster nodes Use the graceful shutdown option to bring down the resource groups and their applications 2. 1998. we must use IPAT via replacement. we also need to update our name resolution for the new service IP labels and we need to create an alternate hardware address or Locally Administered Address (LAA) for each service IP label. 1998. IPAT via IP replacement C-35 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Name resolution changes One slight problem with the above procedure is that it requires the users (or the DNS administrator) to change the service IP address that they are using. this would require more network reconfiguration work and it isn’t totally clear that the difference is significant in the grand scheme of things. However. . 2008 Appendix C. © Copyright IBM Corp. Note that either approach requires the cooperation of the network administrators as we will require IP addresses and probably DNS changes.V4.0 Instructor Guide AP Convert the network to use IPAT via replacement In addition to the obvious step of disabling IPAT via aliasing. It would arguably be better if we preserved the service IP address. 1998.Instructor Guide Instructor notes: Purpose — Describe the procedure for implementing HWAT on a network that is currently using IPAT via IP aliasing. C-36 HACMP Implementation © Copyright IBM Corp. we stop HACMP. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — First. . 0 Notes: Stop HACMP Make sure that HACMP is shut down gracefully. © Copyright IBM Corp.hudson] true graceful + + + + Content of this menu will very depending on the HACMP version. F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure C-15. on system restart or both Stop Cluster Services on these nodes BROADCAST cluster shutdown? * Shutdown mode [Entry Fields] now [bondar.V4. 1998.0 Instructor Guide AP Stopping HACMP # smit clstop Stop Cluster Services Type or select values in entry fields. * Stop now. . as we can’t have the application running while we are changing service IP addresses. 2008 Appendix C. Stopping HACMP AU548. IPAT via IP replacement C-37 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Choose the stop option that takes the resources offline. Press Enter AFTER making all desired changes. we remove the existing service IP labels. Details — Additional information — Transition statement — Next. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. .Instructor Guide Instructor notes: Purpose — Show the shutting down of the cluster. C-38 HACMP Implementation © Copyright IBM Corp. ¦ ¦ Press Enter AFTER making all selections.V4. ¦ ¦ ONE OR MORE items can be selected. © Copyright IBM Corporation 2008 Figure C-16. IPAT via IP replacement C-39 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Add a Service IP Label/Address Change/Show a Service IP Label/Address Remove Service IP Label(s)/Address(es) +--------------------------------------------------------------------------+ ¦ Select Service IP Label(s)/Address(es) to Remove ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. Configure HACMP Service IP Labels/Addresses Move cursor to desired item and press Enter. Removing a service IP label AU548. ¦ ¦ ¦ ¦ xweb ¦ ¦ yweb ¦ ¦ zweb ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ F1¦ Enter=Do /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ Repeat for both service IP labels.0 Instructor Guide AP Removing a service IP label Press Enter here and you will be prompted to confirm the removal.0 Notes: Remove any service labels configure for IPAT via aliasing An attempt to convert the network to IPAT via IP replacement fails if there are any service IP labels that don’t conform to the IPAT via IP replacement rules. 1998. 2008 Appendix C. . © Copyright IBM Corp. Instructor Guide Instructor notes: Purpose — Show the removal of the old service IP labels. . Details — Additional information — Transition statement — Next. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. C-40 HACMP Implementation © Copyright IBM Corp. we need to convert the IP network to IPAT via IP replacement. 1998. 0 Notes: Introduction Here we change the net_ether_01 network to disable IPAT via aliasing. Press Enter AFTER making all desired changes. IPAT via IP replacement C-41 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. * Network Name New Network Name * Network Type * Netmask * Enable IP Address Takeover via IP Aliases IP Address Offset for Heartbeating over IP Aliases [Entry Fields] net_ether_01 [] [ether] [255. Change/Show an IP-Based Network in the HACMP Cluster Type or select values in entry fields.0 Instructor Guide AP Disable IPAT via aliases Set the "Enable IP Address Takeover via IP Aliases" setting to "No" and press Enter. © Copyright IBM Corp. Disable IPAT via aliases AU548.255. .V4.255. 2008 Appendix C.0] [No] [] + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure C-17. Instructor Guide Instructor notes: Purpose — Show the screen used to change the parameters of an IP network. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. we need to assign new IP addresses to the service IP labels. . C-42 HACMP Implementation © Copyright IBM Corp. 1998. Details — Additional information — Transition statement — Now. 29 192.168. In a cluster with only two NICs per node.31 192.15.5.31 192.168. 1998. There must be one NIC on each host that has an IP address on the same subnet as the service IP labels (in HACMP 4.168.16.15. IPAT via IP replacement C-43 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 subnet: 192.V4.31 192.70 bondar bondar-if1 bondar-if2 hudson hudson-if1 hudson-if2 xweb yweb # # # # # # # # # # persistent node IP label on bondar bondar's first boot IP label bondar's second boot IP label persistent node IP label on hudson hudson's first boot IP label hudson's second boot IP label the IP label for the application normally resident on bondar the IP label for the application normally resident on hudson Note that neither bondar or hudson's network configuration (as defined with the AIX TCP/IP smit screens) needs to be changed Note that we are not renaming the interface IP labels to something like bondar_boot and bondar_standby as changing IP labels in an HACMP cluster can be quite a bit of work (it is often easier to delete the cluster definition and start over) © Copyright IBM Corporation 2008 Figure C-18.15.29 192. NIC IP addresses that conform to the IPAT via IP aliasing rules also conform to the IPAT via replacement. 2008 Appendix C.168.168.168.x terminology. so only the service IP labels need to be changed.15. © Copyright IBM Corp. b. The service IP labels must all be on the same subnet. The other NICs on each node must each be in a different subnet than the service IP labels (in HACMP 4.x terminology. these NICs are standby adapters). The updated /etc/hosts AU548. these NICs are boot adapters). .168.168.15.29 192. c.0 Instructor Guide AP The updated /etc/hosts Here's the key portion of the /etc/hosts file with the service IP labels moved to the 192.16.0 Notes: IPAT via replacement rules Remember the rules for IP addresses for IPAT via IP replacement networks (slightly reworded): a.168.92 192.5. C-44 HACMP Implementation © Copyright IBM Corp. These locally administered hardware addresses (called Locally Administered Addresses or LAAs) will be used if HACMP needs to move the service IP label to a different interface or a different node. The issue here is that they must be unique on your network. Additional information — Transition statement — Now. Details — You might need to spend some time giving examples of how other cluster networks would be configured for IPAT via replacement. we need to create some alternate hardware addresses. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. .Instructor Guide Instructor notes: Purpose — Show the updated /etc/hosts file for IPAT via IP replacement. Let’s discuss one method for creating LAAs. 1998. the second bit transmitted on the LAN medium (the “4” bit) is the local/global bit. 2008 Appendix C. Token ring and FDDI) use six byte hardware addresses of the form: xx. the address is a GAA. at a minimum. .xx.V4. 2 or 3 – A MAC address that starts with 0. 1. These addresses are called Globally Administered Addresses (GAAs).xx. 2 or 3 is called a Globally Administered Address (GAA) because it is assigned to the NIC's vendor by a central authority Incrementing this first digit by 4 transforms the GAA into a Locally Administered Address (LAA) which will be unique worldwide (unless someone has already used the same GAA to create an LAA which isn't likely since GAAs are unique worldwide) © Copyright IBM Corporation 2008 Figure C-19. as noted in the visual.xx The factory-set MAC address of the NIC will start with 0.0 Instructor Guide AP Creating a locally administered address (LAA) Each service IP label using HWAT will need an LAA The LAA must be unique on the cluster's physical network The MAC address based technologies (Ethernet. © Copyright IBM Corp. Locally administered addresses Incrementing the first nibble of the GAA by 4 transforms it into an LAA.xx. IPAT via IP replacement C-45 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. on the local network to which they are connected.xx. Note: According to the IEEE 802 standard for LAN MAC addresses. Creating a locally administered address (LAA) AU548. 1. The factory set hardware address for each network interface card (NIC) is administered by a central authority and should be unique in the world. 1998. Using this method to create an alternate address should provide you with an address that is also globally unique. Setting this bit to one indicates this address is locally administered.0 Notes: Hardware addresses Hardware addresses must be unique. If this bit is zero. . C-46 HACMP Implementation © Copyright IBM Corp. 1998.Instructor Guide Instructor notes: Purpose — Discuss how to create an LAA. Details — Additional information — Transition statement — Let’s create two LAAs for our cluster. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 0 Instructor Guide AP Creating two LAAs for our cluster Here are two Globally Administered Addresses (GAAs) taken from Ethernet adapters in the cluster: – 0.ac.ac.17. 1.04.19.19.29.ac.19.29.64 – 40.46.6.8 First we make sure that each number is two digits long by adding leading zeros as necessary: – 00. IPAT via IP replacement C-47 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.64 – 00.06.0 Notes: © Copyright IBM Corp.46.29.64 – 0.17.08 Done! These two addresses are now LAAs © Copyright IBM Corporation 2008 Figure C-20. Creating two LAAs for our cluster AU548.4. 1998.ac.ac.17.04. 2 or 3: – Yep! Add 4 to the first digit of each GAA: – 40. .46.ac.08 Verify that the first digit is 0.06. 2008 Appendix C. 1998. Additional information — Transition statement — Before we define the new service IP labels to HACMP.Instructor Guide Instructor notes: Purpose — Show how to convert two Globally Administered Addresses (GAAs) into Locally Administered Addresses (LAAs). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. let’s take a look at a couple of issues. Details — Be prepared to use some of the GAAs on network cards in the classroom into LAAs (that is. . C-48 HACMP Implementation © Copyright IBM Corp. you need to be familiar with the procedure before you teach this foil). . 2008 Appendix C. You must leave that blank and configure this using the SMIT HACMP menus. The Token-Ring documentation states that the LAA must start with 42 The FDDI documentation states that the first nibble (digit) of the first byte of the LAA must be 4. – Causes serious communications problems and puts the cluster in to an unstable state. IPAT via IP replacement C-49 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Hardware address takeover issues AU548. © Copyright IBM Corp. – AIX must be set to reboot automatically after a system crash (see smitty chgsys) © Copyright IBM Corporation 2008 Figure C-21. 6 or 7 (which is compatible with the method for creating LAAs described earlier) Token-Ring adapters do not release the LAA if AIX crashes.0 Instructor Guide AP Hardware address takeover issues Do not enable the ALTERNATE hardware address field in the SMIT devices menu – Causes the adapter to boot on your chosen LAA rather than the burned in ROM address. 1998. – Correct method is to enter your chosen LAA in to the smit HACMP menus (remove the periods or colons before entering it into the field).V4.0 Notes: Issues The main thing to remember is that you do NOT configure the ALTERNATE hardware address field in the SMIT devices panel. 5. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Discuss the importance of using HACMP to apply the LAAs. let’s define the new service IP labels to HACMP. . Details — Additional information — Transition statement — Now. 1998. C-50 HACMP Implementation © Copyright IBM Corp. not the SMIT devices panel. V4. . * IP Label/Address * Network Name Alternate HW Address to accompany IP Label/Address [Entry Fields] [xweb] + net_ether_01 [4004ac171964] You probably shouldn't use the particular LAAs shown on these visuals in your cluster. © Copyright IBM Corp. The Alternate HW Address to accompany IP Label/Address is specified as a series of hexadecimal digits without intervening periods or any other punctuation. which it is in this case. 2008 Appendix C. © Copyright IBM Corporation 2008 Figure C-22. Press Enter AFTER making all desired changes. you get an error or a warning from this screen if you try to define service IP labels which do not conform to the rules for service IP labels on IPAT via IP replacement networks.0 Notes: Redefining the service IP labels Define each of the service IP labels making sure to specify a different LAA address for each one. Note that the periods are stripped out before the LAA is entered into the HW Address field. IPAT via IP replacement C-51 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide AP Redefining the service IP labels for HWAT Redefine the two service IP labels. 1998. If IPAT via IP replacement is specified for the network. Redefining the service IP labels for HWAT AU548. F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Don't forget to specify the second LAA for the second service IP label. Add a Service IP Label/Address configurable on Multiple Nodes (extended) Type or select values in entry fields. Select your own LAAs using the procedure described earlier. synchronize the changes.Instructor Guide Instructor notes: Purpose — Show the screen used to define a service IP label being used to also specify a LAA. C-52 HACMP Implementation © Copyright IBM Corp. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Details — Additional information — Transition statement — Now. . 1998.0 Notes: Synchronize Don’t forget to synchronize.V4. [Entry Fields] * Verify. . Press Enter AFTER making all desired changes. HACMP Verification and Synchronization Type or select values in entry fields. Synchronize or Both [Both] Force synchronization if verification fails? [No] * Verify changes only? [No] * Logging [Standard] + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure C-23. © Copyright IBM Corp.0 Instructor Guide AP Synchronize your changes Synchronize the changes and run through the test plan. IPAT via IP replacement C-53 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Synchronize your changes AU548. 2008 Appendix C. C-54 HACMP Implementation © Copyright IBM Corp.Instructor Guide Instructor notes: Purpose — Show the synchronization step. . Additional information — Transition statement — Let’s review. Details — The students might wonder why the synchronization step is always shown explicitly. 1998. It also could be a useful reminder if this unit is used as a reference document once the students return to their offices. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. The synchronization foil marks the end of each scenario (except when an extra synchronization is done during a scenario). Checkpoint AU548. True or False? If the takeover node is not the home node for the resource group and the resource group does not have a Startup policy of Online Using Distribution Policy. the service IP address replaces the IP address of a NIC with an IP address in the same subnet as the subnet of the service IP address. 3. For IPAT via replacement (select all that apply) a. Each service IP address must be in the same subnet c. True or False? You must stop the cluster in order to change from IPAT via aliasing to IPAT via replacement. 2008 Appendix C. . 4.0 Notes: © Copyright IBM Corp. 1998. Each service IP address cannot be in any non-service address subnet 2. Each service IP address must be in the same subnet as one of the non-service addresses b.0 Instructor Guide AP Checkpoint 1. True or False? In order to use HWAT. © Copyright IBM Corporation 2008 Figure C-24.V4. you must enable and complete the ALTERNATE ETHERNET address field in the SMIT devices menu. IPAT via IP replacement C-55 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. True or False? You must stop the cluster in order to change from IPAT via aliasing to IPAT via replacement.Instructor Guide Instructor notes: Purpose — Review. you must enable and complete the ALTERNATE ETHERNET address field in the SMIT devices menu. True or False? In order to use HWAT. True or False? If the takeover node is not the home node for the resource group and the resource group does not have a Startup policy of Online Using Distribution Policy. the service IP address replaces the IP address of a NIC with an IP address in the same subnet as the subnet of the service IP address. 4. Each service IP address must be in the same subnet as one of the non-service addresses b. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 3. C-56 HACMP Implementation © Copyright IBM Corp. Details — Checkpoint solutions 1. . © Copyright IBM Corporation 2008 Additional information — Transition statement — Let’s summarize. Each service IP address must be in the same subnet c. Each service IP address cannot be in any non-service address subnet 2. For IPAT via replacement (select all that apply) a. 1998. .V4. 1998. Do NOT use standard SMIT field. – Alternate hardware address must be unique.0 Notes: © Copyright IBM Corp. IPAT via IP replacement C-57 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide AP Unit summary Key points from this unit: IPAT via IP replacement: – May require fewer subnets than IPAT via aliasing – May require more NICs than IPAT via aliasing – Supports hardware address takeover HACMP replaces non-service IP labels with service IP labels on the same subnet as the service IP label when the resource group is started on its home node or if the Startup Policy is distributed HACMP replaces non-service IP labels with service IP labels on a different subnet from the service IP label when the resource group is moved to any other node IPAT via IP replacement configuration issues – Service IP address must be the same subnet as one of the non-service subnets – All service IP addresses must be in the same subnet – You must have at least as many NICs on each node as service IP addresses Hardware Address Takeover (HWAT) issues – Alternate hardware address (Locally Administered Address or LAA) must be configured in HACMP. 2008 Appendix C. © Copyright IBM Corporation 2008 Figure C-25. Unit summary AU548. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Instructor notes: Purpose — Summary. Details — Additional information — Transition statement — What’s next? C-58 HACMP Implementation © Copyright IBM Corp. . What you should be able to do After completing this unit.ibm.4. Version 5. 1998. Version 5.1: Administration Guide SC23-5177-04 HACMP for AIX.4.4.0 Instructor Guide Uempty Appendix D. Version 5.1: Installation Guide SC23-4864-10 HACMP for AIX. Version 5.4. Version 5.1: Troubleshooting Guide SC23-4867-09 HACMP for AIX.1: Planning Guide SC23-4862-10 HACMP for AIX.html HACMP manuals © Copyright IBM Corp.4.4. 2008 Appendix D.V4. Version 5. . Configuring target mode SSA D-1 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. you should be able to: • Configure Target Mode SSA How you will check your progress Accountability: • Self-guided implementation References SC23-5209-01 HACMP for AIX.1: Master Glossary http://www-03.1: Concepts and Facilities Guide SC23-4861-10 HACMP for AIX. Configuring target mode SSA Estimated time 00:15 What this unit is about This appendix describes steps required to configure Target mode SSA (TMSSA).com/systems/p/library/hacmp_docs. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Unit objectives After completing this unit. Unit objectives AU548. 1998. .0 Notes: D-2 HACMP Implementation © Copyright IBM Corp. you should be able to: Perform the steps necessary to configure Target Mode SSA © Copyright IBM Corporation 2008 Figure D-1. Configuring target mode SSA D-3 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Present unit objectives. Details — Additional information — Transition statement — Open with what Target mode SSA is. 1998. . 2008 Appendix D.V4. © Copyright IBM Corp. the premise behind this scenario is all too real.0 Notes: Target mode SSA or heartbeat over disk networks Sadly. In contrast. We focus on SSA in this scenario as we have discussed heartbeat over disk earlier in the course. such a failure is MUCH less likely to go unnoticed.Instructor Guide Implementing target mode SSA The serial cable being used to implement the rs232 non-IP network has been borrowed by someone and nobody noticed A decision has been made to implement a target mode SSA (tmssa) non-IP network as it won't fail unless complete access to the shared SSA disks is lost by one of the nodes (and someone is likely to notice that) bondar D A hudson D A © Copyright IBM Corporation 2008 Figure D-2. a target mode SSA network or heartbeat on disk network won’t fail until all paths between the two nodes fail. 1998. . The problem with rs232 non-IP networks is that if they become disconnected or otherwise disabled. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. D-4 HACMP Implementation © Copyright IBM Corp. Since such a failure will cause one or both nodes to lose access to some or all of the shared disks. Implementing target mode SSA AU548. then it is entirely possible that nobody notices even though HACMP logs the failure of the connection when it happens and reports it in the logs if it is down at HACMP startup time. Configuring target mode SSA D-5 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. © Copyright IBM Corp. . Details — Additional information — Transition statement — The first step in setting up target mode SSA is to assign a unique node number to each node.0 Instructor Guide Uempty Instructor notes: Purpose — Set the scene for the next scenario. 2008 Appendix D. 1998. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Change/Show the SSA Node Number For This System Type or select values in entry fields.ssa.tm.rte file set be installed on all cluster nodes. Setting the SSA node number AU548. We'll set bondar's ssa node number to 1 and hudson's to 2.1 (and above) does not have this requirement (and does not expose the HACMP node id to the administrator). We assign 1 as the SSA node number for bondar and 2 as the SSA node number for hudson. HACMP 5. SSA node number and HACMP node ID The first step in configuring a target mode SSA network is to assign a unique SSA node number to each node.0 Notes: Required software Target mode SSA support requires that the devices. 1998. © Copyright IBM Corporation 2008 Figure D-3. Press Enter AFTER making all desired changes.Instructor Guide Setting the SSA node number The first step is to give each node a unique SSA node number. Earlier versions of HACMP required that the SSA node number be the same as the node’s HACMP node id. SSA Node Number [Entry Fields] [1] +# F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image Use the smitty ssaa fastpath to get to AIX's SSA Adapters menu. D-6 HACMP Implementation © Copyright IBM Corp. the SSA node numbers must also be unique across all systems with shared access to the SSA subsystem. This is usually not a concern as allowing non-cluster nodes to have any form of access to a cluster’s shared disks is an unnecessary risk that few cluster administrators would ever accept.V4.x are that the SSA node numbers be non-zero and unique for each node within the cluster. © Copyright IBM Corp. Strictly speaking. Configuring target mode SSA D-7 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . 1998.0 Instructor Guide Uempty Requirements for SSA node numbers The minimum requirements for HACMP 5. 2008 Appendix D. Instructor Guide Instructor notes: Purpose — Demonstrate how to set the SSA node numbers and explain why they need to be set. 1998. Details — Additional information — Transition statement — The next step is to configure the target mode SSA devices from the AIX device configuration perspective. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. D-8 HACMP Implementation © Copyright IBM Corp. . V4. Each node must have tmssa devices which refer to each of the other nodes that they can see via the SSA loops. and it then defines tmssa devices referring to any other nodes which respond to tmssa packets. Procedure The end result is that the following procedure gets all the required tmssa devices defined: © Copyright IBM Corp.im and /dev/tmssa2.tm) which refer to bondar bondar now also knows that hudson supports tmssa and has created the tmssa devices (/dev/tmssa2. run cfgmgr on the other node (hudson) 3.tm) which refer to hudson 2.0 Notes: Introduction Once each node has a unique SSA node number. run cfgmgr again on the first node (bondar) • bondar now has /dev/tmssa2.tm devices which refer to hudson © Copyright IBM Corporation 2008 Figure D-4. .im /dev/tmssa2. run cfgmgr on one of the nodes (bondar) • • • bondar is now ready to respond to tmssa queries hudson is now ready to respond to tmssa queries hudson also knows that bondar supports tmssa and has created the tmssa devices (/dev/tmssa1. When cfgmgr is run on a node. 2008 Appendix D. it sets up the node to accept tmssa packets. Configuring target mode SSA D-9 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. In order for this to all work.im and /dev/tmssa1.0 Instructor Guide Uempty Configuring the tmssa devices This is a three-step process for a two-node cluster as each node needs tmssa devices which refer to the other node: 1. the other nodes must all be set up to accept and respond to tmssa packets. 1998. the AIX configuration manager needs to be used to define the tmssa devices. Configuring the tmssa devices AU548. Please note that any text file may be substituted for /etc/hosts and you have to specify different tmssa device names if you configured different SSA node numbers for each node. and defines the tmssa devices on each node to refer to nodes which have already been setup for tmssa. D-10 HACMP Implementation © Copyright IBM Corp. enter the following command (make sure that you specify the im suffix and not the tm suffix): # cat /etc/hosts > /dev/tmssa1.tm (This command should hang) On the node with ID 2. Run cfgmgr on each cluster node in turn. 5.Instructor Guide 1.im (The /etc/hosts file should be displayed on the first node) This validates that the target mode serial network in functional. . Run cfgmgr on each node in turn again (depending upon exactly what order you do this in. 2.im and /dev/tmssa#. 3. Verify the tmssa devices exist: Run # ls /dev/tmssa* on each node.tm where # refers to the other node’s node number. There should be a tmssar device (which is actually a target mode SSA router acting as a pseudo device) configured on each node. it is actually possible to skip running cfgmgr on one of the nodes. Verify the tmssar devices exist: Run # lsdev -C | grep tmssa on each node. This sets up each node to handle tmssa packets. Test the target mode connection: Enter the following command on the node with id 1 (make sure you specify the tm suffix and not the im suffix): # cat < /dev/tmssa2. 1998. This is simply an example. 4. Note that each node has target mode SSA devices called /dev/tmssa#. but it is probably not worth the trouble of being sure that the last cfgmgr run wasn’t required). 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 1998. . we need to rerun the HACMP discovery process to “discover” the newly created tmssa devices. Details — Additional information — Transition statement — Now.V4. 2008 Appendix D. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Explain how to configure and test the target mode SSA devices. Configuring target mode SSA D-11 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. As this is a rather error-prone process. Strictly speaking. they will appear in SMIT pick lists when we configure the tmssa non-IP network.0 Notes: HACMP discover By discovering the new devices. Extended Configuration Move cursor to desired item and press Enter. . D-12 HACMP Implementation © Copyright IBM Corp. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide Rediscover the HACMP information Next. it is probably best to use the HACMP discovery mechanism to discover the devices for us. Rediscover the HACMP information AU548. it is not necessary to rerun the HACMP discovery as it is possible to configure tmssa networks by entering in the tmssa device names explicitly. 1998. we need to get HACMP to know about the new communication devices so we run the auto-discovery procedure again on one of the nodes. Discover Extended Extended Extended Extended Security Snapshot HACMP-related Information from Configured Nodes Topology Configuration Resource Configuration Event Configuration Performance Tuning Parameters Configuration and Users Configuration Configuration Extended Verification and Synchronization F1=Help F9=Shell F2=Refresh F10=Exit F3=Cancel Enter=Do © Copyright IBM Corporation 2008 F8=Image Figure D-5. Additional information — Transition statement — Now. © Copyright IBM Corp.0 Instructor Guide Uempty Instructor notes: Purpose — Show how to re-run the HACMP discovery process.V4. 1998. we need to define the tmssa network. Details — Point out to the students that they may find it necessary to rerun the HACMP discovery process themselves from time to time (usually after manually creating or changing a network or shared storage component). 2008 Appendix D. . Configuring target mode SSA D-13 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. D-14 HACMP Implementation © Copyright IBM Corp. ¦ ¦ ¦ ¦ Add Discovered Communication Interface and Devices ¦ ¦ Add Predefined Communication Interfaces and Devices ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure D-6. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: Defining a non-IP tmssa network The procedure for defining a non-IP tmssa network is pretty much identical to the procedure used earlier to define the non-IP rs232 network. Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings +--------------------------------------------------------------------------+ ¦ Select a category ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter.Instructor Guide Defining a non-IP tmssa network (1 of 3) This should look very familiar as it is the same procedure that was used to define the non-IP rs232 network earlier. . 1998. Defining a non-IP tmssa network (1 of 3) AU548. 2008 Appendix D.0 Instructor Guide Uempty Instructor notes: Purpose — Show the first step of defining a tmssa network using pre-discovered communication devices. Configuring target mode SSA D-15 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 1998. . Details — Additional information — Transition statement — Select the Add Discovered Communication Interface and Devices choice. © Copyright IBM Corp. 1998. ¦ ¦ ¦ ¦ # Discovery last performed: (Feb 12 18:20) ¦ ¦ Communication Interfaces ¦ ¦ Communication Devices ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F8=Image F10=Exit Enter=Do ¦ F1¦ /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure D-7. .0 Notes: D-16 HACMP Implementation © Copyright IBM Corp.Instructor Guide Defining a non-IP tmssa network (2 of 3) Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. Add Communication Interfaces/Devices Change/Show Communication Interfaces/Devices Remove Communication Interfaces/Devices Update HACMP Communication Interface with Operating System Settings +--------------------------------------------------------------------------+ ¦ Select a category ¦ ¦ ¦ ¦ Move cursor to desired item and press Enter. Defining a non-IP tmssa network (2 of 3) AU548. 1998. 2008 Appendix D. Configuring target mode SSA D-17 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the second step of defining a tmssa network using pre-discovered communication devices. Details — Additional information — Transition statement — Select the Communication Interface and Devices choice. .V4. © Copyright IBM Corp. we need to define the tmssa network using a process Configure HACMP Communication Interfaces/Devices Move cursor to desired item and press Enter. ¦ ¦ Press Enter AFTER making all selections.3 Planning and Installation Guide for information on configuring all supported types of non-IP networks.Instructor Guide Defining a non-IP tmssa network (3 of 3) Now. D-18 HACMP Implementation © Copyright IBM Corp. 1998. ¦ ¦ ¦ ¦ # Node Device Device Path Pvid ¦ ¦ > hudson tmssa1 /dev/tmssa1 ¦ ¦ > bondar tmssa2 /dev/tmssa2 ¦ ¦ bondar tty0 /dev/tty0 ¦ ¦ hudson tty0 /dev/tty0 ¦ ¦ bondar tty1 /dev/tty1 ¦ ¦ hudson tty1 /dev/tty1 ¦ ¦ ¦ ¦ F1=Help F2=Refresh F3=Cancel ¦ ¦ F7=Select F8=Image F10=Exit ¦ F1¦ Enter=Do /=Find n=Find Next ¦ F9+--------------------------------------------------------------------------+ © Copyright IBM Corporation 2008 Figure D-8. Defining a non-IP tmssa network (3 of 3) AU548. . ¦ ¦ ONE OR MORE items can be selected. Use arrow keys to scroll. Refer to Chapter 13 of the HACMP v5.0 Notes: Final step Select the tmssa devices on each node and press Enter to define the network. Add Communication Interfaces/Devices +--------------------------------------------------------------------------+ ¦ Select Point-to-Point Pair of Discovered Communication Devices to Add ¦ ¦ ¦ ¦ Move cursor to desired item and press F7. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 2008 Appendix D.V4. 1998. Details — Additional information — Transition statement — Synchronize the changes and we’re done. . Configuring target mode SSA D-19 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Instructor Guide Uempty Instructor notes: Purpose — Show the final step in defining the tmssa network. © Copyright IBM Corp. 0 Notes: D-20 HACMP Implementation © Copyright IBM Corp.Instructor Guide Synchronize your changes Synchronize the changes and run through the test plan HACMP Verification and Synchronization Type or select values in entry fields. Press Enter AFTER making all desired changes. * Verify. 1998. Synchronize or Both Force synchronization if verification fails? * Verify changes only? * Logging [Entry Fields] [Both] [No] [No] [Standard] + + + + F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do F4=List F8=Image © Copyright IBM Corporation 2008 Figure D-9. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM. . Synchronize your changes AU548. © Copyright IBM Corp. 2008 Appendix D. Details — Additional information — Transition statement — Let’s review topic one. Configuring target mode SSA D-21 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4.0 Instructor Guide Uempty Instructor notes: Purpose — Show the synchronize screen. . 1998. . 1998. Unit summary AU548.Instructor Guide Unit summary Key points from this unit: This unit showed the steps necessary to configure Target Mode SSA © Copyright IBM Corporation 2008 Figure D-10. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.0 Notes: D-22 HACMP Implementation © Copyright IBM Corp. Details — Additional information — Transition statement — We’re done with this unit. © Copyright IBM Corp. .0 Instructor Guide Uempty Instructor notes: Purpose — Unit summary. 2008 Appendix D. 1998. Configuring target mode SSA D-23 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.V4. 2008 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.Instructor Guide D-24 HACMP Implementation © Copyright IBM Corp. 1998. . 0 backpg Back page .V4. ® .
Copyright © 2025 DOKUMEN.SITE Inc.