IBM PowerHA 配置与管理.ppt_第1页
IBM PowerHA 配置与管理.ppt_第2页
IBM PowerHA 配置与管理.ppt_第3页
IBM PowerHA 配置与管理.ppt_第4页
IBM PowerHA 配置与管理.ppt_第5页
已阅读5页,还剩63页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

IBMPowerSystems,PowerHA配置与管理,Power课程,课程内容说明,内容概述HACMP概述HACMP新功能介绍HA常见架构HA规划/安装/管理参考资料列表,HACMP概述HA常见架构HA规划/安装/管理参考资料列表,Agenda,说明:PowerHA与HACMP,PowerHA原名为HACMP(HighAvailabilityClusterMultiProcessing).AIXV6.1上,称为PowerHA。AIXV5.x上,称为HACMP。二者通用。是用于IBMPower系列服务器上的高可靠集群软件,通过冗余配置,消除单点故障,保证整个系统连续可用性和安全可靠性。,1992,93,94,95,96,97,98,99,2000,01,02,03,04,05,06,07,08,1,1.2,2.1,3.1,3.11,4.1,4.11,4.2,4.21,4.22,4.3,4.4,4.3.1,4.41,4.5,5.1,5.2,5.3,5.4,HACMP发展史,市场调查:虽然硬件的可靠性在提升,业务环境中仍会发生硬件故障引发的系统失效,Severalstudiesplacetheproportionbetween20%and45%Humanerror,softwareerrorandplannedmaintenancecausethemajorityofserviceoutages,宕机和低性能对于业务环境来说,不仅仅意味着经济损失,更能造成客户的流失,“Overalldowntime-costsaverage3.6%ofannualrevenue.”InfoneticsManystudiesestimateaveragecostofdowntimeatover$5,000/hourPopularWebsitesestimatecostofdowntimeatmillionsofdollarsA22-hourcrashinJune,2003costeBayanestimated$5MLossesgobeyondimmediatesalesrevenue,Toclients,availabilityequatestoreliabilityandtrustworthinessInternalapplicationfailurespreventemployeesfromworking,HACMP商业环境中的成熟技术,Matureproductnowinits17thmajorreleaseAveraging40,000licensessoldworld-wideannuallyBuiltonadecadeofIBMclusterleadershipHACMPallowsyoutocreatehighlyavailableenvironmentswithminimalhardware.HACMPisscalableupto32-nodes,allowingyourclustertoadapttothegrowingdemandsofyourbusiness.TheoptionalXDfeatureallowsyourclusterstospanunlimitedgeographicdistances.,HACMP不是万能的,下列环境不适合使用HACMP:,YourenvironmentisnotsecureNetworksecurityisnotinplaceChangemanagementproceduresarenotrespectedYoudonothavetrainedadministratorEnvironmentispronetouserfiddlefaddleApplicationrequiresmanualintervention,HACMPwillneverbeanout-of-the-boxsolutiontoavailability.Acertaindegreeofskillwillbealwaysberequired.,And-,YoucannotsufferanydowntimeFailoverswillcauseatleastsomedowntimeYourenvironmentisnotstableHACMPdependsonstablesoftwarelevelsandstableconfigurationHACMPissusceptibletothe“fiddlefactor”YourapplicationneedsmanualinterventiontorecoverfromafailureManualresetofadevice,etc.,HACMP既可以减少计划内停机也可以减少计划外停机,UnplannedOutageSystemFailureHardwareOperatingSystemCrashPowerLossUserErrorComponentFailureNICSCSI/SANAdapterNetworkHub/SwitchSANSwitchDiskFailure(bothO/Sandapplicationdata)PlannedOutageMaintenanceSystemHardwareChange/UpgradeOS&ApplicationUpgrades&FixesTestingAppliedFixesFailurescenariosforHA&DR,使用HACMP的考虑点,Applicationmustbeabletorecoverfromastop/restartoperationMustreleaseallresourceswhenstoppedeithernormallyorabnormallyMusttoleratealossofmemorycontentsMusttoleratealossofprocessorstateMustperformarestartfromacheckpointMustrecoverfrompartialdatawritesMustoperateina“transactional”protocolTheremustnotbeasinglepointoffailureintheHAclusterSharedpowersupply,non-protecteddisk,etc.HACMPisasoftwaresolution,HACMP通过检测问题并快速切换到备份硬件的方法避免服务中断,Twonodes(AandB)TwonetworksPrivate(internal)networkPublic(shared)networkShareddiskAlldatainsharedstorageavailabletobothnodesCriticalapplicationsDatabaseserverWebserverDependentonDB,SharedDisk,PrivateNetwork,!,CompanySharedNetwork,WebSrv,Database,HACMP监测四类故障,NodeFailuresProcessorhardwareoroperationgsystemfailuresOneormoresurvivingnodescanacquireresourcesNetworkAdapterFailuresMoveIPaddresstostandbynetworkadapterinsamenodeNetworkFailureMessagedisplayedonconsoleandeventisloggedAseverysitesnetworkconfigurationsareunique,nootherdefaultactionistakenActiontobetakeninresponsetonetworkfailuresiscustomizableApplicationFailureWebSphere/DB2/OracleAS&DB,NodeFailure,NetworkAdapterFailure,NetworkFailure,ApplicationFailure,其它类型的故障,DiskDriveFailuresLVMMirroringRAIDDiskDevicesOtherHardwareFailureApplicationFailure(Customizationneeded,SRC)HACMPFailurePromotedtonodefailurePowerFailureAvoidcommonpowersuppliesacrossreplicateddevicesUseaUPS,例子-Failure#1:Nodefailure,SharedDisk,PrivateNetwork,NodeAfailscompletelyNodeBdetectsthelossofNodeANodeBstartsupitsowninstanceoftheDatabase.Databaseistemporarilytaken-overbyNodeBuntilNodeAisbroughtbackonline,!,CompanySharedNetwork,WebSrv,Database,例子-Failure#2:Lossofnetworkconnection,NodeAlosesaNICBecauseofNICredundancy,theserviceIPswapslocallyOperationscontinuenormallywhileproblemisresolvedIftotalpublicnetworkconnectivitywaslostafallovercouldoccur,SharedDisk,PrivateNetwork,!,CompanySharedNetwork,WebSrv,Database,Onetoone,Onetoany,Anytoany,Anytoone,切换的各种可能,资源组的客户化,StartupPreferencesOnlineOnHomeNodeOnly(cascading)-(OHNO)OnlineonFirstAvailableNode(rotatingorcascadingw/inactivetakeover)-(OFAN)OnlineOnAllAvailableNodes(concurrent)-(OAAN)StartupDistributionFalloverPreferencesFalloverToNextPriorityNodeInTheList-(FOHP)FalloverUsingDynamicNodePriority-(FDNP)BringOffline(OnErrorNodeOnly)-(BOEN)FallbackPreferencesFallbackToHigherPriorityNode-(FBHP)NeverFallback-(NFB),需要高可用的常见资源,ServiceIPAddress(es)TheIPAddressesthatusers/clientappswilluseforproductionThiscanbeoneormultipleaddressesNotlimitedtothenumberofinterfaceswhenutilizingaliasingApplication(Server)Application(s)desiredtobecontrolled/protectbyHACMPManycasescanbeuserprovidedstart/stopscriptMaytakeadvantageofpre-packagedapplicationSmartAssists.SharedStorageVolumeGroupsLogicalVolumesJFSNFS,AdditionalGranularOptions,ResourceGroupDependenciesParent/ChildRelationshipsGreatforMulti-TierenvironmentsLocationDependenciesOnlineonSameNodeAllresourcegroupsmustbeonlineonthesamenodeOnlineonDifferentNodesAllresourcegroupsmustbeonlineondifferentnodesOnlineonSameSiteAllresourcegroupsmustbeonlineonthesamesiteDefineResourceGroupPriorities(DifferentNodeDep.)LowIntermediateHigh,应用监控,HACMPcanmonitorapplicationsinoneoftwoways:ProcessMonitordeterminesthedeathofaprocessCustomMonitormonitorshealthoftheapplicationusingamonitormethodyouprovideDecisionsuponfailureRestartCanestablishanumberofrestartstorestartlocally.Afteraspecifiedrestartcount,ifappcontinuestofailyoucanescalatetoafallover.NotifiySendemailnotificationFalloverMoveapplicationandassociatedresourcegrouptonextcandidatenode.Suspend/ResumeApplicationMonitoringatanytime.,HACMPV5.x,通过回答5个问题就可以完成群集的配置,Whatistheaddressofthebackupnode?Whatisthenameoftheapplication?WhatscriptHACMPshouldusetostartit?WhatscriptHACMPshouldusetostopit?WhatistheserviceIPlabelthatclientswillusetoaccesstheapplication?,WebSMIT,HACMP概述HA常见架构HA规划/安装/管理参考资料列表,Agenda,两节点HACMP拓扑结构示意图,NetworkClients,SerialHeartbeat,pSeriesClusterNode,pSeriesClusterNode,IPNetwork,Service&StandbyNetworkAdapters,SharedDisk,IPHeartbeats,群集节点,Sincetheclusteristreatedasasingleentity,werefertotheindividualcomputersasnodes.EachnodeisanindependentsystemInternodecommunicationisdefinedwhentheclusterisinitialized.,服务IP别名,ServiceAddressorServiceLabelistheconnectiontothecomputerAIXallowsmanyaddressesonasingleadapterDoesnotaffecttheoriginalconfigurationAllowsseparationofservicesFastertomoveifnecessary,IP地址切换(IPAT)方式一(替换方式),Atsystem,boot,With,HACMP,running,After,adapter,failure,After,failure,AdapterType,192.168.0.1,192.168.0.6,na,na,Boot/,Service,1.1.1.1,1.1.1.1,na,Standby,Boot,1.1.1.2,1.1.1.2,Standby,192.168.0.2,192.168.0.2,192.168.0.6,192.168.0.6,192.168.0.2,192.168.0.2,1.1.1.2,NodeA,NodeB,host,TwologicalIPnetworks(Netmask255.255.255.0)OnephysicalnetworkClientsalwaysaccess192.168.0.6MACaddresstakeoverorARPcacheupdateisalsoneeded,IP地址切换(IPAT)方式二(别名方式),Atsystem,boot,With,HACMP,running,After,adapter,failure,172.16.18.10,192.168.1.111,na,na,192.168.0.25,192.168.0.25,na,192.168.0.1,192.168.0.1,172.16.18.11,172.16.18.11,192.168.1.121,172.16.18.10,192.168.1.121,192.168.1.111,192.168.1.121,192.168.1.122,192.168.1.122,172.16.18.11,192.168.1.122,172.16.18.11,192.168.1.122,192.168.0.1,192.168.1.111,NodeA,NodeB,After,failure,host,192.168.0.25,192.168.0.1,Initiallyconfiguredaddresses(BootIP)PersistentIPaddresses-usefulforapplicationslikeTivoliServiceIPaddresses-usedbyclientstoaccessthecluster-multipleareallowed,IP地址切换(IPAT)方式二(别名方式,互备),Atsystem,boot,With,HACMP,running,After,adapter,failure,172.16.18.10,192.168.1.111,na,na,192.168.0.25,192.168.0.25,na,192.168.0.1,192.168.0.1,172.16.18.11,172.16.18.11,192.168.1.121,172.16.18.10,192.168.1.121,192.168.1.111,192.168.1.121,192.168.1.122,192.168.1.122,172.16.18.11,192.168.1.122,172.16.18.11,192.168.1.122,192.168.0.1,192.168.1.111,ha_node1,HA_node2,After,failure,host,192.168.0.25,192.168.0.1,Initiallyconfiguredaddresses(BootIP)PersistentIPaddresses-usefulforapplicationslikeTivoliServiceIPaddresses-usedbyclientstoaccessthecluster-multipleareallowed,192.168.1.112,192.168.1.112,192.168.1.112,db主webapp备,webapp主db备,PersistentNodeIPlabel是一个IPalias,它可以分配给cluster里的一个特定节点总是位于同一个节点可以位于一块已经拥有service或non-serviceIPlabel的网卡上不需在节点上安装额外的物理网卡不属于任何资源组能被用于对指定的节点进行管理每个节点只能配置一个.在节点启动后即可用,当HACMP服务停止后也始终保持可用如果网卡失败,它只会迁移到相同网络的同一个节点上的其它网卡如果节点失败,该IP标识不会迁移到群集中的其它节点,PersistentNodeIPlabel,HACMP5.x的新功能能够使用下列任何一种共享磁盘阵列(FibreChannel,SCSI,或SSA)使用的磁盘是一个enhancedconcurrentvolumegroup的一部分,唯一的要求是这个VG必须在两个节点都有定义,心跳/磁盘心跳(Heartbeatviadisk),资源组,LogicalconstructsthatgrouprelatedattributestogetherThecontainerusedbyHACMPtomoveresourcesParticipatingnodelistdefaultnodeprioritiesHomenodeHavePolicieson:StartupFalloverFallbackDistributionpolicyDependantresourcegroups,资源组的策略:启动,Resourcegroupstartupoccurs:duringinitialclusterstartupinitialacquisitionoftheresourcegroupMaybemodifiedbyasettlingtimer,OnlineonHomeNodeOnly(OHNO)onlystartonthehighestpriorityOnlineonFirstAvailableNode(OFAN)willstartonanyonenodeOnlineonAllAvailableNodes(OAAN)TheresourcegroupswillstartonallnodesOnlineUsingDistributionPolicy(OUDP)Oneresourcegrouppernetworkornodedependingonthedistributionpolicy,资源组的策略:切换,Resourcegroupfalloveroccurs:WhenthecurrentnodecannolongersupporttheresourcegroupanditismovedtoanothernodeFailurehasoccurredGracefulshutdownwithtabkoverofthecurrentnode,FallovertoNextPriorityNode(FNPN)ResourcegroupismovedtothenextnodeintheresourcegroupsnodelistFalloverusingDynamicNodePriority(FDNP)ResourcegroupismovedtothenextnodeintheresourcegroupsnodelistasrecalculatedbasedonthedynamicnodecriteriapolicyBringOfflineonErrorNode(BOEN)Resourcegroupissettoanofflinestateonthisnodeonly,资源组的策略:回切,Resourcegroupfallbackoccurs:TheresourcegroupisnotonitshomenodeAhigherprioritynodebecomesavailableCanbemodifiedbyafallbacktimer,FallbacktoaHigherPriorityNode(FHPN)Whenthehigherprioritynodeisavailableand/ortheoptionaltimerexpires,theresourcegroupmovesNeverFallback(NFB)Regardlessifahigherprioritynodebecomesavailable,theresourcegroupwillnotmove,HACMP资源组(OnlineonHomeNodeOnly),FallovertoNextPriorityNode,OnlineonHomeNodeOnly,FallbacktoaHigherPriorityNode,HACMP资源组(OnlineonHomeNodeOnly),HACMP资源组(OnlineonFirstAvailableNode),FallovertoNextPriorityNode,OnlineonFirstAvailableNode,NeverFallback,HACMP资源组(OnlineonFirstAvailableNode),HACMP资源组(OnlineonAllAvailableNodes),BringOfflineonErrorNode,OnlineonAllAvailableNodes,NeverFallback,HACMP资源组(OnlineonAllAvailableNodes),资源:HowVolumeGroupsareHandled,Twotypes:SharedNon-sharedSharedvolumegroupscanmigrateNon-SharedvolumegroupsarenodeboundApplicationdatamustbeonasharedvolumegrouptobemovedApplicationcodemaybeoneithertypeofdisk,资源:ApplicationServerScripts,Applicationserver,anamegiventoaseriesofscripts:StarttheapplicationStoptheapplicationMonitortheapplication(optional)Re-starttheapplication(optional)ApplicationsmustbeabletobestartedfromapreviouslyunknownstatebyascriptApplicationsmustbeabletobestoppedbyascript,HACMP概述HA常见架构HA规划/安装/管理参考资料列表,Agenda,HACMP软件规划-系统软件,操作系统的版本和补丁要求信息查看:,HACMP软件规划-应用软件,一般来说,在一个cluster中,涉及到的应用软件版本一致,这样易于管理因为HACMP产品对应用软件并没有严格的限制,用户可以根据实际需求选择需要加入cluster的应用软件,并通过自己的脚本来管理,HACMP规划,群集节点拓扑资源组资源优化,需要安装的组件操作系统的补丁HACMP软件HACMP软件的补丁软件的安装方法NIM光盘安装本地硬盘安装,HACMP软件的安装,HACMP软件的配置过程,HACMP配置前的准备工作配置IP地址编辑/etc/hosts文件编写应用程序的启动/停止脚本创建vg和文件系统准备串口设备及磁盘心跳设备HACMP的Standard配置过程添加Cluster和节点配置Cluster资源创建Cluster资源组同步HACMP的配置HACMP的Extended配置过程添加心跳定制Cluster资源,HAC

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论