oraclerac重启分析.doc_第1页
oraclerac重启分析.doc_第2页
oraclerac重启分析.doc_第3页
oraclerac重启分析.doc_第4页
oraclerac重启分析.doc_第5页
已阅读5页,还剩8页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Oracle rac 重启分析一. 故障现象Oracle rac 刚开始半个月两节点交替重启,目前2天就重启,并且重启的时候没有什么高负载二. 环境介绍 两台oracle linux 64位系统, 64位系统,存储为DELL ps6000oracledbasc1 $ oifcfg getifeth0 global publiceth1 global cluster_interconnecteth2 global cluster_interconnecteth4 global cluster_interconnectSQL select * from GV$CONFIGURED_INTERCONNECTS; INST_ID NAME IP_ADDRESS IS_PUBLIC SOURCE- - - - - 2 eth1 NO Oracle Cluster Repository 2 eth2 02 NO Oracle Cluster Repository 2 eth4 03 NO Oracle Cluster Repository 2 eth0 0 YES Oracle Cluster Repository 1 eth1 NO Oracle Cluster Repository 1 eth2 00 NO Oracle Cluster Repository 1 eth4 01 NO Oracle Cluster Repository 1 eth0 0 YES Oracle Cluster Repository 8 rows selectedSQL select INST_ID,PUB_KSXPIA,PICKED_KSXPIA,NAME_KSXPIA,IP_KSXPIA 2 from x$ksxpia; INST_ID PUB_KSXPIA PICKED_KSXPIA NAME_KSXPIA IP_KSXPIA- - - - - 1 N OCR eth1 1 N OCR eth2 00 1 N OCR eth4 01 1 Y OCR eth0 0SQL oradebug setmypidSQL oradebug ipcTrace如下;/u01/app/oracle/admin/asc/udump/asc1_ora_5396.trcOracle Database 10g Enterprise Edition Release .0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing optionsORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1System name:LinuxNode name:dbasc1Release:2.6.18-194.el5xenVersion:#1 SMP Mon Mar 29 22:22:00 EDT 2010Machine:x86_64Instance name: asc1Redo thread mounted by this instance: 1Oracle process number: 46Unix process pid: 5396, image: oracledbasc1 (TNS V1-V3)* 2011-12-02 11:10:49.327* ACTION NAME:() 2011-12-02 11:10:49.327* MODULE NAME:(sqlplusdbasc1 (TNS V1-V3) 2011-12-02 11:10:49.327* SERVICE NAME:(SYS$USERS) 2011-12-02 11:10:49.327* SESSION ID:(3244.1653) 2011-12-02 11:10:49.327Dump of unix-generic skgm contextareaflags 000000e7realmflags 0000000fmapsize 00000800protectsize 00001000lcmsize 00001000seglen 00008000largestsize 0000040000000000smallestsize 0000000001000000stacklimit 0x7fff6bcca860stackdir -1mode 640magic acc01adeHandle: 0xcfac160 /u01/app/oracle/product/10.2.0/db_1asc1Dump of unix-generic realm handle /u01/app/oracle/product/10.2.0/db_1asc1, flags = 00000000 Area #0 Fixed Size containing Subareas 0-0 Total size 0000000000207290 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 0 0 2555912 0x00000060000000 0x00000060000000 Subarea size Segment size 0000000000208000 0000000400010000 Area #1 Variable Size containing Subareas 2-2 Total size 00000003ff000000 Minimum Subarea size 01000000 Area Subarea Shmid Stable Addr Actual Addr 1 2 2555912 0x00000061000000 0x00000061000000 Subarea size Segment size 00000003ff000000 0000000400010000 Area #2 Redo Buffers containing Subareas 1-1 Total size 0000000000df8000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 2 1 2555912 0x00000060208000 0x00000060208000 Subarea size Segment size 0000000000df8000 0000000400010000 Area #3 skgm overhead containing Subareas 3-3 Total size 0000000000009000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 3 3 2555912 0x00000460000000 0x00000460000000 Subarea size Segment size 0000000000009000 0000000400010000Dump of Solaris-specific skgm contextsharedmmu 00000000shareddec 0used region 0: start 0000000040000000 length 000000047fff40000000Maximum processes: = 3000Number of semaphores per set: = 187Semaphores key overhead per set: = 4User Semaphores per set: = 183Number of semaphore sets: = 17Semaphore identifiers: = 17Semaphore List=262146- system semaphore information - Shared Memory Segments -key shmid owner perms bytes nattch status 0x00000000 2392066 root 644 80 2 0x00000000 2424836 root 644 16384 2 0x00000000 2457605 root 644 280 2 0x00fa5a34 2523143 oracle 640 130056192 16 0x81c8eca0 2555912 oracle 660153 - Semaphore Arrays -key semid owner perms nsems 0x57aff660 131073 oracle 640 44 0x53b1ce2c 262146 oracle 660 187 0x53b1ce2d 294915 oracle 660 187 0x53b1ce2e 327684 oracle 660 187 0x53b1ce2f 360453 oracle 660 187 0x53b1ce30 393222 oracle 660 187 0x53b1ce31 425991 oracle 660 187 0x53b1ce32 458760 oracle 660 187 0x53b1ce33 491529 oracle 660 187 0x53b1ce34 524298 oracle 660 187 0x53b1ce35 557067 oracle 660 187 0x53b1ce36 589836 oracle 660 187 0x53b1ce37 622605 oracle 660 187 0x53b1ce38 655374 oracle 660 187 0x53b1ce39 688143 oracle 660 187 0x53b1ce3a 720912 oracle 660 187 0x53b1ce3b 753681 oracle 660 187 0x53b1ce3c 786450 oracle 660 187 - Message Queues -key msqid owner perms used-bytes messages ksxpdmp: facility 0 (?) (0x0, (nil) counts 0, 0ksxpdmp: Dumping the osd context (brief)SKGXPCTX: 0x0xcfd43c0 ctxWAIT HISTORYWait Time Time since Fast reaps Wait Type Return Code (ms) prev wait(ms) before - - - - -0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status code0 0 0 NORMAL invalid status codewait delta 19 sec (19844 msec) ctx ts 0x0 last ts 0x0user cpu time since last wait 0 sec 0 tickssystem cpu time since last wait 0 sec 0 tickslocked 1blocked 0timed wait receives 0fast reaps since last wait 0admno 0x771fe85a admport:SSKGXPT 0xcfd67b0 flags SSKGXPT_READPENDING socket no 7 IP UDP 51784last bytes received: 32824context timestamp 0done Queueno completed requestsport Queueno portsconnection Queueno pending connect disconnect operationssends waiting to be transmittedno sends waiting to be transmittedpending ack Queueno send requests pending ackMapped regionsRegion0 Id 1321025581 Base Address 0x61000000 Size -16777216 key 246783003rgnport 0xcfd5aa4 lbuf 0 nrgns 1 flags 1SSKGXPT 0xcfd5aa4 flags socket no 12 IP 00 UDP 33644ksxpdmp: Dumping the ksxp contextsksxpdmp: Dumping ksxp context 0x2b925ae7b6d8 client 2Dump of memory from 0x00002B925AE7B6D8 to 0x00002B925AE7B9682B925AE7B6D0 0CFD8778 00000000 x.2B925AE7B6E0 0CFD2398 00000000 02000000 00000000 .#.2B925AE7B6F0 00000000 00000000 5AE7B6F8 00002B92 .Z.+.2B925AE7B700 5AE7B6F8 00002B92 5AE7B708 00002B92 .Z.+.Z.+.2B925AE7B710 5AE7B708 00002B92 5AE7B718 00002B92 .Z.+.Z.+.2B925AE7B720 5AE7B718 00002B92 00000000 00000000 .Z.+.2B925AE7B730 00000000 00000000 5AE7B738 00002B92 .8.Z.+.2B925AE7B740 5AE7B738 00002B92 00000002 00000000 8.Z.+.2B925AE7B750 00000000 00000000 00000000 00000000 . Repeat 8 times2B925AE7B7E0 00000002 00000000 00000000 00000000 .2B925AE7B7F0 00000000 00000000 00000000 00000000 . Repeat 7 times2B925AE7B870 00000000 00000000 00000002 00000000 .2B925AE7B880 00000000 00000000 00000000 00000000 . Repeat 8 times2B925AE7B910 00000002 00000000 00000000 00000000 .2B925AE7B920 00000000 00000000 5AE7B928 00002B92 .(.Z.+.2B925AE7B930 5AE7B928 00002B92 00D238F0 00000000 (.Z.+.8.2B925AE7B940 5AE7B5A0 00002B92 05655B0C 00000000 .Z.+.e.2B925AE7B950 05654378 00000000 8878EE0F 00000000 xCe.x.2B925AE7B960 00000000 00000000 . ksxpdmp: Dumping ksxp context 0xcfd8778 client 1Dump of memory from 0x000000000CFD8778 to 0x000000000CFD8A0800CFD8770 0CFD2398 00000000 .#.00CFD8780 5AE7B6D8 00002B92 00000000 00000000 .Z.+.00CFD8790 00000000 00000000 0CFD8798 00000000 .00CFD87A0 0CFD8798 00000000 0CFD87A8 00000000 .00CFD87B0 0CFD87A8 00000000 0CFD87B8 00000000 .00CFD87C0 0CFD87B8 00000000 00000000 00000000 .00CFD87D0 00000000 00000000 0CFD87D8 00000000 .00CFD87E0 0CFD87D8 00000000 00000001 00000000 .00CFD87F0 00000000 00000000 00000000 00000000 . Repeat 8 times00CFD8880 00000002 00000000 00000000 00000000 .00CFD8890 00000000 00000000 00000000 00000000 . Repeat 7 times00CFD8910 00000000 00000000 00000002 00000000 .00CFD8920 00000000 00000000 00000000 00000000 . Repeat 8 times00CFD89B0 00000002 00000000 00000000 00000000 .00CFD89C0 00000000 00000000 0CFF1928 00000000 .(.00CFD89D0 0CFF1928 00000000 0103A460 00000000 (.00CFD89E0 00000000 00000000 056A690C 00000000 .ij.00CFD89F0 00000000 00000000 8878EE0F 00000000 .x.00CFD8A00 00000000 00000000 . ksxpdmp: Done dumping the ksxp contextsksxpdmp: Dumping pending request queueksxpdmp: Done dumping the pending request queue三. 初步分析过程具体发生故障的时间顺序此次略,刚刚弄丢了1 、发现reason 1,具体说明如下:Here, you can see the reason for the reconfiguration event. The most common reasons would be 1, 2, or 3. Reason 1 means that the NM initiated the reconfiguration event, as typically seen when a node joins or leaves a cluster. A reconfiguration event is initiated with reason 2 when an instance death is detected. How is an instance death detected? Every instance updates the control file with a heartbeat through its Checkpoint (CKPT) process. If heartbeat information is not present for x amount of time, the instance is considered to be dead and the Instance Membership Recovery (IMR) process initiates reconfiguration. This type of reconfiguration is commonly seen when significant time changes occur across nodes, the node is starved for CPU or I/O times, or some problems occur with the shared storage.A reason 3 reconfiguration event is due to a communication failure. Communication channels are established between the Oracle processes across the nodes. This communication occurs over the interconnect. Every message sender expects an acknowledgment message from the receiver. If a message is not received for a timeout period, then a “communication failure” is assumed. This is more relevant for UDP, as Reliable Shared Memory (RSM), Reliable DataGram protocol (RDG), and Hyper Messaging Protocol (HMP) do not need it, since the acknowledgment mechanisms are built into the cluster communication and protocol itself.When the block is sent from one instance to another using wire, especially when unreliable protocols such as UDP are used, it is best to get an acknowledgment message from the receiver. The acknowledgment is a simple side channel message that is normally required for most of the UNIX systems where UDP is used as the default IPC protocol. When user-mode IPC protocols such as RDG (on HP Tru64 UNIX TruCluster) or HP HMP are used, the additional messaging can be disabled by setting _reliable_block_sends=TRUE. For Windows-based systems, it is always recommended to leave the default value as is.2、查看DELL ps6000日志INFO 11-12-4 14:52:32 ps6000-1 iSCSI login to target 4:3260, .equallogic:0-8a0906-c2827630a-58b0000000c4e3fa-flash from initiator 02:39526, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:32 ps6000-1 iSCSI login to target 2:3260, .equallogic:0-8a0906-c2a27630a-ef3000000154e3fa-test2 from initiator 02:49500, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:32 ps6000-1 iSCSI login to target 3:3260, .equallogic:0-8a0906-0c427630a-48b000000184e3fb-ocr from initiator 02:42678, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:32 ps6000-1 iSCSI login to target 2:3260, .equallogic:0-8a0906-0c427630a-05e0000001b4e3fb-vote from initiator 02:49498, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:25 ps6000-1 iSCSI login to target 4:3260, .equallogic:0-8a0906-c2827630a-add0000000f4e3fa-backup from initiator 02:39518, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:25 ps6000-1 iSCSI login to target 1:3260, .equallogic:0-8a0906-ad727630a-ed0000000094e3fa-data from initiator 02:43887, .redhat:452355c1463 successful using standard-sized frames. NOTE: More than one initiator is now logged in to the target. INFO 11-12-4 14:52:25 ps6000-1 iSCSI login to ta

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论