SUN技术支持及培训部-技术资料v1.0.doc_第1页
SUN技术支持及培训部-技术资料v1.0.doc_第2页
SUN技术支持及培训部-技术资料v1.0.doc_第3页
SUN技术支持及培训部-技术资料v1.0.doc_第4页
SUN技术支持及培训部-技术资料v1.0.doc_第5页
已阅读5页,还剩7页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

SUN事业部技术支持及培训中心-技术资料V1.0说明得实SUN技术资料只限于得实集团SUN业务部的FE人员中流通,任何人都不得外传SUN技术支持及培训部:钟剑1更换内存条:现象: /var/dam/messages*Dec 15 10:55:01 ibs-shandong2 unix: AFT0 errID 0x00000052.77005658 Corrected Memory Error on Board 5 J3201 is PersistentDec 15 10:55:01 ibs-shandong2 unix: AFT0 errID 0x00000052.77005658 ECC Data Bit 42 was in error and correctedDec 15 10:55:01 ibs-shandong2 unix: AFT0 Corrected Memory Error on CPU15, errID 0x00000052.77006f57Dec 15 10:55:01 ibs-shandong2 unix: AFSR 0x00000000.00100000 AFAR 0x00000000.f99b1610Dec 15 10:55:01 ibs-shandong2 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x100731d0Dec 15 10:55:01 ibs-shandong2 unix: UDBH Syndrome 0x43 Memory Module Board 5 J3200Dec 15 10:55:01 ibs-shandong2 unix: AFT0 errID 0x00000052.77006f57 Corrected Memory Error on Board 5 J3200 is PersistentDec 15 10:55:01 ibs-shandong2 unix: AFT0 errID 0x00000052.77006f57 ECC Data Bit 42 was in error and correctedDec 15 10:55:01 ibs-shandong2 unix: AFT0 Corrected Memory Error on CPU15, errID 0x00000052.77008766Dec 15 10:55:01 ibs-shandong2 unix: AFSR 0x00000000.00100000 AFAR 0x00000000.f99b1690Dec 15 10:55:01 ibs-shandong2 unix: AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x100731d0Dec 15 10:55:01 ibs-shandong2 unix: UDBH Syndrome 0x43 Memory Module Board 5 J3201 All LED status is ok; prtdiag output is ok; format output is ok; show-post-results display: board1,3,16 status are ok, board5 status is failure. on board5, bank0=2,bank0=*, bank1=2, bank1=*结论:dimm J3201,J3200 on board5 have problem步骤: 跟用户确定当前系统的可以down机时间。 init 0 pull out board5 replace dimm J3201,J3200 memories boot up system 查看/var/adm/messages* 信息,确定系统已经处理完毕。2更换tape drive现象: probe-scsi-all - can detect the tape drive boot r;insert new tape, mt -f /dev/rmt/0 - /dev/rmt/0: no tape loaded or drive offline using two different cleaning tapes - after cleaning, the cleaning led is lit each time tape is inserted try with new tapes - tape drive still cannt be used结论:tape drive is fault and replace tape drive步骤: 跟用户确定系统的down机时间 确定系统的状态,如果是cluster系统,先把当前机器的服务切换到另外一台备机上(使用haswitch命令),再停止该节点(使用scadmin stopnode命令) init 0;关机 replace the tape drive(注意jump id 的设置) boot up system insert a good tape mt status 可以备份一系统作为测试(#ufsdump 0cuf /dev/rmt/0 /)3更换D1000里的硬盘(Volumne Manager管理硬盘)现象: format shows that c1t12d0 disk type is unknown there are thousands of disk errors related to /sbus3,0/QLGC,isp0,10000/sdc,0. (c1t12d0) volumes are mirrored and all plexes that are using c1t12d0 show no device error结论:更换c1t12d0硬盘步骤: 使用vxprint查看系统的mirror的关系,确定另外一半的 mirror是好的。 确定c1t12d0硬盘的位置 vxdiskadm - 4 从软件上剔除c1t12d0硬盘 手工从D1000里拔出c1t12d0硬盘 手工在拔出硬盘的位置插进一块新硬盘 vxdiskadm - 5 从软件上加入c1t12d0硬盘 使用vxprint查看状态,其状态项全部为Active,表明已经同步完毕4更换E250里的硬盘现象: format - c0t8d0 check disk led - one disk led is off; the other led is on.结论:更换c0t8d0硬盘步骤 检查系统,确认c0t8d0硬盘需要更换 向用户了解相关信息。如:该硬盘的作用,造成这样的原因等等。 从用户了解到该硬盘是用来备份系统的,没有用做其他用途。这一点从自己在用户另外一台机器上得到证实 查看系统可以知道,系统硬盘和备盘的容量大小不同(这点非常值得注意,关系到工程师的工作步骤) fotmat c0t8d0 (分区信息从另外一系统同一位置的 硬盘中得到) 使用newfs,在分区上建立文件系统 使用mount命令,把相应分区mount到/mnt cd /mnt 使用ufsdump和ufsrestore命令拷贝信息。如:ufsdump 0cuf - /|ufsrestore vxf 重复上面三步,把所有需要备份的信息都完成位置。#cd /usr/platform/sun4u/lib/fs/ufs#installboot ./bootblk /dev/rdsk/c0t8d0s0检查系统,ok.5更换multipack的硬盘现象: from vxprint -ht output =v lv_recchunk4 gen ENABLED ACTIVE 2048000 SELECT -pl lv_recchunk4-02 lv_recchunk4 ENABLED ACTIVE 2050461 CONCAT - RWsd c2t2d0-09 lv_recchunk4-02 c2t2d0 14672826 2050461 0 c2t2d0 ENApl lv_recchunk4-01 lv_recchunk4 DETACHED STALE 2050461 CONCAT - WOsd c1t2d0-09 lv_recchunk4-01 c1t2d0 14672826 2050461 0 c1t2d0 ENA= there is only one enable, active submirror-plex lv_recchunk4-02, sd c2t2d0-09, in volume lv_recchunk4.the other submirror is detached, stale. from /var/adm/messages =Mar 12 23:10:10 smcp02 unix: WARNING: /pci1f,4000/scsi5,1/sd2,0 (sd47):Mar 12 23:10:10 smcp02 unix: Error for Command: read(10) Error Level: RetryableMar 12 23:10:10 smcp02 unix: Requested Block: 15985361 Error Block: 15985403Mar 12 23:10:10 smcp02 unix: Vendor: SEAGATE Serial Number: 9942534616 Mar 12 23:10:10 smcp02 unix: Sense Key: Media Error from iostat -E =sd47 Soft Errors: 0 Hard Errors: 15 Transport Errors: 0 Vendor: SEAGATE Product: ST39103LCSUN9.0G Revision: 034A Serial No: 9942534616 RPM: 7200 Heads: 27 Size: 9.06GB Media Error: 12 Device Not Ready: 0 No Device: 3 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 according to /var/adm/messages and iostat -E, there is bad block on /pci1f,4000/scsi5,1/sd2,0 (sd47). = which is c2t2d0 unfortunately, the only enable, active submirror is on c2t2d0 = failed to attach another detach, stale submirror back to the volume.结论:c2t2d0硬盘上的block有问题,需要更换。步骤: 认清当前系统的状态:被剔除来的硬盘c1t2d0完好,而系统当前正在用的硬盘c2t2d0需要更换。由于c2t2d0有坏block,所以不能attach c1t2d0,然而c1t2d0中存放的数据是不全的。 把系统服务切到另外一台机器上。(使用haswitch命令) format - repaire修复c2t2d0中的block(15985403)出现ok vxplex -g querydg att lv_recchunk4 lv_recchunk4-01 使用vxprint g querydg | more 命令,状态全部为Active往下操作 在管理querydg的机器上,vxdiskadm - 4 从软件上剔除c2t2d0 系统down机(由于multipack不支持热插拔),物理上拔出c2t2d0,在原位置插入新硬盘。 boot up系统,在管理querydg的机器上,vxdiskadm - 5 从软件上加入c2t2d0 使用vxprint g querydg | more 命令,状态全部为Active时为ok6更换T3电源的u1pcu2现象: fru list =ID TYPE VENDOR MODEL REVISION SERIAL- - - - - -u1ctr controller card SLR-MI 375-0084-02- 0210 032535u1d1 disk drive SEAGATE ST173404FSUN A727 3CE0X6L8u1d2 disk drive SEAGATE ST173404FSUN A727 3CE0X7XTu1d3 disk drive SEAGATE ST173404FSUN A727 3CE0X7GAu1d4 disk drive SEAGATE ST173404FSUN A727 3CE0X900u1d5 disk drive SEAGATE ST173404FSUN A727 3CE0XBLEu1d6 disk drive SEAGATE ST173404FSUN A727 3CE0X5FHu1d7 disk drive SEAGATE ST173404FSUN A727 3CE0WV8Su1d8 disk drive SEAGATE ST173404FSUN A727 3CE0X22Pu1d9 disk drive SEAGATE ST173404FSUN A727 3CE0X8ANu1l1 loop card SLR-MI 375-0085-01- 5.02 Flash 054986u1l2 loop card SLR-MI 375-0085-01- 5.02 Flash 053718u1pcu1 power/cooling unit TECTROL-CAN 300-1454-01( 0000 021453u1pcu2 power/cooling unit TECTROL-CAN 300-1454-01( 0000 004116u1mpn mid plane SLR-MI 370-3990-01- 0000 031354 fru stat =CTLR STATUS STATE ROLE PARTNER TEMP- - - - - -u1ctr ready enabled master - 34.5DISK STATUS STATE ROLE PORT1 PORT2 TEMP VOLUME- - - - - - - -u1d1 ready enabled data disk ready ready 30 v0u1d2 ready enabled data disk ready ready 34 v0u1d3 ready enabled data disk ready ready 33 v0u1d4 ready enabled data disk ready ready 34 v0u1d5 ready enabled data disk ready ready 31 v0u1d6 ready enabled data disk ready ready 30 v0u1d7 ready enabled data disk ready ready 33 v0u1d8 ready enabled data disk ready ready 41 v0u1d9 ready enabled standby ready ready 37 v0LOOP STATUS STATE MODE CABLE1 CABLE2 TEMP- - - - - - -u1l1 ready enabled master - - 29.0u1l2 ready enabled slave - - 34.0POWER STATUS STATE SOURCE OUTPUT BATTERY TEMP FAN1 FAN2- - - - - - - - -u1pcu1 ready enabled line normal normal normal normal normal u1pcu2 ready enabled line normal fault normal normal normal check T3 syslog =Jan 26 16:11:05 LPCT1: N: u1pcu2: Refreshing batteryJan 26 16:16:17 BATD1: N: u1pcu2: hold time was 314 seconds.Jan 26 16:16:18 BATD1: W: u1pcu2: Replace battery, hold time low. serial no = 004116Jan 26 18:00:04 SCHD1: N: u1ctr: u1l1 temperature 29.0 CelsiusJan 26 18:00:04 SCHD1: N: u1ctr: u1l2 temperature 34.5 CelsiusJan 27 00:00:02 SCHD1: N: u1ctr: u1l1 temperature 29.5 CelsiusJan 27 00:00:02 SCHD1: N: u1ctr: u1l2 temperature 34.5 CelsiusJan 27 06:00:00 SCHD1: N: u1ctr: u1l1 temperature 29.0 CelsiusJan 27 06:00:00 SCHD1: N: u1ctr: u1l2 temperature 34.0 CelsiusJan 27 07:16:16 BATD1: N: cur_time = 1012115776, StartRechargeTime = 1012061776 Jan 27 07:16:16 BATD1: N: u1pcu2: Battery recharge timeout.Jan 27 07:16:16 BATD1: N: u1pcu2 Battery recharge timeout.Jan 27 07:16:16 BATD1: N: Battery Refreshing cycle ends at this point LED on one power supply is red resfresh s =No battery refreshing Task is currently running.PCU1PCU2-U1NormalBAT LowCurrent TimeTue Jan 29 20:42:47 2002Last RefreshSat Jan 26 04:45:07 2002Next RefreshSat Feb 23 04:45:07 2002结论:u1pcu2 fault and replace it步骤: 跟用户确定down机时间,查看系统状态 Stop the application of T3 and shutdown the T3 Disconnect the Fibre Channel to T3 Replace the Controller of T3(u1pcu2) Power on the T3 Ping the T3 through the Terminal after the T3 worked(if not ping,switch to the TPE of another T3) View the firmware version of T3 if need to update the firmware version Connect the Fibre Channel to T3 if the T3 is work normally Start the application of T3 Test,telnet the Server #vxdisk list #format(if the output of vxdisk and format is inconsistent,reboot the Server #reboot - -r)7更换Ultra5系统板现象: during host booting, the host cant connect to boot server through network port with link down error watch-net-all = resetting transceiver failed结论:更换system board步骤: down 机 replace system boot up - ok connect to hub or note book watch-net-all successful则ok8更换Cluster客户端Ultra5的硬盘现象: cant startup system, occurred ARP/RARP timeout when boot up probe-ide fast data access mmu miss boot system from cdrom, run format - cant find hard disk结论:更换Ultra的硬盘步骤: down 机 安装Solaris 安装系统补丁patches 安装cluster的客户端 修改cluster客户端的配置文件/etc/serialports;/etc/clusters;/etc/hosts 配置文件格式:/etc/serialports:节点名 TC 端口/etc/clusters:cluster名 节点1名 节点2名/etc/hosts:IP hostname修复文件系统现象: 系统不能进入多用户状态 init not in effect 提示输入root password进入编辑模式结论:系统文件已经损坏,修复步骤: boot s or boot cdrom s fsck /dev/rdsk/c0t0d0s0 (中间有可能提示输入y/n,一般情况输入y) 如果fsck m /dev/rdsk/c0t0d0s0 - Okay则修复success 如果fsck m /dev/rdsk/c0t0d0s0 - Needs Mainitain则还没有修复成功 fsck /dev/rdsk/c0t0d0s0 ,提示输入项为N 使用fsck m /dev/rdsk/c0t0d0s0检查 同上修复硬盘上的其他分区更换GBICof E5500现象:* node 1: ncwgzxsa *format.out showed the node a has dual path conenction to a5200Apr 27 16:30:38 ncwgzxsa unix: IDSUNWssa.socal.link.5010 socal1: port 0: Fibre Channel is OFFLINEApr 27 16:30:38 ncwgzxsa unix: IDSUNWssa.socal.link.6010 socal1: port 0: Fibre Channel Loop is ONLINEApr 27 16:30:38 ncwgzxsa unix: IDSUNWssa.socal.link.5010 socal1: port 0: Fibre Channel is OFFLINEApr 27 16:30:

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论