存储堆栈数据损坏分析_第1页
存储堆栈数据损坏分析_第2页
存储堆栈数据损坏分析_第3页
存储堆栈数据损坏分析_第4页
存储堆栈数据损坏分析_第5页
已阅读5页,还剩12页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、存储堆栈中的数据损坏问题分析Lakshmi N. Bairavasundarambairavasundaram Lakshmi N., Garth R. Goodson古德森,加思R., Bianca Schroeder,比安卡施罗德Andrea C. Arpaci-Dusseau安德列C. arpaci杜索, Remzi H. Arpaci-Dusseau该arpaci杜索,H.University of Wisconsin-Madison威斯康星大学-麦迪逊Network Appliance, Inc.网络设备公司University of Toronto多伦多大学laksh, dusse

2、au, , garth.goodson, 拉克,杜索,该 ,garth.goodson,Abstract摘要An important threat to reliable storage of data is silent对数据可靠存储的一个重要威胁是无声的data corruption. In order to develop suitable protection数据腐败。为了开发合适的保护mechanisms against data corruption

3、, it is essential to understand its characteristics. In this paper, we present the对数据腐败的机制,它是必不可少的,以了解其特点。在本文中,我们提出了first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems第一次大规模数据腐败研究。我们分析记录在生产存储系统的腐败现象containing a total of 1.53 million disk

4、 drives, over a period of 41 months. We study three classes of corruption:包含1530000个磁盘驱动器,超过41个月的时间。我们研究了三类腐败:checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches校验和不匹配,身份的差异,和奇偶校验不一致。我们专注于校验和不匹配since they occur the most.因为他们最。We find more than 400

5、,000 instances of checksum我们发现校验和400000多个实例mismatches over the 41-month period. We find many41个月内不匹配。我们发现很多interesting trends among these instances including: (i)有趣的趋势,在这些情况下,包括:(我)nearline disks (and their adapters) develop checksum近线盘(和适配器)开发的校验mismatches an order of magnitude more often than ente

6、rprise class disk drives, (ii) checksum mismatches within错位的幅度往往比企业级磁盘驱动器的顺序,(ii)在校验和不匹配the same disk are not independent events and they show同一个磁盘不是独立的事件,它们显示high spatial and temporal locality, and (iii) checksum高的时间和空间局部性,及(iii)校验mismatches across different disks in the same storage在同一存储的不同磁盘上的不匹配

7、system are not independent. We use our observations to系统不是独立的。我们用我们的意见derive lessons for corruption-proof system design.从中吸取教训,以防腐败体系设计。1 Introduction1引言One of the biggest challenges in designing storage systems is providing the reliability and availability that users在设计存储系统的最大挑战之一是提供的可靠性和可用性,用户expe

8、ct. Once their data is stored, users expect it to be persistent forever, and perpetually available. Unfortunately,期待。一旦他们的数据存储,用户期望它会持续永远,永远有效。不幸的是,in practice there are a number of problems that, if not在实践中有许多问题,如果不dealt with, can cause data loss in storage systems.处理,可引起存储系统中的数据丢失。One primary caus

9、e of data loss is disk drive unreliability 16. It is well-known that hard drives are mechanical, moving devices that can suffer from mechanical problems leading to drive failure and data loss. For数据丢失的一个主要原因是磁盘驱动器的可靠性 16 。众所周知,硬盘是机械的,移动的设备,可以承受机械故障导致的故障和数据丢失。对于example, media imperfections, and loose

10、 particles causing scratches, contribute to media errors, referred to as例如,媒体的不完善,以及松散的颗粒造成的划伤,有助于媒体的错误,简称为latent sector errors, within disk drives 18. Latent sector潜在的部门错误,在磁盘驱动器 18 。潜在部门errors are detected by a drives internal error-correcting错误被检测到驱动器的内部错误校正codes (ECC) and are reported to the sto

11、rage system.码(ECC)和报告存储系统。Less well-known, however, is that current hard drives然而,众所周知,目前的硬盘驱动器and controllers consist of hundreds-of-thousands of lines和控制器由数百条线组成of low-level firmware code. This firmware code, along低级别固件代码。这个固件代码,一起with higher-level system software, has the potential for使用更高级别的系统软件

12、,具有潜在的harboring bugs that can cause a more insidious type of窝藏错误,可以导致更阴险的类型disk error silent data corruption, where the data is磁盘错误:数据是错误的,数据是错误的silently corrupted with no indication from the drive that无声的损坏,没有任何迹象表明,从驱动器an error has occurred.发生错误。Silent data corruptions could lead to data loss more

13、 often than latent sector errors, since, unlike latent sector errors, they cannot be detected or repaired by the disk drive静默数据损坏可能会导致数据丢失的往往比潜在扇区错误,因为,不像潜在扇区错误,他们无法检测或修复的磁盘驱动器itself. Detecting and recovering from data corruption requires protection techniques beyond those provided by本身。检测和恢复数据损坏需要保

14、护技术,超越了那些提供the disk drive. In fact, basic protection schemes such as磁盘驱动器。事实上,基本的保护计划,如RAID 13 may also be unable to detect these problems.袭击 13 可能也无法检测到这些问题。The most common technique used in storage systems存储系统中最常用的技术to detect data corruption is for the storage system to add检测数据腐败,是为存储系统添加its own h

15、igher-level checksum for each disk block, which自己的上级校验每个磁盘块,这is validated on each disk block read. There is a long history of enterprise-class storage systems, including ours,在每个磁盘块上进行验证。企业级存储系统有很长的历史,包括我们的,in using checksums in a variety of manners to detect data在以各种方式使用校验和检测数据corruption 3, 6, 8, 2

16、2. However, as we discuss later,腐败 3,6,8,22 。然而,我们稍后再讨论,checksums do not protect against all forms of corruption.校验和不保护反对一切形式的腐败。Therefore, in addition to checksums, our storage system因此,除了校验和,我们的存储系统also uses file system-level disk block identity information to detect previously undetectable corrup

17、tions.使用文件系统级的磁盘块的身份信息来检测从未发现的腐败。In order to further improve on techniques to handle为了进一步提高处理技术corruption, we need to develop a thorough understanding腐败,我们需要深入了解of data corruption characteristics. While recent studies数据腐败特征。而最近的研究provide information on whole disk failures 11, 14, 16提供整个磁盘故障的信息 11,14

18、,16 and latent sector errors 2 that can aid system designers和潜在部门的错误 2 ,可以帮助系统设计师in handling these error conditions, very little is known在处理这些错误的情况下,很少是已知的about data corruption, its prevalence and its characteristics. This paper presents a large-scale study of silent关于数据腐败,其患病率及其特点。本文提出了一种大规模的研究,沉默d

19、ata corruption based on field data from 1.53 million disk基于1530000盘数据的数据腐败drives covering a time period of 41 months. We use the开盖的时间期限为41个月。我们使用same data set as the one used in recent studies of latent在最近的研究中使用的相同的数据集sector errors 2 and disk failures 11. We identify the扇区错误 2 和磁盘故障 11 。我们确定fraction

20、 of disks that develop corruption, examine factors that might affect the prevalence of corruption, such发展腐败的磁盘组,检查可能影响腐败盛行的因素,例如as disk class and age, and study characteristics of corruption, such as spatial and temporal locality. To the best of作为磁盘类和年龄,研究腐败的特征,如空间和时间的地方。到最好的our knowledge, this is t

21、he first study of silent data corruption in production and development systems.我们的知识,这是第一次在生产和发展系统中的无声数据腐败的研究。We classify data corruption into three categories based我们将数据分类为三类on how it is discovered: checksum mismatches, identity discrepancies, and parity incons它是如何发现:校验和不匹配,身份的差异,和奇偶incons(描述in det

22、ail in Section 2.3). We focus on checksum mismatches since they are found to occur the most. Our important observations include the following:在2.3节中详细介绍。我们专注于校验和不匹配是因为他们发现发生的最。我们的重要意见包括以下内容:(i) During the 41-month time period, we observe more(一)在41个月的时间内,我们观察到更多than 400, 000 instances of checksum mi

23、smatches, 8% of400,校验和不匹配的000个实例,8%which were discovered during RAID reconstruction, creating the possibility of real data loss. Even though the在空袭重建过程中发现的,创造了真实数据丢失的可能性。即使是rate of corruption is small, the discovery of checksum腐败率小,校验和发现mismatches during reconstruction illustrates that data在重建过程中的不匹

24、配说明了数据corruption is a real problem that needs to be taken into腐败是一个需要被纳入的现实问题account by storage system designers.由存储系统设计的帐户。(ii) We find that nearline (SATA) disks and their adapters(ii)发现近线(SATA)磁盘和适配器develop checksum mismatches an order of magnitude开发一个量级的校验和不匹配more often than enterprise class (FC

25、) disks. Surprisingly,比企业级(足球)磁盘更经常。令人惊讶的,enterprise class disks with checksum mismatches develop more of them than nearline disks with mismatches.校验和不匹配的企业级磁盘的发展超过了近线盘错位。(iii) Checksum mismatches are not independent occurrences both within a disk and within different disks in(iii)校验和不匹配的不独立在磁盘和在不同的

26、磁盘上the same storage system.同一存储系统。(iv) Checksum mismatches have tremendous spatial locality; on disks with multiple mismatches, it is often consecutive blocks that are affected.(四)校验和不匹配,有巨大的空间位置;对多错配盘,它往往是连续的数据块的影响。(v) Identity discrepancies and parity inconsistencies do(五)身份差异和平价不一致occur, but affe

27、ct 3 to 10 times fewer disks than checksum发生,但影响3到10倍比较少的磁盘校验mismatches affect.错配影响。The rest of the paper is structured as follows. Section 2本文其余部分的结构如下。第2节presents the overall architecture of the storage systems介绍存储系统的总体架构used for the study and Section 3 discusses the methodology used. Section 4 pr

28、esents the results of our analysis of checksum mismatches, and Section 5 presents the用于研究和3节讨论所使用的方法。4节介绍了我国的校验和不匹配的分析结果,和5节介绍了results for identity discrepancies, and parity inconsistencies. Section 6 provides an anecdotal discussion of corruption, developing insights for corruption-proof storage结果的

29、身份差异,奇偶性不一致。第6节提供了一个轶事的腐败问题,发展的见解,腐败证据存储system design. Section 7 presents related work and Section 8 provides a summary of the paper.系统设计。第7节介绍了有关工作和8节提供了一个总结的文件。2 Storage System Architecture2存储系统架构The data we analyze is from tens-of-thousands of production and development Network Appliance我们分析的数据来自

30、于成千上万的生产和开发网络设备TMTMstorage保管部systems (henceforth called the system) installed at hundreds of customer sites. This section describes the architecture of the system, its corruption detection mechanisms, and the classes of corruptions in our study.系统(此后称为系统)安装在数百个客户网站。本节描述了该系统的体系结构,其腐败的检测机制,并在研究腐败类。2.1

31、 Storage Stack2.1存储栈Physically, the system is composed of a storagecontroller that contains the CPU, memory, network interfaces, and storage adapters. The storage-controller物理上,该系统由包含CPU,内存,一个storagecontroller网络接口,存储适配器。存储控制器is connected to a set of disk shelves via Fibre Channel通过光纤通道连接到一组磁盘架上loops

32、. The disk shelves house individual disk drives.循环。磁盘架上的单个磁盘驱动器。The disks may either be enterprise class FC disk drives磁盘可以是企业级的磁盘驱动器or nearline serial ATA (SATA) disks. Nearline drives或近线串行ATA(SATA)硬盘。近线驱动器use hardware adapters to convert the SATA interface to使用硬件适配器转换为SATA接口the Fibre Channel proto

33、col. Thus, the storage-controller光纤通道协议。因此,存储控制器views all drives as being Fibre Channel (however, for视图所有驱动器作为光纤通道(然而,对于the purposes of the study, we can still identify whether这项研究的目的,我们仍然可以确定是否a drive is SATA and FC using its model type).硬盘是SATA和FC利用其模型类型)。The software stack on the storage-controll

34、er is composed of the WAFL在存储控制器的软件堆栈组成的细胞凋亡RRfile system, RAID, and storage文件系统,突袭和存储layers. The file system processes client requests by issuing read and write operations to the RAID layer, which层。该文件系统处理客户端请求,通过发布读写操作来处理层transforms the file system requests into logical disk block将文件系统请求转换为逻辑磁盘块re

35、quests and issues them to the storage layer. The RAID请求并将它们发布到存储层。空袭layer also generates parity for writes and reconstructs层也产生奇偶校验写入和重构data after failures. The storage layer is a set of customized device drivers that communicate with physical故障后的数据。存储层是一组定制的设备驱动程序,与物理通信disks using the SCSI command

36、set 23.使用SCSI命令集 23盘。2.2 Corruption Detection Mechanisms2.2腐败检测机制The system, like other commercial storage systems, is与其他商业存储系统,该系统是designed to handle a wide range of disk-related errors.设计用于处理磁盘相关的广泛错误。The data integrity checks in place are designed to detect and recover from corruption errors so t

37、hat they are数据完整性检查的目的是为了检测和恢复从腐败的错误,使他们not propagated to the user. The system does not knowingly propagate corrupt data to the user under any circumstance.不传播给用户。在任何情况下,该系统不向用户传播腐败数据。We focus on techniques used to detect silent data corruption, that is, corruptions not detected by the disk drive我们专

38、注于用来检测静默数据损坏,这是技术,通过硬盘检测不到腐败or any other hardware component. Therefore, we do not或任何其他硬件组件。因此,我们不describe techniques used for other errors, such as transport corruptions reported as SCSI transport errors or latent sector errors. Latent sector errors are caused by描述用于其他错误的技术,如运输损坏报告为SCSI传输错误或潜在扇区错误。潜

39、在的部门错误造成的physical problems within the disk drive, such as media磁盘驱动器内的物理问题,如媒体scratches, “high-fly” writes, etc. 2, 18, and detected by划痕,“高飞”写等 2,18 ,并检测the disk drive itself by its inability to read or write sectors, or through its error-correction codes (ECC).磁盘驱动器本身的读写扇区的无能,或通过其纠错码(ECC)。In order

40、 to detect silent data corruptions, the system为了检测沉默的数据损坏,系统stores extra information to disk blocks. It also periodically reads all disk blocks to perform data integrity存储额外信息到磁盘块。它还定期读取所有磁盘块来执行数据完整性checks. We now describe these techniques in detail.支票。我们现在详细描述这些技术。Corruption Class Possible Causes D

41、etection Mechanism Detection Operation腐败类可能导致检测机制的检测操作Checksum mismatch Bit-level corruption; torn write; RAID block checksum Any disk read校验和错配位腐败;撕开写;RAID块校验磁盘读misdirected write错误的写Identity discrepancy Lost or misdirected write File system-level block identity File system read身份差异丢失或误导写文件系统级的文件系统读

42、取块身份Parity inconsistency Memory corruption; lost write; RAID parity mismatch Data scrub奇偶性不一致的内存损坏;丢失的写;校验失配数据擦洗bad parity calculation差平价计算Table 1: Corruption classes summary.表1:腐败类总结。(a) Format for enterprise class disks(一)企业级磁盘的格式520 520 520 520 520 520520 520 520 520 520 5204 KB4 KB文件系统数据块520 520

43、520 52064byte Data64字节数据Integrity Segment完整性段(b) Format for nearline disks(b)为近线磁盘格式4 KB File system data block4 KB的文件系统数据块512 512 512 512 512 512 512 512 512512 512 512 512 512 512 512 512 512448 bytes unused448字节未使用64byte Data64字节数据Integrity Segment +完整性段+(c) Structure of the data integrity segmen

44、t (DIS)()数据完整性分部(解散)的结构.。Checksum of data block数据块校验Identity of data block数据块身份.Checksum of DIS校验和DISFigure 1: Data Integrity Segment. The figure shows the图1:数据完整性段。图显示different on-disk formats used to store the data integrity segment of a disk block on (a) enterprise class drives with 520B用于存储磁盘块的数

45、据完整性段光盘格式的不同(一)与企业级硬盘520Bsectors, and on (b) nearline drives with 512B sectors. The figure also shows (c) the structure of the data integrity segment.部门,和(b)近线驱动器512B扇区。图还显示了数据完整性段的结构。In particular, in addition to the checksum and identity information, this structure also contains a checksum of itse

46、lf.特别是,除了校验和身份信息,该结构还包含一个校验本身。2.2.1 Data Integrity Segment2.2.1数据完整段In order to detect disk block corruptions, the system为了检测磁盘块的损坏,系统writes a 64-byte data integrity segment along with each一个64字节的数据完整段以及每个disk block. Figure 1 shows two techniques for storing磁盘块。图1显示了存储的技术this extra information, and

47、also describes its structure.这个额外的信息,也描述了它的结构。For enterprise class disks, the system uses 520-byte sectors. Thus, a 4-KB file system block is stored along with对于企业级磁盘,该系统使用520字节扇区。因此,一个4KB的文件系统的块存储在64 bytes of data integrity segment in eight 520-byte sectors. For nearline disks, the system uses the

48、default 512-byte sectors and store the data integrity segment for each八字节的数据完整性分部在520个64字节扇区。对于近线盘,系统将使用默认的512字节扇区存储数据完整性的一段set of eight sectors in the following sector. We find that在下列部门设置八个部门。我们发现the protection offered by the data integrity segment is数据完整性段所提供的保护well-worth the extra space needed t

49、o store them.很值得的额外空间来存储它们。One component of the data integrity segment is a数据完整性段的一个组成部分是checksum of the entire 4 KB file system block. The整个4 KB的文件系统的块校验。这个checksum is validated by the RAID layer whenever the校验和是由RAID层验证时data is read. Once a corruption has been detected, the数据读取。一旦发现了腐败,original bl

50、ock can usually be restored through RAID reconstruction. We refer to corruptions detected by RAIDlevel checksum validation as checksum mismatches.原始的块通常可以通过空袭重建恢复。我们指的raidlevel检测校验和验证作为校验和错配的腐败。A second component of the data integrity segment is数据完整性段的另一个组成部分是block identity information. In this case

51、, the fact that the块身份信息。在这种情况下,事实上,file system is part of the storage system is utilized. The文件系统是利用存储系统的一部分。这个identity is the disk blocks identity within the file system身份是文件系统中的磁盘块的标识(e.g., this block belongs to inode 5 at offset 100). This(例如,这一块属于inode 5偏移100)。这identity is cross-checked at file

52、 read time to ensure that在文件读取时间时,要确保交叉检查,以确保the block being read belongs to the file being accessed.被读取的块属于被访问的文件。If, on file read, the identity does not match, the data is如果,在文件读取时,身份不匹配,数据是reconstructed from parity. We refer to corruptions that从奇偶校验。我们指的是腐败,are not detected by checksums, but dete

53、cted through file没有检测到通过校验,但检测到文件system identity validation as identity discrepancies.身份差异的系统身份验证。2.2.2 Data Scrubbing2.2.2数据清理In order to pro-actively detect errors, the RAID layer periodically scrubs all disks. A data scrub issues read operations for each physical disk block, computes a checksum o

54、ver its data, and compares the computed checksum to the checksum located in its data integrity segment. If the checksum comparison fails (i.e., a checksum为了积极检测错误,定期擦洗所有磁盘的RAID层。数据清洗问题读操作的每个物理磁盘块,计算校验和的数据,并比较计算的校验和校验和位于其完整的数据段。如果校验和比较失败(即,一个校验和mismatch), the data is reconstructed from other disks in

55、不匹配),数据从其他磁盘重建the RAID group, after those checksums are also verified.的RAID组,经过校验和验证。If no reconstruction is necessary, the parity of the data如果没有重建是必要的,数据的奇偶性blocks is generated and compared with the parity stored块生成并与奇偶存储in the parity block. If the parity does not match the verified data, the scru

56、b process fixes the parity by regenerating it from the data blocks. In a system protected by在奇偶校验块。如果奇偶校验不匹配的验证数据,擦洗过程修复的奇偶性,通过再生的数据块。在受保护的系统中double parity, it is possible to definitively tell which of双奇偶校验,可以明确地告诉它the parity or data block is corrupt.奇偶或数据块被损坏。We refer to these cases of mismatch between data and我们指的是这些情况下,数据之间的不匹配parity as parity inconsistencies. Note that data scrubs奇偶校验不一致。注意,数据服are unable to validate the extra file system identity information stored in the data integrity segment, since, by its

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论