版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Parallel and Distributed Systems,Instructor: Zhang Weizhe (张伟哲) Computer Network and Information Security Technique Research Center , School of Computer Science and Technology, Harbin Institute of Technology,Chapter 14: Replication and Fault Tolerance,3,Fault-tolerant services Replication services H
2、ighly available services Summary,Outline,4,Fault Tolerance Basic Concepts,Being fault tolerant is strongly related to what are called dependable systems Dependability implies the following: Availability Reliability Safety Maintainability,5,Failure Models,Different types of failures.,6,Failure Maskin
3、g by Redundancy,Figure 8-2. Triple modular redundancy.,7,Flat Groups versus Hierarchical Groups,(a) Communication in a flat group. (b) Communication in a simple hierarchical group.,8,Agreement in Faulty Systems (1),The Byzantine agreement problem for three nonfaulty and one faulty process. (a) Each
4、process sends their value to the others.,9,Agreement in Faulty Systems (2),The Byzantine agreement problem for three nonfaulty and one faulty process. (b) The vectors that each process assembles based on (a). (c) The vectors that each process receives in step 3.,10,Agreement in Faulty Systems (3),no
5、w with two correct process and one faulty process. m faulty process only if at least 2m+1 correct process!,11,RPC Semantics in the Presence of Failures,Five different classes of failures that can occur in RPC systems: The client is unable to locate the server. The request message from the client to
6、the server is lost. The server crashes after receiving a request. The reply message from the server to the client is lost. The client crashes after sending a request.,12,Basic Reliable-Multicasting Schemes,A simple solution to reliable multicasting when all receivers are known and are assumed not to
7、 fail. (a) Message transmission. (b) Reporting feedback.,13,Nonhierarchical Feedback Control,Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.,14,Hierarchical Feedback Control,The essence of hierarchical reliable m
8、ulticasting. Each local coordinator forwards the message to its children and later handles retransmission requests.,15,Fault-tolerant services Replication services Highly available services Summary,Outline,16,Replication Basic Concepts,Replication is a key technology to enhance service Replication o
9、f data: Maintenance of copies of data at multiple computers Goals: Enhanced performance. Increased availability. Fault tolerance. Some potential requirements: Replication transparency. Consistency: if a copy is modified, how and when the others are updated determines “the price of replication”,17,Re
10、plication Basic Concepts,Simple math: if two independent servers, each with 5% chance of failing, then availability is: 1 prob (ALL failed) = 1 - 0. 25% = 99.75% Diff between replication and caches? Caches might not necessarily include ALL objects of interest,18,Replication system model,A basic arch
11、itectural model Replica manager One replica manager per replica Receive FEs request, apply operations to its replicas atomically Front end One front end per client Receive clients request, communicate with RM by message passing,19,An operation executed on a replicated object,Request The front end is
12、sues the request to one or more replica managers Coordination The replica managers coordinate in preparation for executing the request consistently Different ordering Execution The replica managers execute the request (perhaps tentatively) Agreement The replica managers reach consensus on the effect
13、 of the request Response One or more replica managers responds to the front end,20,One primary replica manager, one or more secondary replica manager When the primary replica manager fail, one of the backups is prompted to act as the primary The architecture,Passive (primary-backup) replication,21,R
14、equest The font end issues the request, containing a unique identifier, to the primary replica manager Coordination The primary takes each request atomically, in the order in which it receives it Execution The primary execute the request and stores the response,The sequence of events when a client i
15、ssue a request,22,Agreement If the request is an update then the primary sends the updated state, the response and the unique identifier to all the backups The backups send an acknowledgement Response The primary responds to the front end, which hands the response back to the client,The sequence of
16、events when a client issue a request (2),23,Front end multicast request to replication managers The architecture,Active replication,24,Request The front end attaches a unique identifier to the request and multicasts it to the group of replica managers, using a totally ordered, reliable multicast pri
17、mitive Coordination The group communication system delivers the request to every correct replica manager in the same order Execution Every replica manager executes the request Agreement (no) Response Each replica manager sends its response to the front end,Active replication scheme,25,Achieve sequen
18、tial consistency Reliable multicast All correct replica manager process the same set of requests: reliable multicast Total order All correct replica manager process requests in the same order FIFO order Be Maintained by each front end No linearizability The total order is not same as the real-time o
19、rder,Active replication performance,26,Fault-tolerant services Replication services Highly available services Summary,Outline,27,Fault tolerance “eager” consistency all replicas reach agreement before passing control to client High availability “lazy” consistency Reach consistency until next access
20、Reach agreement after passing control to client Gossip, Bayou, Coda,High availability vs. fault tolerance,28,The architecture Front end connects to any of replica manager Query/Update Replica managers exchange “gossip” messages periodically to maintain consistency Two guarantees Each client obtains
21、a consistent service over time Relaxed consistency between replicas All replica managers eventually receive all updates and they apply updates with ordering guarantees,The gossip architecture,29,Request The front end sends the request to a replica manager Query: client may be blocked Update: unblock
22、ed Update response Replica manager replies immediately Coordination Suspend the request until it can be apply May receive gossip messages that sent from other replica managers,Queries and updates in a gossip service,30,Execution The replica manager executes the request Query response Reply at this p
23、oint Agreement exchange gossip messages which contain the most recent updates applied on the replica Exchange occasionally Ask the particular replica manager to send when some replica manager finds it has missed one,Queries and updates in a gossip service continued,31,Exchange gossip message Estimat
24、e the missed messages of one replica manager by its timestamp table Exchange gossip messages periodically or when some other replica manager ask The format or a gossip message m.log: one or more updates in the source replica managers log m.ts: the replica timestamp of the source replica manager,Goss
25、ip messages,32,How often to exchange gossip messages? Minutes, hours or days Depend on the requirement of application How to choose partners to exchange? Random Deterministic Utilize a simple function of the replica managers state to make the choice of partner Topological Mesh, circle, tree,Update p
26、ropagation,33,Limits of AFS Read-only replica The objective of Coda Constant data availability Coda: extend AFS on Read-write replica Optimistic strategy to resolve conflicts Disconnected operation,The Coda file system,34,Venus/Vice Vice: replica manager Venus: hybrid of front end and replica manage
27、r Volume storage group (VSG) The set of servers holding replicas of a file volume Available volume storage group (AVSG) Vice know AVSG of each file Access a file The file is serviced by any server in AVSG,The Coda architecture,35,Replication services Fault-tolerant services Highly available services Summary,Outline,36,Summary,Replication for distributed systems High performance, high availability, fault tolerance Replication for fault tolerance Primary-backup rep
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024-2025学年度执业兽医题库含完整答案详解(夺冠系列)
- 项目3 名片翻译与英汉翻译技巧之词义的选择
- 金融产品服务要求承诺书9篇范文
- 医疗卫生系统廉政风险点排查及防控措施
- 2024-2025学年度福建电力职业技术学院单招《英语》考前冲刺试卷及答案详解(真题汇编)
- 2024-2025学年合肥共达职业技术学院单招数学测试卷完整答案详解
- 2024-2025学年度护士资格证模考模拟试题及完整答案详解【各地真题】
- 2024-2025学年度执业药师考试彩蛋押题【新题速递】附答案详解
- 2024-2025学年反射疗法师3级真题(综合卷)附答案详解
- 2024-2025学年度临床执业医师预测复习(全优)附答案详解
- 2025年泰州职业技术学院单招职业倾向性考试题库带答案解析
- (新教材)2026年春期人教版三年级下册数学教学计划+教学进度表
- 火电厂热控培训课件内容
- 涉密机房培训
- 沥青路面施工监理实施细则
- (正式版)DB61∕T 2103-2025 《砖瓦用页岩矿资源储量核实技术规范》
- 2026年长沙职业技术学院单招职业技能考试模拟测试卷及答案1套
- 蜡疗课件教学
- 2026江苏中烟工业有限责任公司高校毕业生招聘14人(第一批次)笔试考试参考试题及答案解析
- 四年级下册语文仿写训练题库
- 点餐系统的设计毕业论文
评论
0/150
提交评论