已阅读5页,还剩12页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Detecting, Managing, and Diagnosing Failures with FUSE,John Dunagan, Juhan Lee (MSN), Alec WolmanWIP,2,Goals & Target Environment,Improve the ability of large internet portals to gain insight into failuresNon-goals: masking failuresuse machine learning to inferabnormal behavior,3,MSN Background,Messenger, , Hotmail, Search, many other “properties”Large ( 100 million users)Sources of Complexity: multiple data-centers large # of machinescomplex internal network topologydiversity of applications and software infrastructure,4,The Plan,Detecting, managing, and diagnosing failuresReview MSNs current approachesDescribe our solution at a high level,5,Detecting Failures,Monitor system availability with heartbeatsMonitor applications availability & quality of service using synthetic requestsCustomer complaintsTelephone, emailProblems: These approaches provide limited coverage harder to catch failures that dont affect every requestData on detected failures often lacks necessary detail to suggest a remedy:which front end is flaky? which app component caused end-user failure?,6,Managing Failures,Definition: Ability to prioritize failures Detect component service degradation Characterizing app-stability Capacity planningWhen server “x” fails, what is the impact of this failure?Better use of ops and engineering resourcesCurrent approach: no systematic attempt to provide this functionality,7,Our solution (in 2 steps),Detecting and Managing FailuresStep 1: Instrument applications to track user requests across the “service chain”Each request is tagged with a unique idService chain is composed on-the-fly with help of app instrumentationFor each request:Collect per-hop performance informationCollect per-request failure statusCentralized data collection,8,What kinds of failures?,We can handle:Machine failuresNetwork connectivity problemsMost:MisconfigurationApplication bugsBut not all:Application errors where app itself doesnt detect that there is a problem,9,Diagnosing Failures,Assigning responsibility to a specific hw or sw componentInsight into internals of a component Cross component interactionsCurrent approach: instrument applicationsApp-specific log messagesProblemsHigh request rates = log rolloverPerceived overhead = detailed logging enabled during testing, disabled in production,10,Fuse Background,FUSE (OSDI 2004): lightweight agreement on only one thing: whether or not a failure has occurredLack of a positive ack = failure,11,Step 2: Conditional Logging,Step 2: Implement “conditional logging” to significantly reduce the overhead of collecting detailed logs across different machines in the service chainStep 1 provides ability to identify a request across all participants in the service chain, Fuse provides agreement on failure status across that chainWhile fate is undecided: Detailed log messages stored in main memoryCommon case overload of logging is vastly reducedOnce the fate of service chain is decided, we discard app logs for successful requests and save logs for failuresQuantity of data generated is manageable, when most requests are successful,12,Example,Benefits:FUSE allows monitoring of real transactions.All transactions, or a sampled subset to control overhead.When a request fails, FUSE provides an audit trailHow far did it get?How long did each step take?Any additional application specific context.FUSE can be deployed incrementally.,13,Issues,Overload policy: need to handle bursts of failures without inducing more failuresHow much effort to make apps FUSE enabled?Are the right components FUSE enabled?Identifying and filtering false positivesTracking request flow is non-trivial with network load balancers,14,Status,Weve implemented FUSE for MSN, integrated with ASP.NET rendering engineTesting in progressRoll-out at end of summer,15,Backups,16,FUSE is Easy to Integrate,Example current code on Front End:ReceiveRequestFromClient() SendRequestToBackEnd();Example code on F
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年中职粮油检验检测技术(粮油检验基础)试题及答案
- 2025年中职生物(植物生理学基础)试题及答案
- 2025年中职(会计综合实训)全盘账务处理阶段测试试题及答案
- 2025年大学越野滑雪运动与管理(越野滑雪技术)试题及答案
- 2025年大学大四(出版学)出版物编辑出版综合评估试题及答案
- 2026年人力资源外包(员工派遣管理)试题及答案
- 2025年高职测绘工程技术(测绘工程实操)试题及答案
- 2025年大学三年级(公共政策)公共政策分析试题及答案
- 2025年高职现代农业技术(智慧农业设备应用)试题及答案
- 2025年高职医学美容技术(医学美容技术)试题及答案
- 2026年南通科技职业学院高职单招职业适应性测试备考试题含答案解析
- 中远海运集团笔试题目2026
- 2026年中国热带农业科学院橡胶研究所高层次人才引进备考题库含答案详解
- 妆造店化妆品管理制度规范
- 2025-2026学年四年级英语上册期末试题卷(含听力音频)
- 浙江省2026年1月普通高等学校招生全国统一考试英语试题(含答案含听力原文含音频)
- 2026届川庆钻探工程限公司高校毕业生春季招聘10人易考易错模拟试题(共500题)试卷后附参考答案
- 基本农田保护施工方案
- 股骨颈骨折患者营养护理
- 二级医院医疗设备配置标准
- 2026年广西出版传媒集团有限公司招聘(98人)考试参考题库及答案解析
评论
0/150
提交评论