已阅读5页,还剩12页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Detecting, Managing, and Diagnosing Failures with FUSE,John Dunagan, Juhan Lee (MSN), Alec WolmanWIP,2,Goals & Target Environment,Improve the ability of large internet portals to gain insight into failuresNon-goals: masking failuresuse machine learning to inferabnormal behavior,3,MSN Background,Messenger, , Hotmail, Search, many other “properties”Large ( 100 million users)Sources of Complexity: multiple data-centers large # of machinescomplex internal network topologydiversity of applications and software infrastructure,4,The Plan,Detecting, managing, and diagnosing failuresReview MSNs current approachesDescribe our solution at a high level,5,Detecting Failures,Monitor system availability with heartbeatsMonitor applications availability & quality of service using synthetic requestsCustomer complaintsTelephone, emailProblems: These approaches provide limited coverage harder to catch failures that dont affect every requestData on detected failures often lacks necessary detail to suggest a remedy:which front end is flaky? which app component caused end-user failure?,6,Managing Failures,Definition: Ability to prioritize failures Detect component service degradation Characterizing app-stability Capacity planningWhen server “x” fails, what is the impact of this failure?Better use of ops and engineering resourcesCurrent approach: no systematic attempt to provide this functionality,7,Our solution (in 2 steps),Detecting and Managing FailuresStep 1: Instrument applications to track user requests across the “service chain”Each request is tagged with a unique idService chain is composed on-the-fly with help of app instrumentationFor each request:Collect per-hop performance informationCollect per-request failure statusCentralized data collection,8,What kinds of failures?,We can handle:Machine failuresNetwork connectivity problemsMost:MisconfigurationApplication bugsBut not all:Application errors where app itself doesnt detect that there is a problem,9,Diagnosing Failures,Assigning responsibility to a specific hw or sw componentInsight into internals of a component Cross component interactionsCurrent approach: instrument applicationsApp-specific log messagesProblemsHigh request rates = log rolloverPerceived overhead = detailed logging enabled during testing, disabled in production,10,Fuse Background,FUSE (OSDI 2004): lightweight agreement on only one thing: whether or not a failure has occurredLack of a positive ack = failure,11,Step 2: Conditional Logging,Step 2: Implement “conditional logging” to significantly reduce the overhead of collecting detailed logs across different machines in the service chainStep 1 provides ability to identify a request across all participants in the service chain, Fuse provides agreement on failure status across that chainWhile fate is undecided: Detailed log messages stored in main memoryCommon case overload of logging is vastly reducedOnce the fate of service chain is decided, we discard app logs for successful requests and save logs for failuresQuantity of data generated is manageable, when most requests are successful,12,Example,Benefits:FUSE allows monitoring of real transactions.All transactions, or a sampled subset to control overhead.When a request fails, FUSE provides an audit trailHow far did it get?How long did each step take?Any additional application specific context.FUSE can be deployed incrementally.,13,Issues,Overload policy: need to handle bursts of failures without inducing more failuresHow much effort to make apps FUSE enabled?Are the right components FUSE enabled?Identifying and filtering false positivesTracking request flow is non-trivial with network load balancers,14,Status,Weve implemented FUSE for MSN, integrated with ASP.NET rendering engineTesting in progressRoll-out at end of summer,15,Backups,16,FUSE is Easy to Integrate,Example current code on Front End:ReceiveRequestFromClient() SendRequestToBackEnd();Example code on F
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025 高中信息技术数据与计算之 Python 的机器学习模型模型融合评估课件
- 2026年碳关税贸易合同责任界定条款设计与谈判要点
- 2026年反向抵押房产日常维护与防灾防损实务
- 2026年候选药剂型规格制剂处方与参照药一致性原则
- 2026年数据交易所会员合规审计管理办法
- 2026年康复医院外骨骼机器人科室建设指南
- 2026年第二代刀片电池闪充技术产业化应用
- 2026年造林碳汇项目方法学适用条件与开发实务
- 2026浙江温州瓯海区三垟街道社区卫生服务中心面向社会招聘工作人员1人备考题库【综合卷】附答案详解
- 2026天津铁路建设投资控股(集团)有限公司招聘1人备考题库及完整答案详解【全优】
- 2026黑龙江新高考:语文必背知识点归纳
- 金属非金属地下矿山人行梯子间设置细则
- 领导干部任前法律法规知识考试题库(2025年度)及答案
- 2025福建厦门航空有限公司招聘备考题库及答案详解(易错题)
- 村集体三资管理培训课件
- (正式版)DB61∕T 2115-2025 《中深层地热能开发钻完井技术规程》
- 年鉴编纂基本知识课件
- 2026年保安员证考试题库完整版
- 2026年四川单招语数英基础提升分层试卷含答案适配不同水平
- 仰卧起坐课件
- 2025考研中共党史党建学真题(浙江省委党校)
评论
0/150
提交评论