已阅读5页,还剩11页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Detecting, Managing, and Diagnosing Failures with FUSE,John Dunagan, Juhan Lee (MSN), Alec WolmanWIP,2,Goals & Target Environment,Improve the ability of large internet portals to gain insight into failuresNon-goals: masking failuresuse machine learning to inferabnormal behavior,3,MSN Background,Messenger, , Hotmail, Search, many other “properties”Large ( 100 million users)Sources of Complexity: multiple data-centers large # of machinescomplex internal network topologydiversity of applications and software infrastructure,4,The Plan,Detecting, managing, and diagnosing failuresReview MSNs current approachesDescribe our solution at a high level,5,Detecting Failures,Monitor system availability with heartbeatsMonitor applications availability & quality of service using synthetic requestsCustomer complaintsTelephone, emailProblems: These approaches provide limited coverage harder to catch failures that dont affect every requestData on detected failures often lacks necessary detail to suggest a remedy:which front end is flaky? which app component caused end-user failure?,6,Managing Failures,Definition: Ability to prioritize failures Detect component service degradation Characterizing app-stability Capacity planningWhen server “x” fails, what is the impact of this failure?Better use of ops and engineering resourcesCurrent approach: no systematic attempt to provide this functionality,7,Our solution (in 2 steps),Detecting and Managing FailuresStep 1: Instrument applications to track user requests across the “service chain”Each request is tagged with a unique idService chain is composed on-the-fly with help of app instrumentationFor each request:Collect per-hop performance informationCollect per-request failure statusCentralized data collection,8,What kinds of failures?,We can handle:Machine failuresNetwork connectivity problemsMost:MisconfigurationApplication bugsBut not all:Application errors where app itself doesnt detect that there is a problem,9,Diagnosing Failures,Assigning responsibility to a specific hw or sw componentInsight into internals of a component Cross component interactionsCurrent approach: instrument applicationsApp-specific log messagesProblemsHigh request rates = log rolloverPerceived overhead = detailed logging enabled during testing, disabled in production,10,Fuse Background,FUSE (OSDI 2004): lightweight agreement on only one thing: whether or not a failure has occurredLack of a positive ack = failure,11,Step 2: Conditional Logging,Step 2: Implement “conditional logging” to significantly reduce the overhead of collecting detailed logs across different machines in the service chainStep 1 provides ability to identify a request across all participants in the service chain, Fuse provides agreement on failure status across that chainWhile fate is undecided: Detailed log messages stored in main memoryCommon case overload of logging is vastly reducedOnce the fate of service chain is decided, we discard app logs for successful requests and save logs for failuresQuantity of data generated is manageable, when most requests are successful,12,Example,Benefits:FUSE allows monitoring of real transactions.All transactions, or a sampled subset to control overhead.When a request fails, FUSE provides an audit trailHow far did it get?How long did each step take?Any additional application specific context.FUSE can be deployed incrementally.,13,Issues,Overload policy: need to handle bursts of failures without inducing more failuresHow much effort to make apps FUSE enabled?Are the right components FUSE enabled?Identifying and filtering false positivesTracking request flow is non-trivial with network load balancers,14,Status,Weve implemented FUSE for MSN, integrated with ASP.NET rendering engineTesting in progressRoll-out at end of summer,15,Backups,16,FUSE is Easy to Integrate,Example current code on Front End:ReceiveRequestFromClient() SendRequestToBackEnd();Example code on F
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026浙江杭州市西湖实验室药物发现平台诚聘英才备考题库附答案详解(轻巧夺冠)
- 2026河南开封市宋城文化产业发展有限公司招聘1人备考题库含答案详解(研优卷)
- 2026国防科技大学星光幼儿园招聘教职工2人备考题库含答案详解(综合题)
- 2026安徽马鞍山当涂现代农业示范区管委会招聘村级后备干部2人备考题库含答案详解(基础题)
- 2026年度安徽师范大学外国语学院人才招聘4人备考题库及答案详解(夺冠系列)
- 2026广西贺州富川瑶族自治县市场监督管理局招聘工作人员1名备考题库完整参考答案详解
- 2026广东茂名信宜市市直学校赴海南师范大学招聘教师30人备考题库(编制)附答案详解(培优)
- 2026青海黄南州同德县紧密型县域医共体招聘2人备考题库含答案详解(能力提升)
- 《Module 1 Unit 1 I want a hot dog please》课件2025-2026学年外研版六年级下册英语
- 能源管理与节能技术指南
- 2026年全国中学生生物学联赛试卷及答案解析
- 2025年黑龙江大庆市初二学业水平地理生物会考真题试卷(含答案)
- 第22课 在线学习新变革 课件(内嵌视频) 2025-2026学年人教版初中信息科技七年级全一册
- 2026国家广播电视总局直属事业单位招聘166人备考题库(北京)及答案详解(历年真题)
- 第六课 准备工作早做好教学设计-2025-2026学年小学心理健康四年级下册大百科版
- 收受回扣的管理制度包括(3篇)
- 河南工业职业技术学院2026年单独招生《职业适应性测试》模拟试题
- 环境监测数据异常分析指南
- 【中考真题】2025年上海英语试卷(含听力mp3)
- 4微检查 高中语文必背古诗文60篇打卡表
- 初中音乐《我和你》说课逐字稿
评论
0/150
提交评论