




文档简介
Behind the Scenes at MyS Dan Farino Chief Systems Architect dan 1Friday November 21 2008 Topics Architecture overview and history The stuff I get to work on in the Windows world Monitoring AdministraGon 2 2Friday November 21 2008 Topics Windows It s a good server now leave me alone However the selecGon of tools for large scale management is a bit sparse 3 3Friday November 21 2008 Where we started 4Friday November 21 2008 Where we started The ideal growth scenario Plan Implement Test Go live Monitor and collect ops data Repeat 5 5Friday November 21 2008 Where we started Our growth scenario Implement Go live And while those are happening over and over Reboot servers Throw hardware at performance issues Shotgun debugging 6 6Friday November 21 2008 Where we started Shotgun debugging Shotgun debugging is a process of making relaGvely undirected changes to soVware in the hope that a bug will be perturbed out of existence 7 7Friday November 21 2008 Where we started Why would anyone shotgun debug Don t really know how to analyze and debug a problem Need to resolve the problem now and collecGng data for analysis would take too long 8 8Friday November 21 2008 Where we started Web servers Windows 2000 Server IIS 5 0 ColdFusion 5 Database servers Windows 2000 Server SQL Server 2000 9 9Friday November 21 2008 Where we were OperaGonally Batch fi les and robocopy for code deployment psexec for remote admin script execuGon Windows Performance Monitor for monitoring 10 10Friday November 21 2008 Where we were Any sort of formal automated QA process No 11 11Friday November 21 2008 Current architecture 12Friday November 21 2008 Current architecture 4 500 web servers Windows 2003 IIS 6 0 ASP NET 1 200 cache servers 64 bit Windows 2003 500 database servers 64 bit Windows 2003 SQL Server 2005 13 13Friday November 21 2008 QA today Unit tests automated tesGng We sGll don t fuzz the site nearly as thoroughly as our users do though There are sGll problems that happen only in producGon 14 14Friday November 21 2008 QA today We need beher operaGonal data collecGon so that we know what cases we re not tesGng 15 15Friday November 21 2008 OperaGonal Data CollecGon 16Friday November 21 2008 Ops Data CollecGon Two general types of systems StaGc Collect store and alert based on pre confi gured rules Dynamic Write an ad hoc script or applicaGon to collect data for an immediate or one off need 17 17Friday November 21 2008 Ops Data CollecGon Our current staGc Windows Performance counter monitor 18Friday November 21 2008 Ops Data CollecGon Cons of staGc system RelaGvely central confi guraGon managed by a small number of administrators Bad for one off requests change the confi g apply wait for data Developer s quesGons usually go unanswered 19 19Friday November 21 2008 Ops Data CollecGon Developers looking at producGon Developers like to see their creaGons come to life I know I do The more a developer can see once their code goes live the more they re going to know for V2 20 20Friday November 21 2008 Ops Data CollecGon Cons of the dynamic system It s not really a system at all it s an administrator running a script Is a privileged operaGon scripts are powerful and can potenGally make changes to the system Even run as a limited user bad scripts can sGll DoS the system 21 21Friday November 21 2008 Ops Data CollecGon Cons of the dynamic system One shot data collecGon is possible but learning about deltas takes a lot more code and polling yuck Diff erent custom data collecGon tools that request the same data point cause duplicated network traffi c 22 22Friday November 21 2008 Ops Data CollecGon A recent example of an ad hoc task using our current dynamic system get adservers run agent ps e Version gcm F file dll FileVersionInfo FileVersion select Host Message 23 23Friday November 21 2008 Ops Data CollecGon Ideally all operaGonal data available in the enGre server farm should be able to queried Safely Instantly With change noGfi caGon 24 24Friday November 21 2008 Ops Data CollecGon I d like to be able to do something like this SELECT CpuTime ExceptionsPerSecond WHERE WebService Status UP AND serving OR serving 25 25Friday November 21 2008 Ops Data CollecGon I d also to be able to leave that query hanging and be noGfi ed of changes like A selected fi eld has changed for a known data point A new server has come online and meets the criteria or vice versa 26 26Friday November 21 2008 Our new operaGonal data collecGon plalorm 27Friday November 21 2008 Ops Data CollecGon Our new operaGonal data subscripGon plalorm On demand Supports both one shot and persistent modes Can be used by non privileged users 28 28Friday November 21 2008 Ops Data CollecGon Our new operaGonal data subscripGon plalorm Eliminates the need for the consumer to poll for changes If a data source requires polling that operaGon is pushed as close to the source as possible 29 29Friday November 21 2008 Ops Data CollecGon A Client makes one TCP connecGon to a Collector server Can receive data related to thousands of servers via this one connecGon As long as the connecGon is up the client is kept up to date 30 30Friday November 21 2008 Ops Data CollecGon A lihle bit like Having all of the servers in a chat room and being able to talk to a selected subset of them at any Gme over one connecGon IniGal idea came from looking at using XMPP ejabberd for command and control 31 31Friday November 21 2008 Ops Data CollecGon Collector Server AgentAgentAgent One lazily established TCP connection per Agent ClientClient Preferably one TCP connection per Client although more than one is allowed but frowned upon 32 32Friday November 21 2008 Ops Data CollecGon Provides Windows Performance Counters WMI objects Event logs Hardware data Custom WMI objects published from out of process Log fi le contents 33 33Friday November 21 2008 Ops Data CollecGon Provides On Linux plans are to hook into something like D Bus so that processes can provide operaGonal data to the Agent in a loosely connected manner 34 34Friday November 21 2008 Ops Data CollecGon The Collector service A Windows Service in C Completely async I O never blocks a thread Uses MicrosoV s Concurrency and CoordinaGon RunGme An Agent running on each host 35 35Friday November 21 2008 Ops Data CollecGon Wire protocol is Google s Protocol Buff ers Clients and Agents can be easily wrihen in any of the languages for which there is a PB implementaGon 36 36Friday November 21 2008 Ops Data CollecGon Why not use XMPP ejabberd Wanted to use Protocol Buff ers instead of XML Wanted lazily established TCP connecGons to the Agents Wanted to see if C CCR could handle the load yes it can 37 37Friday November 21 2008 Why develop a whole new plalorm 38Friday November 21 2008 Ops Data CollecGon Why develop something new There doesn t seem to be anything out there right now that fi ts the need And my requirements also include free and open source 39 39Friday November 21 2008 Ops Data CollecGon To do it properly you really need to be using 100 async I O Libraries that make this easy are relaGvely new CCR Twisted GTask Erlang 40 40Friday November 21 2008 Ops Data CollecGon Most established products were wrihen before the mulG core async craze 41 41Friday November 21 2008 Ops Data CollecGon What does it enable The individual that is actually interested in the data can gather it himself No central confi g no need to involve an administrator This includes developers 42 42Friday November 21 2008 Ops Data CollecGon What does it enable There is a very low barrier to entry It s almost like exploring a database with some ad hoc SQL queries I wonder quesGons are easily answered without a lot of work 43 43Friday November 21 2008 Ops Data CollecGon What does it enable CharGng alerGng data archiving systems no longer concern themselves with the data collecGon intricacies We can spend Gme wriGng the valuable code instead of rewriGng the same plumbing every Gme 44 44Friday November 21 2008 Ops Data CollecGon What does it provide Abstracts physical server farm from the user If you know machine names great But you can also say all servers serving profi or all cache servers in Los Angeles 45 45Friday November 21 2008 Ops Data CollecGon What does it provide Guaranteed to keep you up to date Get your iniGal set of data and then just wait for the deltas Pushes polling as close to the source as possible 46 46Friday November 21 2008 Ops Data CollecGon What does it provide Eliminates duplicate requests Hundreds of clients can be monitoring the Processor Time for a server and it will only be sent from that server once when it changes 47 47Friday November 21 2008 Ops Data CollecGon What does it provide Only collects data that someone is currently asking for This is how we avoid having explicit confi guraGon on the server 48 48Friday November 21 2008 Ops Data CollecGon Is this really a good way to do things Having too much data pushed at you is a bad thing Being able to pull from a large selecGon of data points is a good thing 49 49Friday November 21 2008 Ops Data CollecGon For developers knowing that they will have access to instrumentaGon data even in produc4on encourages more detailed instrumentaGon 50 50Friday November 21 2008 Ops Data CollecGon Beher instrumentaGon more data available more detailed feedback to QA and developers 51 51Friday
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025辽宁鞍山市立山区教育局面向应届毕业生校园招聘2人模拟试卷及1套完整答案详解
- 2025江苏苏州凌晔进出口有限公司招聘7人考前自测高频考点模拟试题附答案详解(典型题)
- 2025赤峰龙韵城市建设有限公司所属子公司员工招聘21人模拟试卷及答案详解(名校卷)
- 2025年湖南常德津市市人民医院公开招聘专业技术人员16人模拟试卷及答案详解(名校卷)
- 2025年5月广西悦桂田园文化旅游投资有限责任公司招聘13人笔试题库历年考点版附带答案详解
- 2025湖北巴东县溪丘湾乡人民政府招聘公益性岗位工作人员11人模拟试卷及1套参考答案详解
- 2025广东省能源集团西北(甘肃)有限公司招聘18人模拟试卷及答案详解(夺冠系列)
- 2025江苏句容市教育局所属学校招聘紧缺教育人才5人考前自测高频考点模拟试题及完整答案详解一套
- 2025北京中国热带农业科学院香料饮料研究所第一批工作人员招聘(第2号)模拟试卷及一套完整答案详解
- 2025年三亚市直属学校赴高校面向2025年应届毕业生招聘81人模拟试卷附答案详解
- 中医糖尿病个案护理
- 幼儿社会领域教育
- 肺功能检查质控要点
- CJ/T 164-2014节水型生活用水器具
- 医学院研究生招生考试回避制度
- DB37-T5321-2025 居住建筑装配式内装修技术标准
- 汽车代工协议书模板
- 人脸门禁设计方案和施工计划1
- 2025年监理工程师职业能力测试卷:监理工程师专业基础知识自测题
- 知识图谱在护理学领域的新应用与发展
- 智能化农业装备与设备
评论
0/150
提交评论