版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、系统设计中七剑客大纲同步 网络 数据库分布式性能 估算面向对象案例 社交网站信息流 日志统计-网络爬虫 电商产品页面IntroductionSystem design:1-3 rounds in interviewsFor new-grad: not required/simpleFor experienced: important to show your knowledge and thoughtful ideasKnowledge, Design, Communication!ConcurrencyThvs. ProcessConsumer and ProducerBlockingque
2、ueTracking:Synchronized, AsynchronizedNetworkVisit URLWhat happens after you typed a URL in your browser and pressed return key?DatabaseRelational DB vs. KV StoreSharding vs. ClusteringTinyURL:Store the map document:from shortlink code to full URL. The record/code: varchar(8)url: varchar(1000) creat
3、ed_at: timestampWe also need to store the reverse mapfrom URL back to code.Distribute SystemHow to scale Tiny URL service?Stateless frontend serversa load balancerSharded/replicated database (on shortlink code)Memcached to scaletrafficSpwrite loadLocally buffered event tracking + async flush to high
4、-throughput message queueUse a distributed unique ID generator (64-bit)PerformanceCache is KEY!EstimationHow many piano tuners are there in the entire world?Tiny URL: How much is total storage? URL Length 10 - 1000 chars.Total accumulated URL number 100 MNew URL registrations are on the order 100,00
5、0/day (1/ sec)Redirect requests are on the order of 100M/day (1000/sec)Design Pattern23 patterns: MVCSingleton Factory Iterator Decorator Facade案例News FeedsStats ServerWeb CrawlerAmazon Product PageNews feedDefine feedOrganizeaggregate dedup sortLevel 1.0Database Schema: User Friendship NewsGet News
6、feed: merge news Newsfeed vs NewsNewsIdAuthorIdContent12“Hehe”21“Lala”FriendshipIdSourceIdTargetId112221UserIdNameAge1Jason252Michael26Why bad?100+ friends1 Query -> Get friends list 1 Query ->SELECT news WHERE timestamp>AND sourceid IN friend list LIMIT 1000IN is slowEither Sequential scan
7、 or 100+ index queriesLevel 2.0Pull vs PushPull: Get news from each friend, merge them together. (NewsFeed generated when user request)Push: NewsFeed generated when news generated. (we have another table to store newsfeed, may cause duplicate news)Push:1 Query to select latest 1000 newsfeed. 100+ in
8、sert queries (Async)Disadvantage: News Delay.Level 3.0Popular star (Justin Bieber) Flowers 13M +Async Push may cause over 30 minutes (13M+ insertions, delay too long)Push + Pullfor popular star, dont push news to flowersfor every newsfeed request, merge non-popular user newsfeed (push) and popular u
9、sers newsfeed (pull)Level 4.0Push disadvantage: RealtimeStorage (Duplicate) EditGo back to PULL:Cache users latest (14 days) newsBroadcast multiple request to multiple servers (Shard by userId). Merge & sort newsfeedCache newsfeeds for this user with timestampClick Stats ServerHow are click stat
10、s stored?A poor candidate will suggest write-back to a data store on every clickA good candidate will suggest some form of aggregation tier that accepts clickstream data, aggregates it, and writes back a persistent data store periodicallyA great candidate will suggest a low-latency messaging system
11、to buffer the click data and transfer it to the aggregation tier.If daily, storing in hdfs and running map/reduce jobs to compute stats is a reasonable approachIf near real-time, the aggregation logic should compute statsCache Requirementa When a reqomes look it up in the cache and if it hits thenre
12、turn the response from here and do not pass the request to the systemb If the request is not found in the cache then pass it on to the systemc Since cache can only store the last n requests, Insert the n+1th request in the cache and delete one of the older requests from the cachedDesign one cache su
13、ch that all operations can be done in O(1) lookup, delete and insert.Web CrawlerAmazon Product PageThe product page includes information such asa) product informationb) user informationc) recommended products (what do other customers buy after viewing this item, recommendations for you like this product, etc)ReferenceThe Log: What every software engineer should know about real-time da
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 5年(2021-2025)河北高考政治真题分类汇编专题11 世界多极化与经济全球化(解析版)
- 2025年太原市社区工作者招聘考试真题及答案
- openEuler系统管理与运维(AI协同)(微课版) 课件全套 项目1-8 搭建服务器基础环境 -部署前后端分离的Web项目
- 韶关市辅警招聘考试题及答案
- 2026年中考语文考前冲刺押题试卷及答案(五)
- 2026年事业单位招聘考试计算机理论知识考试试卷及答案(十九)
- 26年基因检测替代方案告知要点
- 26年手足综合征与疗效关联
- 2026年GMAT《定量推理》真题回忆版
- 2026年模具钳工中级工(四级)职业技能鉴定考试题库(地方专用)
- 注塑车间安全生产培训内容
- 国家安全生产十五五规划
- 开颅手术手术中过程护理的配合
- 酒店防损部安全培训课件
- 《视觉欺骗术》课件
- (标准)转让合同协议书挖机
- 交叉污染培训课件
- 2025年高考化学湖南卷试题真题及答案详解(精校打印版)
- 2025-陶瓷球三点纯滚动接触疲劳试验机设计-
- 2025-2030年中国异丁烯及其衍生物行业市场现状供需分析及投资评估规划分析研究报告
- (高清版)DG∕TJ 08-2410-2022 文物和优 秀历史建筑消防技术标准
评论
0/150
提交评论