如何提高程序的性能_第1页
如何提高程序的性能_第2页
如何提高程序的性能_第3页
如何提高程序的性能_第4页
如何提高程序的性能_第5页
已阅读5页,还剩26页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Secrets of Performance如何提高程序的性能如何提高程序的性能Bin LinDevelopment ManagerMicrosoft Research, China提高性能的方法 Faster hardware - $ = 速度 Use the right language, compiler optimizations Design scalable application Architectural design: cache - RAM - disk Choose right data structures and algorithms Tune code Avoid

2、slow OS APIs Tune, measure, tune, measure, tune, measure提高性能的方法 Faster hardware - $ = 速度 Use the right language, compiler optimizations Design scalable application Architectural design: cache - RAM - disk Choose right data structures and algorithms Tune code Avoid slow OS APIs Tune, measure, tune, m

3、easure, tune, measureDesign: A Case Study Design a scalable SMTP server Scalable is the key 2-CPU, 4-CPU, 8-CPU machines Handle as many request as possible, with relatively fast response time.Design: A Case Study A simple SMTP server/ Read SMTP commands/data from socketsIf (ReadFile( ) / various hou

4、sekeeping removed / Parse SMTP recipients and other headersIf (!ParseSMTPHeaders() / handle errors/ Parse bodiesIf (!ParseSMTPBodies() / handle errorsDesign: A Case Study (cont.)/ Local delivery or routingIf (LocalDelivery( ) Deliver( ); else Route( );/ Send SMTP response through SocketIf (WriteFile

5、() / various housekeeping skipsTraditional Thread ArchitectureSMTP Request Receiver (Socket)Worker ThreadWorker Thread(Other workers) 1 thread to receive and dispatch SMTP request 64 worker threads doing: Parse SMTP headers Parse SMTP bodies Local delivery Routing All in the same thread sequentially

6、The Evolution of HardwareRelative Performance (Latency)020040060080019921994199619982000TimePerformanceCPURAMDiskBridge the Gap - Caches CPU L1 cache 8K instruction cache, plus 8K data cache Closely coupled 0.333 clock/instruction practical 1 CPI CPU L2 cache 512K static RAM Coupled with full clock-

7、speed, 64-bit, cache bus Latency: 4-1-1-1 7 clocks/instruction I/O caches (RAM based file caches)The Price of Failure Lets look at the costs: Assume 1 second to zero a register L1 cache hit - 1 second (1x) L2 cache hit - 4 seconds (plus 3 seconds extra work - 7x) RAM hit - 25-150 seconds (24x-150 x)

8、 Disk or net hit - 3 weeks (2,000,000 x)SMTP Server Architecture 1 thread to receive and dispatch SMTP request 1 worker thread per CPU Event Loops (4 stages) Parse SMTP headers/bodies Local delivery Routing Socket send and file I/O One queue for each event loops Drain queue to empty before context s

9、witchSMTP Requests DispatcherLocal DeliveryRoutingI/O ManagerSMTP Header/Body ParserPerformance ImprovementOverall Performance0500010000150002000025000300002468Number of CPUsOperations per SecondSMTP ServerThreadsTraditionalThreadsPutting it all together Cache effects are a significant factor in ove

10、rall performance and scalability Batching offers significant benefits Spatial and temporal locality matters Cache lines (or pages) contain related data Burst or smooth usage pattern (not random) Significant repeated use of the cached data Threads per user architectures are expensive and dont scale w

11、ellNow lets look at memory The footprint of Windows is large 16MB Win98 system and 32MB Windows 2000 systems cannot host most applications MS applications need more memory than Windows can provide To increase performance, we need to reduce our foodprintsWhat is the Footprint? The footprint is the un

12、constrained working set of an application All image code and data recently used by the application All dynamic data recently used by the app System resources used to support the app The working set consists of resident pages that are using real physical memoryVirtual Memory UsageVirtual MemoryYour a

13、ddress space startsout emptyCode and Static data fromyour .exe is part ofthe address spaceCodeStatic DataEXEDisk ImageCode and Static data fromDLLs that your app uses are part of the address spaceCodeStatic DataDLLsCodeStatic DataCodeStatic DataDynamic Data from heap &VirtualAlloc allocations ar

14、e part of the address spaceDynamic AllocationDynamic DataDynamic DataDynamic DataHeapOS Support structures likepage table pages are part ofthe address spaceSystem DataSystem DataSystem DataOS SupportWorking SetVirtual MemoryPaging reduces theworking setHard DiskThe pages remaining arethe current wor

15、king setPhysical MemoryWorking SetA page fault will bring ina page adding to the working setPage FaultApplication ThrashingPage Fault Rate vs. Memory0 8 16 32 64 128 256Page Faults/Sec.Memory in Megabytes Machine is way too small for app Machine is a little small but OK Machine handles the app just

16、fineHow to Determine Working Set? NT Task Manager Processes Tab shows per process data Mem Usage shows active Working Set Mem Delta shows Working Set changes Page Faults shows total number of faults NT pmon utility Mem Usage column NT pstat utility Ws columnWorking SetPhysical MemoryVirtual MemoryHa

17、rd DiskThe OS manages the workingset size of all the applicationsWorking SetSize1.5 Meg0K500K1 MegThe OS use two settings in itspaging algorithm wsMax andwsMinwsMinwsMaxwsCurs relation to these settings along with memory contraint is used in pagingwsCurWhen memory is ample and a page fault occurs ws

18、Cur isallowed to grow unconstrainedPage FaultwsCurWorking Set (Cont.)Physical MemoryHard DiskWorking SetSize1.5 Meg0K500K1 MegwsMinwsMaxWhen memory is ample and a page fault occurs wsCur isallowed to grow above wsMaxPage FaultwsCurAt some point memory getstight.If memory is tight and wsCur is larger

19、 than wsMax the OS takes an aggressive position.Page FaultIt causes the app to page outone of its pages to free up room.Then it pages in the faultedpage. The overall working setsize remains the same.Working Set Growth When a page fault occurs, the new page is added to the processs working set. If me

20、mory is tight, a page is removed from your working set; otherwise, your working set grows In ample memory conditions, page faults cause WsCur to grow even above WsMax In tight memory conditions, page faults cause you to page against yourselfWorking Set Trim In tight memory conditions your working se

21、t will not grow WsMgr thread steals memory from working sets to make more available memory The more your working set is over WsMax, the more pages that get stolen Small machines (16mb NT, 8Mb Win95) almost always operate in this mode!How to be a Good Citizen! 做操作系统的好市民 VirtualUnlock() will kick a ra

22、nge of pages out of your working set SetProcessWorkingSetSize(-1,-1) empties entire working set when a window is minimized, this happens to the owning process link /WS:AGGRESSIVE marks a process for aggressive trimming Marking helps free memory quickly Marked processes can have their working set tri

23、mmed to WsMin (80k) if they are idle. NT marks Explorer, Spooler, servicesDemo Watch IE 5.0 Working Set go from 23Mb to 8Mb by browse, minimize, restore, refresh Watch PowerPoint go from 6-7Mb to 2-3Mb by edit, minimize, restore, change viewSpot the defect#define PAGE_COUNT 16#define PAGE_SIZE 4096D

24、WORD rgdwArrayPAGE_COUNTPAGE_SIZE;/ code to fill the array skips/ access the arrayfor (int y = 0; y PAGE_SIZE; y+) for (int x = 0; x PAGE_COUNT; x+) DWORD dw = rgdwArrayxy; Spot the defect#define PAGE_COUNT 16#define PAGE_SIZE 4096DWORD rgdwArrayPAGE_COUNTPAGE_SIZE;/ code to fill the array skips/ access the arrayfor (int y = 0; y PAGE_SIZE; y+) for (int x = 0; x PAGE_COUNT; x+) DWORD dw = rgdwArrayxy; 4KB pages may cause 16*4k page faults. 25-150 sec vs. 3 weeks! (10 TickCounts)A better version#define PAGE_COUNT

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论