课计算机组成原理课件第3章_第1页
课计算机组成原理课件第3章_第2页
课计算机组成原理课件第3章_第3页
课计算机组成原理课件第3章_第4页
课计算机组成原理课件第3章_第5页
已阅读5页,还剩57页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、课计算机组成原理课件第3章课计算机组成原理课件第3章Main Content Characteristics of Memory System Types of Memory The Memory Hierarchy Cache Memory Virtual Memory Main ContentCharacteristics of Memory Systems Location CPU Internal ExternalCapacityWord ByteInternalUsually governed by data bus widthExternalUsually a block whic

2、h is much larger than a wordUnit of TransferCharacteristics of Memory SystAccess Methods (1)SequentialStart at the beginning and read through in orderAccess time depends on location of data and previous locatione.g. tapeDirectIndividual blocks have unique addressAccess is by jumping to vicinity plus

3、 sequential searchAccess time depends on location and previous locatione.g. diskCharacteristics of Memory Systems Access Methods (1)SequentialChRandomIndividual addresses identify locations exactlyAccess time is independent of location or previous accesse.g. RAMAssociativeData is located by a compar

4、ison with contents of a portion of the storeAccess time is independent of location or previous accesse.g. cacheAccess Methods (2)Characteristics of Memory Systems RandomAccess Methods (2)CharacPerformanceAccess timeTime between presenting the address and getting the valid dataMemory Cycle timeTime m

5、ay be required for the memory to “recover” before next accessCycle time is access + recoveryTransfer RateRate at which data can be movedCharacteristics of Memory Systems PerformanceAccess timeCharactePhysical TypesSemiconductorRAMMagneticDisk & TapeOpticalCD & DVDOthersHologramCharacteristics of Mem

6、ory Systems Physical TypesSemiconductorChaThe Bottom LineHow much? (Capacity)Greater capacity, smaller cost per bitGreater capacity, slower access timeHow fast? (Access time)Faster access time, greater cost per bitHow expensive? (Cost)The Bottom LineHow much? (CapaMemory HierarchyRegistersIn CPUInte

7、rnal or Main memoryMay include one or more levels of cache“RAM”External memoryBacking storeMemory HierarchyRegistersHierarchy ListRegistersL1 CacheL2 CacheMain memoryDiskOpticalTapeHierarchy ListRegistersMemory Hierarchy - DiagramMemory Hierarchy - DiagramHierarchy ListHierarchy ListCacheSmall amoun

8、t of fast memorySits between normal main memory and CPUMay be located on CPU chip or moduleCacheSmall amount of fast memoContent addressable memory What makes cache “special”? Cache is not accessed by address; it is accessed by content. For this reason, cache is sometimes called content addressable

9、memory or CAM.Content addressable memory WhTerminology Hit: The requested data resides in a given level of memory. Miss: The requested data is not found in the given level of memory. Hit Rate: The percentage of memory accesses found in a given level of memory. Miss Rate: The percentage of memory acc

10、esses not found in a given level of memory. Miss Rate=1-Hit Rate Hit Time: The time required to access the requested information in a given level of memory. Miss Penalty: The time required to process a miss, which includes replacing a block in an upper level of memory, plus the additional time to de

11、liver the requested data to the processor.Terminology Hit: The requesteCache/Main Memory StructureCache/Main Memory StructureCache operation overviewCPU requests contents of memory locationCheck cache for this dataIf present, get from cache (fast)If not present, read required block from main memory

12、to cacheThen deliver from cache to CPUCache includes tags to identify which block of main memory is in each cache slotCache operation overviewCPU Cache Read Operation - FlowchartCache Read Operation - FlowchaTypical Cache OrganizationTypical Cache OrganizationCache DesignSizeMapping FunctionReplacem

13、ent AlgorithmWrite PolicyBlock SizeNumber of CachesCache DesignSizeCache SizeSmall enough so overall cost/bit is close to that of main memoryLarge enough so overall average access time is close to that of the cache aloneAccess time = main memory access time plus cache access timeLarge caches tend to

14、 be slightly slower than small cachesIt is impossible to arrive at a single “optimum” cache sizeCache SizeSmall enough so overComparison of Cache Sizesa Two values seperated by a slash refer to instruction and data cachesb Both caches are instruction only; no data cachesProcessorTypeYear of Introduc

15、tionL1 cacheaL2 cacheL3 cacheIBM 360/85Mainframe196816 to 32 KBPDP-11/70Minicomputer19751 KBVAX 11/780Minicomputer197816 KBIBM 3033Mainframe197864 KBIBM 3090Mainframe1985128 to 256 KBIntel 80486PC19898 KBPentiumPC19938 KB/8 KB256 to 512 KBPowerPC 601PC199332 KBPowerPC 620PC199632 KB/32 KBPowerPC G4P

16、C/server199932 KB/32 KB256 KB to 1 MB2 MBIBM S/390 G4Mainframe199732 KB256 KB2 MBIBM S/390 G6Mainframe1999256 KB8 MBPentium 4PC/server20008 KB/8 KB256 KBIBM SPHigh-end server/ supercomputer200064 KB/32 KB8 MBCRAY MTAbSupercomputer20008 KB2 MBItaniumPC/server200116 KB/16 KB96 KB4 MBSGI Origin 2001Hig

17、h-end server200132 KB/32 KB4 MBItanium 2PC/server200232 KB256 KB6 MBIBM POWER5High-end server200364 KB1.9 MB36 MBCRAY XD-1Supercomputer200464 KB/64 KB1MBComparison of Cache SizesProMapping FunctionThe elements included in the following example:Cache of 64kByteCache block of 4 bytesi.e. cache is 16k

18、(214) lines of 4 bytes16MBytes main memory24 bit address (224=16M) Direct Mapping Associative Mapping Set Associative MappingMapping FunctionThe elements iDirect MappingDirect MappingDirect MappingEach block of main memory maps to only one cache linei.e. if a block is in cache, it must be in one spe

19、cific place i = j mod mWhere i=cache line number j=main memory block number m=number of lines in the cacheDirect MappingEach block of maDirect MappingAddress is in two partsLeast Significant w bits identify unique wordMost Significant s bits specify one memory blockThe MSBs are split into a cache li

20、ne field r and a tag of s-r (most significant) Cache m=2rDirect MappingAddress is in twDirect Mapping Cache Line TableCache lineMain Memory blocks held 0 0, m, 2m, 3m2s-m 1 1,m+1, 2m+12s-m+1 m-1 m-1, 2m-1,3m-12s-1Direct Mapping Cache Line TabDirect Mapping Address StructureTag s-rLine or Slot rWord

21、w814224 bit address2 bit word identifier (4 byte block)22 bit block identifier8 bit tag (=22-14)14 bit slot or lineNo two blocks in the same line have the same Tag fieldCheck contents of cache by finding line and checking TagDirect Mapping Address StructDirect Mapping Cache OrganizationDirect Mappin

22、g Cache OrganizatDirect Mapping ExampleDirect Mapping ExampleDirect Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2s+ w/2w = 2sNumber of lines in cache = m = 2rSize of tag = (s

23、r) bitsDirect Mapping SummaryAddress Direct Mapping pros & consSimpleInexpensiveFixed location for given blockIf a program accesses 2 blocks that map to the same line repeatedly, cache misses are very highDirect Mapping pros & consSimpAssociative MappingA main memory block can load into any line of

24、cacheMemory address is interpreted as tag and wordTag uniquely identifies block of memoryEvery lines tag is examined for a matchCache searching gets expensiveAssociative MappingA main memoAssociative MappingAssociative Mapping024631570246315781012141191315LineNumberBlockTag CacheMemoryAssociative Ma

25、ppingAssociativeAssociative Cache OrganizationAssociative Cache OrganizationAssociative Mapping ExampleAssociative Mapping ExampleTag 22 bitWord2 bitAssociative MappingAddress Structure22 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast sig

26、nificant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g.AddressTagDataCache lineFFFFFCFFFFFC246824683FFFTag 22 bitWordAssociative MaAssociative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w

27、 words or bytesNumber of blocks in main memory = 2s+ w/2w = 2sNumber of lines in cache = undeterminedSize of tag = s bitsAssociative Mapping SummaryAddSet Associative MappingCache is divided into a number of setsEach set contains a number of linesA given block maps to any line in a given sete.g. Blo

28、ck B can be in any line of set ie.g. 2 lines per set2 way associative mappingA given block can be in one of 2 lines in only one setSet Associative MappingCache iSet Associative MappingSet Associative MappingBlockLineNumberCacheMemory02463157 Set 0 Set 1 Set 2 Set 30246315781012141191315Set Associati

29、ve MappingSet AssK Way Set Associative Cache OrganizationK Way Set Associative Cache OrSet Associative MappingAddress StructureUse set field to determine cache set to look inCompare tag field to see if we have a hite.gAddressTagDataSet number1FF 7FFC1FF123456781FFF001 7FFC001112233441FFFTag 9 bitSet

30、 13 bitWord2 bitSet Associative MappingAddresSet Associative Mapping Example13 bit set number000000, 008000, 018000, FF8000 map to same set Set Associative Mapping ExampTwo Way Set Associative Mapping ExampleTwo Way Set Associative Mapping ExampleTwo Way Set Associative MapSet Associative Mapping Su

31、mmaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2s+w/2w=2sNumber of lines in set = kNumber of sets = v = 2dNumber of lines in cache = kv = k2dSize of tag = (s d) bitsSet Associative Mapping

32、 SummarSet Associative Mapping Summary When v=m, k=1, the set associative mapping reduces to direct mapping. When v=1, k=m, the set associative mapping reduces to associative mapping. The use of two lines per set (v=m/2, k=2) is the common set associative mapping organization.Set Associative Mapping

33、 SummarReplacement Algorithms (1)Direct mappingNo choiceEach block only maps to one lineReplace that lineReplacement Algorithms (1)DirReplacement Algorithms (2)Associative & Set AssociativeHardware implemented algorithm (speed)Least Recently used (LRU) e.g. in 2 way set associativeWhich of the 2 blo

34、ck is LRU?First in first out (FIFO)replace block that has been in cache longestLeast frequently usedreplace block which has had fewest hitsRandomReplacement Algorithms (2)AssWrite PolicyMust not overwrite a cache block unless main memory is up to dateMultiple CPUs may have individual cachesI/O may a

35、ddress main memory directlyWrite PolicyMust not overwriteWrite throughAll writes go to main memory as well as cacheMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of trafficSlows down writesWrite throughAll writes go to Write backUpdates initially made in ca

36、che onlyUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if update bit is setOther caches get out of syncI/O must access main memory through cacheWrite backUpdates initially maLine Size (块的大小)As the block size increases from very small to large

37、r sizes, the hit rate will at first increase because the data in the vicinity of a referenced word are likely to be referenced in the near future.The hit rate will begin to decrease, as the block becomes even bigger. 64128 byte cache line sizes are most frequently used.Line Size (块的大小)As the block s

38、Number of CachesMultilevel Caches (L1 / L2) On-chip Cache Off-chip Cache Split Caches Data Cache Instruction Cache Number of CachesMultilevel CacPentium 4 Cache80386 no on chip cache80486 8k using 16 byte lines and four way set associative organizationPentium (all versions) two on chip L1 cachesData

39、 & instructionsPentium III L3 cache added off chipPentium 4L1 caches8k bytes64 byte linesfour way set associativeL2 cache 256k128 byte lines8 way set associativeL3 cache on chipPentium 4 Cache80386 no on cIntel Cache EvolutionProblemSolutionProcessor on which feature first appearsExternal memory slo

40、wer than the system bus.Add external cache using faster memory technology.386Increased processor speed results in external bus becoming a bottleneck for cache access.Move external cache on-chip, operating at the same speed as the processor.486Internal cache is rather small, due to limited space on c

41、hipAdd external L2 cache using faster technology than main memory486Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Units data access takes place.Create separate data

42、 and instruction caches.PentiumIncreased processor speed results in external bus becoming a bottleneck for L2 cache access.Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.Pentium ProMove L2 cache on to the processor chip.Pentium IISome applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.Add external L3 cache.Pentium IIIMove L3 cache on-chip.Pentium 4Intel Cache EvolutionProblemSoPowerPC Cache Organization601 single 32kb 8 way set as

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论