coa课件第四章.ppt_第1页
coa课件第四章.ppt_第2页
coa课件第四章.ppt_第3页
coa课件第四章.ppt_第4页
coa课件第四章.ppt_第5页
已阅读5页,还剩68页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、William Stallings Computer Organization and Architecture6th Edition,Chapter 4 Cache Memory(重点),高速缓冲存储器,4.1 Computer Memory System Overview,Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organization,存储位置,容量,传输单位,存取方式,性能,物理类型,物理特性,组织,Page96,特性,1.Ch

2、aracteristics of memory Systems,(1)Location,CPU (registers, cache) Internal (main memory, cache) External (disk, tape),存储位置,(2)Capacity,Word size word length (字长) The natural unit of organisation 8,16,32bits Number of words (字数) or Bytes 注意:字(word)在不同的机器上位数不同,它不是一个公认的单位,而字节是标准单位:1Byte=8bits,容量=字数x字长

3、,(字的位数),N is number of words,(3)Unit of Transfer,Internal “word” Usually governed by data bus width Be equal to the word length (size) (or xx bits) External “block” Usually a block which is much larger than a word(e.g.4,8,16) 1block=2n words,传输单位,(4)Access Methods (1),Sequential access 顺序存取 Start at

4、 the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct access 直接存取 Individual blocks have unique address Access time depends on location of data and previous location e.g. disk,存取方式,Access Methods (2),Random access 随机存取 Individual addr

5、esses identify locations exactly Access time is independent of location or previous access e.g. RAM(Main memory) Associative access 关联存取 Data is located by a comparison with contents of a portion of the store Access time is independent of location or previous access e.g. cache,存取方式,(5)Performance,Ac

6、cess time 存取时间 Time between presenting the address and getting the valid data Memory Cycle time存储周期 Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate 传输速度 Rate at which data can be moved Note: Access 存取,访问(忽略方向) =Fetch & store(方向性),性能,P

7、age 98,(6)Physical Types,Semiconductor 半导体 RAM (DRAM & SRAM) Magnetic 磁 Disk & Tape Optical 光 CD & DVD Magnetic Optical 磁-光 MO(用的少),物理类型,(7)Physical Characteristics,Volatile/nonvolatile 易失性 非易失性 Erasable/nonerasable 可擦除 不可擦除,物理特性,(可读可写 ) (只读),(8)Organization 存储单元组织 Physical arrangement of bits into

8、words,2.Memory Hierarchy 存储器分层结构,Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store 将存储备份,Page99,Memory Hierarchy - Diagram,Page100,主板内存储器,主板外存储,离线存储,寄存器组 高速缓冲存储器 主存储器,磁盘 CD光盘 DVD光盘,磁带 光-磁盘 一写多读光盘 (刻录盘),1.每位价格降低 2.容量大 3.存储时间长(速度慢),Hie

9、rarchy List,Registers(in CPU) L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape,Example: two levels of memory,Level1 fast but size small Access time T1=0.01us Level2 slow but size big Access time T2=0.1us Suppose:(约定 ) CPU needs must in LeveL2(L2) maybe in LeveL1(L1),Page101,Access method:

10、CPU first access L1 memory, if find (Hit on 命中), directly gets the data(fast) if no find (Hit miss 未命中), then Access L2 memory, read required data to L1 memory Then deliver from L1 to CPU,Example: two levels of memory,Hhit on radio (x%) , Taverage Access time T1access time to L1, T2access time to L2

11、 T=HT1+(1-H)(T1+T2)=T1+(1-H)T2 If H is near to 100%, the T near to T1 If H is near to 0%, the T near to (T1+T2) Suppose H=95, then T=T1+(1-0.95)T2=0.015us Result: Speed=L1 Access time Capacity=L2 capacity,Page102,命中率,Performance of a simple two-level memory,Page101,T=T1+(1-H)T2,So you want fast,It i

12、s possible to build a computer which uses only static RAM (see later) This would be very fast This would need no cache-结构简化 How can you cache cache? This would cost a very large amount,Page103,4.2 Cache,Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or

13、 module,Page103,(L1:快,小),(L2:慢,大),“零售 ” “批发”,Cache operation - overview,CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify wh

14、ich block of main memory is in each cache slot(line),Page104,Cache read operation,Page105,RA=Real Address 实地址 或真实地址,Typical Cache Organization,Page106,Cache/Main memory structure,Page104,CM,Total:M blocks,Total:C lines,w0,Wk-1,w1,= Block size=Line size,w0,w1,Wk-1,w2,w3,4.3 Elements of Cache Design,C

15、ache size (capacity) Mapping Function Replacement Algorithm Write Policy Line Size Number of Caches,Page107,容量,映射函数,替换算法,写策略,Cache 数量,行大小,Cache的设计要素,1.Size does matter,Cost More cache is expensive Speed More cache is faster (up to a point) Checking cache for data takes time,容量,2.Mapping Function(1),

16、Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory block into cache line. determine which main memory block currently occupies a cache line Three techniques: Direct Mapping 直接映射 Associative Mapping 关联映射 Set Associative Mapping 组关联映射*,映射函数,(1)D

17、irect Mapping 直接映射,Each block of main memory maps to only one (the fixed) cache line i.e. if a block is in cache, it must be in one specific place Address code length is divided 3 parts :技术处理 Right bits identify unique word in the block (word number :Word#) Middle bits specify caches location (line

18、number : line #) Left bits serve as mark ( tag ),字的编号,行的编号,标志,块的编号,Page107112,Note: Word number 字的编号(字号) Number of words 字数 Line number 行的编号(行号) Number of lines 行数 block number 块的编号(块号) Number of blocks 块数 Address code length 地址码长度(RA),For example(一个简单的例子),已知: (1)Main memory size=16words (2)Line siz

19、e=Block size=2words (3)Cache size=8words,书外例子,Cache,Elements for all three cases,Cache of 64kbytes ( or word ) Cache block of 4 bytes i.e. cache is 16k (214) lines of 4 bytes 16Mbytes main memory,书上例子:Page107,Elements for all three cases,Cache Size=64KBytes = (216)bytes ( or word ) Cache Line size=4

20、bytes=(22)bytes Number of cache Lines=(216)bytes/(22)bytes/line =(214) line Block size=Line size=4bytes =(22)bytes Main Memory Size=16MBytes=(224)bytes Number of memory blocks =(224)bytes/(22)bytes/block = (222) block Length of RA=24 bits (224=16M) M=2N NAddress code bits MAddressable memory capacit

21、y,RA=Real Address 实地值 或真实地址,Page107,Direct Mapping Address Structure,Tag,Line #,Word #,8,14,2,24 bit address(RA) 2 bit word# (4 byte block) 14 bit line# and 8 bit tag (=24-2-14) No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag,block#,Pag

22、e110,bits,Direct Mapping Cache Organization,hit on,hit miss,block#,word #,RA,word#,line#,tag,Page109,(1),(2),(3),(3),(4) MM to Cache?,Direct Mapping Example,Page111,e.g. 16 339CH 0001 0110 00 1100 1110 0111 00B tag=16H r=0CE7H w=0H,Tag,Line #,Word #,8,14,2,Address format,339CH/4=? 0 3/4 3 C (3X16+3)

23、/4 3 E (3X16+9)/4 1 7 (1X16+12)/4 0,Direct Mapping Summary,Number of addressable units = 2n words or bytes =Address length = n bits (RA) Block size = line size = 2w words or bytes = Size of word# = w bits Number of blocks in main memory = 2n / 2w = 2n-w Number of lines in cache=2L = Size of line# =

24、L bits RA=n bits Size of tag = (n-w-L)bits tag L w bits,Page110,Direct Mapping pros & cons,Simple 实现简单 Inexpensive 花费少 Fixed location for given block 位置固定 If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high conflict(冲突),命中率低,正面地&反面地,Page111,(2)Associative

25、Mapping (关联映射/全关联映射),A main memory block can load into any line of cache Memory address code is interpreted as tag and word# Tag uniquely identifies block of memory Every lines tag is examined for a match Cache searching gets expensive,Page112,Tag 22 bit,Word 2 bit,Associative Mapping Address Struct

26、ure,22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit,Fully Associative Cache Organization,Page113,hit on,hit miss,block#,word #,RA,word#,tag,(1),(2),(2),(3) MM to Cache?,1:m,Associative Mapping Example,e.g. 16339CH 00 0101 1000 1100 1110 011

27、1 00B tag=058CE7H w=0H,Page114,Tag,Word #,22,2,Address format,Associative Mapping Summary,Address length =n bits=(tag bits + word#) Number of addressable units = 2n words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory =2n / 2w = 2 n-w Number of lines in cache=und

28、etermined Size of tag = n-w bits,Page112,在地址格式中未确定 可映射到cache中的任意一行,Associative Mapping pros & cons,Flexible 灵活的 Require complex circuitry to examine the tags of all cache lines in parallel 检查tag电路复杂 (Comparing tag is most difficulty Hardware difficulty ),Page112,Reviews(回顾),Direct Mapping Each block

29、 of main memory maps to only one (the fixed) cache line Simple/Inexpensive Conflict/hit miss highly Associative Mapping a main memory block can load into any line of cache Flexible Compare tag difficulty,(3)Set Associative Mapping 组关联映射,Cache is divided into a number of sets Each set contains a numb

30、er of lines A given block maps to any line in a given (the fixed) set e.g. Block B can be in any line of set i e.g. 2 lines per set 2-way associative mapping A given block can be in one of 2 lines in only one set (fixed),Page112,K-Way :K路组关联映射,Address Format,tag set word,Two Way Set Associative Cach

31、e Organization,Page116,hit on,hit miss,RA,word#,set#,tag,(1),(2),(3),(3),1:k (km),(4) MM to Cache?,Set Associative Mapping Address Structure,Number of sets in cache= (214) lines /2lines/set = (213) sets tag=RA - Set# - Word# =24 - 13 - 2=9bits Use set field to determine cache set to look in Compare

32、tag field to see if we have a hit,Tag 9 bit,Set# 13 bit,Word# 2 bit,Two Way Set Associative Mapping Example,e.g. 16339CH 0001 0110 00 1100 1110 0111 00B tag=02CH set=0CE7H w=0H,Tag,set #,Word #,9,13,2,Address format,Set Associative Mapping Summary,Address length = n bits Number of addressable units

33、= 2n words or bytes Block size = line size=2w words or bytes Number of blocks in main memory = 2n-w Number of lines in set = k (k行一组) Number of sets = v = 2s Number of lines in cache = kv = k * 2s Size of tag = (n s w) bits,Page115,Mapping,已知:Cache of 64k Words Cache block of 4 Words 16M Words main

34、memory K-Way set 实际应用中 2,4,8,16 Memory size =RA= ? Bits Block size or line size =W= ? Bits Cache size/line size/(K lines/set) =Set#= ? Bits Number of lines in cache =line#= ? Bits,3.Replacement Algorithms (1)(1)Direct mapping,When a new block is brought into the Cache,one of the existing block must

35、be replaced No choice Each block only maps to one line a fixed line Replace that line,替换算法,Page115,which one?,Replacement Algorithms (2)(2)Associative & Set Associative,Hardware implemented algorithm (speed) Least Recently used(LRU) First in first out (FIFO) Least frequently used Random,最近最少使用法,先进先出

36、法,最不经常使用法,随机法,Replacement Algorithms (2-1)Associative & Set Associative,Random replace block at random Least frequently used(LFU) replace block which has had fewest hits using a counter to record frequency First in first out (FIFO) replace block that has been in cache longest using a counter to reco

37、rd history,随机法,最不经常使用法,先进先出法,Replacement Algorithms (2-2)Associative & Set Associative,Least Recently used (LRU) Each line includes a Use bit,when a line is referenced(or accessed),its Use bit is set to 1 and the Use bit of the other line in same set is set to 0 So, the Use bit is 0,meaning the line

38、 is LRU, replacement is occurring in the line,最近最少使用法,Replacement Algorithms (2-3)Associative & Set Associative,Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is LRU? 2 ways for each set have a flag bit Use=1 means left way is just accessed So replacement make right one

39、(Use=0 ),最近最少使用法,Use,a set,Use,1,0,Left,Right,4.Write Policy,If CPU modifies the content of cache different with memory when modify the content of main memory? write through write back,写策略,Page118,(1)Write through,All writes go to main memory as well as cache The data in a cache line is changed(writ

40、e), the change must occur in memory block at the same time Lots of traffic Slows down writes Remember bogus write through caches!,写直达/写通/写透法,Page118,总线上传输繁忙,降低写操作的效率,“假的”-没有实现过,(2)Write back,Updates initially made in cache only Update bit for cache line is set when update occurs If block is to be re

41、placed, write to main memory only if update bit is set,写回/拷回法,写操作占存储器操作的 15,5.Line Size,Line size=block size Simple, a large block size(same as line size) may rise the hit in ratio But heavy traffic to bus,行的大小,Page119,尺寸大的行,可以增加命中率,但是传输过程中对总线的压力增大。,6.Number of Caches,Multilevel Caches on chip cache

42、(片内Cache),or board cache(板上Cache) CPU,L1 Cache,L2 Cache,Memory 1:110:2 100:50(速度) Unified versus Split Caches on-chip cache holds data & instructions Unified cache , dynamic adjust the size of data or instruction by Hit On Radio Split cache:data writes back memory, but instructions does not ,Cache数目

43、,Page119,多级Cache,统一和分立Cache,Unified Cache,4.4 Pentium 4 Cache and Power PC Cache,80386 no on chip cache 80486 a single on-cache of 8KBytes using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches Data & instructions Pentium 4 L1 caches(Data Cache & I

44、nstruction Cache) 8k bytes(Data Cache) 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative,Page121,Pentium 4 Diagram (Simplified),L1指令Cache,L1数据Cache,L2,Cache,Page122,Power PC Cache Organization,601 single 32kb 8 way set associative 603 16kb (2 x 8kb) two way set associative 604 32kb 620 6

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论