计算机系统基础:cache memory_第1页
计算机系统基础:cache memory_第2页
计算机系统基础:cache memory_第3页
计算机系统基础:cache memory_第4页
计算机系统基础:cache memory_第5页
已阅读5页,还剩122页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1计算机体系结构

cachememory

2次课2LocalityPrincipleofLocality:

ProgramstendtousedataandinstructionswithaddressesnearorequaltothosetheyhaveusedrecentlyTemporallocalityRecentlyreferenceditemsarelikely

tobereferencedagaininthenearfutureSpatiallocalityItemswithnearbyaddressestend

tobereferencedclosetogetherintime3LocalityAlllevelsofmoderncomputersystemsaredesignedtoexploitlocalityHardwareCachememory(tospeedupmainmemoryaccesses)OperatingsystemsUsemainmemorytospeedupvirtualaddressspaceaccessesUsemainmemorytospeedupdiskfileaccessesApplicationprogramsWebbrowsersexploittemporallocalitybycachingrecentlyreferenceddocumentsonalocaldisk4Localityintsumvec(intv[N]){ inti,sum=0;

for(i=0;i<N;i++) sum+=v[i]; returnsum;}Address0481216202428Contentsv0v1v2v3v4v5v6v7Accessorder123456785LocalityLocalityintheexamplesum:temporallocalityv:spatiallocalityStride-1referencepatternStride-kreferencepatternVisitingeveryk-thelementofacontiguousvectorAsthestrideincreases,thespatiallocalitydecreases6Stride-1referencepatternintsumarrayrows(inta[M][N]){ inti,j,sum=0;

for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1234567Stride-Nreferencepatternintsumarraycols(inta[M][N])(M=2,N=3){ inti,j,sum=0;

for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1352468LocalityLocalityoftheinstructionfetchSpatiallocalityInmostcases,programsareexecutedinsequentialorderTemporallocalityInstructionsinloopsmaybeexecutedmanytimes9LocalityDatareferencesReferencearrayelementsinsuccession SpatiallocalityReferencevariablesumeachiteration TemporallocalityInstructionreferencesReferenceinstructionsinsequence. SpatiallocalityCyclethroughlooprepeatedly Temporallocalitysum=0;for(i=0;i<n;i++) sum+=a[i];returnsum;10MemoryHierarchyFundamentalpropertiesofstoragetechnologyandcomputersoftwareDifferentstoragetechnologieshavewidelydifferentaccesstimesFastertechnologiescostmoreperbytethansloweronesandhavelesscapacityThegapbetweenCPUandmainmemoryspeediswideningWell-writtenprogramstendtoexhibitgoodlocality11Mainmemoryholdsdiskblocksretrievedfromlocaldisks.registerson-chipL1cache(SRAM)mainmemory(DRAM)localsecondarystorage(localdisks)Larger,slower,andcheaper(perbyte)storagedevicesremotesecondarystorage(distributedfilesystems,Webservers)Localdisksholdfilesretrievedfromdisksonremotenetworkservers.off-chipL2cache(SRAM)L1cacheholdscachelinesretrievedfromtheL2cache.CPUregistersholdwordsretrievedfromcachememory.L2cacheholdscachelinesretrievedfrommemory.L0:L1:L2:L3:L4:L5:Smaller,faster,andcostlier(perbyte)storagedevicesAnexamplememoryhierarchyCachesFundamentalideaofamemoryhierarchy:ForeachK,thefaster,smallerdeviceatlevelKservesasacacheforthelarger,slowerdeviceatlevelK+1.Whydomemoryhierarchieswork?Becauseoflocality,programstendtoaccessthedataat

levelkmoreoftenthantheyaccessthedataatlevelk+1.Thus,thestorageatlevelk+1canbeslower,andthuslargerandcheaperperbit.BigIdea:Thememoryhierarchycreatesalargepoolofstoragethatcostsasmuchasthecheapstoragenearthebottom,butthatservesdatatoprogramsattherateofthefaststoragenearthetop.GeneralCacheConcepts012345678910111213141589143CacheLarger,slower,cheapermemoryviewedaspartitionedinto“blocks”Dataiscopiedinblock-sizedtransferunitsSmaller,faster,moreexpensivememorycachesasubsetoftheblocks44410101014MemoryHierarchyBlocksAtlevelk+1ThestorageispartitionedintocontiguouschunksofdataobjectsEachblockhasauniqueaddressornameBlockscanbefixed-sizeorvariable-sizedAtlevelkThestorageispartitionedintoasmallersetofblocksTheblocksarethesamesizeastheblocksatlevelk+1Thestoragecontainscopiesofasubsetoftheblocksatlevelk+115MemoryHierarchyTransferunitsUsedtocopydatabackandforthbetweenlevelkandlevelk+1GeneralCacheConcepts:Hit012345678910111213141589143CacheDatainblockbisneededRequest:1414Blockbisincache:Hit!GeneralCacheConcepts:Miss012345678910111213141589143CacheDatainblockbisneededRequest:12Blockbisnotincache:Miss!BlockbisfetchedfrommemoryRequest:12121212BlockbisstoredincachePlacementpolicy:

determineswherebgoesReplacementpolicy:

determineswhichblock

getsevicted(victim)TypesofCacheMissesCold(compulsory)missColdmissesoccurbecausethecacheisempty.CapacitymissOccurswhenthesetofactivecacheblocks(workingset)islargerthanthecache.TypesofCacheMissesConflictmissMostcacheslimitblocksatlevelk+1toasmallsubset(sometimesasingleton)oftheblockpositionsatlevelk.e.g.Blockiatlevelk+1mustbeplacedinblock(imod4)atlevelk.Conflictmissesoccurwhenthelevelkcacheislargeenough,butmultipledataobjectsallmaptothesamelevelkblock.e.g.Referencingblocks0,8,0,8,0,8,...wouldmisseverytime.ExamplesofCachingintheHierarchyHardware0On-ChipTLBAddresstranslationsTLBWebbrowser10,000,000LocaldiskWebpagesBrowsercacheWebcacheNetworkbuffercacheBuffercacheVirtualMemoryL2cacheL1cacheRegistersCacheTypeWebpagesPartsoffilesPartsoffiles4-KBpage64-bytesblock64-bytesblock4-8byteswordsWhatisCached?Webproxyserver1,000,000,000RemoteserverdisksOS100MainmemoryHardware1On-ChipL1Hardware10On/Off-ChipL2AFS/NFSclient10,000,000LocaldiskHardware+OS100MainmemoryCompiler0CPUcoreManagedByLatency(cycles)WhereisitCached?Diskcache DisksectorsDiskcontroller100,000Diskfirmware21CacheMemorymainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememoryLiesbetweentheCPUandthemainmemoryCPUlooksfirstfordataincache,theninmainmemoryHoldfrequentlyaccessedblocksofmainmemoryincaches22CacheMemoryHistoryAtverybeginning,3levelsRegisters,mainmemory,storage10yearslater,4levelsRegister,cachememory,mainmemory,storageModernprocessor,5~6levelsRegisters,SRAML1,L2(,L3)cache,mainDRAMmemory,diskstorageCachememoriessmall,fastSRAM-basedmemoriesmanagedbyhardwareautomaticallycanbeon-chip,off-chip23InsertinganL1cachebetweentheCPUandmainmemoryabcdblock10pqrsblock21......wxyzblock30...Thebigslowmainmemoryhasroomformany8-wordblocks.ThesmallfastL1cachehasroomfortwo8-wordblocks.Thetiny,veryfastCPUregisterfilehasroomforfour4-bytewords.Thetransferunitbetweenthecacheandmainmemoryisa8-wordblock(32bytes).ThetransferunitbetweentheCPUregisterfileandthecacheisa4-byteblock.line0line124GenericCacheMemoryOrganization•

•B–110•

•B–110validvalidtagtagset0:B=2bbytespercacheblockElinespersetS=2ssetsttagbitsperline1validbitperline•

••

•B–110•

•B–110validvalidtagtagset1:•

••

•B–110•

•B–110validvalidtagtagsetS-1:•

••

•Cacheisanarrayofsets.Eachsetcontainsoneormorelines.Eachlineholdsablockofdata.25CacheMemoryFundamentalparametersParametersDescriptionsS=2sEB=2bm=log2(M)NumberofsetsNumberoflinespersetBlocksize(bytes)Numberofphysical(mainmemory)addressbits26CacheMemoryDerivedquantitiesParametersDescriptionsM=2ms=log2(S)b=log2(B)t=m-(s+b)C=BESMaximumnumberofuniquememoryaddressNumberofsetindexbitsNumberofblockoffsetbitsNumberoftagbitsCachesize(bytes)notincludingoverheadsuchasthevalidandtagbits27MemoryAccessingForamemoryaccessinginstructionmovlA%eaxAccesscachebyAdirectlyIfcachehitgetthevaluefromthecacheOtherwise,cachemisshandlinggetthevalue28Addressingcachestbitssbitsbbits0m-1<tag><setindex><blockoffset>PhysicalAddressA:0m-1Splitinto3parts:29Direct-mappedcacheSimplestkindofcacheCharacterizedbyexactlyonelineperset.validvalidvalidtagtagtag•

•set0:set1:setS-1:E=1linespersetcacheblockcacheblockcacheblockp63330AccessingDirect-MappedCachesThreestepsSetselectionLinematchingWordextraction31SetselectionUsethesetindexbitstodeterminethesetofinterestvalidvalidvalidtagtagtag•

•set0:set1:setS-1:tbitssbits000010m-1bbitstagsetindexblockoffsetselectedsetcacheblockcacheblockcacheblock32Linematching1tbitssbitsi01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(1)Thevalidbitmustbeset(2)Thetagbitsinthecachelinemustmatchthetagbitsintheaddress011030127456Findavalidlineintheselectedsetwithamatchingtag33WordExtraction1tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):blockoffsetselectsstartingbyte0110w3w0w1w23012745634SimpleMemorySystemCacheCache16lines4-bytelinesizeDirectmapped11109876543210OffsetIndexTag35SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123111150––––21B1000204083360––––4321436D8F0950D13672F01D6310––––716111C2DF0336SimpleMemorySystemCacheIdxTagValidB0B1B2B382413A00518992D0––––A2D19315DA3BB0B0––––C120––––D16104963415E13183771BD3F140––––37AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x0D

Hit?YesByte:0x3638LineReplacementonMissesCheckthecachelineofthesetindicatedbythesetindexbitsIfthecachelinevalid,itmustbeevictedRetrievetherequestedblockfromthenextlevelHowtogettheblock?Currentlineisreplacedbythenewlyfetchedline39Checkthecacheline1selectedset(i):=1?Ifvalidbitisset,evictthelineTag3012745640GettheAddressoftheStartingByteConsidermemoryaddresslookslikethefollowingClearthelastbitsandgettheaddressA0m-1<tag><setindex><blockoffset>0m-1<tag><setindex><blockoffset>xxx………xxx000………00041ReadtheBlockfromtheMemoryPutAontheBus(AisA’000)A0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory42ReadtheBlockfromtheMemoryMainmemoryreadsA’fromthememorybus,retrieves8bytesx,andplacesitonthebusX0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory43ReadtheBlockfromtheMemoryCPUreadsxfromthebusandcopiesitintothecacheline0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory44ReadtheBlockfromtheMemoryIncreaseA’by1,andcopyYinA’+1intothecacheline.0A’+1YmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory45ReadtheBlockfromtheMemoryRepeatseveraltimes(4or8)0A’+3WmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory46Cacheline,setandblockBlockAfixed-sizedpacketofinformationMovesbackandforthbetweenacacheandmainmemory(oralower-levelcache)LineAcontainerinacachethatstoresAblock,thevalidbit,thetagbitsOtherinformationSetAcollectionofoneormorelines47Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xt=1s=2b=1xxx0vtagblock0000123set48Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]

missxt=1s=2b=1xxx10m[0]m[1]vtagblock000

0[0000](miss)(1)0123set49Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitxt=1s=2b=1xxx10m[0]m[1]vtagblock000

1[0001](hit)(2)0123set50Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissxt=1s=2b=1xxx10m[0]m[1]vtagblock011m[12]m[13]0

13[1101](miss)(3)0123set51Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissmissxt=1s=2b=1xxx11m[8]m[9]vtagblock011m[12]m[13]0

8[1000](miss)(4)0123set52Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissmiss

missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0

0[0000](miss)(5)0123set53Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissmiss

missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0

0[0000](miss)(5)0123setThrashing!54ConflictMissesinDirect-MappedCaches1 floatdotprod(floatx[8],floaty[8])2 {3 floatsum=0.0;4 inti;56 for(i=0;i<8;i++)7 sum+=x[i]*y[i];8 returnsum;9 }55ConflictMissesinDirect-MappedCachesAssumptionforxandyxisloadedintothe32bytesofcontiguousmemorystartingataddress0ystartsimmediatelyafterxataddress32AssumptionforthecacheAblockis16bytesbigenoughtoholdfourfloatsThecacheconsistsoftwosetsAtotalcachesizeof32bytesThelast6bitsoftheaddressofx[0]is000000Thelast6bitsoftheaddressofx[4]is010000Thelast6bitsoftheaddressofy[0]is100000Thelast6bitsoftheaddressofy[4]is11000056ConflictMissesinDirect-MappedCachesTrashingReadx[0]willloadx[0]~x[3]intothecacheReady[0]willoverloadthecachelinebyy[0]~y[3]57ConflictMissesinDirect-MappedCachesPaddingcanavoidthrashingClaimx[12]insteadofx[8]Thelast6bitsofY[0]is11000058Whyusemiddlebitsasindex?High-OrderBitIndexingAdjacentmemorylineswouldmaptosamecachesetPooruseofspatiallocalityMiddle-OrderBitIndexingConsecutivememorylinesmaptodifferentcachelinesCanholdC-byteregionofaddressspaceincacheatonetime59Direct-mappedcachesimulationAddressbitsAddress(decimal)Tagbits(t=1)Indexbits(s=2)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000011111111000001011010111100000101101011110101010101010101001122330011223360Direct-mappedcachesimulationAddressbitsAddress(decimal)Indexbits(s=2)Tagbits(t=1)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000001010101101010101111111100110011001100110101010101010101000011112222333361Whyusemiddlebitsasindex?4-lineCacheHigh-OrderBitIndexingMiddle-OrderBitIndexing000110110000000100100011010001010110011110001001101010111100110111101111000000010010001101000101011001111000100110101011110011011110111162SetassociativecachesCharacterizedbymorethanonelinepersetvalidtagset0:E=2linespersetset1:setS-1:•

•cacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblock63AccessingsetassociativecachesSetselectionidenticaltodirect-mappedcachevalidvalidtagtagset0:validvalidtagtagset1:validvalidtagtagsetS-1:•

•tbitssbits000010m-1bbitstagsetindexblockoffsetSelectedsetcacheblockcacheblockcacheblockcacheblockcacheblockcacheblock64AccessingsetassociativecachesLinematchingandwordselectionmustcomparethetagineachvalidlineintheselectedset.(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.10110w3w0w1w211001tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.3012745665AssociativeCacheCache16lines4-bytelinesize2-waysetassociative11109876543210OffsetIndexTag66SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123110150––––11B1000204081360––––2321436D8F0920D13672F01D3310––––316111C2DF0367SimpleMemorySystemCacheIdxTagValidB0B1B2B342413A00518942D0––––52D19315DA3B50B0––––6120––––616104963415713183771BD37140––––68AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x1A

Hit?No

69LineReplacementonMissesIfallthecachelinesofthesetarevalidWhichlineisselectedtobeevicted?LFU(least-frequently-used)ReplacethelinethathasbeenreferencedthefewesttimesoversomepasttimewindowLRU(least-recently-used)ReplacethelinethatwaslastaccessedthefurthestinthepastAllofthesepoliciesrequireadditionaltimeandhardware70SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xxt=2s=1b=1xx0vtagblock0000011set71SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]

miss100m[0]m[1]vtagblock000

0[0000](miss)(1)xxt=2s=1b=1xx72SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hit100m[0]m[1]vtagblock000

1[0001](hit)(2)xxt=2s=1b=1xx0011set73SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmiss100m[0]m[1]vtagblock1111m[12]m[13]0

13[1101](miss)(3)xxt=2s=1b=1xx0011set74SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissmiss110m[8]m[9]vtagblock111m[12]m[13]10

8[1000](miss)LRU(4)xxt=2s=1b=1xx0011set75SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):

0[0000]1[0001]13[1101]8[1000]0[0000]

miss

hitmissmiss

miss110m[8]m[9]vtagblock100m[0]m[1]10

0[0000](miss)LRU(5)xxt=2s=1b=1xx0011set76FullyassociativecachesCharacterizedbyallofthelinesintheonlyonesetNosetindexbitsintheaddressset0:validvalidtagtagcacheblockcacheblockvalidtagcacheblock…E=C/Blinesintheoneandonlysettbitsbbitstagblockoffset77AccessingfullyassociativecachesLinematchandwordselectionmustcomparethetagineachvalidline00110w3w0w1w211001tbits10001100m-1bbitstagblockoffset=1?=?(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.30127456100110111078IssueswithWritesWritehitsWritethroughCacheupdatesitscopyImmediatelywritesthecorrespondingcacheblocktomemoryWritebackDefersthememoryupdateaslongaspossibleWritingtheupdatedblocktomemoryonlywhenitisevictedfromthecacheMaintainsadirtybitforeachcacheline79IssueswithWritesWritemissesWrite-allocateLoadsthecorrespondingmemoryblockintothecacheThenupdatesthecacheblockNo-write-allocateBypassesthecacheWritestheworddirectlytomemoryCombinationWritethrough,no-write-allocateWriteback,write-allocate(modernimplementation)80Multi-levelcachesL1d-cache,i-cache32k8-wayAccess:4cyclesL2unified-cache256k8-wayAccess:11cyclesL3unified-cache8M16-wayAccess:30~40cyclesBlocksize64bytesforallcache81CacheperformancemetricsMissRatefractionofmemoryreferencesnotfoundincache(misses/references)Typicalnumbers:3-10%forL1Canbequitesmall(<1%)forL2,dependingonsizeHitRatefractionofmemoryreferencesfoundincache(1-missrate)82CacheperformancemetricsHitTimetimetodeliveralineinthecachetotheprocessor(includestimetodeterminewhetherthelineisinthecache)Typicalnumbers:1-2clockcyclesforL1(4cyclesincorei7)5-10clockcyclesforL2(11cyclesincorei7)MissPenaltyadditionaltimerequiredbecauseofamissTypically50-200cyclesformainmemory(Trend:increasing!)83WhatdoesHitRateMean?ConsiderHitTime:2cyclesMissPenalty:200cyclesAverageaccesstime:Hitrate99%:2*0.99+200*0.01=4cyclesHitrate97%:2*0.97+200*0.03=8cyclesThisiswhy“missrate”isusedinsteadof“hitrate”84CacheperformancemetricsCachesizeHitratevs.hittimeBlocksizeSpatiallocalityvs.temporallocalityAssociativityThrashingCostSpeedMisspenaltyWritestrategySimple,readmisses,fewertransfer85WritingCache-FriendlyCodePrinciplesProgramswithbetterlocalitywilltendtohavelowermissratesProgramswithlowermissrateswilltendtorunfasterthanprogramswithhighermissrates86WritingCache-FriendlyCodeBasicapproachMakethecommoncasegofastProgramsoftenspendmostoftheirtimeinafewcorefunctions.ThesefunctionsoftenspendmostoftheirtimeinafewloopsMinimizethenumberofcachemissesineachinnerloop87WritingCache-FriendlyCode8[h]7[h]6[h]5[m]4[h]3[h]2[h]1[m]Accessorder,[h]itor[m]issi=7i=6i=5i=4i=3i=2i=1i=0v[i]Temporallocality,Thesevariablesareusuallyputinregisters(pp.650)intsumvec(intv[N]){ inti,sum=0;

for(i=0;i<N;i++) sum+=v[i]; returnsum;}Assumption:WordsarebytesCacheblocksare4wordsVisblockaligned88Writingcache-friendlycodeTemporallocalityRepeatedreferencestolocalvariablesaregoodbecausethecompilercancachethemintheregisterfile89Writingcache-friendlycodeSpatiallocalityStride-1referencespatternsaregoodbecausecachesatalllevelsofthememoryhierarchystoredataascontiguousblocksSpatiallocalityisespeciallyimportantinprogramsthatoperateonmultidimensionalarrays90Writingcache-friendlycodeExample(pp.651,M=4,N=8)intsumarrayrows(inta[M][N]){ inti,j,sum=0;

for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Assumption:Sameasthepreviousone91Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]9[m]17[m]25[m]2[h]10[h]18[h]26[h]3[h]11[h]19[h]27[h]4[h]12[h]20[h]28[h]5[m]13[m]21[m]29[m]6[h]14[h]22[h]30[h]7[h]15[h]23[h]31[h]8[h]16[h]24[h]32[h]92Writingcache-friendlycodeExample(pp.651,M=4,N=8)

intsumarraycols(inta[M][N]){ inti,j,sum=0;

for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Assumption:SameasthepreviousoneArraysizeislargerthanthecachesize93Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]2[m]3[m]4[m]5[m]6[m]7[m]8[m]9[m]10[m]11[m]12[m]13[m]14[m]15[m]16[m]17[m]18[m]19[m]20[m]21[m]22[m]23[m]24[m]25[m]26[m]27[m]28[m]29[m]30[m]31[m]32[m]94TheMemoryMountain95TheMemoryMountainReadthroughput(readbandwidth)TheratethataprogramreadsdatafromthememorysystemMemorymountainReaddatafromanarrayAtwo-dimensionalfunctionofreadbandwidthismeasuredCharacterizesthecapabilitiesofthememorysystemforeachcomputer96WorkingSetDimension97AccessStrideDimension98TheMemoryMountainDataSizeMAXBYTES(64M)bytesorMAXELEMS(8M)doublesPartiallyaccessedWorkingset:from64MBto2KBStride:from1to6499Memorymountainmainroutine/*mountain.c-Generatethememorymountain.*/#defineMINBYTES(1<<11)/*Workingsetsizerangesfrom2KB*/#defineMAXBYTES(1<<26)/*...upto64MB*/#defineMAXSTRIDE64/*Stridesrangefrom1to64*/#defineMAXELEMSMAXBYTES/sizeof(data)doubledata[MAXELEMS];/*Thearraywe'llbetraversing*/100Memorymountainmainroutineintmain(){intsize; /*Workingsetsize(inbytes)*/intstride; /*Stride(inarrayelements)*/doubleMhz; /*Clockfrequency*/init_data(data,MAXELEMS);/*Initializeeachelementindatato1*/Mhz=mhz(0); /*Estimatetheclockfrequency*/101Memorymountainmainroutinefor(size=MAXBYTES;size>=MINBYTES;size>>=1){ for(stride=1;stride<=MAXSTRIDE;stride++) printf("%.1f

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论