会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

   首页 人人文库网 > 资源分类 > PDF文档下载

36-Organization and performance of a two-level virtual-real cache hierarchy.pdf

  • 资源星级:
  • 资源大小:985.98KB   全文页数:9页
  • 资源格式: PDF        下载权限:注册会员/VIP会员
您还没有登陆,请先登录。登陆后即可下载此文档。
  合作网站登录: 微信快捷登录 支付宝快捷登录   QQ登录   微博登录
友情提示
2:本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3:本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

36-Organization and performance of a two-level virtual-real cache hierarchy.pdf

OrganizationandPerformanceofaTwoLevelVirtualRealCacheHierarchyWenHannWang,JeanLoupBaerandHenryM.LevyDepartmentofComputerScience,FR35UniversityofWashingtonSeattle,WA98195AbstractWeproposeandanalyzeatwolevelcacheorganizationthatprovideshighmemorybandwidth.Thefirstlevelcacheisaccesseddirectlybyvirtualaddresses.Itissmall,fast,and,withouttheburdenofaddresstranslation,caneasilybeoptimizedtomatchtheprocessorspeed.Thevirtuallyaddressedcacheisbackedupbyalargephysicallyaddressedcachethissecondlevelcacheprovidesahighhitratioandgreatlyreducesmemorytraffic.Weshowhowthesecondlevelcachecanbeeasilyextendedtosolvethesynonymproblemresultingfromtheuseofavirtuallyaddressedcacheatthefirstlevel.Moreover,thesecondlevelcachecanbeusedtoshieldthevirtuallyaddressedfirstlevelcachefromirrelevantcachecoherenceinterference.Finally,simulationresultsshowthatthisorganizationhasaperformanceadvantageoverahierarchyofphysicallyaddressedcachesinamultiprocessorenvironment.KeywordsCaches,VirtualMemory,Multiprocessors,MemoryHierarchy,CacheCoherence.1IntroductionVirtuallyaddressedcachesarebecomingcommonplaceinhighperformancemultiprocessorsduetotheneedforrapidcacheaccessill,3,171.Avirtuallyaddressedcachecanbeaccessedmorequicklythanaphysicallyaddressedcachebecauseitdoesnotrequireaprecedingvirtualtphysicaladdresstranslation.However,virtuallyaddressedcacheshaveseveralproblemsaswell.Forexample1.Theymustbecapableofhandlingsynonyms,thatis,multiplevirtualaddressesthatmaptothesamephysicaladdress.2.Whileaddresstranslationisnotrequiredbeforeavirtualcachelookup,addresstranslationisstillneededfollowingamiss.3.Inamultiprocessorsystem,theuseofavirtuallyaddressedcachemaycomplicatecachecoherencebecausebusaddressesarephysical,thereforeareversetranslationmayberequired.Permissiontocopywithoutfeeallorpartofthismaterialisgrantedprovidedthatthecopiesarenotmadeordistributedfordirectcommercialadvantage,theACMcopyrightnoticeandthetitleofthepublicationaaditsdateappear,andnoticeisenthatcopynPisbypermissionoftheAssociationforComputingMachinery.Tocopyotherwise,ortorepublish,requiresafeeand/orspecificpermission.4.I/Odevicesusephysicaladdressesaswell,alsorequiringreversetranslation.5.Avirtualcachemayneedtobeinvalidatedonacontextswitchbecausevirtualaddressesareuniquetoasingleprocess.Noneoftheseproblemsisinsolvablebyitself,andseveralschemeshavebeenproposedformanagingvirtualcaches.Forexample,dualtagsets,onevirtualandonephysical,canbeusedforeachcacheentry7,61.Asanotherexample,theSPURsystemrestrictstheuseofaddressspace,prohibitscachingofI/Obuffers,andrequiresbustransmissionofbothvirtualandphysicaladdressesll.However,theseschemestendtohaveperformanceshortcomingsorunpleasantimplicationsforsystemsoftware.Virtuallyaddressedcachesarefundamentallycomplicated,andthistimeorspacecomplexityreducestheabilityofthecachetomatchtheeverincreasingneedsofmodernprocessors.Toattackthisproblem,weproposeatwolevelcacheorganizationinvolvingavirtuallyaddressedfirstlevelcacheandaphysicallyaddressedsecondlevelcacherecentstudiesoftwoleveluniprocessorandmultiprocessorcachescanbefoundin4,5,12,131.Thesmallfirstlevelcache.canbefasttomeettherequirementsofhighspeedprocessorsitisvirtuallyaddressedtoavoidtheneedforaddresstranslation.ThelargesecondlevelcachewillreducemissratiosandmemorytrafficitisphysicallyaddressedtosimplifytheI/Oandmultiprocessorcoherenceproblems.Furthermore,weshowhowthesecondlevelcachecanbeutilizedtosolvethesynonymproblemandtoshieldthefirstlevelcachefromirrelevantcachecoherencetraffic.Overall,webelievethatthistwolevelvirtualrealorganizationrsimplifiesthedesignofthefirstlevel,whereperformanceiscrucial,whilesolvingsomeofthedifficultproblemsatthesecondlevel,wheretimeandspacearemoreeasilyavailable.Ourorganizationinvolvestheuseofpointersinthetwocachestokeeptrackofthemappingsbetweenvirtualcacheandphysicalcacheentries7.Wealsoprovideatranslationbufferatthesecondlevelwhichoperatesinparallelwithfirstlevelcachelookupsincaseamissrequiresreversetranslation.TracedrivensimulationsareusedtodemonstratetheadvantagesofatwolevelVRvirtualrealcacheoverahierarchyofrealaddressedcachesinamultiprocessorenvironment.Therestofthispaperisorganizedasfollows.Section2describestheapproachestakeninsolvingvariousproblemsrelatedtovirtualaddresscachesandpresentssomedesignchoicesforhighperformancemultiprocessorcaches.Section3givesthespecificorganizationofaVRtwolevelcachehierarchyanditsdetailedoperationaldescription.Section4presentsperformanceresultsfromsimulations,andconclusionsaredrawninsection5.01989ACM08847495/89/0000/014001.501402DesignissuesoftwolevelVRcachesforhighperformancemultiprocessorsThissectionaddressessomeimportantissuesinthedesignoftwolevelVRcachesandmotivatesourdesignchoices.Amoredetailedoperationaldescriptionofourapproachisgiveninthefollowingsection.Theproposedarchitectureforthisevaluationisasharedbusmultiprocessorwhereeachprocessorhasaprivate,twolevel,VcacheRcachehierarchyasshowninFigure1.RCdleRCeEVCacheVCacheP...pFigure1SharedbusorganizationWritepoliciesForatwolevelcache,thewritepolicycanbeselectedindependentlyateachlevel.Intheliterature,writethroughhasbeenproposedasthemostreasonablewritepolicyforthefirstlevelcacheinatwolevelhierarchy,whilewritebackisadvocatedforthesecondlevello,8,131.Amajormotivationforthechoiceofwritethroughatthefirstlevelisthatcachecoherencecontrolissimplified.Inthiscase,thefirstandsecondlevelcacheswillalwayscontainidenticalvalues.Thereareseveralproblems,however,withusingafirstlevelwritethroughcache.First,assumingnowriteallocate,writethroughcacheswillhavesmallerhitratiosthanwritebackcaches.Second,awritetakeslongerunderwritethroughbecausethesecondlevelcachemustbeupdatedaswellprimarymemorymayalsoneedtobeupdateddependingonthewritepolicyforthesecondlevel.Thereducedwritelatencywithwritethroughcanbegreatlyhiddenbytheuseofwritebuffersbetweenthefirstandsecondlevels,butseveralwritebuffersmaybeneeded.Table1,forexample,showsthatintheexecutionoftheVAXprogrampopscf.section4,30ofwritesareduetoprocedurecalls,eachofwhichtypicallygeneratessixormoresuccessivewrites.Table2showstheinterwriteintervaldistributionforasnapshot411,237referencesofthesametraceusinga16KdirectmappedcachewithaMbyteblocksize.Ascanbeseen,thehighpercentageofshortinterwriteintervalsconfirmstheneedforseveralbuffers.Unfortunately,whilewritebufferscanreducethewritelatencyofthefirstlevelcache,theyreintroduceacomplexitythatWritethroughwasintendedtoavoid,namelycachecoherence.Writebufferscanholdmodifieddataforwhichotherprocessorsmightencounteramiss.Thus,cachecoherencycontrolmustbeprovidedforthewritebuffersoneverycachecoherencetransaction.Thesedifficultiesleadustofavorthewritebackpolicyforourvirtuallyaddressedcacheatthefirstlevel.no.ofwr.percallcounttotalwrites1332243004285210Table1Nulmberofwritesduetoprocedmreca116481973510andlarger3245Table2Interwriteintervalssnapshotof411,237referencesThesynonymproblemAspreviouslynoted,atwolevelVRorganizationcanbeusedtosolvethesynonymproblem.Thesolutionrequirestheuseofareversetranslationtable15fordetectingsynonyms,andanaturalplacetoputthattableisatthesecondlevel.Ourtwolevelorganizationpermitsanddetectssynonyms,butguaranteesthatatmostonecopyofadataelementexistsintheVcacheatanytime.Eachsecondlevelcacheblockwillhaveapointertoitsfirstlevelchildblock,ifoneexists.Ifweguaranteeaninclusionproperty,wheretheRcachecontainsasupersetofthetagsintheVcache,thereversetranslationinformationcanbestoredinlogVcachesize/pagesizesupersetbitsineachRcacheblock.ForeachentryintheRcachewithachildintheVcache,theseextrabits,togetherwiththepageoffset,providetheVcachelocationofitschild.WhenamissoccursintheVcache,thevirtualaddressistranslatedusingasecondleveltranslationbufferandtheRcacheisaccessed.IfanRcachehitoccurs,theRcachecheckswhetherthedataisalsointheVcacheunderanothervirtualaddressasynonym.Ifso,itsimplyinvalidatesthatVcachecopyandmovesthedatatothenewvirtualaddressintheVcache.Thus,whileadataelementcanhavesynonyms,itisalwaysstoredintheVcacheusingthelastvirtualaddresswithwhichitwasaccessed.NotethatourapproachindealingwiththesynonymproblemhassomesimilaritiestoGoodmansapproach7.OnecanviewourapproachasmovingGoodmansrealdirectoryfrombeingjustforsnoopingtobeingassociatedwiththeleveltwocache.Thismoveprovidestwobenefits.First,ithidesthecostofGoodmansextra,realdirectorybymakingittheleveltwocachedirectory.Second,itreducesthemissescausedbyrealaddresscollisionsviamakingtherealdirectorymuchbigger.ContextswitchingInamultiprogrammingenvironment,addressesareuniquetoeachprocessandthereforetheVcachemustbeflushedwheneveracontextswitchoccurs.Thismightbecostlyforalargevirtuallyaddressedcache.Forsmallcacheswebelievethepenaltyonhitratioswillbenegligibleandthisisconfirmedbyoursimulationresultscf.Section4.However,ifawritebackpolicyisusedfortheVcache,asubstantialnumberofwritebacksmayoccurateachcontextswitch,whichgreatlyincreasescontextswitchlatency.AnothersolutiontoavoidtheaddressmappingconflictistoattachaprocessidentifiertoeachtagentryoftheVcache.ThisapproachdoesnotimprovethehitratioforasmallVcacheI,butcanavoidthelargenumberofwritebacksatcontextswitchtime.Unfortunately,thisapproachincreasesthecomplexityofatwolevelhierarchybecausetheVcacheneedstobepurgedorselectivelyflushedwhenaTLBentryofaninactiveprocessisreplacedbyanentryoftheactiveprocess,oraprocessidisreassigned.WewishtohavethebenefitsofredncedcontextswitchlatencywithoutneedingtoflushtheVcachewhenaTLBentrychanges.OurapproachmeetsthesegoalsbyinvalidatingallVcacheblocksonacontextswitchbutnotwritingthembackatthattime.Instead,eachblockiswrittenbackonlywhenitisreplaced,thatis,whenanewblockisreadintothatcacheslot.Thewritesarethusdistributedintimewherethelatencycanbehiddenusingwritebackbuffers.Toimplementthisscheme,weaddtwonewfieldstoeachVcacheblock.First,weaddaswappedvalidbit,whichissetforeachVcacheblockonacontextswitch.Uponareplacement,iftheVcachefindsablockwithswappedvalidset,itcheckswhetherthatblockisalsomarkedbothdirtyandvalidifso,thatblockmustbewrittenback.Second,weaddanrpointer,whichistheloworderbitsofthepagenumber,toeachVcacheblock.Therpointer,togetherwiththepageoffset,issufficienttolinkaVcacheentrytoitscorrespondinglocationintheRcache.Thislinkagemakesawritebackorastatecheckefficient,sincethereisnoneedforanaddresstranslation.Thisapproachusesspacecomparabletothatoftheprocessidentifierscheme,butwithoutitsdisadvantages.Table3showstheeffectoftheswappedvalidbitherewe.see.theinterwriteintervalfromthesamebenchmarkasTable2whentheswappedvalidbitisused.Becauseswappedwritebacksaretypicallyfarapartfromotherswappedwritebacks,asinglewritebackbufferissufficienttooverlapswappedwritebackswithprocessorexecution.Oursimulationsshowthatwithasinglebuffertheamountofstallingonaswappedwritebackisindeednegligible.Ontheotherhand,iftheincrementalwritebackisnotusedweneedtowritebackoverahundredblocksatcontextswitchingtimeforthisspecificbenchmark.Noticethatthenumberofwritebacksneededduetocontextswitchingisafunctionofcachesize,cacheorganization,thedurationoftherunningstateofaprocess,andtheworkload.CachecoherenceWhiletwolevelcachesareattractive,cachecoherencecontroliscomplicatedbyatwolevelscheme.Withoutspecialattentiontothecoherenceproblem,thefirstlevelcachewillbedisturbedbyeverycoherencyrequestonthebus.Asolutiontothisproblemistousethesecondlevelcacheasafiltertoshieldthefirstlevelcachefromirrelevantinterference.Inordertoachievethis,weneedtoimposeaninclusionpropertywherethetagsoftheTable3Writeintervalwithwritebackandswappedwritebacksnapshotof411,237referencessecondlevelcacheareasupersetofthetagsofitschildcache.Wesaythatamultilevelcachehierarchyhas,theinclusionpropertyifthissupersetrelationholds.Imposinginclusionisalsoessentialforsolvingthesynonymproblemasstatedabove.Inamultiprocessorenvironment,theinclusionpropertycannotbeheldevenwithaglobalLRUreplacement4.In5thefollowingreplacementalgorithmwasproposedasbneoftheconditionstoimposetheinclusion.lFirstlevelAnyreplacementalgorithmwilldoe.g.,LRU.Notifythesecondlevelcacheoftheblockbeingreplaced.lSecondlevelReplaceablockwhichdoesnotexistinthefirstlevelthisisdonebycheckinganinclusionbitthereisoneinclusionbitperblocktoindicatewhethertheblockispresentinthefirstlevel.Thegeneratproblemwithinclusionisitsimplicationsforalargesetsizeinthesecondleveli.e.,highassociativity.Byfollowingthesameapproachasin51,andlettingSbethenumberofsets,Bibetheblocksize,andeibethecachesizeofalevelicache,wecanshowthatinordertoimposeinclusionundertheabovereplacementalgorithm,thesetassociativityofthesecondlevelcacheA2mustbeundertheusualpracticalsituationswhereS2S,,Bz_,size2sizelandBlSl2pagesizez.Inpracticalcases,thisconstraintcanbetoostricttobefeasible.Forexample,iftheVcacheis16Kbytes,thepagesizeis4Kbytes,andBzis4timesaslargeasB1,evenwithadirectmappedVcacheweneeda16wayRcachetoa,chievetheinclusion.TorelaxthestrictconstraintonthesetassociativityoftheRcache,wechangethereplacementruleoftheRcachetooperateasfollowsreplaceablockwiththeinclusionbitclearifthereisoneotherwisereplaceablockaccordingtosomepredefinedreplacementalgorithmandinvalidatethecorrespondingVcacheblock.NotethatthelatterwonthappenveryoftensincetheRcacheismuchlargerthantheVcache.Forexample,theanalysisofthemultiprocessortrace,popsover3millionmemoryreferences,showsthatonly21inclusioninvalidationsareneedediftheVcacheis16Kbytes,awaysetassociativewitha16byteblocksizeandtheRcacheis256Kbyteswithsamesetsizeandblocksize.ifBISIBrintheVcache.4PerformanceInthissection,wecomparetherelativeperformanceofvirtualrealVRandrealrealRRtwolevelcaches.WealsoexaminethemeritsofsplittingthefirstlevelvirtuallyaddressedcacheintoIandDcaches.Finally,wemeasuretheeffectoftheRcacheinshieldingtheVcachefromirrelevantcachecoherenceinterference.Togathertheperformancefigures,weusetracedrivensimulationsandthreeparallelprogramtracespops,thorandabaqus2,141.Inpopsandthor,contextswitchesoccurrarelywhiletheyarefrequentinabaqus.Table5givesasummaryofsomecharacteristicsofthesetraces.RelativeperformanceofVRandRRtwolevelcachesTocomparetheperformanceofVRandRRtwolevelcaches,wegatherthehitratiosatdifferentlevelsthehitratiosarethenusedingenericmemoryaccesstimeequationstopredictrelativeperformances.WeassumethattheinclusionpropertydefinedpreviouslyalsoholdsfortheRRtwolevelcache.Forsimplicity,weconsideronlydirectmappedcachesatbothlevels.Thegenericaccesstimeequationofatwolevelcachehierarchyisasfollowsxc,Probhitatlevel1xaccesstimeatlevel1Probhitatlevel2rsatlevel1xaccesstimeatlevel2tprobmissatlevel1and2xmemoryaccesstimethatisTecchtlt1hhzt21hr1hrh2t,wherehr,hzarehitratiosatlevels1and2,trandt2areaccesstimesatthetwolevels,andt,isthememoryaccesstimeincludingthebusoverhead.BecausethesecondlevelcachesarethesameforbothVRandRRorganizations,andbecauseinclusionholds,thenumberofmissesandthetrafficfromthesecondlevelcachearethesameinbothorganizations.ThereforethethirdtermintheaboveequationisthesameforbothVRandRRorganizations.Assumingthathandlingasynonymhasacostequivalentofhandlingamissinthefirstlevelcachethathitsinthesecondlevelcache,therelativeperformancewherethereisahitinthehierarchycanbeestimatedsolelyonthefirsttwotermsoftheaboveequation.Table6showsthehitratiosatbothlevelsofVRandRRorganizationsforthethreetracesunderthreedifferentpairsoffirstandsecondlevelcachesizes.Figures4,5and6depicttherelativeperformanceofthetwoorganizationsunderdifferentdegreesofassumedRcachedegradationduetoaddresstranslationoverhead.Thesefiguresplottherelativeperformanceofthetwohierarchieswitht24tlvs.thepercentageofslowdownduetoaddresstranslationforvariousfirstlevel/secondlevelcachesizes.Thepointsontheyaxiscorrespondtonoslowdownatall.Fromthesefigureswecandrawthefollowingconclusions.LetUSassumethatthereisnotime,penaltyinvolvedinperformingavirtualrealaddresstranslationinconjunctionwiththeaccesstothefirstlevelcache.Whencontextswitchesoccurrarely,asisthecaseforthefirsttwotracesFigures4and5,theperformancesoftheVRandRRhierarchiesarealmostindistinguishablethepointsontheyaxisarethesame.Whencontextswitchesarefrequent,asinthethirdtraceFigure6,theVRhierarchyisslowerby2to6dependingonthesizeoftheVcachealargerVcacheseemstoimplyalargerrelativedegradation.Now,letusassumeatimepenaltyforthetranslation.Therearetwopossiblereasonsforthispenalty.ThefirstisthatTLBaccessandcacheaccesscannotbecompletelyoverlappedmsoonathecachesizeislargerthanthepagesizemultipliedbythesetssociativity.Second,evenifthereweretotaloverlap,therewouldstillbeanextracomparisonnecessarytocheckthevalidityofacachehit.Fromtheobservation5ofthepreviousparagraph,itisclearthattheVRhierarchywillperformbetterinthecase.ofrarecontextswitches.Therelativeimprovementisapproximatelyequaltotheoverheadofaddresstranslation.WhatisinterestingIstoseethecrossoverpointforthecaseoffrequentcontextswitches.FromFigure6,weseethattheVRhierarthywillhaveabetterperformancewhentheaddresstranslation5105downthefirstlevelRcacheaccessby6ormore.Since6isaconservativefigureforthepenaltyduetotheinsertionofaTLBatthefirstlevel,itappearsthattheVRhierarchyisabettersolution.ItsperformanceisagoodathatofanRRhierarchyanditscostislesssincetheTLBdoesnothavetobetracenum.ofcpustotalrefsinstrcountdatareaddatawritecontextswitchcountthor43283k1517k139Ok376k21POPS43286k1718k1285k283k7abaqus21196k514k600k82k292Table5Characteristicsoftraces145Table6hitratiosTable7HitratiosforsmallfirstlevelcachesFl4AverageusxsstimeVS.slowdownofREachethor..............,.......__..,.....__......,,/,lm4d/,.q,/j/.,..i,..____._....______....._k.,4___.......____...4....._.....................,61218FistlevelRacheslowdownpacmtageFigure5AveragewxsstimeVS.slowdownofRcachepp,,w___...______._..____.......I__._....____...______...__....._______.._______....i1d061218FistlevelRcacheslowdownpercentage612.I8FirstlevelRcacheslowdownpercentweimplementedinfastlogic.AnotheradvantageisthatproblemssuchasTLBcoherencecanalsobehandledatthesecondlevel.Theresultspresentedaboveassumed4Kto16Kfirstlevelcaches,whichmaybeimpracticalforsomeadvancedtechnologies,suchasGaAs.However,webelievethattheVRorganizationisevenmoreattractiveforhierarchieswithsmallerfirstlevelcaches.OurresultsinTable7showthatforsmallerfimtlevelcachese.g.,.5Kto2K,thefirstlevelhitratiosofVRandRRorganizationsarenearlyidentical.Therefore,performanceofaVRhierarchywillbesuperiorgivenanypenaltyforaTLBlookup.Inaddition,fortechnologiesinwhichspaceisatapremium,wecantradethefirstlevelTLBofanRRhierarchyforalargerfirstlevelcacheinaVRhierarchy.Thisinturnprovideslargerhitratiosandhencesmalleraverageaccesstime.SplittingthefirstlevelvirtuallyaddressedcacheThereareanumberofreasonswhyitisadvantageoustosplitthefirstlevelcacheintoseparateIandDcaches.First,thebandwidthcanalmostbedoubledforpipelinedprocessorswhereaninstructionfetchcanoccuratthesametimeasadatafetchofapreviousinstructione.g.,theIBM801andMotorola88000.Second,eachIandDcacheissmallerandhasthepotentialtobeoptimizedforitsspeed.Third,andthispertainsmostlytoVcaches,theIcacheissimplerthantheDcachesinceitdoesnotneedtohandlethesynonymandthecachecoherenceproblemsprovidedthatselfmodifyingprogramsarenotpermitted.Adisadvantage,however,isthatweneedmorewiringsorpinsfortheprocessorandcachemodule.Itisimportanttoassess,however,ifsplittingthecacheintoIDcomponentswillimproveperformance.OurresultsinTable8,9and10showthatthehitratiosofsplitIDcachesareveryclosetothatofaunifiedIDcacheandarenotnecessarilyworse.Inthesetables,theIandDseparatecachesareofequalsizesi.e.,inthe4KexampletheIcacheandtheDcacheareeach2K.Similarresultshavebeenfoundin9,131.Thus,wewouldadvocatesuchasplitforaVRhierarchy.thor4K/64K8K/128K16K/256Kdatareadsplit0.9240.9370.945unified0.9130.9380.950datawritesplit0.9520.9620.969unified0.9460.9660.972instructionsplit0.9570.9630.989unified0.9300.9730.984overallsplit0.9420.9520.968unified0.9250.9570.968Table8Hitratiosoflevel1cachesforthethortracePOPS4K/64K8K/128K16K/256Kdatareadsplit0.9020.9120.923unified0.9000.9150.926datawritesplit0.9360.9460.955unified0.93710.94810.958instructionsplit10.94710.96610.978unified0.9480.9630.974overallepht0.9280.9440.955111111IunifiedI0.928I0.943I0.954ITable9Hitratiosoflevel1cachesforthepopstraceabaqus4K/64K8K/128KlSK/256Kdatareadsplit0.7950.8180.837unified0.8060.8290.845datawritesplit0.8410.8610.875unified0.8470.8570.895instructionsplit0.9200.9470.949unified0.9070.9260.938overallsplit0.8520.8760.888unified0.8520.8730.888Table10Hitratiosoflevel1cachesfortheabaqustraceShieldingcachecoherenceinterferenceAnimportantadvantageofthetwolevelapproachisthattheRcachecanshieldtheVcachefromirrelevantcachecoherenceinterference.Forexample,onareadmissbusrequest,theRcacheneedstosendaflushrequesttoitsVcacheonlywhentheVcachecontainsamodifiedcopyofthedataotherwisetheVcachewillnotbedisrupted.NotethatthisshieldingeffectisachievedbecausetheinclusionpropertyholdsinourVRtwolevelcache.ImposinginclusionmightnotseemtobeessentialforanRRtwolevelhierarchybecausethesynonymproblemisnotpresent.However,theresultsinTables11,12and13,whichgivethenumberofcoherencemessagesbeingpercolatedtoeachfirstlevelcache,showthataVRtwolevelcachehasmuchlesscoherenceinterferenceatthefirstlevelthanthatofanRRtwolevelcachewithoutinclusion.TheresultsalsoshowthatinclusionisimportantinanRRtwolevelcachesinceitresultsinapproximatelythesamesavingsincoherencemessagestothefirstlevelcache.4Webelievethattheshieldingeffectoncachecoherencewillbemoreprominentasthenumberofprocessorsincreases.Thisisduetothefactthatmorebuscoherencerequestswillbegeneratedfromalargernumberofprocessors,andwithouttheshielding,afirstlevelcachewillbedisruptedmoreoften.OurresultsinTables11,124cpusand132cpusreflectthiseffect.Forexample,ontheaverage,thefirstlevelcacheofaVRhierarchyencountersabouthalfthecoherencemessagesthanthatoftheRRhierarchywithoutinclusionforthetwoprocessortracecf.Table13,whereasforfourprocessortracesthefirstlevelcacheoftheVRhierarchyencountersfromthreetosixtimesfewercoherencemessages.Weplantofurtherconfirmthisobservationwhenweareinpossessionoflargerscaletraces.5ConclusionsOneofthemostchallengingissuesincomputerdesignisthesupportofhighmemorybandwidth.Inthispaper,wehaveproposedWenoticethatRRwithinclusionhasover10fewercoherencemessagesthanthatofVRfortheabaqustrace.Thisdiscrepancyisduetoalargeamountofinclusioninvalidationsincurredinthisspecifictraceduetoalargenumberofcontextswitchings.Table11NumberofcoherencemessagestothefirstlevelcacheTable12Numberofcoherencemessagestothefirstlevelcacheabaqus4K/64K8K/128K16K/256KCPUVRRRinc1RRnoinclVRRRinc1RRnoinclVRRRinc1RRnoincl010961843618855116779379212951106798532260311052780292072610547952824202105991002826845Table13Numberofcoherencemessagestothefirstlevelcache147atwolevelcachehierarchytoaddressthisissue.Wehavearguedthatthefirstlevelcacheisbestaccesseddirectlybyvirtualaddresses.Webackupthesmallvirtuallyaddressedcachebyalargesecondlevelcache.Avirtuallyaddressedfirstlevelcachedoesnotrequireaddresstranslationandcanbeoptimizedtomatchtheprocessorspeed.Throughtheuseofaswappedvalidbit,weavoidtheclusteringofwritebacksatcontextswitchingtime.Thedistributionofthesewritebacksismoreevenlyspreadovertime.Thelargesecondlevelcacheprovidesahighhitratioandreducesalargeamountofmemorytraffic.Wehaveshownhowthesecondlevelcachecanbeeasilyextendedtosolvethesynonymproblemresultingfromtheuseofavirtuallyaddressedcacheatthefirstlevel.Furthermore,thesecondlevelcachecanbeusedeffectivelytoshieldthevirtuallyaddressedfirstlevelcachefromirrelevantcachecoherenceinterference.Oursimulationresultsshowthatwhencontextswitchesarerare,thevirtuallyaddressedcacheoptionhascomparableperformancetoitsphysicallyaddressedcounterpart,evenassumingnoaddresstranslationoverhead.Whencontextswitchesoccurfrequently,thevirtuallyaddressedcacheoptionhasaperformanceedgewhenasmalladdresstranslationpenaltyistakenintoaccount,andthesmallerthevirtuallyaddressedcachethelargertherelativeperformanceedge.Wealsoadvocatesplittingthevirtuallyaddressedcacheintoseparatedinstructionanddatacaches.ThisapproachhasthepotentialofdoublingthememorybandwidthsinceourresultsshowthatthehitratiosofsplitinstructionanddatacachesareveryclosetothatofasingleIDcache.Asafinalremark,wenotethatcacheperformanceisworkloaddependent.InthisstudywehaveconfinedourselvestoalimitedVAXmultiprocessorworkload.Weplantoenlargeourworkloadsampleassoona8weareinpossessionofothermultiprocessortraces.AcknowledgmentThisworkwassupportedinpartbyNationalScienceFoundationGrantsNo.CC8702915andCCR8619663,BoeingComputerServices,DigitalEquipmentCorporationtheSystemResearchCenterandtheExternalResearchProgramandaGTEfellowship.TheexperimentalpartofthisstudycouldnothavebeenpossiblewithoutDickSiteswhomadethetracesavailabletousandArrantAgarwalwhoallowedustosharehispostprocessingprogramsandwhopatientlyansweredourmanyquestions.WealsothankthemembersoftheComputerArchitecturelunch,especiallyTomAnderson,JonBertoni,SanglyulMinandJohnZahorjanfortheirexcellentcommentsandsuggestions.ReferenceslAgarwal,A.,R.L.SitesandM.Horowitz.ATUMAnewtechniqueforcapturingaddresstracesusingmicrocode.InProc.13thSymposiumonComputerArchitecture,pages119127,1986.2Agarwal,A.,R.Simoni,J.HennessyandM.Horowitz.Anevaluationofdirectoryschemesforcachecoherence.InProc.15thSymposiumonComputerArchitecture,pages280289,1988.3Atkinson,R.R.andE.M.McCreight.Thedragonprocessor.InProc.ArchitecturalSupportforProgmmmingLanguagesandOpemtingSystemsASPLOSII,pages6569,1987.4Baer,J.L.andW.H.Wang.Architecturalchoicesformultilevelcachehierarchies.InPrac.16thInternationalConjerenceonPamllelProcessing,pages258261,1987.51Baer,J.L.andW.H.Wang.Ontheinclusionpropertyformultilevelcachehierarchies.InProc.15thSymposiumonComputerArchitecture,pages7380,1988.61Cheriton,D.R.,G.SlavenburgandP.Boyle.SoftwarecontrolledcachesintheVMPmultiprocessor.InPrac.13thSymposiumonComputerArchitectun,pages367374,1986.71Goodman,J.Coherencyformultiprocessorvirtualaddresscaches.InPrac.AmhitectumlSupportforProgrammingLanguagesandOpemtingSystemsASPLOSII,pages7281,1987.PIGoodman,J.andP.J.Woest.TheWisconsinmulticubeAnewlargescalecachecoherentmultiprocessor.InProc.15thSymposiumonComputerArchitecture,pages422431,1988.PIPO1Halkala,I.J.andP.H.Kutvonen.SpYitcacheorganizations.InProc.Performance84,pages459.472,1984.Hattori,A.,Koshino,M.andS.Kamimoto.ThreelevelhierarchicalstoragesystemforFACOMM380/382.InProc.InformationProcessingIFIP,pages693697,1983.1111PII31WII51I311171Hill,M.etal.DesigndecisionsinSPUR.Computer,1911822,November1986.Przybylski,StevenA.PerformanceDirectedMemoryHierarchyDesign.Ph.DDissertation,StanfordUniversity,1988.ShortR.T.andH.M.Levy.Asimulationstudyoftwolevelcaches.InProc.15thSymposiumonComputerArchitecture,pages8188,1988.Sites,R.L.andA.Agarwal.MultiprocessorcacheanalysisusingATUM.InPrac.15thSymposiumonComputerArchitecture,pages186195,1988.Smith,A.J.Cachememories.ComputingSurveys,143473530,September1982.Sweasey,P.andA.J.Smith.AclassofcompatiblecacheconsistencyprotocolsandtheirsupportbytheIEEEfuturebus.InProc.13thSymposiumonComputerArchitecture,pages414423,1986.Cheng,Ray.VirtualaddresscacheinUNIX.InProc.USENIXConference,pages217224,June1987.148

注意事项

本文(36-Organization and performance of a two-level virtual-real cache hierarchy.pdf)为本站会员(baixue100)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网(发送邮件至renrendoc@163.com或直接QQ联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5