会员注册 | 登录 | 微信快捷登录 支付宝快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

36-Organization and performance of a two-level virtual-real cache hierarchy.pdf36-Organization and performance of a two-level virtual-real cache hierarchy.pdf -- 5 元

宽屏显示 收藏 分享

页面加载中... ... 广告 0 秒后退出

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

OrganizationandPerformanceofaTwoLevelVirtualRealCacheHierarchyWenHannWang,JeanLoupBaerandHenryM.LevyDepartmentofComputerScience,FR35UniversityofWashingtonSeattle,WA98195AbstractWeproposeandanalyzeatwolevelcacheorganizationthatprovideshighmemorybandwidth.Thefirstlevelcacheisaccesseddirectlybyvirtualaddresses.Itissmall,fast,and,withouttheburdenofaddresstranslation,caneasilybeoptimizedtomatchtheprocessorspeed.Thevirtuallyaddressedcacheisbackedupbyalargephysicallyaddressedcachethissecondlevelcacheprovidesahighhitratioandgreatlyreducesmemorytraffic.Weshowhowthesecondlevelcachecanbeeasilyextendedtosolvethesynonymproblemresultingfromtheuseofavirtuallyaddressedcacheatthefirstlevel.Moreover,thesecondlevelcachecanbeusedtoshieldthevirtuallyaddressedfirstlevelcachefromirrelevantcachecoherenceinterference.Finally,simulationresultsshowthatthisorganizationhasaperformanceadvantageoverahierarchyofphysicallyaddressedcachesinamultiprocessorenvironment.KeywordsCaches,VirtualMemory,Multiprocessors,MemoryHierarchy,CacheCoherence.1IntroductionVirtuallyaddressedcachesarebecomingcommonplaceinhighperformancemultiprocessorsduetotheneedforrapidcacheaccessill,3,171.Avirtuallyaddressedcachecanbeaccessedmorequicklythanaphysicallyaddressedcachebecauseitdoesnotrequireaprecedingvirtualtphysicaladdresstranslation.However,virtuallyaddressedcacheshaveseveralproblemsaswell.Forexample1.Theymustbecapableofhandlingsynonyms,thatis,multiplevirtualaddressesthatmaptothesamephysicaladdress.2.Whileaddresstranslationisnotrequiredbeforeavirtualcachelookup,addresstranslationisstillneededfollowingamiss.3.Inamultiprocessorsystem,theuseofavirtuallyaddressedcachemaycomplicatecachecoherencebecausebusaddressesarephysical,thereforeareversetranslationmayberequired.Permissiontocopywithoutfeeallorpartofthismaterialisgrantedprovidedthatthecopiesarenotmadeordistributedfordirectcommercialadvantage,theACMcopyrightnoticeandthetitleofthepublicationaaditsdateappear,andnoticeisenthatcopynPisbypermissionoftheAssociationforComputingMachinery.Tocopyotherwise,ortorepublish,requiresafeeand/orspecificpermission.4.I/Odevicesusephysicaladdressesaswell,alsorequiringreversetranslation.5.Avirtualcachemayneedtobeinvalidatedonacontextswitchbecausevirtualaddressesareuniquetoasingleprocess.Noneoftheseproblemsisinsolvablebyitself,andseveralschemeshavebeenproposedformanagingvirtualcaches.Forexample,dualtagsets,onevirtualandonephysical,canbeusedforeachcacheentry7,61.Asanotherexample,theSPURsystemrestrictstheuseofaddressspace,prohibitscachingofI/Obuffers,andrequiresbustransmissionofbothvirtualandphysicaladdressesll.However,theseschemestendtohaveperformanceshortcomingsorunpleasantimplicationsforsystemsoftware.Virtuallyaddressedcachesarefundamentallycomplicated,andthistimeorspacecomplexityreducestheabilityofthecachetomatchtheeverincreasingneedsofmodernprocessors.Toattackthisproblem,weproposeatwolevelcacheorganizationinvolvingavirtuallyaddressedfirstlevelcacheandaphysicallyaddressedsecondlevelcacherecentstudiesoftwoleveluniprocessorandmultiprocessorcachescanbefoundin4,5,12,131.Thesmallfirstlevelcache.canbefasttomeettherequirementsofhighspeedprocessorsitisvirtuallyaddressedtoavoidtheneedforaddresstranslation.ThelargesecondlevelcachewillreducemissratiosandmemorytrafficitisphysicallyaddressedtosimplifytheI/Oandmultiprocessorcoherenceproblems.Furthermore,weshowhowthesecondlevelcachecanbeutilizedtosolvethesynonymproblemandtoshieldthefirstlevelcachefromirrelevantcachecoherencetraffic.Overall,webelievethatthistwolevelvirtualrealorganizationrsimplifiesthedesignofthefirstlevel,whereperformanceiscrucial,whilesolvingsomeofthedifficultproblemsatthesecondlevel,wheretimeandspacearemoreeasilyavailable.Ourorganizationinvolvestheuseofpointersinthetwocachestokeeptrackofthemappingsbetweenvirtualcacheandphysicalcacheentries7.Wealsoprovideatranslationbufferatthesecondlevelwhichoperatesinparallelwithfirstlevelcachelookupsincaseamissrequiresreversetranslation.TracedrivensimulationsareusedtodemonstratetheadvantagesofatwolevelVRvirtualrealcacheoverahierarchyofrealaddressedcachesinamultiprocessorenvironment.Therestofthispaperisorganizedasfollows.Section2describestheapproachestakeninsolvingvariousproblemsrelatedtovirtualaddresscachesandpresentssomedesignchoicesforhighperformancemultiprocessorcaches.Section3givesthespecificorganizationofaVRtwolevelcachehierarchyanditsdetailedoperationaldescription.Section4presentsperformanceresultsfromsimulations,andconclusionsaredrawninsection5.01989ACM08847495/89/0000/014001.501402DesignissuesoftwolevelVRcachesforhighperformancemultiprocessorsThissectionaddressessomeimportantissuesinthedesignoftwolevelVRcachesandmotivatesourdesignchoices.Amoredetailedoperationaldescriptionofourapproachisgiveninthefollowingsection.Theproposedarchitectureforthisevaluationisasharedbusmultiprocessorwhereeachprocessorhasaprivate,twolevel,VcacheRcachehierarchyasshowninFigure1.RCdleRCeEVCacheVCacheP...pFigure1SharedbusorganizationWritepoliciesForatwolevelcache,thewritepolicycanbeselectedindependentlyateachlevel.Intheliterature,writethroughhasbeenproposedasthemostreasonablewritepolicyforthefirstlevelcacheinatwolevelhierarchy,whilewritebackisadvocatedforthesecondlevello,8,131.Amajormotivationforthechoiceofwritethroughatthefirstlevelisthatcachecoherencecontrolissimplified.Inthiscase,thefirstandsecondlevelcacheswillalwayscontainidenticalvalues.Thereareseveralproblems,however,withusingafirstlevelwritethroughcache.First,assumingnowriteallocate,writethroughcacheswillhavesmallerhitratiosthanwritebackcaches.Second,awritetakeslongerunderwritethroughbecausethesecondlevelcachemustbeupdatedaswellprimarymemorymayalsoneedtobeupdateddependingonthewritepolicyforthesecondlevel.Thereducedwritelatencywithwritethroughcanbegreatlyhiddenbytheuseofwritebuffersbetweenthefirstandsecondlevels,butseveralwritebuffersmaybeneeded.Table1,forexample,showsthatintheexecutionoftheVAXprogrampopscf.section4,30ofwritesareduetoprocedurecalls,eachofwhichtypicallygeneratessixormoresuccessivewrites.Table2showstheinterwriteintervaldistributionforasnapshot411,237referencesofthesametraceusinga16KdirectmappedcachewithaMbyteblocksize.Ascanbeseen,thehighpercentageofshortinterwriteintervalsconfirmstheneedforseveralbuffers.Unfortunately,whilewritebufferscanreducethewritelatencyofthefirstlevelcache,theyreintroduceacomplexitythatWritethroughwasintendedtoavoid,namelycachecoherence.Writebufferscanholdmodifieddataforwhichotherprocessorsmightencounteramiss.Thus,cachecoherencycontrolmustbeprovidedforthewritebuffersoneverycachecoherencetransaction.Thesedifficultiesleadustofavorthewritebackpolicyforourvirtuallyaddressedcacheatthefirstlevel.no.ofwr.percallcounttotalwrites1332243004285210Table1Nulmberofwritesduetoprocedmreca116481973510andlarger3245Table2Interwriteintervalssnapshotof411,237referencesThesynonymproblemAspreviouslynoted,atwolevelVRorganizationcanbeusedtosolvethesynonymproblem.Thesolutionrequirestheuseofareversetranslationtable15fordetectingsynonyms,andanaturalplacetoputthattableisatthesecondlevel.Ourtwolevelorganizationpermitsanddetectssynonyms,butguaranteesthatatmostonecopyofadataelementexistsintheVcacheatanytime.Eachsecondlevelcacheblockwillhaveapointertoitsfirstlevelchildblock,ifoneexists.Ifweguaranteeaninclusionproperty,wheretheRcachecontainsasupersetofthetagsintheVcache,thereversetranslationinformationcanbestoredinlogVcachesize/pagesizesupersetbitsineachRcacheblock.ForeachentryintheRcachewithachildintheVcache,theseextrabits,togetherwiththepageoffset,providetheVcachelocationofitschild.WhenamissoccursintheVcache,thevirtualaddressistranslatedusingasecondleveltranslationbufferandtheRcacheisaccessed.IfanRcachehitoccurs,theRcachecheckswhetherthedataisalsointheVcacheunderanothervirtualaddressasynonym.Ifso,itsimplyinvalidatesthatVcachecopyandmovesthedatatothenewvirtualaddressintheVcache.Thus,whileadataelementcanhavesynonyms,itisalwaysstoredintheVcacheusingthelastvirtualaddresswithwhichitwasaccessed.NotethatourapproachindealingwiththesynonymproblemhassomesimilaritiestoGoodmansapproach7.OnecanviewourapproachasmovingGoodmansrealdirectoryfrombeingjustforsnoopingtobeingassociatedwiththeleveltwocache.Thismoveprovidestwobenefits.First,ithidesthecostofGoodmansextra,realdirectorybymakingittheleveltwocachedirectory.Second,itreducesthemissescausedbyrealaddresscollisionsviamakingtherealdirectorymuchbigger.ContextswitchingInamultiprogrammingenvironment,addressesareuniquetoeachprocessandthereforetheVcachemustbeflushedwheneveracontextswitchoccurs.Thismightbecostlyforalargevirtuallyaddressedcache.Forsmallcacheswebelievethepenaltyonhitratioswillbenegligibleandthisisconfirmedbyoursimulationresultscf.Section4.However,ifawritebackpolicyisusedfortheVcache,asubstantialnumberofwritebacksmayoccurateachcontextswitch,whichgreatlyincreasescontextswitchlatency.AnothersolutiontoavoidtheaddressmappingconflictistoattachaprocessidentifiertoeachtagentryoftheVcache.ThisapproachdoesnotimprovethehitratioforasmallVcacheI,butcanavoidthelargenumberofwritebacksatcontextswitchtime.Unfortunately,thisapproachincreasesthecomplexityofatwolevelhierarchybecausetheVcacheneedstobepurgedorselectivelyflushedwhenaTLBentryofaninactiveprocessisreplacedbyanentryoftheactiveprocess,oraprocessidisreassigned.WewishtohavethebenefitsofredncedcontextswitchlatencywithoutneedingtoflushtheVcachewhenaTLBentrychanges.OurapproachmeetsthesegoalsbyinvalidatingallVcacheblocksonacontextswitchbutnotwritingthembackatthattime.Instead,eachblockiswrittenbackonlywhenitisreplaced,thatis,whenanewblockisreadintothatcacheslot.Thewritesarethusdistributedintimewherethelatencycanbehiddenusingwritebackbuffers.Toimplementthisscheme,weaddtwonewfieldstoeachVcacheblock.First,weaddaswappedvalidbit,whichissetforeachVcacheblockonacontextswitch.Uponareplacement,iftheVcachefindsablockwithswappedvalidset,itcheckswhetherthatblockisalsomarkedbothdirtyandvalidifso,thatblockmustbewrittenback.Second,weaddanrpointer,whichistheloworderbitsofthepagenumber,toeachVcacheblock.Therpointer,togetherwiththepageoffset,issufficienttolinkaVcacheentrytoitscorrespondinglocationintheRcache.Thislinkagemakesawritebackorastatecheckefficient,sincethereisnoneedforanaddresstranslation.Thisapproachusesspacecomparabletothatoftheprocessidentifierscheme,butwithoutitsdisadvantages.Table3showstheeffectoftheswappedvalidbitherewe.see.theinterwriteintervalfromthesamebenchmarkasTable2whentheswappedvalidbitisused.Becauseswappedwritebacksaretypicallyfarapartfromotherswappedwritebacks,asinglewritebackbufferissufficienttooverlapswappedwritebackswithprocessorexecution.Oursimulationsshowthatwithasinglebuffertheamountofstallingonaswappedwritebackisindeednegligible.Ontheotherhand,iftheincrementalwritebackisnotusedweneedtowritebackoverahundredblocksatcontextswitchingtimeforthisspecificbenchmark.Noticethatthenumberofwritebacksneededduetocontextswitchingisafunctionofcachesize,cacheorganization,thedurationoftherunningstateofaprocess,andtheworkload.CachecoherenceWhiletwolevelcachesareattractive,cachecoherencecontroliscomplicatedbyatwolevelscheme.Withoutspecialattentiontothecoherenceproblem,thefirstlevelcachewillbedisturbedbyeverycoherencyrequestonthebus.Asolutiontothisproblemistousethesecondlevelcacheasafiltertoshieldthefirstlevelcachefromirrelevantinterference.Inordertoachievethis,weneedtoimposeaninclusionpropertywherethetagsoftheTable3Writeintervalwithwritebackandswappedwritebacksnapshotof411,237referencessecondlevelcacheareasupersetofthetagsofitschildcache.Wesaythatamultilevelcachehierarchyhas,theinclusionpropertyifthissupersetrelationholds.Imposinginclusionisalsoessentialforsolvingthesynonymproblemasstatedabove.Inamultiprocessorenvironment,theinclusionpropertycannotbeheldevenwithaglobalLRUreplacement4.In5thefollowingreplacementalgorithmwasproposedasbneoftheconditionstoimposetheinclusion.lFirstlevelAnyreplacementalgorithmwilldoe.g.,LRU.Notifythesecondlevelcacheoftheblockbeingreplaced.lSecondlevelReplaceablockwhichdoesnotexistinthefirstlevelthisisdonebycheckinganinclusionbitthereisoneinclusionbitperblocktoindicatewhethertheblockispresentinthefirstlevel.Thegeneratproblemwithinclusionisitsimplicationsforalargesetsizeinthesecondleveli.e.,highassociativity.Byfollowingthesameapproachasin51,andlettingSbethenumberofsets,Bibetheblocksize,andeibethecachesizeofalevelicache,wecanshowthatinordertoimposeinclusionundertheabovereplacementalgorithm,thesetassociativityofthesecondlevelcacheA2mustbeundertheusualpracticalsituationswhereS2S,,Bz_,size2sizelandBlSl2pagesizez.Inpracticalcases,thisconstraintcanbetoostricttobefeasible.Forexample,iftheVcacheis16Kbytes,thepagesizeis4Kbytes,andBzis4timesaslargeasB1,evenwithadirectmappedVcacheweneeda16wayRcachetoa,chievetheinclusion.TorelaxthestrictconstraintonthesetassociativityoftheRcache,wechangethereplacementruleoftheRcachetooperateasfollowsreplaceablockwiththeinclusionbitclearifthereisoneotherwisereplaceablockaccordingtosomepredefinedreplacementalgorithmandinvalidatethecorrespondingVcacheblock.NotethatthelatterwonthappenveryoftensincetheRcacheismuchlargerthantheVcache.Forexample,theanalysisofthemultiprocessortrace,popsover3millionmemoryreferences,showsthatonly21inclusioninvalidationsareneedediftheVcacheis16Kbytes,awaysetassociativewitha16byteblocksizeandtheRcacheis256Kbyteswithsamesetsizeandblocksize.ifBISIBrintheVcache.4PerformanceInthissection,wecomparetherelativeperformanceofvirtualrealVRandrealrealRRtwolevelcaches.WealsoexaminethemeritsofsplittingthefirstlevelvirtuallyaddressedcacheintoIandDcaches.Finally,wemeasuretheeffectoftheRcacheinshieldingtheVcachefromirrelevantcachecoherenceinterference.Togathertheperformancefigures,weusetracedrivensimulationsandthreeparallelprogramtracespops,thorandabaqus2,141.Inpopsandthor,contextswitchesoccurrarelywhiletheyarefrequentinabaqus.Table5givesasummaryofsomecharacteristicsofthesetraces.RelativeperformanceofVRandRRtwolevelcachesTocomparetheperformanceofVRandRRtwolevelcaches,wegatherthehitratiosatdifferentlevelsthehitratiosarethenusedingenericmemoryaccesstimeequationstopredictrelativeperformances.WeassumethattheinclusionpropertydefinedpreviouslyalsoholdsfortheRRtwolevelcache.Forsimplicity,weconsideronlydirectmappedcachesatbothlevels.Thegenericaccesstimeequationofatwolevelcachehierarchyisasfollowsxc,Probhitatlevel1xaccesstimeatlevel1Probhitatlevel2rsatlevel1xaccesstimeatlevel2tprobmissatlevel1and2xmemoryaccesstimethatisTecchtlt1hhzt21hr1hrh2t,wherehr,hzarehitratiosatlevels1and2,trandt2areaccesstimesatthetwolevels,andt,isthememoryaccesstimeincludingthebusoverhead.BecausethesecondlevelcachesarethesameforbothVRandRRorganizations,andbecauseinclusionholds,thenumberofmissesandthetrafficfromthesecondlevelcachearethesameinbothorganizations.ThereforethethirdtermintheaboveequationisthesameforbothVRandRRorganizations.Assumingthathandlingasynonymhasacostequivalentofhandlingamissinthefirstlevelcachethathitsinthesecondlevelcache,therelativeperformancewherethereisahitinthehierarchycanbeestimatedsolelyonthefirsttwotermsoftheaboveequation.Table6showsthehitratiosatbothlevelsofVRandRRorganizationsforthethreetracesunderthreedifferentpairsoffirstandsecondlevelcachesizes.Figures4,5and6depicttherelativeperformanceofthetwoorganizationsunderdifferentdegreesofassumedRcachedegradationduetoaddresstranslationoverhead.Thesefiguresplottherelativeperformanceofthetwohierarchieswitht24tlvs.thepercentageofslowdownduetoaddresstranslationforvariousfirstlevel/secondlevelcachesizes.Thepointsontheyaxiscorrespondtonoslowdownatall.Fromthesefigureswecandrawthefollowingconclusions.LetUSassumethatthereisnotime,penaltyinvolvedinperformingavirtualrealaddresstranslationinconjunctionwiththeaccesstothefirstlevelcache.Whencontextswitchesoccurrarely,asisthecaseforthefirsttwotracesFigures4and5,theperformancesoftheVRandRRhierarchiesarealmostindistinguishablethepointsontheyaxisarethesame.Whencontextswitchesarefrequent,asinthethirdtraceFigure6,theVRhierarchyisslowerby2to6dependingonthesizeoftheVcachealargerVcacheseemstoimplyalargerrelativedegradation.Now,letusassumeatimepenaltyforthetranslation.Therearetwopossiblereasonsforthispenalty.ThefirstisthatTLBaccessandcacheaccesscannotbecompletelyoverlappedmsoonathecachesizeislargerthanthepagesizemultipliedbythesetssociativity.Second,evenifthereweretotaloverlap,therewouldstillbeanextracomparisonnecessarytocheckthevalidityofacachehit.Fromtheobservation5ofthepreviousparagraph,itisclearthattheVRhierarchywillperformbetterinthecase.ofrarecontextswitches.Therelativeimprovementisapproximatelyequaltotheoverheadofaddresstranslation.WhatisinterestingIstoseethecrossoverpointforthecaseoffrequentcontextswitches.FromFigure6,weseethattheVRhierarthywillhaveabetterperformancewhentheaddresstranslation5105downthefirstlevelRcacheaccessby6ormore.Since6isaconservativefigureforthepenaltyduetotheinsertionofaTLBatthefirstlevel,itappearsthattheVRhierarchyisabettersolution.ItsperformanceisagoodathatofanRRhierarchyanditscostislesssincetheTLBdoesnothavetobetracenum.ofcpustotalrefsinstrcountdatareaddatawritecontextswitchcountthor43283k1517k139Ok376k21POPS43286k1718k1285k283k7abaqus21196k514k600k82k292Table5Characteristicsoftraces145Table6hitratiosTable7HitratiosforsmallfirstlevelcachesFl4AverageusxsstimeVS.slowdownofREachethor..............,.......__..,.....__......,,/,lm4d/,.q,/j/.,..i,..____._....______....._k.,4___.......____...4....._.....................,61218FistlevelRacheslowdownpacmtageFigure5AveragewxsstimeVS.slowdownofRcachepp,,w___...______._..____.......I__._....____...______...__....._______.._______....i1d061218FistlevelRcacheslowdownpercentage612.I8FirstlevelRcacheslowdownpercentweimplementedinfastlogic.AnotheradvantageisthatproblemssuchasTLBcoherencecanalsobehandledatthesecondlevel.Theresultspresentedaboveassumed4Kto16Kfirstlevelcaches,whichmaybeimpracticalforsomeadvancedtechnologies,suchasGaAs.However,webelievethattheVRorganizationisevenmoreattractiveforhierarchieswithsmallerfirstlevelcaches.OurresultsinTable7showthatforsmallerfimtlevelcachese.g.,.5Kto2K,thefirstlevelhitratiosofVRandRRorganizationsarenearlyidentical.Therefore,performanceofaVRhierarchywillbesuperiorgivenanypenaltyforaTLBlookup.Inaddition,fortechnologiesinwhichspaceisatapremium,wecantradethefirstlevelTLBofanRRhierarchyforalargerfirstlevelcacheinaVRhierarchy.Thisinturnprovideslargerhitratiosandhencesmalleraverageaccesstime.SplittingthefirstlevelvirtuallyaddressedcacheThereareanumberofreasonswhyitisadvantageoustosplitthefirstlevelcacheintoseparateIandDcaches.First,thebandwidthcanalmostbedoubledforpipelinedprocessorswhereaninstructionfetchcanoccuratthesametimeasadatafetchofapreviousinstructione.g.,theIBM801andMotorola88000.Second,eachIandDcacheissmallerandhasthepotentialtobeoptimizedforitsspeed.Third,andthispertainsmostlytoVcaches,theIcacheissimplerthantheDcachesinceitdoesnotneedtohandlethesynonymandthecachecoherenceproblemsprovidedthatselfmodifyingprogramsarenotpermitted.Adisadvantage,however,isthatweneedmorewiringsorpinsfortheprocessorandcachemodule.Itisimportanttoassess,however,ifsplittingthecacheintoIDcomponentswillimproveperformance.OurresultsinTable8,9and10showthatthehitratiosofsplitIDcachesareveryclosetothatofaunifiedIDcacheandarenotnecessarilyworse.Inthesetables,theIandDseparatecachesareofequalsizesi.e.,inthe4KexampletheIcacheandtheDcacheareeach2K.Similarresultshavebeenfoundin9,131.Thus,wewouldadvocatesuchasplitforaVRhierarchy.thor4K/64K8K/128K16K/256Kdatareadsplit0.9240.9370.945unified0.9130.9380.950datawritesplit0.9520.9620.969unified0.9460.9660.972instructionsplit0.9570.9630.989unified0.9300.9730.984overallsplit0.9420.9520.968unified0.9250.9570.968Table8Hitratiosoflevel1cachesforthethortracePOPS4K/64K8K/128K16K/256Kdatareadsplit0.9020.9120.923unified0.9000.9150.926datawritesplit0.9360.9460.955unified0.93710.94810.958instructionsplit10.94710.96610.978unified0.9480.9630.974overallepht0.9280.9440.955111111IunifiedI0.928I0.943I0.954ITable9Hitratiosoflevel1cachesforthepopstraceabaqus4K/64K8K/128KlSK/256Kdatareadsplit0.7950.8180.837unified0.8060.8290.845datawritesplit0.8410.8610.875unified0.8470.8570.895instructionsplit0.9200.9470.949unified0.9070.9260.938overallsplit0.8520.8760.888unified0.8520.8730.888Table10Hitratiosoflevel1cachesfortheabaqustraceShieldingcachecoherenceinterferenceAnimportantadvantageofthetwolevelapproachisthattheRcachecanshieldtheVcachefromirrelevantcachecoherenceinterference.Forexample,onareadmissbusrequest,theRcacheneedstosendaflushrequesttoitsVcacheonlywhentheVcachecontainsamodifiedcopyofthedataotherwisetheVcachewillnotbedisrupted.NotethatthisshieldingeffectisachievedbecausetheinclusionpropertyholdsinourVRtwolevelcache.ImposinginclusionmightnotseemtobeessentialforanRRtwolevelhierarchybecausethesynonymproblemisnotpresent.However,theresultsinTables11,12and13,whichgivethenumberofcoherencemessagesbeingpercolatedtoeachfirstlevelcache,showthataVRtwolevelcachehasmuchlesscoherenceinterferenceatthefirstlevelthanthatofanRRtwolevelcachewithoutinclusion.TheresultsalsoshowthatinclusionisimportantinanRRtwolevelcachesinceitresultsinapproximatelythesamesavingsincoherencemessagestothefirstlevelcache.4Webelievethattheshieldingeffectoncachecoherencewillbemoreprominentasthenumberofprocessorsincreases.Thisisduetothefactthatmorebuscoherencerequestswillbegeneratedfromalargernumberofprocessors,andwithouttheshielding,afirstlevelcachewillbedisruptedmoreoften.OurresultsinTables11,124cpusand132cpusreflectthiseffect.Forexample,ontheaverage,thefirstlevelcacheofaVRhierarchyencountersabouthalfthecoherencemessagesthanthatoftheRRhierarchywithoutinclusionforthetwoprocessortracecf.Table13,whereasforfourprocessortracesthefirstlevelcacheoftheVRhierarchyencountersfromthreetosixtimesfewercoherencemessages.Weplantofurtherconfirmthisobservationwhenweareinpossessionoflargerscaletraces.5ConclusionsOneofthemostchallengingissuesincomputerdesignisthesupportofhighmemorybandwidth.Inthispaper,wehaveproposedWenoticethatRRwithinclusionhasover10fewercoherencemessagesthanthatofVRfortheabaqustrace.Thisdiscrepancyisduetoalargeamountofinclusioninvalidationsincurredinthisspecifictraceduetoalargenumberofcontextswitchings.Table11NumberofcoherencemessagestothefirstlevelcacheTable12Numberofcoherencemessagestothefirstlevelcacheabaqus4K/64K8K/128K16K/256KCPUVRRRinc1RRnoinclVRRRinc1RRnoinclVRRRinc1RRnoincl010961843618855116779379212951106798532260311052780292072610547952824202105991002826845Table13Numberofcoherencemessagestothefirstlevelcache147atwolevelcachehierarchytoaddressthisissue.Wehavearguedthatthefirstlevelcacheisbestaccesseddirectlybyvirtualaddresses.Webackupthesmallvirtuallyaddressedcachebyalargesecondlevelcache.Avirtuallyaddressedfirstlevelcachedoesnotrequireaddresstranslationandcanbeoptimizedtomatchtheprocessorspeed.Throughtheuseofaswappedvalidbit,weavoidtheclusteringofwritebacksatcontextswitchingtime.Thedistributionofthesewritebacksismoreevenlyspreadovertime.Thelargesecondlevelcacheprovidesahighhitratioandreducesalargeamountofmemorytraffic.Wehaveshownhowthesecondlevelcachecanbeeasilyextendedtosolvethesynonymproblemresultingfromtheuseofavirtuallyaddressedcacheatthefirstlevel.Furthermore,thesecondlevelcachecanbeusedeffectivelytoshieldthevirtuallyaddressedfirstlevelcachefromirrelevantcachecoherenceinterference.Oursimulationresultsshowthatwhencontextswitchesarerare,thevirtuallyaddressedcacheoptionhascomparableperformancetoitsphysicallyaddressedcounterpart,evenassumingnoaddresstranslationoverhead.Whencontextswitchesoccurfrequently,thevirtuallyaddressedcacheoptionhasaperformanceedgewhenasmalladdresstranslationpenaltyistakenintoaccount,andthesmallerthevirtuallyaddressedcachethelargertherelativeperformanceedge.Wealsoadvocatesplittingthevirtuallyaddressedcacheintoseparatedinstructionanddatacaches.ThisapproachhasthepotentialofdoublingthememorybandwidthsinceourresultsshowthatthehitratiosofsplitinstructionanddatacachesareveryclosetothatofasingleIDcache.Asafinalremark,wenotethatcacheperformanceisworkloaddependent.InthisstudywehaveconfinedourselvestoalimitedVAXmultiprocessorworkload.Weplantoenlargeourworkloadsampleassoona8weareinpossessionofothermultiprocessortraces.AcknowledgmentThisworkwassupportedinpartbyNationalScienceFoundationGrantsNo.CC8702915andCCR8619663,BoeingComputerServices,DigitalEquipmentCorporationtheSystemResearchCenterandtheExternalResearchProgramandaGTEfellowship.TheexperimentalpartofthisstudycouldnothavebeenpossiblewithoutDickSiteswhomadethetracesavailabletousandArrantAgarwalwhoallowedustosharehispostprocessingprogramsandwhopatientlyansweredourmanyquestions.WealsothankthemembersoftheComputerArchitecturelunch,especiallyTomAnderson,JonBertoni,SanglyulMinandJohnZahorjanfortheirexcellentcommentsandsuggestions.ReferenceslAgarwal,A.,R.L.SitesandM.Horowitz.ATUMAnewtechniqueforcapturingaddresstracesusingmicrocode.InProc.13thSymposiumonComputerArchitecture,pages119127,1986.2Agarwal,A.,R.Simoni,J.HennessyandM.Horowitz.Anevaluationofdirectoryschemesforcachecoherence.InProc.15thSymposiumonComputerArchitecture,pages280289,1988.3Atkinson,R.R.andE.M.McCreight.Thedragonprocessor.InProc.ArchitecturalSupportforProgmmmingLanguagesandOpemtingSystemsASPLOSII,pages6569,1987.4Baer,J.L.andW.H.Wang.Architecturalchoicesformultilevelcachehierarchies.InPrac.16thInternationalConjerenceonPamllelProcessing,pages258261,1987.51Baer,J.L.andW.H.Wang.Ontheinclusionpropertyformultilevelcachehierarchies.InProc.15thSymposiumonComputerArchitecture,pages7380,1988.61Cheriton,D.R.,G.SlavenburgandP.Boyle.SoftwarecontrolledcachesintheVMPmultiprocessor.InPrac.13thSymposiumonComputerArchitectun,pages367374,1986.71Goodman,J.Coherencyformultiprocessorvirtualaddresscaches.InPrac.AmhitectumlSupportforProgrammingLanguagesandOpemtingSystemsASPLOSII,pages7281,1987.PIGoodman,J.andP.J.Woest.TheWisconsinmulticubeAnewlargescalecachecoherentmultiprocessor.InProc.15thSymposiumonComputerArchitecture,pages422431,1988.PIPO1Halkala,I.J.andP.H.Kutvonen.SpYitcacheorganizations.InProc.Performance84,pages459.472,1984.Hattori,A.,Koshino,M.andS.Kamimoto.ThreelevelhierarchicalstoragesystemforFACOMM380/382.InProc.InformationProcessingIFIP,pages693697,1983.1111PII31WII51I311171Hill,M.etal.DesigndecisionsinSPUR.Computer,1911822,November1986.Przybylski,StevenA.PerformanceDirectedMemoryHierarchyDesign.Ph.DDissertation,StanfordUniversity,1988.ShortR.T.andH.M.Levy.Asimulationstudyoftwolevelcaches.InProc.15thSymposiumonComputerArchitecture,pages8188,1988.Sites,R.L.andA.Agarwal.MultiprocessorcacheanalysisusingATUM.InPrac.15thSymposiumonComputerArchitecture,pages186195,1988.Smith,A.J.Cachememories.ComputingSurveys,143473530,September1982.Sweasey,P.andA.J.Smith.AclassofcompatiblecacheconsistencyprotocolsandtheirsupportbytheIEEEfuturebus.InProc.13thSymposiumonComputerArchitecture,pages414423,1986.Cheng,Ray.VirtualaddresscacheinUNIX.InProc.USENIXConference,pages217224,June1987.148
编号:201401051948256813    大小:985.98KB    格式:PDF    上传时间:2014-01-05
  【编辑】
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

当前资源信息

4.0
 
(2人评价)
浏览:27次
baixue100上传于2014-01-05

官方联系方式

客服手机:13961746681   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

相关资源

相关资源

相关搜索

工业、机械、能源、设计、建模、模具、工学  
关于我们 - 网站声明 - 网站地图 - 友情链接 - 网站客服客服 - 联系我们
copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5