会员注册 | 登录 | 微信快捷登录 支付宝快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

   首页 人人文库网 > 资源分类 > PDF文档下载

31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf

  • 资源星级:
  • 资源大小:420.46KB   全文页数:7页
  • 资源格式: PDF        下载权限:注册会员/VIP会员
您还没有登陆,请先登录。登陆后即可下载此文档。
  合作网站登录: 微信快捷登录 支付宝快捷登录   QQ登录   微博登录
友情提示
2:本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3:本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf

LOCKUPFREEINSTRUCTIONFETCH/PREFETCHCACHEORGANIZATIONDAVIDKROFTControlDataCanada,Ltd.CanadianDevelopmentDivisionMississauga,Ontario,CanadaABSTRACTInthepastdecade,therehasbeenmuchliteraturedescribingvariouscacheorganizationsthatexploitgeneralprogrammingidiosyncrasiestoobtainmaximumhitratetheprobabilitythatarequesteddatumisnowresidentinthecache.Little,ifany,hasbeenpresentedtoexploit1theinherentdualinputnatureofthecacheand2themanydatumreferencetypecentralprocessorinstructions.Nomatterhowhighthecachehitrateis,acachemissmayimposeapenaltyonsubsequentcachereferences.Thispenaltyisthenecessityofwaitinguntilthemissedrequesteddatumisreceivedfromcentralmemoryand,possibly,forcacheupdate.Forthetwocasesabove,thecachereferencesfollowingamissdonotrequiretheinformationofthedatumnotresidentinthecache,andarethereforepenalizedinthisfashion.Inthispaper,acacheorganizationispresentedthatessentiallyeliminatesthispenalty.Thiscacheorganizationalfeaturehasbeenincorporatedinacache/memoryinterfacesubsystemdesign,andthedesignhasbeenimplementedandprototyped.Anexistingsimpleinstructionsetmachinehasverifiedtheadvantageofthisfeaturefuture,moreextensiveandsophisticatedinstructionsetmachinesmayobviouslytakemoreadvantage.Priortoprototyping,simulationsverifiedtheadvantage.INTRODUCTIONAcachebuffert,2isasmall,fastmemoryholdingmostrecentlyaccesseddataanditssurroundingneighbors,Becausetheaccesstimeofthisbufferisusuallyanorderofmagnitudegreaterthanmainorcentralmemory,andthestandardsoftwarepracticeistolocalizedata,theeffectivememoryaccesstimeisconsiderablyreducedwhenacachebufferisincluded.Thecostincrementforthiswhencomparedwiththecostofcentralmemoryalongwiththeaboveaccesstimeadvantageinferscosteffectiveness.acachemiss,spaceallocation,andreplacementalgorithmstomaximizehitrate.Anothermethodpresentedtoincreasethehitratewasselectiveprefetchinq.Allthesemethodsassumethecachecanhandleonlyonerequestatatimeonamiss,thecachestaysbusyservicingtherequestuntilthedataisreceivedfrommemoryand,possibly,forcachebufferupdate.Inthispaper,acacheorganizationispresentedthatincreasestheeffectivenessofanormalcacheinclusionbyusingtheinherentdualinputnatureofanoverallcacheandthemanydatareferenceinstructions.Inotherwords,itwouldbeextremelyusefultopipelinetherequestsintothecacheatthecachehitthroughputrateregardlessofanymisses.Ifthiscouldbeaccornplishedthenallfetchanti/orprefetchofinstructionscouldbetotallytransparenttotheexecutionunit.Also,forinstructionsthatrequireanumberofdatareferences,therequestscouldbealmostentirelyoverlapped.Obviously,requestscouldnotbestreamedintothecacheatthehitthroughputrateindefinitely.Thereisalimit.Thisorganizationslimitisimposedbythenumb.erofmissesthathavenotbeencompletelyprocessedthatthecachewillkeeptrackofsimultaneouslywithoutlockup.ORGANIZATIONInadditiontothestandardblocks,thiscacheorganizationrequiresthefollowing1.Oneunresolvedmissinformation/statusholdingregisterMSHRforeachmissthatwillbehandledconcurrently.Now,acceptingtheusefulnessofacachebuffer,onelooksintowaysofincreasingitseffectivenessthatis,furtherdecreasingtheeffectivememoryaccesstime.Considerableresearchhasbeendonetofinetuneacachedesignforvariousrequlrernents.PThisfinetuningconsistedofselectingoptimaltotalcachebuffersize,blocksizethenumberofbytestoberequestedon8101497111/81/0000/008100.751981IEEE2.3.Onenwaycomparator,inwhichnisthenumberofMSHRregisters,forregisteringhitsondataintransitfrommemory.Aninputstacktoholdthetotalnumberofreceiveddatawordspossiblyoutstanding.Thesizeofthisstack,consequently,isequaltotheblocksizeinwordstimesthenumberofMSHRregisters.4.MSHRstatusupdateandcollectingnetworks.5.Theappropriatecontrolunitenhancementtoaccommodate1through4.Figure1isasimplifiedblockdiagramofthecacheorganization.Asetassociativeoperationisassumed.Includedaretherequiredblocksforasetassociativecachetagarraysandcontrol,cachebuffer,thecentralmemoryinterfaceblocksmemoryrequestor,memoryreceiver.andthecacheenhancementblocksmissinfoholdingregisters,misscomparatorandstatuscollection,inputstack.Themissinfoholdingregistersholdallnecessaryinformationto1handlethecentralmemoryreceiveddataproperlyand2informthemaincachecontrol,throughthemisscomparatorandstatuscollector,ofallhitandotherstatusofdataintransitfrommemory.Theinputstackisnecessarytoleavethemaincachebufferavailableforoverlappedreadsandwrites.NotethatthisorganizationallowsfordatajustreceivedfrommemoryorintheinputstacktobesentimmediatelytotherequestingCPUunits.Ofcourse,thenumberofMSHRregistersisimportant.Aswithsetsizeblocksperset.theincrementalvaluedecreasesrapidlywiththenumberofregisters.Thisisgood,becausethecostincreasessignificantlywiththenumberofregisters.Figure2presentsaqualitativecurve.Theaveragedelaytimeiscausedbylockoutonoutstandingmisses.Thisdelaytime,ofcourse,isalsodependentoncacheinputrequestandhitrates.Inthedegeneratecase,1MSHRregisterofreducedsizeisrequired2MSHRregistersallowforoverlapwhileonemissisoutstanding,butstillwouldlockupthecacheinputonmultiplemissesoutstanding.OwingtocostconsiderationsandincrementaleffectivenessgainedonincreasingthenumberofMSHRregisters,4registersappeartobeoptimal.fThenecessaryinformationcontainedwithinoneoftheseMSHRregistersincludesthefollowingFirst.thecachebufferaddress,alongwiththeinputrequestaddress,isrequired.Thecachebufferaddressiskepttoknowwheretoplacethereturningmemorydatatheinputrequestaddressissavedtodetermineif,onsubsequentrequests,thedatarequestedisonitswayfromcentralmemory.Second,inputrequestidentificationtags,alongwiththesendtoCPUstatus,arestored.TliisinformationpermitsthecachetoreturntoCPUrequestingunitsonlythedatarequestedandreturnitwithitsidentificationtag.Third,ininputstackindicatorsareusedtoallowforreadingdatadirectlyfromtheinputstack.Fourth,acodeforexample,onebitperbyteforpartialwriteisheldforeachwordtoindicatewhatbytesofthewordhavebeenwrittentothecachebuffer.Thiscodecontrolsthecachebufferwriteupdateandallowsdispensingofdataforbufferareasthathavebeentotallywrittenafterrequested.Thecache,thus,hasthecapabilityofprocessingpartialwriteinputrequestsontheflywithoutpurging.Ofcourse,thispartialwritecodemaynotbeincorporatedifthecacheblockispurgedonapartialwriterequesttoawordinablockintransitfrommemory.Last.somecontrolinformationtheregistercontainsvalidinformationonlyforreturningrequesteddata,butnotforcachebufferupdateandthenumberofwordsoftheblockthathavebeenreceivedandwritten,ifrequired,intothecachebufferisneeded.Therefore,eachMSHRregistercontains1.Cachebufferaddress2.Inputrequestaddress3.Inputidentificationtagsoneperword4.SendtoCPUindicatorsoneperword5.Ininputstackindicatorsoneperword6.Partialwritecodesoneperword7.Numberofwordsofblocksprocessed8.Validinformationindicator9.ObsoleteindicatorinformationnotvalidforcacheupdateorMSHRhitondataintransitOPERATIONTheoperationcanbesplitintotwobasicpartsmemoryreceiver/inputstackoperationsandtagarraycontroloperations.Formemoryreceiver/inputstackoperations,thefieldsofMSHRinterrogatedarethefollowing1.SendtoCPUindicator2.Inputidentificationtags3.Cachebufferaddress4.Partialwritecodes825.Obsoleteindicator6.ValidindicatorWhenawordisreceivedfrommemory,itissenttotheCPUrequestingunitifthesendtoCPUindicatorissettheappropriateidentificationtagaccompaniesthedata.ThiswordisalsowrittenintotheinputstackifthewordsspacehasnotbeenpreviouslytotallywritteninthecachebufferorifMSHRisnotobsoleteinvalidforcacheupdate.Thewordsofdataareremovedfromthisinputstackonafirstin,firstoutbasisandarewrittenintothecachebufferusingfields3and4.Ofcourse,MSHRmustholdvalidinformationwheninterrogated,oranerrorsignalwillbegenerated.Aslightdiversionisnecessaryatthispointtoexplaincachedatatagging.Onamiss,thecacherequestsablockofwords.Alongwitheachword,acachetagissent.ThistagpointstotheparticularassignedMSHRandindicatesthewordoftheblock.NotethatthecachesavesinMSHRtherequestingunitsidentificationtag.Thistaggingclosestheremainingopenlinkforthehandlingofdatareturnedfrommemoryandremovesallrestrictionsonmemoryontheorderofresponses.Ifaparticularprocessor/memoryinterfaceallowsforadatawidthofablockofwordsforcachetocentralmemoryrequests,thecachedatataggingmaybesimplifiedbymerelypointingtotheparticularassignedMSHR.If,however,allotherdatapathsarestillonewordwide,themainoperationswouldbeessentiallyunchanged.Consequently,thisextendedinterfacewouldnotsignificantlyreducethecontrolcomplexityortheaveragelockouttimedelayperrequest.ThefieldsoftheMSHRupdatedduringmemoryreceiver/inputstackoperationsarethefollowingtotallywrittenorhavinganobsoleteMSHR,oriswrittenintothecachebuffer,thenumberofwordsprocessedcounterisincremented.Onnumberofwordsprocessedcounteroverflowallwordsforablockhavebeenreceived,thevalidorusedMSHRindicatoriscleared.Fortagarraycontroloperations,thefollowingfieldsofMSHRsareinterrogated1.Inputrequestaddresses2.SendtoCPUindicators3.Ininputstackindicators4.Partialwritecodes5.Validindicator6.ObsoleteindicatorFields1,5,and6areusedalongwithcurrentinputrequestaddressandthenwayMSHRcomparatortodetermineifthereisahitonpreviouslymisseddatastillbeinghandledpreviousmisshit.Fields2,3,and4produceoneofthefollowingstatesforthepreviousmisshitPartiallywrittenPartialwritecodehasatleastonebitset.TotallywrittenPartialwritecodeisallts.IninputstackAlreadyaskedforSendtoCPUindicatorisalreadyset.3.Numberofwordsofblockprocessed1.2.IninputstackindicatorsPartialwritecodesFigure3indicatestheactionsfollowedbythetagarraycontrolunderalltheabovecombinationsforapreviousmisshit.Onamiss,aMSHRisassigned,andthefollowingisperformed1.Validindicatorset4.Validinformationindicatorbeingusedindicator2.ObsoleteindicatorclearedTheininputstackindicatorsaresetwhenthedatawordiswrittenintotheinputstackandclearedwhendataisremovedfromtheinputstackandwrittenintothecachebuffer.Thepartialwritecodeissettoindicatetotallywrittenwhenthedatawordfromcentralmemoryindicatesthecachebuffer.Inaddition,wheneveradatawordisdisposedofbecauseofbeing833.4.5.CachebufferaddresssavedinassignedMSHRInputrequestaddresssavedinassignedMSHRAppropriatesendtoCPUindicatorsetandotherscleared7.AllpartialwritecodesassociatedwithassignedMSHRcleared8.AllMSHRspointingtosamecachebufferaddresspurgedSetpartialwritecodetoalltsNotethatactions5and6willvaryifthecachefunctionwasaprefetchallsendtoCPUindicatorsarecleared,andnotagissaved.Action8preventsdatafromapreviousallocationofacachebufferblockfromoverwritingthepresentallocationsdata.Onamissandpreviousmisshitthecachebufferblockwasreallocatedforthesameinputaddressbeforealldatawasreceived,MSHRissetobsoletetopreventpossiblesubsequentmultiplehitsintheMSHRcomparator.6.InputidentificationtagsavedinappropriatepositionCONCLUSIONSThiscacheorganizationhasbeendesigned,prototyped,andverified.ThedesignallowsforthedisablingoftheMSHRregisters.Usingthiscapability,thedirecteffectofthenumberofMSHRregistersontheexecutiontimesofanumberofapplicationswasnoted.Thereducedexecutiontimesoftheseapplicationsdirectlydemonstratedtheeffectivenessofthisenhancement.Itisbeyondthescopeofthispapertoanalyzequantitativelytheaveragelockoutdelay/requestwithrespecttothenumberofenabledMSHRregistersfordifferentcacheinputratesandhitratescachebuffersizes.Thisanalysiswillbereportedinfuturework.Thecostofthe4MSHRadditionstothedesignwasabout10ofthetotalcachecost.ACKNOWLEDGMENTSIMULTANEITYApreviousmisshitonadatawordjustbeingreceivedisdefinitelypossible.Dependingonthecontroloperation,thiswordmayhaveitscorrespondingsendtoCPUindicatorsoutputforcedtothesendconditionormaybereadoutoftheinputstackonthe.nextminorcycle.DIAGNOSABILITYTodiagnosethiscacheenhancementmorereadily,cacheinputfunctionsshouldbeaddedtoclearandsetthevalidindicatorsoftheMSHRregisters.ThiswouldallowthefollowingerrorconditionstobeforcedCachetagpointstononvalidMSHRregisterMultiplehitwithMSHRcomparatorPreviousmisshitstatustotallywrittenandnotpartiallywrittenAllotherfieldsoftheMSHRregistersmaybeverifiedbyusingthesespecialcacheinputfunctionsincombinationwiththestandardinputfunctionswithallcombinationsofaddresses,identificationtagsanddata.TheauthorthanksControlDataCanadafortheopportunitytodevelopthenewcacheorganizationpresentedinthispaper.184REFERENCESc.J.Conti.Conceptsofbufferstorage,IEEEComputerGroupNews,2March1969.2R.M.Meade.Howacachememoryenhancesacomputersperformance,ElectronicsJan.1972.3K.R.KaplanandR.O.Winder.Cachebasedcomputersystems,IEEEComputerMarch1973.4J.Bell,D.Casasent,andC.G.Bell.Aninvestigationofalternativecacheorganizations.IEEETransactionsonComputers,C23April1974.5J.H.KroegerandR.M.MeadeofCogarCorporation,WoppingersFall,NY.Cachebuffermemoryspecification.GA.V.Pohm,O.P.Agrawal.andR.N.Monroe.Thecostandperformancetradeoffsofbufferedmemories.ProceedingsoftheIEEE,63Aug.1973.7A.J.Smith.Sequentialprogramprefetchinginmemoryhierachies,IEEEComputerDec1978.BG.H.Toole.InstructionlookaheadandexecutiontrafficconsiderationsforthecachedesignDevelopmentdivisioninternalpaper,ControlDataCanada,1975.85CPUUNITSMEMORYREQUESTORCENTRALMEMORYADDRESSiaDATA..rADDRESSaDATATAG.CACHEuARRAYSlJJBUFFERAND...JADDRESSADDRESSlJJCONTROLMISSINFOCfaCONTROLHOLDINGL..REGISTERS......J0«aci.ZQ0«ADDRESSuQMISSCOMPARATOR.DATAuANDlJJlSTATUSSTATUSINPUT...JSTACKlJJCOLLECTIONCf.MEMORYDATARECEIVERCENTRALMEMORYIICPUINSTUNITCPUEXECUNITFigure1.CacheOrganization248NO.OFMISSINFOHOLOINGREGISTERSFigure2.QualitativeCurveforLockoutDelay86INPUTPARTIALLYTOTAL.L.YINAL.READYACTIONFUNCTIONWRITTENWRITTENINPUTASKEDSTACKFORREADNONONONOSETSENDTOCPUBITSAVEIOENTREADNONONOYESREADFROMCENTRALMEMORVBYPASSREADNONOYESXREADFROMSTACKREADYESNOXXREADFROtACENTRALMEMORYBYPASSRf4DYESYESXXREADFROIlCACHEBUFFERPREFETCHXXXXNOACTIONWRITXXXXWRITEBYTESTOCACHEBUFFER.SETAPPROPRIATPARTIALWRITESIT.WHEREX1DONTCAREFigure3.PreviousMissHitOperations87

注意事项

本文(31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf)为本站会员(baixue100)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网([email protected]),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5