会员注册 | 登录 | 微信快捷登录 QQ登录 微博登录 | 帮助中心 人人文库renrendoc.com美如初恋!
站内搜索 百度文库

热门搜索: 直缝焊接机 矿井提升机 循环球式转向器图纸 机器人手爪发展史 管道机器人dwg 动平衡试验台设计

31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf -- 5 元

宽屏显示 收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

LOCKUPFREEINSTRUCTIONFETCH/PREFETCHCACHEORGANIZATIONDAVIDKROFTControlDataCanada,Ltd.CanadianDevelopmentDivisionMississauga,Ontario,CanadaABSTRACTInthepastdecade,therehasbeenmuchliteraturedescribingvariouscacheorganizationsthatexploitgeneralprogrammingidiosyncrasiestoobtainmaximumhitratetheprobabilitythatarequesteddatumisnowresidentinthecache.Little,ifany,hasbeenpresentedtoexploit1theinherentdualinputnatureofthecacheand2themanydatumreferencetypecentralprocessorinstructions.Nomatterhowhighthecachehitrateis,acachemissmayimposeapenaltyonsubsequentcachereferences.Thispenaltyisthenecessityofwaitinguntilthemissedrequesteddatumisreceivedfromcentralmemoryand,possibly,forcacheupdate.Forthetwocasesabove,thecachereferencesfollowingamissdonotrequiretheinformationofthedatumnotresidentinthecache,andarethereforepenalizedinthisfashion.Inthispaper,acacheorganizationispresentedthatessentiallyeliminatesthispenalty.Thiscacheorganizationalfeaturehasbeenincorporatedinacache/memoryinterfacesubsystemdesign,andthedesignhasbeenimplementedandprototyped.Anexistingsimpleinstructionsetmachinehasverifiedtheadvantageofthisfeaturefuture,moreextensiveandsophisticatedinstructionsetmachinesmayobviouslytakemoreadvantage.Priortoprototyping,simulationsverifiedtheadvantage.INTRODUCTIONAcachebuffert,2isasmall,fastmemoryholdingmostrecentlyaccesseddataanditssurroundingneighbors,Becausetheaccesstimeofthisbufferisusuallyanorderofmagnitudegreaterthanmainorcentralmemory,andthestandardsoftwarepracticeistolocalizedata,theeffectivememoryaccesstimeisconsiderablyreducedwhenacachebufferisincluded.Thecostincrementforthiswhencomparedwiththecostofcentralmemoryalongwiththeaboveaccesstimeadvantageinferscosteffectiveness.acachemiss,spaceallocation,andreplacementalgorithmstomaximizehitrate.Anothermethodpresentedtoincreasethehitratewasselectiveprefetchinq.Allthesemethodsassumethecachecanhandleonlyonerequestatatimeonamiss,thecachestaysbusyservicingtherequestuntilthedataisreceivedfrommemoryand,possibly,forcachebufferupdate.Inthispaper,acacheorganizationispresentedthatincreasestheeffectivenessofanormalcacheinclusionbyusingtheinherentdualinputnatureofanoverallcacheandthemanydatareferenceinstructions.Inotherwords,itwouldbeextremelyusefultopipelinetherequestsintothecacheatthecachehitthroughputrateregardlessofanymisses.Ifthiscouldbeaccornplishedthenallfetchanti/orprefetchofinstructionscouldbetotallytransparenttotheexecutionunit.Also,forinstructionsthatrequireanumberofdatareferences,therequestscouldbealmostentirelyoverlapped.Obviously,requestscouldnotbestreamedintothecacheatthehitthroughputrateindefinitely.Thereisalimit.Thisorganizationslimitisimposedbythenumb.erofmissesthathavenotbeencompletelyprocessedthatthecachewillkeeptrackofsimultaneouslywithoutlockup.ORGANIZATIONInadditiontothestandardblocks,thiscacheorganizationrequiresthefollowing1.Oneunresolvedmissinformation/statusholdingregisterMSHRforeachmissthatwillbehandledconcurrently.Now,acceptingtheusefulnessofacachebuffer,onelooksintowaysofincreasingitseffectivenessthatis,furtherdecreasingtheeffectivememoryaccesstime.Considerableresearchhasbeendonetofinetuneacachedesignforvariousrequlrernents.PThisfinetuningconsistedofselectingoptimaltotalcachebuffersize,blocksizethenumberofbytestoberequestedon8101497111/81/0000/008100.751981IEEE2.3.Onenwaycomparator,inwhichnisthenumberofMSHRregisters,forregisteringhitsondataintransitfrommemory.Aninputstacktoholdthetotalnumberofreceiveddatawordspossiblyoutstanding.Thesizeofthisstack,consequently,isequaltotheblocksizeinwordstimesthenumberofMSHRregisters.4.MSHRstatusupdateandcollectingnetworks.5.Theappropriatecontrolunitenhancementtoaccommodate1through4.Figure1isasimplifiedblockdiagramofthecacheorganization.Asetassociativeoperationisassumed.Includedaretherequiredblocksforasetassociativecachetagarraysandcontrol,cachebuffer,thecentralmemoryinterfaceblocksmemoryrequestor,memoryreceiver.andthecacheenhancementblocksmissinfoholdingregisters,misscomparatorandstatuscollection,inputstack.Themissinfoholdingregistersholdallnecessaryinformationto1handlethecentralmemoryreceiveddataproperlyand2informthemaincachecontrol,throughthemisscomparatorandstatuscollector,ofallhitandotherstatusofdataintransitfrommemory.Theinputstackisnecessarytoleavethemaincachebufferavailableforoverlappedreadsandwrites.NotethatthisorganizationallowsfordatajustreceivedfrommemoryorintheinputstacktobesentimmediatelytotherequestingCPUunits.Ofcourse,thenumberofMSHRregistersisimportant.Aswithsetsizeblocksperset.theincrementalvaluedecreasesrapidlywiththenumberofregisters.Thisisgood,becausethecostincreasessignificantlywiththenumberofregisters.Figure2presentsaqualitativecurve.Theaveragedelaytimeiscausedbylockoutonoutstandingmisses.Thisdelaytime,ofcourse,isalsodependentoncacheinputrequestandhitrates.Inthedegeneratecase,1MSHRregisterofreducedsizeisrequired2MSHRregistersallowforoverlapwhileonemissisoutstanding,butstillwouldlockupthecacheinputonmultiplemissesoutstanding.OwingtocostconsiderationsandincrementaleffectivenessgainedonincreasingthenumberofMSHRregisters,4registersappeartobeoptimal.fThenecessaryinformationcontainedwithinoneoftheseMSHRregistersincludesthefollowingFirst.thecachebufferaddress,alongwiththeinputrequestaddress,isrequired.Thecachebufferaddressiskepttoknowwheretoplacethereturningmemorydatatheinputrequestaddressissavedtodetermineif,onsubsequentrequests,thedatarequestedisonitswayfromcentralmemory.Second,inputrequestidentificationtags,alongwiththesendtoCPUstatus,arestored.TliisinformationpermitsthecachetoreturntoCPUrequestingunitsonlythedatarequestedandreturnitwithitsidentificationtag.Third,ininputstackindicatorsareusedtoallowforreadingdatadirectlyfromtheinputstack.Fourth,acodeforexample,onebitperbyteforpartialwriteisheldforeachwordtoindicatewhatbytesofthewordhavebeenwrittentothecachebuffer.Thiscodecontrolsthecachebufferwriteupdateandallowsdispensingofdataforbufferareasthathavebeentotallywrittenafterrequested.Thecache,thus,hasthecapabilityofprocessingpartialwriteinputrequestsontheflywithoutpurging.Ofcourse,thispartialwritecodemaynotbeincorporatedifthecacheblockispurgedonapartialwriterequesttoawordinablockintransitfrommemory.Last.somecontrolinformationtheregistercontainsvalidinformationonlyforreturningrequesteddata,butnotforcachebufferupdateandthenumberofwordsoftheblockthathavebeenreceivedandwritten,ifrequired,intothecachebufferisneeded.Therefore,eachMSHRregistercontains1.Cachebufferaddress2.Inputrequestaddress3.Inputidentificationtagsoneperword4.SendtoCPUindicatorsoneperword5.Ininputstackindicatorsoneperword6.Partialwritecodesoneperword7.Numberofwordsofblocksprocessed8.Validinformationindicator9.ObsoleteindicatorinformationnotvalidforcacheupdateorMSHRhitondataintransitOPERATIONTheoperationcanbesplitintotwobasicpartsmemoryreceiver/inputstackoperationsandtagarraycontroloperations.Formemoryreceiver/inputstackoperations,thefieldsofMSHRinterrogatedarethefollowing1.SendtoCPUindicator2.Inputidentificationtags3.Cachebufferaddress4.Partialwritecodes825.Obsoleteindicator6.ValidindicatorWhenawordisreceivedfrommemory,itissenttotheCPUrequestingunitifthesendtoCPUindicatorissettheappropriateidentificationtagaccompaniesthedata.ThiswordisalsowrittenintotheinputstackifthewordsspacehasnotbeenpreviouslytotallywritteninthecachebufferorifMSHRisnotobsoleteinvalidforcacheupdate.Thewordsofdataareremovedfromthisinputstackonafirstin,firstoutbasisandarewrittenintothecachebufferusingfields3and4.Ofcourse,MSHRmustholdvalidinformationwheninterrogated,oranerrorsignalwillbegenerated.Aslightdiversionisnecessaryatthispointtoexplaincachedatatagging.Onamiss,thecacherequestsablockofwords.Alongwitheachword,acachetagissent.ThistagpointstotheparticularassignedMSHRandindicatesthewordoftheblock.NotethatthecachesavesinMSHRtherequestingunitsidentificationtag.Thistaggingclosestheremainingopenlinkforthehandlingofdatareturnedfrommemoryandremovesallrestrictionsonmemoryontheorderofresponses.Ifaparticularprocessor/memoryinterfaceallowsforadatawidthofablockofwordsforcachetocentralmemoryrequests,thecachedatataggingmaybesimplifiedbymerelypointingtotheparticularassignedMSHR.If,however,allotherdatapathsarestillonewordwide,themainoperationswouldbeessentiallyunchanged.Consequently,thisextendedinterfacewouldnotsignificantlyreducethecontrolcomplexityortheaveragelockouttimedelayperrequest.ThefieldsoftheMSHRupdatedduringmemoryreceiver/inputstackoperationsarethefollowingtotallywrittenorhavinganobsoleteMSHR,oriswrittenintothecachebuffer,thenumberofwordsprocessedcounterisincremented.Onnumberofwordsprocessedcounteroverflowallwordsforablockhavebeenreceived,thevalidorusedMSHRindicatoriscleared.Fortagarraycontroloperations,thefollowingfieldsofMSHRsareinterrogated1.Inputrequestaddresses2.SendtoCPUindicators3.Ininputstackindicators4.Partialwritecodes5.Validindicator6.ObsoleteindicatorFields1,5,and6areusedalongwithcurrentinputrequestaddressandthenwayMSHRcomparatortodetermineifthereisahitonpreviouslymisseddatastillbeinghandledpreviousmisshit.Fields2,3,and4produceoneofthefollowingstatesforthepreviousmisshitPartiallywrittenPartialwritecodehasatleastonebitset.TotallywrittenPartialwritecodeisallts.IninputstackAlreadyaskedforSendtoCPUindicatorisalreadyset.3.Numberofwordsofblockprocessed1.2.IninputstackindicatorsPartialwritecodesFigure3indicatestheactionsfollowedbythetagarraycontrolunderalltheabovecombinationsforapreviousmisshit.Onamiss,aMSHRisassigned,andthefollowingisperformed1.Validindicatorset4.Validinformationindicatorbeingusedindicator2.ObsoleteindicatorclearedTheininputstackindicatorsaresetwhenthedatawordiswrittenintotheinputstackandclearedwhendataisremovedfromtheinputstackandwrittenintothecachebuffer.Thepartialwritecodeissettoindicatetotallywrittenwhenthedatawordfromcentralmemoryindicatesthecachebuffer.Inaddition,wheneveradatawordisdisposedofbecauseofbeing833.4.5.CachebufferaddresssavedinassignedMSHRInputrequestaddresssavedinassignedMSHRAppropriatesendtoCPUindicatorsetandotherscleared7.AllpartialwritecodesassociatedwithassignedMSHRcleared8.AllMSHRspointingtosamecachebufferaddresspurgedSetpartialwritecodetoalltsNotethatactions5and6willvaryifthecachefunctionwasaprefetchallsendtoCPUindicatorsarecleared,andnotagissaved.Action8preventsdatafromapreviousallocationofacachebufferblockfromoverwritingthepresentallocationsdata.Onamissandpreviousmisshitthecachebufferblockwasreallocatedforthesameinputaddressbeforealldatawasreceived,MSHRissetobsoletetopreventpossiblesubsequentmultiplehitsintheMSHRcomparator.6.InputidentificationtagsavedinappropriatepositionCONCLUSIONSThiscacheorganizationhasbeendesigned,prototyped,andverified.ThedesignallowsforthedisablingoftheMSHRregisters.Usingthiscapability,thedirecteffectofthenumberofMSHRregistersontheexecutiontimesofanumberofapplicationswasnoted.Thereducedexecutiontimesoftheseapplicationsdirectlydemonstratedtheeffectivenessofthisenhancement.Itisbeyondthescopeofthispapertoanalyzequantitativelytheaveragelockoutdelay/requestwithrespecttothenumberofenabledMSHRregistersfordifferentcacheinputratesandhitratescachebuffersizes.Thisanalysiswillbereportedinfuturework.Thecostofthe4MSHRadditionstothedesignwasabout10ofthetotalcachecost.ACKNOWLEDGMENTSIMULTANEITYApreviousmisshitonadatawordjustbeingreceivedisdefinitelypossible.Dependingonthecontroloperation,thiswordmayhaveitscorrespondingsendtoCPUindicatorsoutputforcedtothesendconditionormaybereadoutoftheinputstackonthe.nextminorcycle.DIAGNOSABILITYTodiagnosethiscacheenhancementmorereadily,cacheinputfunctionsshouldbeaddedtoclearandsetthevalidindicatorsoftheMSHRregisters.ThiswouldallowthefollowingerrorconditionstobeforcedCachetagpointstononvalidMSHRregisterMultiplehitwithMSHRcomparatorPreviousmisshitstatustotallywrittenandnotpartiallywrittenAllotherfieldsoftheMSHRregistersmaybeverifiedbyusingthesespecialcacheinputfunctionsincombinationwiththestandardinputfunctionswithallcombinationsofaddresses,identificationtagsanddata.TheauthorthanksControlDataCanadafortheopportunitytodevelopthenewcacheorganizationpresentedinthispaper.184REFERENCESc.J.Conti.Conceptsofbufferstorage,IEEEComputerGroupNews,2March1969.2R.M.Meade.Howacachememoryenhancesacomputersperformance,ElectronicsJan.1972.3K.R.KaplanandR.O.Winder.Cachebasedcomputersystems,IEEEComputerMarch1973.4J.Bell,D.Casasent,andC.G.Bell.Aninvestigationofalternativecacheorganizations.IEEETransactionsonComputers,C23April1974.5J.H.KroegerandR.M.MeadeofCogarCorporation,WoppingersFall,NY.Cachebuffermemoryspecification.GA.V.Pohm,O.P.Agrawal.andR.N.Monroe.Thecostandperformancetradeoffsofbufferedmemories.ProceedingsoftheIEEE,63Aug.1973.7A.J.Smith.Sequentialprogramprefetchinginmemoryhierachies,IEEEComputerDec1978.BG.H.Toole.InstructionlookaheadandexecutiontrafficconsiderationsforthecachedesignDevelopmentdivisioninternalpaper,ControlDataCanada,1975.85CPUUNITSMEMORYREQUESTORCENTRALMEMORYADDRESSiaDATA..rADDRESSaDATATAG.CACHEuARRAYSlJJBUFFERAND...JADDRESSADDRESSlJJCONTROLMISSINFOCfaCONTROLHOLDINGL..REGISTERS......J0«aci.ZQ0«ADDRESSuQMISSCOMPARATOR.DATAuANDlJJlSTATUSSTATUSINPUT...JSTACKlJJCOLLECTIONCf.MEMORYDATARECEIVERCENTRALMEMORYIICPUINSTUNITCPUEXECUNITFigure1.CacheOrganization248NO.OFMISSINFOHOLOINGREGISTERSFigure2.QualitativeCurveforLockoutDelay86INPUTPARTIALLYTOTAL.L.YINAL.READYACTIONFUNCTIONWRITTENWRITTENINPUTASKEDSTACKFORREADNONONONOSETSENDTOCPUBITSAVEIOENTREADNONONOYESREADFROMCENTRALMEMORVBYPASSREADNONOYESXREADFROMSTACKREADYESNOXXREADFROtACENTRALMEMORYBYPASSRf4DYESYESXXREADFROIlCACHEBUFFERPREFETCHXXXXNOACTIONWRITXXXXWRITEBYTESTOCACHEBUFFER.SETAPPROPRIATPARTIALWRITESIT.WHEREX1DONTCAREFigure3.PreviousMissHitOperations87
编号:201401051948186809    大小:420.46KB    格式:PDF    上传时间:2014-01-05
  【编辑】
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
0条评论

还可以输入200字符

暂无评论,赶快抢占沙发吧。

当前资源信息

4.0
 
(2人评价)
浏览:25次
baixue100上传于2014-01-05

官方联系方式

客服手机:13961746681   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

相关资源

相关资源

相关搜索

工业、机械、能源、设计、建模、模具、工学  
关于我们 - 网站声明 - 网站地图 - 友情链接 - 网站客服客服 - 联系我们
copyright@ 2015-2017 人人文库网网站版权所有
苏ICP备12009002号-5