31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf_第1页
31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf_第2页
31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf_第3页
31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf_第4页
31-Lockup-Free Instruction FetchPrefetch Cache Organization.pdf_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

LOCKUP-FREEINSTRUCTIONFETCH/PREFETCHCACHEORGANIZATIONDAVIDKROFTControlDataCanada,Ltd.CanadianDevelopmentDivisionMississauga,Ontario,CanadaABSTRACTInthepastdecade,therehasbeenmuchliteraturedescribingvariouscacheorganizationsthatexploitgeneralprogrammingidiosyncrasiestoobtainmaximumhitrate(theprobabilitythatarequesteddatumisnowresidentinthecache).Little,ifany,hasbeenpresentedtoexploit:(1)theinherentdualinputnatureofthecacheand(2)themany-datumreferencetypecentralprocessorinstructions.Nomatterhowhighthecachehitrateis,acachemissmayimposeapenaltyonsubsequentcachereferences.Thispenaltyisthenecessityofwaitinguntilthemissedrequesteddatumisreceivedfromcentralmemoryand,possibly,forcacheupdate.Forthetwocasesabove,thecachereferencesfollowingamissdonotrequiretheinformationofthedatumnotresidentinthecache,andarethereforepenalizedinthisfashion.Inthispaper,acacheorganizationispresentedthatessentiallyeliminatesthispenalty.Thiscacheorganizationalfeaturehasbeenincorporatedinacache/memoryinterfacesubsystemdesign,andthedesignhasbeenimplementedandprototyped.Anexistingsimpleinstructionsetmachinehasverifiedtheadvantageofthisfeature;future,moreextensiveandsophisticatedinstructionsetmachinesmayobviouslytakemoreadvantage.Priortoprototyping,simulationsverifiedtheadvantage.INTRODUCTIONAcachebuffert,2isasmall,fastmemoryholdingmostrecentlyaccesseddataanditssurroundingneighbors,Becausetheaccesstimeofthisbufferisusuallyanorderofmagnitudegreaterthanmainorcentralmemory,andthestandardsoftwarepracticeistolocalizedata,theeffectivememoryaccesstimeisconsiderablyreducedwhenacachebufferisincluded.Thecostincrementforthiswhencomparedwiththecostofcentralmemoryalongwiththeaboveaccesstimeadvantageinferscosteffectiveness.acachemiss),spaceallocation,andreplacementalgorithmstomaximizehitrate.Anothermethodpresentedtoincreasethehitratewasselectiveprefetchinq.?Allthesemethodsassumethecachecanhandleonlyonerequestatatime;onamiss,thecachestaysbusyservicingtherequestuntilthedataisreceivedfrommemoryand,possibly,forcachebufferupdate.Inthispaper,acacheorganizationispresentedthatincreasestheeffectivenessofanormalcacheinclusionbyusingtheinherentdualinputnatureofanoverallcacheandthemanydatareferenceinstructions.Inotherwords,itwouldbeextremelyusefultopipelinetherequestsintothecacheatthecachehitthroughputrateregardlessofanymisses.Ifthiscouldbeaccornplishedthenallfetchanti/orprefetchofinstructionscouldbetotallytransparenttotheexecutionunit.Also,forinstructionsthatrequireanumberofdatareferences,therequestscouldbealmostentirelyoverlapped.Obviously,requestscouldnotbestreamedintothecacheatthehitthroughputrateindefinitely.Thereisalimit.Thisorganizationslimitisimposedbythenumb.erofmissesthathavenotbeencompletelyprocessedthatthecachewillkeeptrackofsimultaneouslywithoutlockup.ORGANIZATIONInadditiontothestandardblocks,thiscacheorganizationrequiresthefollowing:1.Oneunresolvedmissinformation/statusholdingregister(MSHR)foreachmissthatwillbehandledconcurrently.Now,acceptingtheusefulnessofacachebuffer,onelooksintowaysofincreasingitseffectiveness;thatis,furtherdecreasingtheeffectivememoryaccesstime.Considerableresearchhasbeendonetofinetuneacachedesignforvariousrequlrernents.P-Thisfinetuningconsistedofselectingoptimaltotalcachebuffersize,blocksize(thenumberofbytestoberequestedon810149-7111/81/0000/0081$00.751981IEEE2.3.Onenwaycomparator,inwhichnisthenumberofMSHRregisters,forregisteringhitsondataintransitfrommemory.Aninputstacktoholdthetotalnumberofreceiveddatawordspossiblyoutstanding.Thesizeofthisstack,consequently,isequaltotheblocksizeinwordstimesthenumberofMSHRregisters.4.MSHRstatusupdateandcollectingnetworks.5.Theappropriatecontrolunitenhancementtoaccommodate1through4.Figure1isasimplifiedblockdiagramofthecacheorganization.(Aset-associativeoperationisassumed.)Includedaretherequiredblocksforaset-associativecache(tagarraysandcontrol,cachebuffer),thecentralmemoryinterfaceblocks(memoryrequestor,memoryreceiver).andthecacheenhancementblocks(missinfoholdingregisters,misscomparatorandstatuscollection,inputstack).Themissinfoholdingregistersholdallnecessaryinformationto(1)handlethecentralmemoryreceiveddataproperlyand(2)informthemaincachecontrol,throughthemisscomparatorandstatuscollector,ofallhitandotherstatusofdataintransitfrommemory.Theinputstackisnecessarytoleavethemaincachebufferavailableforoverlappedreadsandwrites.NotethatthisorganizationallowsfordatajustreceivedfrommemoryorintheinputstacktobesentimmediatelytotherequestingCPUunits.Ofcourse,thenumberofMSHRregistersisimportant.Aswithsetsize(blocksperset).theincrementalvaluedecreasesrapidlywiththenumberofregisters.Thisisgood,becausethecostincreasessignificantlywiththenumberofregisters.Figure2presentsaqualitativecurve.Theaveragedelaytimeiscausedbylockoutonoutstandingmisses.Thisdelaytime,ofcourse,isalsodependentoncacheinputrequestandhitrates.Inthedegeneratecase,1MSHRregisterofreducedsizeisrequired;2MSHRregistersallowforoverlapwhileonemissisoutstanding,butstillwouldlockupthecacheinputonmultiplemissesoutstanding.OwingtocostconsiderationsandincrementaleffectivenessgainedonincreasingthenumberofMSHRregisters,4registersappeartobeoptimal.fThenecessaryinformationcontainedwithinoneoftheseMSHRregistersincludesthefollowing:First.thecachebufferaddress,alongwiththeinputrequestaddress,isrequired.Thecachebufferaddressiskepttoknowwheretoplacethereturningmemorydata;theinputrequestaddressissavedtodetermineif,onsubsequentrequests,thedatarequestedisonitswayfromcentralmemory.Second,inputrequestidentificationtags,alongwiththesend-to-CPUstatus,arestored.TliisinformationpermitsthecachetoreturntoCPUrequestingunitsonlythedatarequestedandreturnitwithitsidentificationtag.Third,in-input-stackindicatorsareusedtoallowforreadingdatadirectlyfromtheinputstack.Fourth,acode(forexample,onebitperbyteforpartialwrite)isheldforeachwordtoindicatewhatbytesofthewordhavebeenwrittentothecachebuffer.Thiscodecontrolsthecachebufferwriteupdateandallowsdispensingofdataforbufferareasthathavebeentotallywrittenafterrequested.Thecache,thus,hasthecapabilityofprocessingpartialwriteinputrequestsontheflywithoutpurging.(Ofcourse,thispartialwritecodemaynotbeincorporatedifthecacheblockispurgedonapartialwriterequesttoawordinablockintransitfrommemory.)Last.somecontrolinformation(theregistercontainsvalidinformationonlyforreturningrequesteddata,butnotforcachebufferupdateandthenumberofwordsoftheblockthathavebeenreceivedandwritten,ifrequired,intothecachebuffer)isneeded.Therefore,eachMSHRregistercontains:1.Cachebufferaddress2.Inputrequestaddress3.Inputidentificationtags(oneperword)4.Send-to-CPUindicators(oneperword)5.In-input-stackindicators(oneperword)6.Partialwritecodes(oneperword)7.Numberofwordsofblocksprocessed8.Validinformationindicator9.Obsoleteindicator(informationnotvalidforcacheupdateorMSHRhitondataintransit)OPERATIONTheoperationcanbesplitintotwobasicparts:memoryreceiver/inputstackoperationsandtagarraycontroloperations.Formemoryreceiver/inputstackoperations,thefieldsofMSHRinterrogatedarethefollowing:1.Send-to-CPUindicator2.Inputidentificationtags3.Cachebufferaddress4.Partialwritecodes825.Obsoleteindicator6.ValidindicatorWhenawordisreceivedfrommemory,itissenttotheCPUrequestingunitifthesend-to-CPUindicatorisset;theappropriateidentificationtagaccompaniesthedata.ThiswordisalsowrittenintotheinputstackifthewordsspacehasnotbeenpreviouslytotallywritteninthecachebufferorifMSHRisnotobsolete(invalidforcacheupdate).Thewordsofdataareremovedfromthisinputstackonafirst-in,first-outbasisandarewrittenintothecachebufferusingfields3and4.Ofcourse,MSHRmustholdvalidinformationwheninterrogated,oranerrorsignalwillbegenerated.Aslightdiversionisnecessaryatthispointtoexplaincachedatatagging.Onamiss,thecacherequestsablockofwords.Alongwitheachword,acachetagissent.ThistagpointstotheparticularassignedMSHRandindicatesthewordoftheblock.NotethatthecachesavesinMSHRtherequestingunitsidentificationtag.Thistaggingclosestheremainingopenlinkforthehandlingofdatareturnedfrommemoryandremovesallrestrictionsonmemoryontheorderofresponses.Ifaparticularprocessor/memoryinterfaceallowsforadatawidthofablockofwordsforcachetocentralmemoryrequests,thecachedatataggingmaybesimplifiedbymerelypointingtotheparticularassignedMSHR.If,however,allotherdatapathsarestillonewordwide,themainoperationswouldbeessentiallyunchanged.Consequently,thisextendedinterfacewouldnotsignificantlyreducethecontrolcomplexityortheaveragelockouttimedelayperrequest.ThefieldsoftheMSHRupdatedduringmemoryreceiver/inputstackoperationsarethefollowing:totallywrittenorhavinganobsoleteMSHR,oriswrittenintothecachebuffer,thenumber-of-words-processedcounterisincremented.Onnumber-of-words-processedcounteroverflow(allwordsforablockhavebeenreceived),thevalidorusedMSHRindicatoriscleared.Fortagarraycontroloperations,thefollowingfieldsofMSHRsareinterrogated:1.Inputrequestaddresses2.Send-to-CPUindicators3.In-input-stackindicators4.Partialwritecodes5.Validindicator6.ObsoleteindicatorFields1,5,and6areusedalongwithcurrentinputrequestaddressandthenwayMSHRcomparatortodetermineifthereisahitonpreviouslymisseddatastillbeinghandled(previousmisshit).Fields2,3,and4produceoneofthefollowingstatesforthepreviousmisshit:Partiallywritten(Partialwritecodehasatleastonebitset.)Totallywritten(Partialwritecodeisallts.)In-input-stackAlready-asked-for(Send-to-CPUindicatorisalreadyset.)3.Numberofwordsofblockprocessed1.2.In-input-stackindicatorsPartialwritecodesFigure3indicatestheactionsfollowedbythetagarraycontrolunderalltheabovecombinationsforapreviousmisshit.Onamiss,aMSHRisassigned,andthefollowingisperformed:1.Validindicatorset4.Validinformationindicator(beingusedindicator)2.ObsoleteindicatorclearedThein-input-stackindicatorsaresetwhenthedatawordiswrittenintotheinputstackandclearedwhendataisremovedfromtheinputstackandwrittenintothecachebuffer.Thepartialwritecodeissettoindicatetotallywrittenwhenthedatawordfromcentralmemoryindicatesthecachebuffer.Inaddition,wheneveradatawordisdisposedofbecauseofbeing833.4.5.CachebufferaddresssavedinassignedMSHRInputrequestaddresssavedinassignedMSHRAppropriatesend-to-CPUindicatorsetandotherscleared7.AllpartialwritecodesassociatedwithassignedMSHRcleared8.AllMSHRspointingtosamecachebufferaddresspurged(Setpartialwritecodetoallts)Notethatactions5and6willvaryifthecachefunctionwasaprefetch(allsend-to-CPUindicatorsarecleared,andnotagissaved).Action8preventsdatafromapreviousallocationofacachebufferblockfromoverwritingthepresentallocationsdata.Onamissandpreviousmisshit(thecachebufferblockwasreallocatedforthesameinputaddressbeforealldatawasreceived),MSHRissetobsoletetopreventpossiblesubsequentmultiplehitsintheMSHRcomparator.6.InputidentificationtagsavedinappropriatepositionCONCLUSIONSThiscacheorganizationhasbeendesigned,prototyped,andverified.ThedesignallowsforthedisablingoftheMSHRregisters.Usingthiscapability,thedirecteffectofthenumberofMSHRregistersontheexecutiontimesofanumberofapplicationswasnoted.Thereducedexecutiontimesoftheseapplicationsdirectlydemonstratedtheeffectivenessofthisenhancement.(Itisbeyondthescopeofthispapertoanalyzequantitativelytheaveragelockoutdelay/requestwithrespecttothenumberofenabledMSHRregistersfordifferentcacheinputratesandhitratescachebuffersizes.Thisanalysiswillbereportedinfuturework.)Thecostofthe4MSHRadditionstothedesignwasabout10%ofthetotalcachecost.ACKNOWLEDGMENTSIMULTANEITYApreviousmisshitonadatawordjustbeingreceivedisdefinitelypossible.Dependingonthecontroloperation,thiswordmayhaveitscorrespondingsend-to-CPUindicatorsoutputforcedtothesendconditionormaybereadoutoftheinputstackonthe.nextminorcycle.DIAGNOSABILITYTodiagnosethiscacheenhancementmorereadily,cacheinputfunctionsshouldbeaddedtoclearandsetthevalidindicatorsoftheMSHRregisters.Thiswouldallowthefollowingerrorconditionstobeforced:CachetagpointstononvalidMSHRregisterMultiplehitwithMSHRcomparatorPreviousmisshitstatus-totallywrittenandnotpartiallywrittenAllotherfieldsoftheMSHRregistersmaybeverifiedbyusingthesespecialcacheinputfunctionsincombinationwiththestandardinputfunctionswithallcombinationsofaddresses,identificationtagsanddata.TheauthorthanksControlDataCanadafortheopportunitytodevelopthenewcacheorganizationpresentedinthispaper.184REFERENCESc.J.Conti.Conceptsofbufferstorage,IEEEComputerGroupNews,2(March1969).2R.M.Meade.Howacachememoryenhancesacomputersperformance,Electronics(Jan.1972).3K.R.KaplanandR.O.Winder.Cache-basedcomputersystems,IEEEComputer(March

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论