已阅读5页,还剩2页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
LOCKUP-FREEINSTRUCTIONFETCH/PREFETCHCACHEORGANIZATIONDAVIDKROFTControlDataCanada,Ltd.CanadianDevelopmentDivisionMississauga,Ontario,CanadaABSTRACTInthepastdecade,therehasbeenmuchliteraturedescribingvariouscacheorganizationsthatexploitgeneralprogrammingidiosyncrasiestoobtainmaximumhitrate(theprobabilitythatarequesteddatumisnowresidentinthecache).Little,ifany,hasbeenpresentedtoexploit:(1)theinherentdualinputnatureofthecacheand(2)themany-datumreferencetypecentralprocessorinstructions.Nomatterhowhighthecachehitrateis,acachemissmayimposeapenaltyonsubsequentcachereferences.Thispenaltyisthenecessityofwaitinguntilthemissedrequesteddatumisreceivedfromcentralmemoryand,possibly,forcacheupdate.Forthetwocasesabove,thecachereferencesfollowingamissdonotrequiretheinformationofthedatumnotresidentinthecache,andarethereforepenalizedinthisfashion.Inthispaper,acacheorganizationispresentedthatessentiallyeliminatesthispenalty.Thiscacheorganizationalfeaturehasbeenincorporatedinacache/memoryinterfacesubsystemdesign,andthedesignhasbeenimplementedandprototyped.Anexistingsimpleinstructionsetmachinehasverifiedtheadvantageofthisfeature;future,moreextensiveandsophisticatedinstructionsetmachinesmayobviouslytakemoreadvantage.Priortoprototyping,simulationsverifiedtheadvantage.INTRODUCTIONAcachebuffert,2isasmall,fastmemoryholdingmostrecentlyaccesseddataanditssurroundingneighbors,Becausetheaccesstimeofthisbufferisusuallyanorderofmagnitudegreaterthanmainorcentralmemory,andthestandardsoftwarepracticeistolocalizedata,theeffectivememoryaccesstimeisconsiderablyreducedwhenacachebufferisincluded.Thecostincrementforthiswhencomparedwiththecostofcentralmemoryalongwiththeaboveaccesstimeadvantageinferscosteffectiveness.acachemiss),spaceallocation,andreplacementalgorithmstomaximizehitrate.Anothermethodpresentedtoincreasethehitratewasselectiveprefetchinq.?Allthesemethodsassumethecachecanhandleonlyonerequestatatime;onamiss,thecachestaysbusyservicingtherequestuntilthedataisreceivedfrommemoryand,possibly,forcachebufferupdate.Inthispaper,acacheorganizationispresentedthatincreasestheeffectivenessofanormalcacheinclusionbyusingtheinherentdualinputnatureofanoverallcacheandthemanydatareferenceinstructions.Inotherwords,itwouldbeextremelyusefultopipelinetherequestsintothecacheatthecachehitthroughputrateregardlessofanymisses.Ifthiscouldbeaccornplishedthenallfetchanti/orprefetchofinstructionscouldbetotallytransparenttotheexecutionunit.Also,forinstructionsthatrequireanumberofdatareferences,therequestscouldbealmostentirelyoverlapped.Obviously,requestscouldnotbestreamedintothecacheatthehitthroughputrateindefinitely.Thereisalimit.Thisorganizationslimitisimposedbythenumb.erofmissesthathavenotbeencompletelyprocessedthatthecachewillkeeptrackofsimultaneouslywithoutlockup.ORGANIZATIONInadditiontothestandardblocks,thiscacheorganizationrequiresthefollowing:1.Oneunresolvedmissinformation/statusholdingregister(MSHR)foreachmissthatwillbehandledconcurrently.Now,acceptingtheusefulnessofacachebuffer,onelooksintowaysofincreasingitseffectiveness;thatis,furtherdecreasingtheeffectivememoryaccesstime.Considerableresearchhasbeendonetofinetuneacachedesignforvariousrequlrernents.P-Thisfinetuningconsistedofselectingoptimaltotalcachebuffersize,blocksize(thenumberofbytestoberequestedon810149-7111/81/0000/0081$00.751981IEEE2.3.Onenwaycomparator,inwhichnisthenumberofMSHRregisters,forregisteringhitsondataintransitfrommemory.Aninputstacktoholdthetotalnumberofreceiveddatawordspossiblyoutstanding.Thesizeofthisstack,consequently,isequaltotheblocksizeinwordstimesthenumberofMSHRregisters.4.MSHRstatusupdateandcollectingnetworks.5.Theappropriatecontrolunitenhancementtoaccommodate1through4.Figure1isasimplifiedblockdiagramofthecacheorganization.(Aset-associativeoperationisassumed.)Includedaretherequiredblocksforaset-associativecache(tagarraysandcontrol,cachebuffer),thecentralmemoryinterfaceblocks(memoryrequestor,memoryreceiver).andthecacheenhancementblocks(missinfoholdingregisters,misscomparatorandstatuscollection,inputstack).Themissinfoholdingregistersholdallnecessaryinformationto(1)handlethecentralmemoryreceiveddataproperlyand(2)informthemaincachecontrol,throughthemisscomparatorandstatuscollector,ofallhitandotherstatusofdataintransitfrommemory.Theinputstackisnecessarytoleavethemaincachebufferavailableforoverlappedreadsandwrites.NotethatthisorganizationallowsfordatajustreceivedfrommemoryorintheinputstacktobesentimmediatelytotherequestingCPUunits.Ofcourse,thenumberofMSHRregistersisimportant.Aswithsetsize(blocksperset).theincrementalvaluedecreasesrapidlywiththenumberofregisters.Thisisgood,becausethecostincreasessignificantlywiththenumberofregisters.Figure2presentsaqualitativecurve.Theaveragedelaytimeiscausedbylockoutonoutstandingmisses.Thisdelaytime,ofcourse,isalsodependentoncacheinputrequestandhitrates.Inthedegeneratecase,1MSHRregisterofreducedsizeisrequired;2MSHRregistersallowforoverlapwhileonemissisoutstanding,butstillwouldlockupthecacheinputonmultiplemissesoutstanding.OwingtocostconsiderationsandincrementaleffectivenessgainedonincreasingthenumberofMSHRregisters,4registersappeartobeoptimal.fThenecessaryinformationcontainedwithinoneoftheseMSHRregistersincludesthefollowing:First.thecachebufferaddress,alongwiththeinputrequestaddress,isrequired.Thecachebufferaddressiskepttoknowwheretoplacethereturningmemorydata;theinputrequestaddressissavedtodetermineif,onsubsequentrequests,thedatarequestedisonitswayfromcentralmemory.Second,inputrequestidentificationtags,alongwiththesend-to-CPUstatus,arestored.TliisinformationpermitsthecachetoreturntoCPUrequestingunitsonlythedatarequestedandreturnitwithitsidentificationtag.Third,in-input-stackindicatorsareusedtoallowforreadingdatadirectlyfromtheinputstack.Fourth,acode(forexample,onebitperbyteforpartialwrite)isheldforeachwordtoindicatewhatbytesofthewordhavebeenwrittentothecachebuffer.Thiscodecontrolsthecachebufferwriteupdateandallowsdispensingofdataforbufferareasthathavebeentotallywrittenafterrequested.Thecache,thus,hasthecapabilityofprocessingpartialwriteinputrequestsontheflywithoutpurging.(Ofcourse,thispartialwritecodemaynotbeincorporatedifthecacheblockispurgedonapartialwriterequesttoawordinablockintransitfrommemory.)Last.somecontrolinformation(theregistercontainsvalidinformationonlyforreturningrequesteddata,butnotforcachebufferupdateandthenumberofwordsoftheblockthathavebeenreceivedandwritten,ifrequired,intothecachebuffer)isneeded.Therefore,eachMSHRregistercontains:1.Cachebufferaddress2.Inputrequestaddress3.Inputidentificationtags(oneperword)4.Send-to-CPUindicators(oneperword)5.In-input-stackindicators(oneperword)6.Partialwritecodes(oneperword)7.Numberofwordsofblocksprocessed8.Validinformationindicator9.Obsoleteindicator(informationnotvalidforcacheupdateorMSHRhitondataintransit)OPERATIONTheoperationcanbesplitintotwobasicparts:memoryreceiver/inputstackoperationsandtagarraycontroloperations.Formemoryreceiver/inputstackoperations,thefieldsofMSHRinterrogatedarethefollowing:1.Send-to-CPUindicator2.Inputidentificationtags3.Cachebufferaddress4.Partialwritecodes825.Obsoleteindicator6.ValidindicatorWhenawordisreceivedfrommemory,itissenttotheCPUrequestingunitifthesend-to-CPUindicatorisset;theappropriateidentificationtagaccompaniesthedata.ThiswordisalsowrittenintotheinputstackifthewordsspacehasnotbeenpreviouslytotallywritteninthecachebufferorifMSHRisnotobsolete(invalidforcacheupdate).Thewordsofdataareremovedfromthisinputstackonafirst-in,first-outbasisandarewrittenintothecachebufferusingfields3and4.Ofcourse,MSHRmustholdvalidinformationwheninterrogated,oranerrorsignalwillbegenerated.Aslightdiversionisnecessaryatthispointtoexplaincachedatatagging.Onamiss,thecacherequestsablockofwords.Alongwitheachword,acachetagissent.ThistagpointstotheparticularassignedMSHRandindicatesthewordoftheblock.NotethatthecachesavesinMSHRtherequestingunitsidentificationtag.Thistaggingclosestheremainingopenlinkforthehandlingofdatareturnedfrommemoryandremovesallrestrictionsonmemoryontheorderofresponses.Ifaparticularprocessor/memoryinterfaceallowsforadatawidthofablockofwordsforcachetocentralmemoryrequests,thecachedatataggingmaybesimplifiedbymerelypointingtotheparticularassignedMSHR.If,however,allotherdatapathsarestillonewordwide,themainoperationswouldbeessentiallyunchanged.Consequently,thisextendedinterfacewouldnotsignificantlyreducethecontrolcomplexityortheaveragelockouttimedelayperrequest.ThefieldsoftheMSHRupdatedduringmemoryreceiver/inputstackoperationsarethefollowing:totallywrittenorhavinganobsoleteMSHR,oriswrittenintothecachebuffer,thenumber-of-words-processedcounterisincremented.Onnumber-of-words-processedcounteroverflow(allwordsforablockhavebeenreceived),thevalidorusedMSHRindicatoriscleared.Fortagarraycontroloperations,thefollowingfieldsofMSHRsareinterrogated:1.Inputrequestaddresses2.Send-to-CPUindicators3.In-input-stackindicators4.Partialwritecodes5.Validindicator6.ObsoleteindicatorFields1,5,and6areusedalongwithcurrentinputrequestaddressandthenwayMSHRcomparatortodetermineifthereisahitonpreviouslymisseddatastillbeinghandled(previousmisshit).Fields2,3,and4produceoneofthefollowingstatesforthepreviousmisshit:Partiallywritten(Partialwritecodehasatleastonebitset.)Totallywritten(Partialwritecodeisallts.)In-input-stackAlready-asked-for(Send-to-CPUindicatorisalreadyset.)3.Numberofwordsofblockprocessed1.2.In-input-stackindicatorsPartialwritecodesFigure3indicatestheactionsfollowedbythetagarraycontrolunderalltheabovecombinationsforapreviousmisshit.Onamiss,aMSHRisassigned,andthefollowingisperformed:1.Validindicatorset4.Validinformationindicator(beingusedindicator)2.ObsoleteindicatorclearedThein-input-stackindicatorsaresetwhenthedatawordiswrittenintotheinputstackandclearedwhendataisremovedfromtheinputstackandwrittenintothecachebuffer.Thepartialwritecodeissettoindicatetotallywrittenwhenthedatawordfromcentralmemoryindicatesthecachebuffer.Inaddition,wheneveradatawordisdisposedofbecauseofbeing833.4.5.CachebufferaddresssavedinassignedMSHRInputrequestaddresssavedinassignedMSHRAppropriatesend-to-CPUindicatorsetandotherscleared7.AllpartialwritecodesassociatedwithassignedMSHRcleared8.AllMSHRspointingtosamecachebufferaddresspurged(Setpartialwritecodetoallts)Notethatactions5and6willvaryifthecachefunctionwasaprefetch(allsend-to-CPUindicatorsarecleared,andnotagissaved).Action8preventsdatafromapreviousallocationofacachebufferblockfromoverwritingthepresentallocationsdata.Onamissandpreviousmisshit(thecachebufferblockwasreallocatedforthesameinputaddressbeforealldatawasreceived),MSHRissetobsoletetopreventpossiblesubsequentmultiplehitsintheMSHRcomparator.6.InputidentificationtagsavedinappropriatepositionCONCLUSIONSThiscacheorganizationhasbeendesigned,prototyped,andverified.ThedesignallowsforthedisablingoftheMSHRregisters.Usingthiscapability,thedirecteffectofthenumberofMSHRregistersontheexecutiontimesofanumberofapplicationswasnoted.Thereducedexecutiontimesoftheseapplicationsdirectlydemonstratedtheeffectivenessofthisenhancement.(Itisbeyondthescopeofthispapertoanalyzequantitativelytheaveragelockoutdelay/requestwithrespecttothenumberofenabledMSHRregistersfordifferentcacheinputratesandhitratescachebuffersizes.Thisanalysiswillbereportedinfuturework.)Thecostofthe4MSHRadditionstothedesignwasabout10%ofthetotalcachecost.ACKNOWLEDGMENTSIMULTANEITYApreviousmisshitonadatawordjustbeingreceivedisdefinitelypossible.Dependingonthecontroloperation,thiswordmayhaveitscorrespondingsend-to-CPUindicatorsoutputforcedtothesendconditionormaybereadoutoftheinputstackonthe.nextminorcycle.DIAGNOSABILITYTodiagnosethiscacheenhancementmorereadily,cacheinputfunctionsshouldbeaddedtoclearandsetthevalidindicatorsoftheMSHRregisters.Thiswouldallowthefollowingerrorconditionstobeforced:CachetagpointstononvalidMSHRregisterMultiplehitwithMSHRcomparatorPreviousmisshitstatus-totallywrittenandnotpartiallywrittenAllotherfieldsoftheMSHRregistersmaybeverifiedbyusingthesespecialcacheinputfunctionsincombinationwiththestandardinputfunctionswithallcombinationsofaddresses,identificationtagsanddata.TheauthorthanksControlDataCanadafortheopportunitytodevelopthenewcacheorganizationpresentedinthispaper.184REFERENCESc.J.Conti.Conceptsofbufferstorage,IEEEComputerGroupNews,2(March1969).2R.M.Meade.Howacachememoryenhancesacomputersperformance,Electronics(Jan.1972).3K.R.KaplanandR.O.Winder.Cache-basedcomputersystems,IEEEComputer(March
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 【正版授权】 ISO 15619:2025 EN Reciprocating internal combustion engines - Measurement method for exhaust silencers - Sound power level of exhaust noise and insertion loss using sound pr
- GB/T 10826.1-2025燃油喷射装置词汇第1部分:喷油泵
- 2026湖北宜昌市教育局所属三峡旅游职业技术学院“招才兴业”人才引进招聘2人武汉大学站参考笔试题库及答案解析
- 2025重庆南岸区南山街道公益性岗位招聘13人备考题库及答案详解(必刷)
- 2025昆明市西山区马街社区卫生服务中心编外人员招聘(2人)笔试考试备考试题及答案解析
- 2025贵州贵阳市白云区招商中心公益性岗位人员招聘1人参考考点题库及答案解析
- 2025云南玉溪市土地储备中心选调事业单位人员1人备考考点题库及答案解析
- 2026年西安市经开第四学校校园招聘(23人)参考笔试题库及答案解析
- 2025呼和浩特市总工会社会工作者、专职集体协商指导员招聘29人备考题库带答案详解
- 2026建信理财有限责任公司校园招聘9人备考题库含答案详解ab卷
- 【建筑监理大纲】工商银行某分行营业部综合楼装修改造内、外装修工程监理大纲
- 阿尔派CD机CDX-A08说明书
- 安检员X光机培训
- 操作系统-002-国开机考复习资料
- 农业的分布(经济作物、三大林区和四大牧区的分布)(课件)七年级地理下册(沪教版)
- 垃圾压缩站管理制度
- 第12课《诗经二首-蒹葭》课件
- 北京市海淀区2023-2024学年高二下学期期末考试英语试卷(含答案)
- 污泥运输合同协议书
- 和解协议书限高模板
- 冠脉支架介入手术课件
评论
0/150
提交评论