




已阅读5页,还剩2页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
LOCKUP-FREEINSTRUCTIONFETCH/PREFETCHCACHEORGANIZATIONDAVIDKROFTControlDataCanada,Ltd.CanadianDevelopmentDivisionMississauga,Ontario,CanadaABSTRACTInthepastdecade,therehasbeenmuchliteraturedescribingvariouscacheorganizationsthatexploitgeneralprogrammingidiosyncrasiestoobtainmaximumhitrate(theprobabilitythatarequesteddatumisnowresidentinthecache).Little,ifany,hasbeenpresentedtoexploit:(1)theinherentdualinputnatureofthecacheand(2)themany-datumreferencetypecentralprocessorinstructions.Nomatterhowhighthecachehitrateis,acachemissmayimposeapenaltyonsubsequentcachereferences.Thispenaltyisthenecessityofwaitinguntilthemissedrequesteddatumisreceivedfromcentralmemoryand,possibly,forcacheupdate.Forthetwocasesabove,thecachereferencesfollowingamissdonotrequiretheinformationofthedatumnotresidentinthecache,andarethereforepenalizedinthisfashion.Inthispaper,acacheorganizationispresentedthatessentiallyeliminatesthispenalty.Thiscacheorganizationalfeaturehasbeenincorporatedinacache/memoryinterfacesubsystemdesign,andthedesignhasbeenimplementedandprototyped.Anexistingsimpleinstructionsetmachinehasverifiedtheadvantageofthisfeature;future,moreextensiveandsophisticatedinstructionsetmachinesmayobviouslytakemoreadvantage.Priortoprototyping,simulationsverifiedtheadvantage.INTRODUCTIONAcachebuffert,2isasmall,fastmemoryholdingmostrecentlyaccesseddataanditssurroundingneighbors,Becausetheaccesstimeofthisbufferisusuallyanorderofmagnitudegreaterthanmainorcentralmemory,andthestandardsoftwarepracticeistolocalizedata,theeffectivememoryaccesstimeisconsiderablyreducedwhenacachebufferisincluded.Thecostincrementforthiswhencomparedwiththecostofcentralmemoryalongwiththeaboveaccesstimeadvantageinferscosteffectiveness.acachemiss),spaceallocation,andreplacementalgorithmstomaximizehitrate.Anothermethodpresentedtoincreasethehitratewasselectiveprefetchinq.?Allthesemethodsassumethecachecanhandleonlyonerequestatatime;onamiss,thecachestaysbusyservicingtherequestuntilthedataisreceivedfrommemoryand,possibly,forcachebufferupdate.Inthispaper,acacheorganizationispresentedthatincreasestheeffectivenessofanormalcacheinclusionbyusingtheinherentdualinputnatureofanoverallcacheandthemanydatareferenceinstructions.Inotherwords,itwouldbeextremelyusefultopipelinetherequestsintothecacheatthecachehitthroughputrateregardlessofanymisses.Ifthiscouldbeaccornplishedthenallfetchanti/orprefetchofinstructionscouldbetotallytransparenttotheexecutionunit.Also,forinstructionsthatrequireanumberofdatareferences,therequestscouldbealmostentirelyoverlapped.Obviously,requestscouldnotbestreamedintothecacheatthehitthroughputrateindefinitely.Thereisalimit.Thisorganizationslimitisimposedbythenumb.erofmissesthathavenotbeencompletelyprocessedthatthecachewillkeeptrackofsimultaneouslywithoutlockup.ORGANIZATIONInadditiontothestandardblocks,thiscacheorganizationrequiresthefollowing:1.Oneunresolvedmissinformation/statusholdingregister(MSHR)foreachmissthatwillbehandledconcurrently.Now,acceptingtheusefulnessofacachebuffer,onelooksintowaysofincreasingitseffectiveness;thatis,furtherdecreasingtheeffectivememoryaccesstime.Considerableresearchhasbeendonetofinetuneacachedesignforvariousrequlrernents.P-Thisfinetuningconsistedofselectingoptimaltotalcachebuffersize,blocksize(thenumberofbytestoberequestedon810149-7111/81/0000/0081$00.751981IEEE2.3.Onenwaycomparator,inwhichnisthenumberofMSHRregisters,forregisteringhitsondataintransitfrommemory.Aninputstacktoholdthetotalnumberofreceiveddatawordspossiblyoutstanding.Thesizeofthisstack,consequently,isequaltotheblocksizeinwordstimesthenumberofMSHRregisters.4.MSHRstatusupdateandcollectingnetworks.5.Theappropriatecontrolunitenhancementtoaccommodate1through4.Figure1isasimplifiedblockdiagramofthecacheorganization.(Aset-associativeoperationisassumed.)Includedaretherequiredblocksforaset-associativecache(tagarraysandcontrol,cachebuffer),thecentralmemoryinterfaceblocks(memoryrequestor,memoryreceiver).andthecacheenhancementblocks(missinfoholdingregisters,misscomparatorandstatuscollection,inputstack).Themissinfoholdingregistersholdallnecessaryinformationto(1)handlethecentralmemoryreceiveddataproperlyand(2)informthemaincachecontrol,throughthemisscomparatorandstatuscollector,ofallhitandotherstatusofdataintransitfrommemory.Theinputstackisnecessarytoleavethemaincachebufferavailableforoverlappedreadsandwrites.NotethatthisorganizationallowsfordatajustreceivedfrommemoryorintheinputstacktobesentimmediatelytotherequestingCPUunits.Ofcourse,thenumberofMSHRregistersisimportant.Aswithsetsize(blocksperset).theincrementalvaluedecreasesrapidlywiththenumberofregisters.Thisisgood,becausethecostincreasessignificantlywiththenumberofregisters.Figure2presentsaqualitativecurve.Theaveragedelaytimeiscausedbylockoutonoutstandingmisses.Thisdelaytime,ofcourse,isalsodependentoncacheinputrequestandhitrates.Inthedegeneratecase,1MSHRregisterofreducedsizeisrequired;2MSHRregistersallowforoverlapwhileonemissisoutstanding,butstillwouldlockupthecacheinputonmultiplemissesoutstanding.OwingtocostconsiderationsandincrementaleffectivenessgainedonincreasingthenumberofMSHRregisters,4registersappeartobeoptimal.fThenecessaryinformationcontainedwithinoneoftheseMSHRregistersincludesthefollowing:First.thecachebufferaddress,alongwiththeinputrequestaddress,isrequired.Thecachebufferaddressiskepttoknowwheretoplacethereturningmemorydata;theinputrequestaddressissavedtodetermineif,onsubsequentrequests,thedatarequestedisonitswayfromcentralmemory.Second,inputrequestidentificationtags,alongwiththesend-to-CPUstatus,arestored.TliisinformationpermitsthecachetoreturntoCPUrequestingunitsonlythedatarequestedandreturnitwithitsidentificationtag.Third,in-input-stackindicatorsareusedtoallowforreadingdatadirectlyfromtheinputstack.Fourth,acode(forexample,onebitperbyteforpartialwrite)isheldforeachwordtoindicatewhatbytesofthewordhavebeenwrittentothecachebuffer.Thiscodecontrolsthecachebufferwriteupdateandallowsdispensingofdataforbufferareasthathavebeentotallywrittenafterrequested.Thecache,thus,hasthecapabilityofprocessingpartialwriteinputrequestsontheflywithoutpurging.(Ofcourse,thispartialwritecodemaynotbeincorporatedifthecacheblockispurgedonapartialwriterequesttoawordinablockintransitfrommemory.)Last.somecontrolinformation(theregistercontainsvalidinformationonlyforreturningrequesteddata,butnotforcachebufferupdateandthenumberofwordsoftheblockthathavebeenreceivedandwritten,ifrequired,intothecachebuffer)isneeded.Therefore,eachMSHRregistercontains:1.Cachebufferaddress2.Inputrequestaddress3.Inputidentificationtags(oneperword)4.Send-to-CPUindicators(oneperword)5.In-input-stackindicators(oneperword)6.Partialwritecodes(oneperword)7.Numberofwordsofblocksprocessed8.Validinformationindicator9.Obsoleteindicator(informationnotvalidforcacheupdateorMSHRhitondataintransit)OPERATIONTheoperationcanbesplitintotwobasicparts:memoryreceiver/inputstackoperationsandtagarraycontroloperations.Formemoryreceiver/inputstackoperations,thefieldsofMSHRinterrogatedarethefollowing:1.Send-to-CPUindicator2.Inputidentificationtags3.Cachebufferaddress4.Partialwritecodes825.Obsoleteindicator6.ValidindicatorWhenawordisreceivedfrommemory,itissenttotheCPUrequestingunitifthesend-to-CPUindicatorisset;theappropriateidentificationtagaccompaniesthedata.ThiswordisalsowrittenintotheinputstackifthewordsspacehasnotbeenpreviouslytotallywritteninthecachebufferorifMSHRisnotobsolete(invalidforcacheupdate).Thewordsofdataareremovedfromthisinputstackonafirst-in,first-outbasisandarewrittenintothecachebufferusingfields3and4.Ofcourse,MSHRmustholdvalidinformationwheninterrogated,oranerrorsignalwillbegenerated.Aslightdiversionisnecessaryatthispointtoexplaincachedatatagging.Onamiss,thecacherequestsablockofwords.Alongwitheachword,acachetagissent.ThistagpointstotheparticularassignedMSHRandindicatesthewordoftheblock.NotethatthecachesavesinMSHRtherequestingunitsidentificationtag.Thistaggingclosestheremainingopenlinkforthehandlingofdatareturnedfrommemoryandremovesallrestrictionsonmemoryontheorderofresponses.Ifaparticularprocessor/memoryinterfaceallowsforadatawidthofablockofwordsforcachetocentralmemoryrequests,thecachedatataggingmaybesimplifiedbymerelypointingtotheparticularassignedMSHR.If,however,allotherdatapathsarestillonewordwide,themainoperationswouldbeessentiallyunchanged.Consequently,thisextendedinterfacewouldnotsignificantlyreducethecontrolcomplexityortheaveragelockouttimedelayperrequest.ThefieldsoftheMSHRupdatedduringmemoryreceiver/inputstackoperationsarethefollowing:totallywrittenorhavinganobsoleteMSHR,oriswrittenintothecachebuffer,thenumber-of-words-processedcounterisincremented.Onnumber-of-words-processedcounteroverflow(allwordsforablockhavebeenreceived),thevalidorusedMSHRindicatoriscleared.Fortagarraycontroloperations,thefollowingfieldsofMSHRsareinterrogated:1.Inputrequestaddresses2.Send-to-CPUindicators3.In-input-stackindicators4.Partialwritecodes5.Validindicator6.ObsoleteindicatorFields1,5,and6areusedalongwithcurrentinputrequestaddressandthenwayMSHRcomparatortodetermineifthereisahitonpreviouslymisseddatastillbeinghandled(previousmisshit).Fields2,3,and4produceoneofthefollowingstatesforthepreviousmisshit:Partiallywritten(Partialwritecodehasatleastonebitset.)Totallywritten(Partialwritecodeisallts.)In-input-stackAlready-asked-for(Send-to-CPUindicatorisalreadyset.)3.Numberofwordsofblockprocessed1.2.In-input-stackindicatorsPartialwritecodesFigure3indicatestheactionsfollowedbythetagarraycontrolunderalltheabovecombinationsforapreviousmisshit.Onamiss,aMSHRisassigned,andthefollowingisperformed:1.Validindicatorset4.Validinformationindicator(beingusedindicator)2.ObsoleteindicatorclearedThein-input-stackindicatorsaresetwhenthedatawordiswrittenintotheinputstackandclearedwhendataisremovedfromtheinputstackandwrittenintothecachebuffer.Thepartialwritecodeissettoindicatetotallywrittenwhenthedatawordfromcentralmemoryindicatesthecachebuffer.Inaddition,wheneveradatawordisdisposedofbecauseofbeing833.4.5.CachebufferaddresssavedinassignedMSHRInputrequestaddresssavedinassignedMSHRAppropriatesend-to-CPUindicatorsetandotherscleared7.AllpartialwritecodesassociatedwithassignedMSHRcleared8.AllMSHRspointingtosamecachebufferaddresspurged(Setpartialwritecodetoallts)Notethatactions5and6willvaryifthecachefunctionwasaprefetch(allsend-to-CPUindicatorsarecleared,andnotagissaved).Action8preventsdatafromapreviousallocationofacachebufferblockfromoverwritingthepresentallocationsdata.Onamissandpreviousmisshit(thecachebufferblockwasreallocatedforthesameinputaddressbeforealldatawasreceived),MSHRissetobsoletetopreventpossiblesubsequentmultiplehitsintheMSHRcomparator.6.InputidentificationtagsavedinappropriatepositionCONCLUSIONSThiscacheorganizationhasbeendesigned,prototyped,andverified.ThedesignallowsforthedisablingoftheMSHRregisters.Usingthiscapability,thedirecteffectofthenumberofMSHRregistersontheexecutiontimesofanumberofapplicationswasnoted.Thereducedexecutiontimesoftheseapplicationsdirectlydemonstratedtheeffectivenessofthisenhancement.(Itisbeyondthescopeofthispapertoanalyzequantitativelytheaveragelockoutdelay/requestwithrespecttothenumberofenabledMSHRregistersfordifferentcacheinputratesandhitratescachebuffersizes.Thisanalysiswillbereportedinfuturework.)Thecostofthe4MSHRadditionstothedesignwasabout10%ofthetotalcachecost.ACKNOWLEDGMENTSIMULTANEITYApreviousmisshitonadatawordjustbeingreceivedisdefinitelypossible.Dependingonthecontroloperation,thiswordmayhaveitscorrespondingsend-to-CPUindicatorsoutputforcedtothesendconditionormaybereadoutoftheinputstackonthe.nextminorcycle.DIAGNOSABILITYTodiagnosethiscacheenhancementmorereadily,cacheinputfunctionsshouldbeaddedtoclearandsetthevalidindicatorsoftheMSHRregisters.Thiswouldallowthefollowingerrorconditionstobeforced:CachetagpointstononvalidMSHRregisterMultiplehitwithMSHRcomparatorPreviousmisshitstatus-totallywrittenandnotpartiallywrittenAllotherfieldsoftheMSHRregistersmaybeverifiedbyusingthesespecialcacheinputfunctionsincombinationwiththestandardinputfunctionswithallcombinationsofaddresses,identificationtagsanddata.TheauthorthanksControlDataCanadafortheopportunitytodevelopthenewcacheorganizationpresentedinthispaper.184REFERENCESc.J.Conti.Conceptsofbufferstorage,IEEEComputerGroupNews,2(March1969).2R.M.Meade.Howacachememoryenhancesacomputersperformance,Electronics(Jan.1972).3K.R.KaplanandR.O.Winder.Cache-basedcomputersystems,IEEEComputer(March
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年新型环保半导体CMP抛光液高性能添加剂市场前景报告
- 2025年新能源汽车领域光伏发电储能技术创新应用展望报告
- 2025年新能源汽车电池回收利用技术创新与动力电池回收产业链市场潜力研究报告
- 中国邮政2025潍坊市秋招合规审计岗位高频笔试题库含答案
- 2025年甘肃省临夏州康乐县招聘第五批城镇公益性岗位人员47人考试参考题库及答案解析
- 2025年营口市老边区城管协勤人员招聘考试参考题库及答案解析
- 2025山东东营市东凯实验学校招聘劳务派遣教师1人考试参考题库及答案解析
- 颈椎病CT课件教学课件
- 颈椎前高信号影像课件
- 2026中交集团全球校园招聘考试参考题库及答案解析
- 黄芪注射液联合当归注射液对急性失血性休克围手术期血乳酸水平和氧代谢的影响
- 网络与信息安全事件报告表模板
- 2023年上海市选调生考试《申论》题库【真题精选+章节题库+模拟试题】
- 中学安全事故问责制度(试行)
- 港口航道疏浚工程案例
- DLT-969-2023年变电站运行导则
- 现代铁路铁道信号远程控制系统(第2版)PPT完整全套教学课件
- 通知证人出庭申请书
- 3、反渗透法设备安装及调试施工工艺工法要点
- 高压开关柜技术规范书
- 污染源自动监测设备比对监测技术规定
评论
0/150
提交评论