版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Spring20026.823ComputerSystemArchitectureDatapathforDLXSpring2002ProblemSet#2Studentsareallowedtocollaborateingroupsofupto3people.Agrouphandsinonlyonecopyofthesolutiontoaproblemset.Homeworkassignmentsaredueatthebeginningofclassonthesuedate.Tofacilitategrading,eachproblemmustbestapledseparately.Homeworkwillnotbeacceptedoncesolutionsarehandedout.Problem1:MicroprogrammingandBus-BasedArchitecturesProblem1.AHowmanycyclesdoesittaketoexecutethefollowinginstructionsinthemicrocodedDLXmachine?UsethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysingal.lilStTLlCtlOUCyclesADDR3,R2,R1ADDIR2,R1,#4LW Rl,0(R2)BEGSRlflabel#(R1==0)BEG3Rl,label#(R1!=0)J labelJR R1JALlabelJALRR1Whichinstructiontakesthemostcyclestoexecute?Whichinstructiontakesthefewestcyclestoexecute?Problem1.BBenBitdiddleneedstocomputefactorialsforsmallnumbers.RealizingthereisnomultiplyinstructioninthemicrocodedDLXmachine,heusesthefollowingcodetocalculatethefactorialofanunsignednumbern.result.=1;for(iresult.=1;for(i=0;i<n; {二wrr一匸for(jresult;=0;jresult<i;j-+)ten'::';ThevariablesI,j,n,temp,andresultareunsigned32-bitvalues.WritetheDLXassemblythatimplementsBen'factorialcode.UseonlytheDLZinstructionsthatcanbeexecutedonthemicrocodedDLXmachine(ALU,ALUi,LW,SW,J,JAL,JR,JALR,BENQ,andBENS).ThemicrocodedDLXmachinedoesnothavetobepreserved.HowmanyDLXinstructionsareexecutedtocalculateafactorial?Howmanycyclesdoesittaketocalculateafactorial?Again,usethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysignal.Fh匚tenalCycles013NProblem1.CAlyssaP.HackertellsBenthathisfactorialcodewillrunmuchfasterifheimplementsanunsignedmultiplyinstructioninthemicrocodedDLXmachine.ThenBencanreplacetheinnerloopinstructionswiththenewmultiplyinstruction.ThedetailsofAlyssa'snewproposedunsignedmultiplyinstructionare:▼Rsl(Rs▼Rsl(Rs2times)ThevalueofRs1isaddedRs2timesandtheresultstoredintoRd.Rs1andRs2aretreatedasunsigned32-bitvalues.IfRs2orRs1is0,thentheresultofRdwillalsobe0.TheformatoftheMULUinstructionisR-type.InordertobeabletowritemicrocodeforMULU,Alyssaaddsanadditionalregister,TO(33),totheregisterfile.Thisregister,likethePCregister,isnotvisibletotheprogrammer.Shealsoadds33asaninputtotheregisterfilemultiplexer.UsingWorksheet1,writemicrocodetoimplementAlyssa 'snewunsignedmultiplyinstruction.InWorksheet1,therepresentationofthenextstateisdifferentfromwhatwaspresentedinlecture.Thelasttwocolumnsrepresentsa2-bitfieldwithfourpossiblevalues:N,J,Z,andD.IUBrisN(next),thenthenextstateissimply(currentstate=1).IfitisJ(jump),thenthenextstateisunconditionallythestatespecifiedintheNextStatecolumn(i.e.,it uhcontiitionalmicrobranch).IfitisZ(branch-if-zero),thenthenextstatedependsonthevalueoftheAL'zerooutputsignal(i.e.,it'aconditionalmicrobranch).Ifzeroisasserted(==1),thenthenextstateisthatspecifiedintheNextStatecolumn,otherwise,itis(currentstate+1).UBisD(dispatch),thentheFSMlooksattheopcodeandfunctionfieldsintheIRandgoesintothecorrespondingstate.Forthisproblemset,weassumethatthedispatchgoestothelabeled(DLX-instruction-name+”.Forexample,iftheinstructionintheIRisSW,thenthedispatchwillgotostateSW0.TheALUperformsoperationsspecifiedbytheALUOp,whichisdeterminedbytheALUcontrollogicblock.AssumetheALUcanperformthefollowingoperations:
ALUOpALUResultOutputCOPYAACOPYBBrINCA1A+lDECA1A-l1INCA4A+4DECA4A-4ADDA+BSUBA-BHowmanycyclesdoesitfortheMULUinstructionfordifferentvaluesofRs2?Rs2Cycles013NProblem1.DWithAlyssa'snewunsignedmultiplyinstruction,Beneliminatestheinnerloopofhisoriginalfactorialcodeandsimplifiesittothefollowing.result=1;for(i=1;1<=n;i++}{result-result*1;}HelpBenwriteDLXassemblycodetoimplementfactorialusingthenewMULUinstruction.Again,useR1fornandR2forresult.Attheendofyourcode,R2mustcontainthecorrectvalue.Youdonothavetopreservethevaluesofanyotherregisters.HowmanyDLXinstructionsareexecutedtocalculateafactorial?Howmanycyclesdoesittaketocalculateafactorial?Again,assumeMemorywillnotassertitsbusysignal.FactorialCycles0123NProblem1.ECombiningamicrocontrollerandtheDLXbus-baseddatapath(L4-5)givesusacompleteworkingcomputerthatcanrunasubsetoftheDLXISA.Besidesrequiringmuchmorememory,AlyssatellsBenanotherreasonwhyusingtheoriginalDLXMicrocontroller(L4-9)wasavadidea.AmachinewiththeoriginalcontrollerwouldhaveamuchIongercycletimethanamachineusingthesecondmicrocontroller(L4-15).Benca'understandwhythisistrue.BelowarethedelaysofthehardwarepartsusedtoimplementtheDLXbus-baseddatapachandthefirstversionoftheDLXMicrocontroller(L4-9).Lehi?-紀mpmiwofregistersIR.A:B.N1A,pPCte-q-clock-to-qtmieofregistersIR,AB,MA.pPCtAiu一timeALUtakestogenerateresultmidzerot„ceud-delayofasignextendertgcc—delayofamultiplexertt-,—delayfromwhendataisrittentobustowhenitbecomesstablet亦為花—delayrofarn-UatebufferCrf-delayoftheregisterfileMm-delayofmemory-trom-delayoftheROMtALV_c0EtioL-delayforALUControlAssumethattALU,tROM,tMEN,havecomparablevaluesandthatthesedelaysarebiggerthanthedelaysforothercomponents.Usingthefirstversionofthemicrocontroller,whichmicrocodeinstructioninvokesthecriticalpathofthemachine?Describethecriticalpath.WhatistheminimumclockperiodthatthecompleteDLXbus-basedmachinecanrunat?AssumeMemorywillnotassertitsbusysignal.Problem2:PipelineHackingInspiredbyhissuccesswiththeMACCinstructioninthelastproblemset,BenBitdiddlecomesupwiththefollowingnewinstructionformatcalledBIF(forBen'InstructionFormat)thathewantstoaddtotheDLXISA:opcodelrf1rf2opcode2rf4rf565 5 65 5Thesemanticsofthenewinstructionwouldbethis:rf5<-(=floplWhereop1andop2areALUoperations.Forexample,ADDR1,R2:ADDR3,R4wouldbecomputedasfollows:RT<-(R1+R2)+R3Toimplementthenewinstructionformat,BendecidestoaddanALUtothememoryphaseofthepipelined,fully-bypassedimplementationoftheDLXdatapathdiscussedinlecture.Old-styleDLXinstructionswouldstillusetheALUintheexecutephasewhileBIFinstructionswouldusebothALUs.ThefirstALUinthememorypahse.Thenewpipelinewouldlooklikethis:IFIDEXMAWBFetchphaseDecodeandregisterfetchphaseExecutephase withonfilialALUMemoryphase withnewALUWrite-backphaseInaddition,theregisterfileintheolddatapathisreplacedwitharegisterfilewiththreereadportsandonewriteportsothatallthreeoperandscanbereadatthesametime.Problem2.AGiveacodeexamplethatshowshowtoycangetbetterperformaneeusingBIFinstructions.Provideboththeoriginalols-styleDLX,codeandthecodethatusestheBIFinstructions.Theoriginalcodeshouldcontainatleastsixinstructions.WhatisthemaximumpossibleimprovementinperformaneeusingBIFinstructions?Problem2.BShowalldatahazardsthatcancausestallsandprovideacodeexampleforeachcase.Youshouldconsiderbothold-styleDLXinstructionsandinstructionsusingBensnewformat.Youmayassumethatthedatapathisfully-bypassed.Donotconsiderjumpsorbranchesfornow.Stillignoringjumpsandbranches,howperformaneechangeifnon-BIFinstructionsusedthenewALUintheMAphaseinstasdoftheoriginalALUintheexecutephase?Problem2.CWritetheequationsforws,we,re1,re2forthenewdatapathwithnon-BIFinstructionsusingtheoriginalALUintheexecutephase.Writethestallsignalusingws,we,re1,andre2.Youmayneedothersignals.Thesignalsfortheoriginaldatapathareprovidedhereforyourconvenience.Again,donotconsiderjumpsorbranchesfornow.ws二caseopcodeALU => rf3ALUi’LW刁rf2JAL.JALRnR31we二caseopcodeALUtALUi,LW (wsRO)JALrJALR ron… noffrel二caseopcodeALU.ALUi„LW.SW.BEQZ,J尺JALRnonJ.JAL 0offre2=caseopcodeALU.SW =>on( (offcstallstall=(rfID=wsE)((opcodeE=LWE)((wsE(R0)(relD+(rf2D=w5E)((opcodeE=LWE)((w^E(R0)(re2DProblem2.DNowconsiderjumpsandbranches.Whatadditionalhazardscanoccur?Giveanexampleforeachcase.Withthenewinstructionformat,Benthinksthatwecanspeedupconditionalbranchesifweallowaninstructionthatcombinesthecomparewiththebranch.Forexample,5LTR1.K21 labelwouldmean:il(R1<R2)thenbranchtoLabelThefirstinstruction(inthisexample,theSLTinstruction)wouldbeperformedontheSLUintheEXpahse.Theresult(a0or1),insteadofbeingwrittentotheregisterfile,wouldbepassedtotheMAphasewherethezerotestwouldbeperformedonthenewALU.Howmanydelayslotswillthistypeofinstructionrequiretoavoidanystalls?精品文档Howcouldyoureducethenumberofdelayslotsthatareneeded,withoutintroducinganynewstallconditionsorkillinginstructionsinthepipeline?Foreachcasethatyouconsider,arguewhateffectitwillhaveontheclockperiod.Eachcaseshouldcorrespondtoanimplementationwithadifferentnumberofdelayslots.Giventheoptionsyouinvestigatedabove,argueinafewsentenceswhichoftheseoptionsisthebest.Considerdelayslots,stalls,circuitsize,andclockperiod.Problem2.EBenfindsthattheadditionalreadportintheregisterfileisincreasingthelengthofthecriticalpathintheprocessor,andthattheycannotclockthenewdatapathatashighaspeedastheoriginal.Totryandsolvetheproblem,heisgoingtotryandusetheoriginalregisterfile.Sincetheoriginalregisterfileonlyhastworeadports,onlytwooftheoperandscanbereadintheIDphase.ThethirdoperandisgoingtobereadintheEXphase,inparallelwiththefirstALUoperation.Whatotherchangesareneededtomakethisschemework?Howdothesechangesaffecttheperformaneeoftheprocessor?AlyssathinksthatBencansolvehisproblembyaddingasecondregisterfile,identicaltothefirst.Howwouldthisschemework?HowdoestheperformanceofAly'aBchemecompatrtothetwothatBentried?Problem2.FWiththenewBIFinstructions,howwillcodesizechange?Willthishaveaneffectonperformanee?Problem3:CacheAccess-Time&PerformanceBenistryingtodeterminethebestcacheconfigurationforanewprocessor.Heknowshowtobuildthreekindsofcaches:direct-mappedcaches,2-wayser-associativecaches,andsmallfullyassociativecaches.Thegoalistofindthebestcacheconfigurationwiththegivenbuildingblocks.Problem3.ASinceheonlyknowshowtobuildverysmallfullyassociativecaches,Bendecidedtouseeitherdirect-mappedor2-wayset-associativeasthebasiccacheconfiguration.Hewantstoknowhowthesetwodifferentconfigurationsaffecttheclockspeedandthecachemiss-rate,andchoosetheonethatprovidesbetterperformaneeintermsofaveragelatencyforaload.Problem3.AAccesstimeiDMThefollowingdiagramshowshowadirect-mappedcacheisorganized.Toreadawordfromthecache,theinputaddressissetbytheprocessor.Thentheindexportionoftheaddressisdecodedtoaccesstheproperrowinthetagmemoryarrayandinthedatamemoryarray.Theselectedtagiscomparedtothetagportionoftheinputaddresstodetermineiftheaccessisahitornot.Atthesametime,thecorrespondingcacheblockisreadandtheproperlineidselectedthroughaMUX.Inthememoryarray,eachrowcorrespondstoarowinthecache.Forexample,arowinthetagmemoryarraycontainsonetagandtwostatusbits(validanddirty)forthecacheline.Fordirect-mappedcaches,arowinthedataarrayholdsonecacheline.Nowwewanttocomputetheaccesstimeofthecache.Assumea32-KBcachewith8-word(32-byte)cachelines.Theaddressis32bits,andtwoLSBoftheaddressisignoredsinceacacheaccessisword-aligned.Thedataoutputisalso32bits,andtheMUXselectsonewordoutoftheeightwordsinacacheline.UsingthhedelayequationsgiveninTable3-1,fillinthecolumnforthedirect=mapped(DM)cacheinTable3-1.Intheequationforthedataoutputdriver,associativity”referstotheassociativityofthecache(1ordirect-mappedcaches,AforA-wayset-associativecaches).ComponentDelayequation(ps)DM(ps)SA(ps)Decoder200k(#ofindexbits)—1000TagDataMeniorsrarray200xlogj ofrows)-200xlog:侍ofbirsmarow)+1000TagDaraComparator200x(#oftagbits)+1000[N-to-1MUX500xlog2N+1000BufFerdriver2000D自祐outputdriver500<(as5ociari\iiv)+1000Validoutput1000Table3-1:DthyofeachCacheComponentWhatidthecriticalpathofthisdirect-mappedcacheforacacheread?Whatistheaccesstimeofthecache(thedelayofthecriticalpath)?Tocomputetheaccesstime,assumethatagate(AND,OR)delayis500(ps).IftheCPUclockis150MHz,howmanyCPUcyclesdoesacacheaccesstake?Problem3.B Accesstime:SATheimplementationofa2-wayset-associativecacheisshowninthefollowingdiagram.Theindexpartoftheinputaddressisagainusedtofindtheproperrowinthedatamemoryarrayandthetagmemoryarray.Inthiscase,however,eachrowcorrespondstotwocachelines(onecacheset).Arowinthedatamemoryholdstwocachelines(for32-bytescachelines,64bytes),andarowinthetagmemoryarraycontainstwotagsandstatusbitsforthosetags(2bitspercacheline).Thetagmemoryandthedatamemoryareaccessedinparallel,buttheoutputdatadriverisenabledonlyifthereisacachehit.Assumethetotalcachesizeis32-KB(eachwayis16-KB)andallotherparameters(suchastheinputaddress,cacheline,etc.)arethesameaspart3.A.Computethedelayofeachcomponentandfillinthecolumnfora2-wayset=associativecacheinTable3-1.Whatisthecriticalpathofthe2-wayset-associativecache?Whatistheaccesstimeofthecache(thedelayofthecriticalpath)?Whatisthemainreasonthatthe2-wayset-associativecacheisslowerthanthedirect-mappedcache?IftheCPUclockis150MHz,howmanyCPUxyxlesdoesacacheaccesstake?
InputAddressProblem3.CMiss-rateanalysisInputAddressProblem3.CNowBenisstudyingtheeffectofset-associativityonthecacheperformanee.Sincehenowknowstheaccesstimeofeachconfiguration,hewantstoknowthemiss=rateofeachone.Forthemiss-rateanalysis,Benisconsideringtwosmallcaches:adirect=mappedcachewith4lineswith16bytes/line,anda2-waqyset-associativecache,usingaleastrecentlyusedreplacementpolicy,with4lineswith16bytes/line.Benteststhecachebyaccessingthefollowingsequeneeifhexadecimalbyteaddress,startingwithemptycaches.Completethefollowingtablesforboththedirect-mappedcacheandthe2-wayset-associativecacheshowingtheprogressionofcachecontainsasaccessesoccur(inthetable,my”=
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026届新高考英语冲刺复习 精准立意下的续写情节构建
- 2024年古人礼仪小故事
- 相关相关项目建设管理管控管控制度
- 华晟中安安全培训价目课件
- 云南专业婚介培训课件
- 2026-2032年中国凉味剂行业市场竞争现状及发展战略研判报告
- 2025-2031年中国焦亚硫酸钾行业市场全景评估及产业前景研判报告
- 2025 小学一年级数学下册单元小结(第七单元)课件
- 2025 小学一年级数学下册儿歌教学(数字歌)课件
- G120 变频器技术及应用课件:电位器调速的电动机运行控制
- 智能水杯行业状况分析报告
- 电力部门春节安全生产培训
- 公司财务部门工作职责
- 原辅材料领料申请单
- 人教版九年级数学上册22 3 3拱桥问题和运动中的抛物线 一课一练 (含答案)
- 2023年个税工资表
- 网球运动基本知识及规则课件
- 2023新青年新机遇新职业发展趋势白皮书-人民数据研究院
- 管理学原理教材-大学适用
- 变电站一次侧设备温度在线监测系统设计
- GB/T 6579-2007实验室玻璃仪器热冲击和热冲击强度试验方法
评论
0/150
提交评论