版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
BranchInstructions
Sima,FountainandKacsuk
Chapter8CSE33041BranchInstructions
Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution2MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta3TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:4TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional5WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha6WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions7ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle8ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta29Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary10DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r111ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/613Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution14BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures15BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r616DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub17Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis18DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1019CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere20DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot21Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions22AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions23AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4524AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4525AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite26FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch27EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC28ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching29HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite30BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted31SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory32BranchpredictionschemesFixedFixedPrediction
(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath33FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTACORRECT34FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT35FixedPrediction
(GuessNotFixedPrediction
(GuessTaken)ConditionKnownEvaluateBTAINCORRECT36FixedPrediction
(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.37StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit38StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend
toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?39DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme40DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides
whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome41BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPredictionTakenPredictionNotTaken42Statetransitionsin2bitschStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNOTTakenATWeaklyTakenATANTWeaklyNOTTakenATWeaklyTaken43Statetransitionsin2bitschImplicitDynamicSchemeTwomainschemesBranchTargetAccessCache(BTAC)BranchTargetInstructionCache(BTIC)BothschemesintroduceanextracacheEntriesareonlystoredfortakenbranchesNottakenbranchesarenotstoresIfanentryisincache,thennextbranchistakenBehaveslike1bit44ImplicitDynamicSchemeTwomaiBTACandBTICimplementationStoreBranchAddressPLUSThetargetaddressitself(BTAC)Theinstructionatthetarget(BTIC)10002000LoadR1,….100020001000LoadR1,….45BTACandBTICimplementationStBTACimplementation...LoadR1,….1000200010002000Don’ttakebranchSpeculativelyWhoops!AddthisbranchaddresstothecacheNexttimeTakebranchSpeculatively46BTACimplementation...LoadR1PerformancedifferencesinBTACandBTICBTACdeliversaddressofbranchintimefornextfetchBTICdeliversactualinstruction.CanbeslowerbecauseitdoesnotrequirefetchLookupBTACFetchDec/RdExeWriteFetchDec/RdExeWriteLookupBTIC47PerformancedifferencesinBTAImplementationofHistorybitsPlacementofhistorybitsI-cacheAlpha,UltraSparcBranchHistoryTable(BHT)PowerPC604,620,R10000BranchTargetAddressCache(BTAC)MC68060,Pentium,R8000I-cacheandBHTeffectivelythesame48ImplementationofHistorybitsI-CacheandBHT(PowerPC604)I-cache16KFourwaysetassociativeInstructionfetchaddressBHT128x4entriesPredictionLogic4instr./cycle2HistoryBitsTaken/NottakenBTAforatakenguessDecodeQueue4x1Instr.IssueQueue49I-CacheandBHT(PowerPC604)PredictionAccuracyP=fc*Pc+fm*Pmwherefc: Probabilityofcorrectlypredictingbranchesfm: Probabilityofmis-predictingbranchesPc: PenaltyofcorrectlypredictedbranchesPm: Penaltyofmis-predictedbranches50PredictionAccuracyP=fc*PcPredictionAccuracy….51PredictionAccuracy….51RecoveryfromMispredictionIfbranchpredictionhardwaremakeswrongguess,needtoreverttoalternativepathofexecution.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite?52RecoveryfromMispredictionIfRecoveryfromMisprediction...Forpipelinedmachines,maybepossibletoabortregisterwritesStoreinstructionsarehardertorecoverConditioncodesmightalsoneedtoberestored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegisterFile53RecoveryfromMisprediction..Schemestoshortenmis-predictionrecoveryBasicpriormeasuresforrecoveryIna“taken”guess,savesequentialaddressIna“nottaken”guess,pre-calculateandsavebranchtargetaddressRequirestwoaddressregistersperspeculatedconditionalbranchSavethisaddressSavethis
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- GB/T 46758-2025纸浆硫酸盐法蒸煮液总碱、活性碱和有效碱的测定(电位滴定法)
- 2025年大学农学(作物研究)试题及答案
- 2025年大学安全教育(人身安全防护)试题及答案
- 2025年中职(物联网技术应用)传感器应用试题及解析
- 2025年大学本科一年级(临床医学)人体解剖基础测试题及答案
- 2025年高职(园林管理)园林景区运营管理综合测试题及答案
- 2025年大学大一(康复治疗学)康复心理学基础阶段测试题及答案
- 2025年大学工业工程(工业4.0研发)试题及答案
- 2025年大学森林消防(森林灭火技术)试题及答案
- 2025年中职(学前教育)幼儿教育学阶段测试题及答案
- ISO27001信息安全管理体系培训资料
- 四年级语文国测模拟试题 (1)附有答案
- 2024-2030年墨西哥数码打印机墨水市场前景分析
- 固定式、车载式、便携式反无人机实施方案
- 餐饮投资项目计划书
- 广州小学英语单词分类识记表-注音版
- 男朋友打游戏申请表
- 危险化学品经营许可证变更申请书(附件2)
- 职业培训师的8堂私房课:修订升级版
- 18621客运服务礼仪题库(114道)
- 多园区管理模式下的机制建设
评论
0/150
提交评论