版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
BranchInstructions
Sima,FountainandKacsuk
Chapter8CSE33041BranchInstructions
Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution2MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta3TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:4TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional5WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha6WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions7ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle8ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta29Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary10DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r111ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/613Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution14BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures15BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r616DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub17Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis18DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1019CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere20DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot21Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions22AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions23AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4524AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4525AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite26FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch27EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC28ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching29HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite30BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted31SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory32BranchpredictionschemesFixedFixedPrediction
(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath33FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTACORRECT34FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT35FixedPrediction
(GuessNotFixedPrediction
(GuessTaken)ConditionKnownEvaluateBTAINCORRECT36FixedPrediction
(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.37StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit38StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend
toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?39DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme40DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides
whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome41BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPredictionTakenPredictionNotTaken42Statetransitionsin2bitschStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNOTTakenATWeaklyTakenATANTWeaklyNOTTakenATWeaklyTaken43Statetransitionsin2bitschImplicitDynamicSchemeTwomainschemesBranchTargetAccessCache(BTAC)BranchTargetInstructionCache(BTIC)BothschemesintroduceanextracacheEntriesareonlystoredfortakenbranchesNottakenbranchesarenotstoresIfanentryisincache,thennextbranchistakenBehaveslike1bit44ImplicitDynamicSchemeTwomaiBTACandBTICimplementationStoreBranchAddressPLUSThetargetaddressitself(BTAC)Theinstructionatthetarget(BTIC)10002000LoadR1,….100020001000LoadR1,….45BTACandBTICimplementationStBTACimplementation...LoadR1,….1000200010002000Don’ttakebranchSpeculativelyWhoops!AddthisbranchaddresstothecacheNexttimeTakebranchSpeculatively46BTACimplementation...LoadR1PerformancedifferencesinBTACandBTICBTACdeliversaddressofbranchintimefornextfetchBTICdeliversactualinstruction.CanbeslowerbecauseitdoesnotrequirefetchLookupBTACFetchDec/RdExeWriteFetchDec/RdExeWriteLookupBTIC47PerformancedifferencesinBTAImplementationofHistorybitsPlacementofhistorybitsI-cacheAlpha,UltraSparcBranchHistoryTable(BHT)PowerPC604,620,R10000BranchTargetAddressCache(BTAC)MC68060,Pentium,R8000I-cacheandBHTeffectivelythesame48ImplementationofHistorybitsI-CacheandBHT(PowerPC604)I-cache16KFourwaysetassociativeInstructionfetchaddressBHT128x4entriesPredictionLogic4instr./cycle2HistoryBitsTaken/NottakenBTAforatakenguessDecodeQueue4x1Instr.IssueQueue49I-CacheandBHT(PowerPC604)PredictionAccuracyP=fc*Pc+fm*Pmwherefc: Probabilityofcorrectlypredictingbranchesfm: Probabilityofmis-predictingbranchesPc: PenaltyofcorrectlypredictedbranchesPm: Penaltyofmis-predictedbranches50PredictionAccuracyP=fc*PcPredictionAccuracy….51PredictionAccuracy….51RecoveryfromMispredictionIfbranchpredictionhardwaremakeswrongguess,needtoreverttoalternativepathofexecution.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite?52RecoveryfromMispredictionIfRecoveryfromMisprediction...Forpipelinedmachines,maybepossibletoabortregisterwritesStoreinstructionsarehardertorecoverConditioncodesmightalsoneedtoberestored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegisterFile53RecoveryfromMisprediction..Schemestoshortenmis-predictionrecoveryBasicpriormeasuresforrecoveryIna“taken”guess,savesequentialaddressIna“nottaken”guess,pre-calculateandsavebranchtargetaddressRequirestwoaddressregistersperspeculatedconditionalbranchSavethisaddressSavethis
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年智能质检与服务机器人结合:客户现场质量检测方案
- 园林古建筑修复技术方案
- 2025年七年级上册生物全册知识提纲(人教版)
- 教育行业专注承诺书7篇
- 纺织服装行业智能物流配送系统建设方案
- 高新技术领域创新成果保证承诺书范文9篇
- 护理岗位人才需求趋势
- 感恩教育:感谢那些人那些事小学主题班会课件
- 工业园区建设与开发全流程手册
- 医疗机构院感防控-医疗安全继续培训
- 2026年入党积极分子培训考试试题及答案
- 2026-2026年中考英语易错题汇编
- 2026新教材语文 16.1《阿房宫赋》教学课件统编版高中语文必修下册
- 2026年上海市宝山区中考数学二模试卷(含解析)
- 2026春青岛版(五四制)三年级科学下册(全册)各单元知识点复习要点梳理
- 断肢再植术后血液循环观察指标及护理要点
- 2026年国企面试心理测试题及答案
- 2025旅游景区质量等级评分细则
- 学生饮水卫生安全课件
- 2026年潍坊三模数学测试题及答案
- 220kV主变中性点隔直装置使用及维护
评论
0/150
提交评论