版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
BranchInstructions
Sima,FountainandKacsuk
Chapter8CSE33041BranchInstructions
Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution2MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta3TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:4TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional5WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha6WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions7ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle8ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta29Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary10DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r111ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/613Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution14BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures15BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r616DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub17Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis18DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1019CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere20DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot21Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions22AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions23AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4524AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4525AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite26FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch27EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC28ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching29HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite30BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted31SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory32BranchpredictionschemesFixedFixedPrediction
(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath33FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTACORRECT34FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT35FixedPrediction
(GuessNotFixedPrediction
(GuessTaken)ConditionKnownEvaluateBTAINCORRECT36FixedPrediction
(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.37StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit38StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend
toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?39DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme40DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides
whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome41BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPredictionTakenPredictionNotTaken42Statetransitionsin2bitschStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNOTTakenATWeaklyTakenATANTWeaklyNOTTakenATWeaklyTaken43Statetransitionsin2bitschImplicitDynamicSchemeTwomainschemesBranchTargetAccessCache(BTAC)BranchTargetInstructionCache(BTIC)BothschemesintroduceanextracacheEntriesareonlystoredfortakenbranchesNottakenbranchesarenotstoresIfanentryisincache,thennextbranchistakenBehaveslike1bit44ImplicitDynamicSchemeTwomaiBTACandBTICimplementationStoreBranchAddressPLUSThetargetaddressitself(BTAC)Theinstructionatthetarget(BTIC)10002000LoadR1,….100020001000LoadR1,….45BTACandBTICimplementationStBTACimplementation...LoadR1,….1000200010002000Don’ttakebranchSpeculativelyWhoops!AddthisbranchaddresstothecacheNexttimeTakebranchSpeculatively46BTACimplementation...LoadR1PerformancedifferencesinBTACandBTICBTACdeliversaddressofbranchintimefornextfetchBTICdeliversactualinstruction.CanbeslowerbecauseitdoesnotrequirefetchLookupBTACFetchDec/RdExeWriteFetchDec/RdExeWriteLookupBTIC47PerformancedifferencesinBTAImplementationofHistorybitsPlacementofhistorybitsI-cacheAlpha,UltraSparcBranchHistoryTable(BHT)PowerPC604,620,R10000BranchTargetAddressCache(BTAC)MC68060,Pentium,R8000I-cacheandBHTeffectivelythesame48ImplementationofHistorybitsI-CacheandBHT(PowerPC604)I-cache16KFourwaysetassociativeInstructionfetchaddressBHT128x4entriesPredictionLogic4instr./cycle2HistoryBitsTaken/NottakenBTAforatakenguessDecodeQueue4x1Instr.IssueQueue49I-CacheandBHT(PowerPC604)PredictionAccuracyP=fc*Pc+fm*Pmwherefc: Probabilityofcorrectlypredictingbranchesfm: Probabilityofmis-predictingbranchesPc: PenaltyofcorrectlypredictedbranchesPm: Penaltyofmis-predictedbranches50PredictionAccuracyP=fc*PcPredictionAccuracy….51PredictionAccuracy….51RecoveryfromMispredictionIfbranchpredictionhardwaremakeswrongguess,needtoreverttoalternativepathofexecution.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite?52RecoveryfromMispredictionIfRecoveryfromMisprediction...Forpipelinedmachines,maybepossibletoabortregisterwritesStoreinstructionsarehardertorecoverConditioncodesmightalsoneedtoberestored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegisterFile53RecoveryfromMisprediction..Schemestoshortenmis-predictionrecoveryBasicpriormeasuresforrecoveryIna“taken”guess,savesequentialaddressIna“nottaken”guess,pre-calculateandsavebranchtargetaddressRequirestwoaddressregistersperspeculatedconditionalbranchSavethisaddressSavethisaddress54Schemestoshortenmis-predictSchemestoshortenmis-predictionrecovery...EnhancedpriormeasurestoshortenrecoveryIna“taken”guesssavesequentialaddresssaveprefetchedsequentialinstructionsIna“nottaken”guesspre-calculateandsavebranchtargetaddresspre-fetchbranchtargetinstructionsrequiresmultipleinstructionbuffers55Schemestoshortenmis-predictSchemestoshortenmis-predictionrecovery…..FromI-CacheSequentialI-bufferTargetI-bufferDecodeLoadedwhenbranchhasbeendetectedSuperSparc(1992)dualI-buffers56Schemestoshortenmis-predictMultiwayBranchingFollowmultiplepathsconcurrently.Requiresadditionalprocessinghardwareforstoringextrastateformultipleregistersforadditionalarithmeticunits57MultiwayBranchingFollowmultiMultiwaybranching...ConditionKnownRegistersRegisters58Multiwaybranching...ConditioGuardedExecutionManybranchescanbeeliminatedbyusingspecialGuardedinstructionsGuardedinstructionsbehavedifferentlydependingonapredicateConsiderthefollowingexamplebeq ra,label if(ra)=0branchtolabel or rb,rb,rc elsemove(rb)intoRcReplacedbycmovne ra,rb,rc59GuardedExecutionManybranchesGeneralformofGuardedExecutionGeneralformis(guard)instructionInstructionisonlyexecutediftheguardpredicateistruePossibleimplementationbydoingoperationandconditionallywritingregisterfileFetchDec/RdALUWrite60GeneralformofGuardedExecutEffectivenessofGuardedExecution61EffectivenessofGuardedExecuBranchInstructions
Sima,FountainandKacsuk
Chapter8CSE330462BranchInstructions
Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution63MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta64TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:65TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional66WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha67WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions68ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle69ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta270Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary71DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r172ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch73TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/674Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution75BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures76BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r677DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub78Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis79DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1080CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere81DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot82Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions83AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions84AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4585AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4586AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite87FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch88EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC89ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching90HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite91BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted92SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory93BranchpredictionschemesFixedFixedPrediction
(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath94FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTACORRECT95FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT96FixedPrediction
(GuessNotFixedPrediction
(GuessTaken)ConditionKnownEvaluateBTAINCORRECT97FixedPrediction
(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.98StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit99StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend
toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?100DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme101DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides
whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome102BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANT
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《行政职业能力测验》2022年公务员考试苏尼特左旗全真模拟试题含解析
- 明光市2022年公务员考试《行政职业能力测验》全真模拟试题含解析
- 4-地下外防水施工方案
- 六朝画像砖研究
- 沐浴养生知识
- 个人教育教学工作总结
- 个人体育工作总结
- 在线教育平台营销策略研究以猿辅导为例
- 上学期高三班主任的工作总结
- 三年级备课组期中工作总结
- 初中英语学科教学的项目化教学课件
- 初中英语九年级 Unit 10 Youre supposed to shake hands Section A (3a-3c) 教学设计
- 长庆油田公司预防监守自盗、内勾外联涉油违法犯罪专题法治讲座
- 2022年南京晓庄学院专升本(英语)科目考试(重点)题库
- 博雅汉语初级起步篇第22课课件
- 儿科医患沟通-课件
- 萝卜回来了-完整版课件
- 2020湖南省长沙市中考化学试卷
- 2023年北京城市副中心投资建设集团有限公司校园招聘笔试题库及答案解析
- 起重机械产品质量证明书 起重机械产品合格证 产品技术特性模板
- 私募基金公司财务管理制度
评论
0/150
提交评论