28-Exploiting Choice - Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor.pdf28-Exploiting Choice - Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor.pdf

收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

EXPLOITINGCHOICEINSTRUCTIONFETCHANDISSUEONANIMPLEMENTABLESIMULTANEOUSMULTITHREADINGPROCESSORDEANMTULLSEN,SUSANJEGGERS,JOELSEMERY,HENRYMLEVY,JACKLLO,ANDREBECCALSTAMMYDEPTOFCOMPUTERSCIENCEANDENGINEERINGYDIGITALEQUIPMENTCORPORATIONUNIVERSITYOFWASHINGTONHLO23/J3BOX35235077REEDROADSEATTLE,WA981952350HUDSON,MA01749ABSTRACTSIMULTANEOUSMULTITHREADINGISATECHNIQUETHATPERMITSMULTIPLEINDEPENDENTTHREADSTOISSUEMULTIPLEINSTRUCTIONSEACHCYCLEINPREVIOUSWORKWEDEMONSTRATEDTHEPERFORMANCEPOTENTIALOFSIMULTANEOUSMULTITHREADING,BASEDONASOMEWHATIDEALIZEDMODELINTHISPAPERWESHOWTHATTHETHROUGHPUTGAINSFROMSIMULTANEOUSMULTITHREADINGCANBEACHIEVEDWITHOUTEXTENSIVECHANGESTOACONVENTIONALWIDEISSUESUPERSCALAR,EITHERINHARDWARESTRUCTURESORSIZESWEPRESENTANARCHITECTUREFORSIMULTANEOUSMULTITHREADINGTHATACHIEVESTHREEGOALS1ITMINIMIZESTHEARCHITECTURALIMPACTONTHECONVENTIONALSUPERSCALARDESIGN,2ITHASMINIMALPERFORMANCEIMPACTONASINGLETHREADEXECUTINGALONE,AND3ITACHIEVESSIGNIFICANTTHROUGHPUTGAINSWHENRUNNINGMULTIPLETHREADSOURSIMULTANEOUSMULTITHREADINGARCHITECTUREACHIEVESATHROUGHPUTOF54INSTRUCTIONSPERCYCLE,A25FOLDIMPROVEMENTOVERANUNMODIFIEDSUPERSCALARWITHSIMILARHARDWARERESOURCESTHISSPEEDUPISENHANCEDBYANADVANTAGEOFMULTITHREADINGPREVIOUSLYUNEXPLOITEDINOTHERARCHITECTURESTHEABILITYTOFAVORFORFETCHANDISSUETHOSETHREADSMOSTEFFICIENTLYUSINGTHEPROCESSOREACHCYCLE,THEREBYPROVIDINGTHE“BEST”INSTRUCTIONSTOTHEPROCESSOR1INTRODUCTIONSIMULTANEOUSMULTITHREADINGSMTISATECHNIQUETHATPERMITSMULTIPLEINDEPENDENTTHREADSTOISSUEMULTIPLEINSTRUCTIONSEACHCYCLETOASUPERSCALARPROCESSOR’SFUNCTIONALUNITSSMTCOMBINESTHEMULTIPLEINSTRUCTIONISSUEFEATURESOFMODERNSUPERSCALARSWITHTHELATENCYHIDINGABILITYOFMULTITHREADEDARCHITECTURESUNLIKECONVENTIONALMULTITHREADEDARCHITECTURES1,2,15,23,WHICHDEPENDONFASTCONTEXTSWITCHINGTOSHAREPROCESSOREXECUTIONRESOURCES,ALLHARDWARECONTEXTSINANSMTPROCESSORAREACTIVESIMULTANEOUSLY,COMPETINGEACHCYCLEFORALLAVAILABLERESOURCESTHISDYNAMICSHARINGOFTHEFUNCTIONALUNITSALLOWSSIMULTANEOUSMULTITHREADINGTOSUBSTANTIALLYINCREASETHROUGHPUT,ATTACKINGTHETWOMAJORIMPEDIMENTSTOPROCESSORUTILIZATIONLONGLATENCIESANDLIMITEDPERTHREADPARALLELISMTULLSEN,ETAL,27SHOWEDTHEPOTENTIALOFPROCEEDINGSOFTHE23RDANNUALINTERNATIONALSYMPOSIUMONCOMPUTERARCHITECTURE,PHILADELPHIA,PA,MAY,1996ANSMTPROCESSORTOACHIEVESIGNIFICANTLYHIGHERTHROUGHPUTTHANEITHERAWIDESUPERSCALARORAMULTITHREADEDPROCESSORTHATPAPERALSODEMONSTRATEDTHEADVANTAGESOFSIMULTANEOUSMULTITHREADINGOVERMULTIPLEPROCESSORSONASINGLECHIP,DUETOSMT’SABILITYTODYNAMICALLYASSIGNEXECUTIONRESOURCESWHERENEEDEDEACHCYCLETHOSERESULTSSHOWEDSMT’SPOTENTIALBASEDONASOMEWHATIDEALIZEDMODELTHISPAPEREXTENDSTHATWORKINFOURSIGNIFICANTWAYSFIRST,WEDEMONSTRATETHATTHETHROUGHPUTGAINSOFSIMULTANEOUSMULTITHREADINGAREPOSSIBLEWITHOUTEXTENSIVECHANGESTOACONVENTIONAL,WIDEISSUESUPERSCALARPROCESSORWEPROPOSEANARCHITECTURETHATISMORECOMPREHENSIVE,REALISTIC,ANDHEAVILYLEVERAGEDOFFEXISTINGSUPERSCALARTECHNOLOGYOURSIMULATIONSSHOWTHATAMINIMALIMPLEMENTATIONOFSIMULTANEOUSMULTITHREADINGACHIEVESTHROUGHPUT18TIMESTHATOFTHEUNMODIFIEDSUPERSCALAR;SMALLTUNINGOFTHISARCHITECTUREINCREASESTHATGAINTO25REACHINGTHROUGHPUTASHIGHAS54INSTRUCTIONSPERCYCLESECOND,WESHOWTHATSMTNEEDNOTCOMPROMISESINGLETHREADPERFORMANCETHIRD,WEUSEOURMOREDETAILEDARCHITECTURALMODELTOANALYZEANDRELIEVEBOTTLENECKSTHATDIDNOTEXISTINTHEMOREIDEALIZEDMODELFOURTH,WESHOWHOWSIMULTANEOUSMULTITHREADINGCREATESANADVANTAGEPREVIOUSLYUNEXPLOITABLEINOTHERARCHITECTURESNAMELY,THEABILITYTOCHOOSETHE“BEST”INSTRUCTIONS,FROMALLTHREADS,FORBOTHFETCHANDISSUEEACHCYCLEBYFAVORINGTHETHREADSMOSTEFFICIENTLYUSINGTHEPROCESSOR,WECANBOOSTTHETHROUGHPUTOFOURLIMITEDRESOURCESWEPRESENTSEVERALSIMPLEHEURISTICSFORTHISSELECTIONPROCESS,ANDDEMONSTRATEHOWSUCHHEURISTICS,WHENAPPLIEDTOTHEFETCHMECHANISM,CANINCREASETHROUGHPUTBYASMUCHAS37THISPAPERISORGANIZEDASFOLLOWSSECTION2PRESENTSOURBASELINESIMULTANEOUSMULTITHREADINGARCHITECTURE,COMPARINGITWITHEXISTINGSUPERSCALARTECHNOLOGYSECTION3DESCRIBESOURSIMULATORANDOURWORKLOAD,ANDSECTION4SHOWSTHEPERFORMANCEOFTHEBASELINEARCHITECTUREINSECTION5,WEEXAMINETHEINSTRUCTIONFETCHPROCESS,PRESENTSEVERALHEURISTICSFORIMPROVINGITBASEDONINTELLIGENTINSTRUCTIONSELECTION,ANDGIVEPERFORMANCERESULTSTODIFFERENTIATETHOSEHEURISTICSSECTION6EXAMINESTHEINSTRUCTIONISSUEPROCESSINASIMILARWAYWETHENUSETHEBESTDESIGNSCHOSENFROMOURFETCHANDISSUESTUDIESINSECTION7ASABASISTODISCOVERBOTTLENECKSFORFURTHERPERFORMANCEIMPROVEMENTWEDISCUSSRELATEDWORKINSECTION8ANDSUMMARIZEOURRESULTSINSECTION9THISRESEARCHWASSUPPORTEDBYONRGRANTSN0001492J1395ANDN000149411136,NSFGRANTSCCR9200832ANDCDA9123308,NSFPYIAWARDMIP9058439,THEWASHINGTONTECHNOLOGYCENTER,DIGITALEQUIPMENTCORPORATION,ANDFELLOWSHIPSFROMINTELANDTHECOMPUTERMEASUREMENTGROUPINSTRUCTIONCACHE8DECODEREGISTERRENAMINGFLOATINGPOINTINSTRUCTIONQUEUEINTEGERINSTRUCTIONQUEUEFPUNITSINT/LDSTOREUNITSDATACACHEPCFETCHUNITINTEGERREGISTERSFPREGISTERSFIGURE1OURBASESIMULTANEOUSMULTITHREADINGHARDWAREARCHITECTURE2ASIMULTANEOUSMULTITHREADINGPROCESSORARCHITECTUREINTHISSECTIONWEPRESENTTHEARCHITECTUREOFOURSIMULTANEOUSMULTITHREADINGPROCESSORWESHOWTHATTHETHROUGHPUTGAINSPROVIDEDBYSIMULTANEOUSMULTITHREADINGAREPOSSIBLEWITHOUTADDINGUNDUECOMPLEXITYTOACONVENTIONALSUPERSCALARPROCESSORDESIGNOURSMTARCHITECTUREISDERIVEDFROMAHIGHPERFORMANCE,OUTOFORDER,SUPERSCALARARCHITECTUREFIGURE1,WITHOUTTHEEXTRAPROGRAMCOUNTERSWHICHREPRESENTSAPROJECTIONOFCURRENTSUPERSCALARDESIGNTRENDS35YEARSINTOTHEFUTURETHISSUPERSCALARPROCESSORFETCHESUPTOEIGHTINSTRUCTIONSPERCYCLE;FETCHINGISCONTROLLEDBYACONVENTIONALSYSTEMOFBRANCHTARGETBUFFER,BRANCHPREDICTION,ANDSUBROUTINERETURNSTACKSFETCHEDINSTRUCTIONSARETHENDECODEDANDPASSEDTOTHEREGISTERRENAMINGLOGIC,WHICHMAPSLOGICALREGISTERSONTOAPOOLOFPHYSICALREGISTERS,REMOVINGFALSEDEPENDENCESINSTRUCTIONSARETHENPLACEDINONEOFTWOINSTRUCTIONQUEUESTHOSEINSTRUCTIONQUEUESARESIMILARTOTHEONESUSEDBYTHEMIPSR1000020ANDTHEHPPA800021,INTHISCASEHOLDINGINSTRUCTIONSUNTILTHEYAREISSUEDINSTRUCTIONSAREISSUEDTOTHEFUNCTIONALUNITSOUTOFORDERWHENTHEIROPERANDSAREAVAILABLEAFTERCOMPLETINGEXECUTION,INSTRUCTIONSARERETIREDINORDER,FREEINGPHYSICALREGISTERSTHATARENOLONGERNEEDEDOURSMTARCHITECTUREISASTRAIGHTFORWARDEXTENSIONTOTHISCONVENTIONALSUPERSCALARDESIGNWEMADECHANGESONLYWHENNECESSARYTOENABLESIMULTANEOUSMULTITHREADING,ANDINGENERAL,STRUCTURESWERENOTREPLICATEDORRESIZEDTOSUPPORTSMTORAMULTITHREADEDWORKLOADTHUS,NEARLYALLHARDWARERESOURCESREMAINCOMPLETELYAVAILABLEEVENWHENTHEREISONLYASINGLETHREADINTHESYSTEMTHECHANGESNECESSARYTOSUPPORTSIMULTANEOUSMULTITHREADINGONTHATARCHITECTUREAREMULTIPLEPROGRAMCOUNTERSANDSOMEMECHANISMBYWHICHTHEFETCHUNITSELECTSONEEACHCYCLE,ASEPARATERETURNSTACKFOREACHTHREADFORPREDICTINGSUBROUTINERETURNDESTINATIONS,PERTHREADINSTRUCTIONRETIREMENT,INSTRUCTIONQUEUEFLUSH,ANDTRAPMECHANISMS,ATHREADIDWITHEACHBRANCHTARGETBUFFERENTRYTOAVOIDPREDICTINGPHANTOMBRANCHES,ANDALARGERREGISTERFILE,TOSUPPORTLOGICALREGISTERSFORALLTHREADSPLUSADDITIONALREGISTERSFORREGISTERRENAMINGTHESIZEOFTHEREGISTERFILEAFFECTSTHEPIPELINEWEADDTWOEXTRASTAGESANDTHESCHEDULINGOFLOADDEPENDENTINSTRUCTIONS,WHICHWEDISCUSSLATERINTHISSECTIONNOTICEABLYABSENTFROMTHISLISTISAMECHANISMTOENABLESIMULTANEOUSMULTITHREADEDSCHEDULINGOFINSTRUCTIONSONTOTHEFUNCTIONALUNITSBECAUSEANYAPPARENTDEPENDENCESBETWEENINSTRUCTIONSFROMDIFFERENTTHREADSAREREMOVEDBYTHEREGISTERRENAMINGPHASE,ACONVENTIONALINSTRUCTIONQUEUEIQDESIGNEDFORDYNAMICSCHEDULINGCONTAINSALLOFTHEFUNCTIONALITYNECESSARYFORSIMULTANEOUSMULTITHREADINGTHEINSTRUCTIONQUEUEISSHAREDBYALLTHREADSANDANINSTRUCTIONFROMANYTHREADINTHEQUEUECANISSUEWHENITSOPERANDSAREAVAILABLEWEFETCHFROMONEPROGRAMCOUNTERPCEACHCYCLETHEPCISCHOSEN,INROUNDROBINORDER,FROMAMONGTHOSETHREADSNOTALREADYEXPERIENCINGANICACHEMISSTHISSCHEMEPROVIDESSIMULTANEOUSMULTITHREADINGATTHEPOINTOFISSUE,BUTONLYFINEGRAINMULTITHREADINGOFTHEFETCHUNITWEWILLLOOKINSECTION5ATWAYSTOEXTENDSIMULTANEOUSMULTITHREADINGTOTHEFETCHUNITWEALSOINVESTIGATEALTERNATIVETHREADPRIORITYMECHANISMSFORFETCHINGAPRIMARYIMPACTOFMULTITHREADINGONOURARCHITECTUREISONTHESIZEOFTHEREGISTERFILEWEHAVEASINGLEREGISTERFILE,ASTHREADSPECIFICLOGICALREGISTERSAREMAPPEDONTOACOMPLETELYSHAREDPHYSICALREGISTERFILEBYTHEREGISTERRENAMINGTOSUPPORTEIGHTTHREADS,WENEEDAMINIMUMOF832256PHYSICALINTEGERREGISTERSFORA32REGISTERINSTRUCTIONSETARCHITECTURE,PLUSMORETOENABLEREGISTERRENAMINGACCESSTOSUCHALARGEREGISTERFILEWILLBESLOW,ALMOSTCERTAINLYAFFECTINGTHECYCLETIMEOFTHEMACHINETOACCOUNTFORTHESIZEOFTHEREGISTERFILE,WETAKETWOCYCLESTOREADREGISTERSINSTEADOFONEINTHEFIRSTCYCLEVALUESAREREADINTOABUFFERCLOSERTOTHEFUNCTIONALUNITSTHEINSTRUCTIONISSENTTOASIMILARBUFFERATTHESAMETIMETHENEXTCYCLETHEDATAISSENTTOAFUNCTIONALUNITFOREXECUTIONWRITESTOTHEREGISTERFILEARETREATEDSIMILARLY,REQUIRINGANEXTRAREGISTERWRITESTAGEFIGURE2SHOWSTHEPIPELINEMODIFIEDFORTWOPHASEREGISTERACCESS,COMPAREDTOTHEPIPELINEOFTHEORIGINALSUPERSCALARTHETWOSTAGEREGISTERACCESSHASSEVERALRAMIFICATIONSONOURARCHITECTUREFIRST,ITINCREASESTHEPIPELINEDISTANCEBETWEENFETCHANDEXEC,INCREASINGTHEBRANCHMISPREDICTIONPENALTYBY1CYCLESECOND,ITTAKESANEXTRACYCLETOWRITEBACKRESULTS,REQUIRINGANEXTRALEVELOFBYPASSLOGICTHIRD,INCREASINGTHEDISTANCEBETWEENQUEUEEXECREGREADDECODEFETCHRENAMECOMMITMISFETCHPENALTY2CYCLESREGISTERUSAGE4CYCLEMINIMUMMISPREDICTPENALTY6CYCLESAMISFETCHPENALTY2CYCLESQUEUEREGWRITEEXECREGREADREGREADDECODEFETCHRENAMECOMMITREGISTERUSAGE6CYCLEMINIMUMMISPREDICTPENALTY7CYCLESMISQUEUEPENALTY4CYCLESBFIGURE2THEPIPELINEOFAACONVENTIONALSUPERSCALARPROCESSORANDBTHATPIPELINEMODIFIEDFORANSMTPROCESSOR,ALONGWITHSOMEIMPLICATIONSOFTHOSEPIPELINESQUEUEANDEXECINCREASESTHEPERIODDURINGWHICHWRONGPATHINSTRUCTIONSREMAININTHEPIPELINEAFTERAMISPREDICTIONISDISCOVEREDTHEMISQUEUEPENALTYINFIGURE2WRONGPATHINSTRUCTIONSARETHOSEINSTRUCTIONSBROUGHTINTOTHEPROCESSORASARESULTOFABRANCHMISPREDICTIONTHOSEINSTRUCTIONSCONSUMEINSTRUCTIONQUEUESLOTS,RENAMINGREGISTERSANDPOSSIBLYISSUESLOTS,ALLOFWHICH,ONANSMTPROCESSOR,COULDBEUSEDBYOTHERTHREADSTHISPIPELINEDOESNOTINCREASETHEINTERINSTRUCTIONLATENCYBETWEENMOSTINSTRUCTIONSDEPENDENTSINGLECYCLELATENCYINSTRUCTIONSCANSTILLBEISSUEDONCONSECUTIVECYCLES,FOREXAMPLE,ASLONGASINTERINSTRUCTIONLATENCIESAREPREDETERMINEDTHATISTHECASEFORALLINSTRUCTIONSBUTLOADSSINCEWEARESCHEDULINGINSTRUCTIONSACYCLEEARLIERRELATIVETOTHEEXECCYCLE,LOADHITLATENCYINCREASESBYONECYCLETOTWOCYCLES
编号:201401051948146806    类型:共享资源    大小:80.04KB    格式:PDF    上传时间:2014-01-05
  
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:28-Exploiting Choice - Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor.pdf
链接地址:http://www.renrendoc.com/p-256806.html

当前资源信息

4.0
 
(2人评价)
浏览:26次
baixue100上传于2014-01-05

官方联系方式

客服手机:17625900360   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

精品推荐

相关阅读

人人文库
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5