20-HPS, A New Microarchitecture- Introduction and Rationale.pdf20-HPS, A New Microarchitecture- Introduction and Rationale.pdf

收藏 分享

资源预览需要最新版本的Flash Player支持。
您尚未安装或版本过低,建议您

HPS,ANEWMICROARCHITECTURERATIONALEANDINTRODUCTIONYALENPATT,WENMEIHWU,ANDMICHAELSHEBANOWCOMPUTERSCIENCEDIVISIONUNIVERSITYOFCALIFORNIA,BERKELEYBERKELEY,CA94720ABSTRACTHPSHIGHPERFORMANCESUBSTRATEISANEWMICROARCHITECTURETARGETEDFORIMPLEMENTINGVERYHIGHPERFORMANCECOMPUTINGENGINESOURMODELOFEXECUTIONISARESTRICTIONONFINEGRANULARITYDATAFLOWTHISPAPERINTRODUCESTHEMODEL,PROVIDESTHERATIONALEFORITSSELECTION,ANDDESCRIBESTHEDATAPATHANDFLOWOFINSTRUCTIONSTHROUGHTHEMICROENGINE1INTRODUCTIONACOMPUTERSYSTEMISAMULTILEVELSTRUCTURE,ALGORITHMSATTHETOP,GATESANDWIRESATTHEBOTTOMTOACHIEVEHIGHPERFORMANCE,ONEMUSTOPTIMIZEATALLLEVELSOFTHISSTRUCTUREATMOSTLEVELS,THECONVENTIONALWISDOMSUGGESTSEXPLOITINGCONCURRENCYSEVERALPROPOSALSHAVEBEENPUTFORWARDASTOHOWTODOTHISWEALSOARGUEFOREXPLOITINGCONCURRENCY,FOCUSINGINPARTICULARONTHEMICROARCHITECTURELEVEL11RESTRICTEDDATAFLOWWEARECALLINGOURENGINEHPS,WHICHSTANDSFORHIGHPERFORMANCESUBSTRATE,TOREFLECTTHENOTIONTHATWHATWEAREPROPOSINGSHOULDBEUSEFULFORIMPLEMENTINGVERYDISSIMILARISPARCHITECTURESOURMODELOFTHEMICROENGINEIE,ARESTRICTIONONCLASSICALFINEGRANULARITYDATAFLOWISNOTUNLIKETHATOFDENNIS3,ARVINDZL,ANDOTHERS,BUTWITHSOMEVERYIMPORTANTDIFFERENCESTHESEDIFFERENCESWILLBEDISCUSSEDNDETAILINSECTION3FORTHEMOMENT,ITISIMPORTANTTOUNDERSTANDTHATUNLIKECLASSICALDATAFIOWMACHINES,ONLYASMALLSUBSETOFTHEENTIREPROGRAMISINTHEHPSMICROENGINEATANYONETIMEWEDEFINETHE“ACTIVEWINDOW”ASTHESETOFISPINSTRUCTIONSWHOSECORRESPONDINGDATAFLOWNODESARECURRENTLYPARTOFTHEDATAFLOWGRAPHWHICHISRESIDENTINTHEMICROENGINEASTHEACTIVEWINDOWMOVESTHROUGHPERMISSIONTOCOPYWITHOUTFEEALLORPARTOFTHISMATERIALISGRANTEDPROVIDEDTHATTHECOPIESARCNOTMADEORDISTRIBUTEDFORDIRECTCOMMERCIALADVANTAGE,THEACMCOPYRIGHTNOTICEANDTHETITLEOFTHEPUBLICATIONANDITSDATEAPPEAR,ANDNOTICEISGIVENTHATCOPYINGISBYPERMISSIONOFTHEASSOCIATIONFORCOMPUTINGMACHINERYTOCOPYOTHERWISE,ORTOREPUBLISH,REQUIRESAFEEAND/ORSPECIFICPERMISSIONTHEDYNAMICINSTRUCTIONSTREAM,HPSEXECUTESTHEENTIREPROGRAM12POTENTIALLIMITATIONSOFOTHERAPPROACHESWEBELIEVETHATANESSENTIALINGREDIENTOFHIGHPERFORMANCECOMPUTINGISTHEEFFECTIVEUTILIZATIONOFALOTOFCONCURRENCYTHUSWESEEAPOTENTIALLIMITATIONINMICROENGINESTHATARELIMITEDTOONEOPERATIONPERCYCLESIMILARLY,WESEEAPOTENTIALLIMITATIONINAMICROENGINETHATUNDERUTILIZESITSBANDWIDTHTOEITHERINSTRUCTIONMEMORYORDATAMEMORYFINALLY,ALTHOUGHWEAPPRECIATETHEADVANTAGESOFSTATICSCHEDULING,WESEEAPOTENTIALLIMITATIONINAMICROENGINETHATPURPORTSTOEXECUTEASUBSTANTIALNUMBEROFOPERATIONSEACHCYCLE,BUTMUSTRELYONANONRUNTIMESCHEDULERFORDETERMININGWHATTODONEXT13OUTLINEOFTHISPAPERTHISPAPERISORGANIZEDINFOURSECTIONSSECTION2DELINEATESTHEFUNDAMENTALREASONSWHICHLEDUSTOTHISNEWMICROARCHITECTURESECTION3DESCRIBESTHEBASICOPERATIONOFHPSSECTION4OFFERSSOMECONCLUDINGREMARKS,ANDDESCRIBESWHEREOURRESEARCHINHPSISHEADING2RATIONALE21THETHREETIERMODELWEBELIEVETHATIRREGULARPARALLELISMINAPROGRAMEXISTSBOTHLOCALLYANDGLOBALLYOURMECHANISMEXPLOITSTHELOCALPARALLELISM,BUTDISREGARDSGLOBALPARALLELISMOURBELIEFISTHATTHEEXECUTIONOFANALGORITHMSHOULDBEHANDLEDINTHREETIERSATTHETOP,WHEREGLOBALPARALLELISMCANBEBESTIDENTIFIED,THEEXECUTIONMODELSHOULDUTILIZELARGEGRANULARITYDATAFLOWMUCHLIKETHEPROPOSALOFTHECEDARPROJECT41INTHEMIDDLE,WHEREFORTYYEARSOFCOLLECTEDEXPERIENCEINCOMPUTERPROCESSINGCANBEEXPLOITEDPROBABLYWITHOUTHARM,CLASSICALSEQUENTIALCONTROLFLOWSHOULDBETHEMODELATTHEBOTTOM,WHEREWEWANTTOEXPLOITLOCALPARALLELISM,FINEGRANULARITYDATAFLOWISRECOMMENDEDOURTHREETIERMODELREFLECTSOURCONCEPTIONTHATTHETOPLEVELSHOULDBEALGORITHMORIENTED,THEMIDDLELEVELSEQUENTIALCONTROLFLOWISPARCHITECTUREORIENTED,ANDTHEBOTTOMLEVELMICROENGINEORIENTEDQ1985ACMO897911725/85/0012/0103007522LOCALPARALLELISMWEFEELOBLIGEDTOREEMPHASIZETHEIMPORTANCEOFLOCALPARALLELISMTOOURCHOICEA4EXECUTIONMODELINDEED,WECHOSETHISRESTRICTEDFORMOFDATAFLOWSPECIFICALLYBECAUSEOURSTUDIESHAVESHOWNTHATTHEPARALLELISMAVAILABLEFROMTHEMIDDLECONTROLFLOWTIERIE,THESEQUENTIALCONTROLFLOWARCHITECTUREISHIGHLYLOCALIZEDWEARGUETHAT,BYRESTRICTINGTHEACTIVEINSTRUCTIONWINDOW,WECANEXPLOI,TALMOSTALLOFTHEINHERENTPARALLELISMINTHEPROGRAMWHILEINCURRINGVERYLITTLEOFTHESYNCHRONIZATIONCOSTSWHICHWOULDBENEEDEDTOKEEPTHEENTIREPROGRAMAROUNDASATOTALDATAFLOWGRAPH23,STALLS,BANDWIDTH,ANDCONCURRENCYWEBELIEVETHATAHIGHPERFORMANCECOMPUTINGENGINESHOULDEXHIBITANUMBEROFCHARACTERISTICSFIRST,ALLITACOMPONENTSMUSTBEKEPTBUSYTHEREMUSTBEFEWSTALLS,BOTHINTHEFLOWOFINFORMATIONIE,THEPATHTOMEMORY,LOADINGOFREGISTERS,ETCANDINTHEPROCESSINGOFINFORMATIONIE,THEFUNCTIONALUNITSSECOND,THEREMUSTBEAHIGHDEGREEOFCONCURRERMYAVAILABLE,SUCHASMULTIPLEPATHSTOMEMORY,MULTIPLEPROCESSINGELEMENTS,ANDSOMEFORMOFPIPELINING,FOREXAMPLEINOURVIEW,THERESTRICTEDDATAFLOWMODEL,WITHITSOUTOFORDEREXECUTIONCAPABILITY,BESTENABLESTHEABOVETWOREQUIREMENTS,ASFOLLOWSTHECENTEROFOURMODELISTHESETOFNODETABLES,WHEREOPERATIONSAWAITTHEIROPERANDSINSTRUCTIONMEMORYFEEDSTHEMICROENGINEATACONSTANTRATEWITHFEWSTALLSDATAMEMORYANDI/OSUPPLYANDEXTRACTDATAATCONSTANTRATESWITHFEWSTALLSFUNCTIONALUNITSAREKEPTBUSYBYNODESTHATCANFIRESOMEWHEREINTHISSYSTEM,THEREHASTOBE“SLACK”THESLACKISINTHENODESWAITINGINTHENODETABLESSINCENODESCANEXECUTEOUTOFORDER,THEREISNOBLOCKINGDUETOUNAVAILABLEDATADECODEDINSTRUCTIONSADDNODESTOTHENODETABLESANDEXECUTEDNODESREMOVETHEMTHENODETABLESTENDTOGROWINTHEPRESENCEOFDATADEPENDENCIES,ANDSHRINKASTHESEDEPENDENCIESBECOMEFEWERMEANWHILE,OURPRELIMINARYMEASUREMENTSSUPPORT,THEMULTIPLECOMPONENTSOFTHEMICROENGINEAREKEPTBUSY3THEHPSMODELOFEXECUTION31OVERVIEWANABSTRACTVIEWOFHPSISSHOWNINFIGURE1INSTRUCTIONSAREFETCHEDANDDECODEDFROMADYNAMICINSTRUCTIONSTREAM,SHOWNATTHETOPOFTHEFIGURETHEFIGUREIMPLIESTHATTHEINSTRUCTIONSTREAMISTAKENFROMASEQUENTIALCONTROLFLOWISPARCHITECTUREWENEEDTOEMPHASIZETHATTHISISNOTANECESSARYPARTOFTHEHPSSPECIFICATIONINDEED,WEAREINVESTIGATINGHAVINGHPSDIRECTLYPROCESSMULTINODEWORDSIE,THENODESOFADIRECTEDGRAPHWHICHWOULDBEPRODUCEDASTHETARGETCODEOFAFOREXAMPLECCOMPILERWHATISNECESSARYISTHAT,FOREACHINSTRUCTION,THEOUTPUTOFTHEDECODERWHICHISPRESENTEDTOTHEMERGERFORHANDLINGBYHPSISADATAFLOWGRAPHAVERYIMPORTANTPARTOFTHESPECIFICATIONOFHPSISTHENOTIONOFTHEACTIVEINSTRUCTIONWINDOWUNLIKECLASSICALDATAFLOWMACHINES,ITISNOTTHECASETHATTHEDATAFLOWGRAPHFORTHEENTIREPROGRAMISINTHEMACHINEATONETIMEWEDEFINETHEACTIVEWINDOWASTHESETOFISPINSTRUCTIONSWHOSECORRESPONDINGDATAFLOWNODESARECURRENTLYBEINGWORKEDONINTHEDATAFLOWMICROENGINEASTHEINSTRUCTIONWINDOWMOVESTHROUGHTHEDYNAMICINSTRUCTIONSTREAM,HPSEXECUTESTHEENTIREINSTRUCTIONSTREAMPARALLELISMWHICHEXISTSWITHINTHEWINDOWISFULLYEXPLOITEDBYTHEMICROENGINETHISPARALLELISMISLIMITEDINSCOPE;ERGO,THETERM“RESTRICTEDDATAFLOW”THEMERGERTAKESTHEDATAFLOWGRAPHCORRESPONDINGTOEACHISPINSTRUCTIONAND,USINGAGENERALIZED‘TOMASULOALGORITHMTORESOLVEANYEXISTINGDATADEPENDENCIES,MERGESITINTOTHEENTIREDATAFLOWGRAPHFORTHEACTIVEWINDOWEACHNODEOFTHEDATAFLOWGRAPHISSHIPPEDTOONEOFTHENODETABLESWHEREITREMAINSUNTILITISREADYTOFIREWHENALLOPERANDSFORADATAFLOWNODEAREREADY,THEDATAFLOWNODEFIRESBVTRANSMITTINETHENODETOTHEAPPROPRIATEFUNCTIONALUNITTHEFUNCTIONALUNITANALU,MEMORY,ORI/ODEVICEEXECUTESTHENODEANDDISTRIBUTESTHERESULT,IFANY,TOTHOSELOCATIONSWHEREITISNEEDEDFORSUBSEQUENTPROCESSINGTHENODETABLES,THEMERGERFORRESOLVINGSUBSEQUENTDEPENDENCIESANDTHEFETCHCONTROLUNITFORBRINGINGNEWINSTRUCTIONSINTOTHEACTIVEWINDOWWHENALLTHEDATAFLOWNODESFORAPARTICULARINSTRUCTIONHAVEBEENEXECUTED,THEINSTRUCTIONISSAIDTOHAVEEXECUTEDANINSTRUCTIONISRETIREDFROMTHEACTIVEWINDOWWHENITHASEXECUTED104ANDALLTHEINSTRUCTIONSBEFOREITHAVERETIREDALLSIDEEFFECTSTOMEMORYARETAKENCAREOFWHENANINSTRUCTIONRETIRESFROMTHEACTIVEWINDOWTHISISESSENTIALFORTHECORRECTHANDLINGOFPRECISEINTERRUPTSLLTHEINSTRUCTIONFETCHINGANDDECODINGUNITSMAINTAINTHEDEGREEOFPARALLELISMINTHENODETABLESBYBRINGINGNEWINSTRUCTIONSINTOTHEACTIVEWINDOW,WHICHRESULTSINNEWDATAFLOWNODESBEINGMERGEDINTOTHEDATAFLOWNODETABLES32INSTRUCTIONFLOWFIGURE2SHOWSTHEGLOBALDATAPATHOFHPSINSTRUCTIONSENTERTHEDATAPATHASINPUTTOTHEMERGERTHISINPUTISINTHEFORMOFADATEFLOWGRAPH,ONEPERINSTRUCTIONTHEDATAFLOWGRAPHCANBETHERESULTOFDECODINGANINSTRUCTIONINACLASSICALSEQUENTIALINSTRUCTIONSTREAM,ORITCANBETHEOUTPUTOFANONCONVENTIONALCOMPILERINEITHERCASE,THEMERGERSEESASETOFDATAFLOWNODESANDDATADEPENDENCIES,ONEFOREACHOPERATIONTHATMUSTBEPERFORMEDINTHEEXECUTIONOFTHATINSTRUCTION,OPERATIONSARE,FOREXAMPLE,READS,WRITES,ADDRESSCOMPUTATIONSANDALUFUNCTIONS,INTHEEXAMPLEOFFIGURE3,THEDATAFLOWGRAPHCORRESPONDINGTOTHEVAXINSTRUCTIONADDWLOOO,A,BCONSISTSOFTHREENODESAMEMORYREAD,MEMORYWRITE,ANDANALUOPERATIONFIGURE3ALSOSHOWSTHESTRUCTUREOFTHETHREENODESANDTHEFIVEVALUEBUFFERENTRIESREQUIREDFORTHEINSTRUCTIONTHEMERGER,USINGTHEREGISTERALIASTABLETORESOLVEDATADEPENDENCIESNOTEXPLICITINTHEINDIVIDUALINSTRUCTION,FORMSTHESETOFDATAFLOWNODESWHICHARENECESSARYTOEXECUTETHEINSTRUCTION,NODESARETHENTRANSMITTEDTOTHEAPPROPRIATENODETABLESNODETABLES,ASWESHALLSEE,ARECONTENTADDRESSIBLEMEMORIES,ANDTHUSSHOULDBEKEPTSMALLTHESIZEOFEACHNODETABLEISAFUNCTIONOFTHESIZEOFTHEACTIVEWINDOWANDTHEDECODINGRATEOFTHEVONNEUMANNINSTRUCTIONSTREAMINOUREXPERIMENTSWITHTHEVAXARCHITECTURE,FOREXAMPLE,ANACTIVEWINDOWOF16INSTRUCTIONS,COUPLEDWITHADECODINGRATEOFEIGHTNODESPERCYCLE,REQUIREDATMOSTA35ENTRYNODETABLEFOREACHNODE,ASLOTISRESERVEDINTHEGLOBALMULTIPORTVALUEBUFFERFORSTORINGTHERESULTOFTHEOPERATIONOFTHATNODETHEINDEXOFEACHSLOTISDESIGNATEDASATAGFORTHECORRESPONDINGNODE,ANDISCARRIEDALONGWITHTHENODEUNTILITCOMPLETESITSEXECUTIONVALUEBUFFERSLOTSAREASSIGNEDINACIRCULARQUEUE,THESIZEOFTHEBUFFERBEINGLARGEENOUGHTOGUARRANTEERETIREMENTOFANINSTRUCTIONBEFOREITSVALUEBUFFERSLOTISAGAINNEEDEDINTHECASEOFOURSIMULATEDIMPLEMENTATIONOFTHEVAXARCHITECTURE,ANACTIVEWINDOWOF16INSTRUCTIONS,HAVINGAPPROXIMATELYFOURNODESPERINSTRUCTION,MEANSTHATAVALUEBUFFEROF136ENTRIESISMORETHANADEQUATEANODEREMAINSINITSNODETABLEUNTILALLOFITSOPERANDSAREAVAILABLE,ATWHICHPOINTITISREADYTOFIREIE,ITISEXECUTABLEANODEISFIREDBYTRANSMITTINGITSOPERATOR,TAG,ANDSETOFOPERANDSTOONEOFTHEFUNCTIONALUNITAASSOCIATEDWITHTHATNODETABLEWHENEXECUTIONCOMPLETES,THERESULTANDITSTAGAREDISTRIBUTEDTOEACHPOTIOFTHEVALUEBUFFERINTHECASEOFARESULTDESTINEDFORAGENERALPURPOSEREGISTER,THECORRESPONDINGTAGISALSOTRANSMITTEDTOTHEREGISTERALIASTABLETOUPDATEINFORMATIONSTOREDTHERETHECORRESPONDINGTAGISALSOTRANSMITTEDTOTHENODETABLESFORTHEPURPOSEOFSETTINGTHEREADYBITSINTHOSENODESAWAITINGTHISRESULTNODESTHENUMBEROFRESULTSTHATCANBEDISTRIBUTE
编号:201401051948106803    类型:共享资源    大小:570.39KB    格式:PDF    上传时间:2014-01-05
  
5
关 键 词:
工业、机械、能源、设计、建模、模具、工学
  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:20-HPS, A New Microarchitecture- Introduction and Rationale.pdf
链接地址:http://www.renrendoc.com/p-256803.html

当前资源信息

4.0
 
(2人评价)
浏览:37次
baixue100上传于2014-01-05

官方联系方式

客服手机:17625900360   
2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   

精品推荐

相关阅读

人人文库
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5