欢迎来到人人文库网! | 帮助中心 人人文库renrendoc.com美如初恋!
人人文库网
首页 人人文库网 > 资源分类 > PDF文档下载

20-HPS, A New Microarchitecture- Introduction and Rationale.pdf

  • 资源大小:570.39KB        全文页数:6页
  • 资源格式: PDF        下载权限:游客/注册会员/VIP会员    下载费用:5
游客快捷下载 游客一键下载
会员登录下载
下载资源需要5

邮箱/手机号:
您支付成功后,系统会自动为您创建此邮箱/手机号的账号,密码跟您输入的邮箱/手机号一致,以方便您下次登录下载和查看订单。注:支付完成后需要自己下载文件,并不会自动发送文件哦!

支付方式: 微信支付    支付宝   
验证码:   换一换

友情提示
2、本站资源不支持迅雷下载,请使用浏览器直接下载(不支持QQ浏览器)
3、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰   

20-HPS, A New Microarchitecture- Introduction and Rationale.pdf

HPS,ANEWMICROARCHITECTURERATIONALEANDINTRODUCTIONYALENPATT,WENMEIHWU,ANDMICHAELSHEBANOWCOMPUTERSCIENCEDIVISIONUNIVERSITYOFCALIFORNIA,BERKELEYBERKELEY,CA94720ABSTRACTHPSHIGHPERFORMANCESUBSTRATEISANEWMICROARCHITECTURETARGETEDFORIMPLEMENTINGVERYHIGHPERFORMANCECOMPUTINGENGINESOURMODELOFEXECUTIONISARESTRICTIONONFINEGRANULARITYDATAFLOWTHISPAPERINTRODUCESTHEMODEL,PROVIDESTHERATIONALEFORITSSELECTION,ANDDESCRIBESTHEDATAPATHANDFLOWOFINSTRUCTIONSTHROUGHTHEMICROENGINE1INTRODUCTIONACOMPUTERSYSTEMISAMULTILEVELSTRUCTURE,ALGORITHMSATTHETOP,GATESANDWIRESATTHEBOTTOMTOACHIEVEHIGHPERFORMANCE,ONEMUSTOPTIMIZEATALLLEVELSOFTHISSTRUCTUREATMOSTLEVELS,THECONVENTIONALWISDOMSUGGESTSEXPLOITINGCONCURRENCYSEVERALPROPOSALSHAVEBEENPUTFORWARDASTOHOWTODOTHISWEALSOARGUEFOREXPLOITINGCONCURRENCY,FOCUSINGINPARTICULARONTHEMICROARCHITECTURELEVEL11RESTRICTEDDATAFLOWWEARECALLINGOURENGINEHPS,WHICHSTANDSFORHIGHPERFORMANCESUBSTRATE,TOREFLECTTHENOTIONTHATWHATWEAREPROPOSINGSHOULDBEUSEFULFORIMPLEMENTINGVERYDISSIMILARISPARCHITECTURESOURMODELOFTHEMICROENGINEIE,ARESTRICTIONONCLASSICALFINEGRANULARITYDATAFLOWISNOTUNLIKETHATOFDENNIS3,ARVINDZL,ANDOTHERS,BUTWITHSOMEVERYIMPORTANTDIFFERENCESTHESEDIFFERENCESWILLBEDISCUSSEDNDETAILINSECTION3FORTHEMOMENT,ITISIMPORTANTTOUNDERSTANDTHATUNLIKECLASSICALDATAFIOWMACHINES,ONLYASMALLSUBSETOFTHEENTIREPROGRAMISINTHEHPSMICROENGINEATANYONETIMEWEDEFINETHE“ACTIVEWINDOW”ASTHESETOFISPINSTRUCTIONSWHOSECORRESPONDINGDATAFLOWNODESARECURRENTLYPARTOFTHEDATAFLOWGRAPHWHICHISRESIDENTINTHEMICROENGINEASTHEACTIVEWINDOWMOVESTHROUGHPERMISSIONTOCOPYWITHOUTFEEALLORPARTOFTHISMATERIALISGRANTEDPROVIDEDTHATTHECOPIESARCNOTMADEORDISTRIBUTEDFORDIRECTCOMMERCIALADVANTAGE,THEACMCOPYRIGHTNOTICEANDTHETITLEOFTHEPUBLICATIONANDITSDATEAPPEAR,ANDNOTICEISGIVENTHATCOPYINGISBYPERMISSIONOFTHEASSOCIATIONFORCOMPUTINGMACHINERYTOCOPYOTHERWISE,ORTOREPUBLISH,REQUIRESAFEEAND/ORSPECIFICPERMISSIONTHEDYNAMICINSTRUCTIONSTREAM,HPSEXECUTESTHEENTIREPROGRAM12POTENTIALLIMITATIONSOFOTHERAPPROACHESWEBELIEVETHATANESSENTIALINGREDIENTOFHIGHPERFORMANCECOMPUTINGISTHEEFFECTIVEUTILIZATIONOFALOTOFCONCURRENCYTHUSWESEEAPOTENTIALLIMITATIONINMICROENGINESTHATARELIMITEDTOONEOPERATIONPERCYCLESIMILARLY,WESEEAPOTENTIALLIMITATIONINAMICROENGINETHATUNDERUTILIZESITSBANDWIDTHTOEITHERINSTRUCTIONMEMORYORDATAMEMORYFINALLY,ALTHOUGHWEAPPRECIATETHEADVANTAGESOFSTATICSCHEDULING,WESEEAPOTENTIALLIMITATIONINAMICROENGINETHATPURPORTSTOEXECUTEASUBSTANTIALNUMBEROFOPERATIONSEACHCYCLE,BUTMUSTRELYONANONRUNTIMESCHEDULERFORDETERMININGWHATTODONEXT13OUTLINEOFTHISPAPERTHISPAPERISORGANIZEDINFOURSECTIONSSECTION2DELINEATESTHEFUNDAMENTALREASONSWHICHLEDUSTOTHISNEWMICROARCHITECTURESECTION3DESCRIBESTHEBASICOPERATIONOFHPSSECTION4OFFERSSOMECONCLUDINGREMARKS,ANDDESCRIBESWHEREOURRESEARCHINHPSISHEADING2RATIONALE21THETHREETIERMODELWEBELIEVETHATIRREGULARPARALLELISMINAPROGRAMEXISTSBOTHLOCALLYANDGLOBALLYOURMECHANISMEXPLOITSTHELOCALPARALLELISM,BUTDISREGARDSGLOBALPARALLELISMOURBELIEFISTHATTHEEXECUTIONOFANALGORITHMSHOULDBEHANDLEDINTHREETIERSATTHETOP,WHEREGLOBALPARALLELISMCANBEBESTIDENTIFIED,THEEXECUTIONMODELSHOULDUTILIZELARGEGRANULARITYDATAFLOWMUCHLIKETHEPROPOSALOFTHECEDARPROJECT41INTHEMIDDLE,WHEREFORTYYEARSOFCOLLECTEDEXPERIENCEINCOMPUTERPROCESSINGCANBEEXPLOITEDPROBABLYWITHOUTHARM,CLASSICALSEQUENTIALCONTROLFLOWSHOULDBETHEMODELATTHEBOTTOM,WHEREWEWANTTOEXPLOITLOCALPARALLELISM,FINEGRANULARITYDATAFLOWISRECOMMENDEDOURTHREETIERMODELREFLECTSOURCONCEPTIONTHATTHETOPLEVELSHOULDBEALGORITHMORIENTED,THEMIDDLELEVELSEQUENTIALCONTROLFLOWISPARCHITECTUREORIENTED,ANDTHEBOTTOMLEVELMICROENGINEORIENTEDQ1985ACMO897911725/85/0012/0103007522LOCALPARALLELISMWEFEELOBLIGEDTOREEMPHASIZETHEIMPORTANCEOFLOCALPARALLELISMTOOURCHOICEA4EXECUTIONMODELINDEED,WECHOSETHISRESTRICTEDFORMOFDATAFLOWSPECIFICALLYBECAUSEOURSTUDIESHAVESHOWNTHATTHEPARALLELISMAVAILABLEFROMTHEMIDDLECONTROLFLOWTIERIE,THESEQUENTIALCONTROLFLOWARCHITECTUREISHIGHLYLOCALIZEDWEARGUETHAT,BYRESTRICTINGTHEACTIVEINSTRUCTIONWINDOW,WECANEXPLOI,TALMOSTALLOFTHEINHERENTPARALLELISMINTHEPROGRAMWHILEINCURRINGVERYLITTLEOFTHESYNCHRONIZATIONCOSTSWHICHWOULDBENEEDEDTOKEEPTHEENTIREPROGRAMAROUNDASATOTALDATAFLOWGRAPH23,STALLS,BANDWIDTH,ANDCONCURRENCYWEBELIEVETHATAHIGHPERFORMANCECOMPUTINGENGINESHOULDEXHIBITANUMBEROFCHARACTERISTICSFIRST,ALLITACOMPONENTSMUSTBEKEPTBUSYTHEREMUSTBEFEWSTALLS,BOTHINTHEFLOWOFINFORMATIONIE,THEPATHTOMEMORY,LOADINGOFREGISTERS,ETCANDINTHEPROCESSINGOFINFORMATIONIE,THEFUNCTIONALUNITSSECOND,THEREMUSTBEAHIGHDEGREEOFCONCURRERMYAVAILABLE,SUCHASMULTIPLEPATHSTOMEMORY,MULTIPLEPROCESSINGELEMENTS,ANDSOMEFORMOFPIPELINING,FOREXAMPLEINOURVIEW,THERESTRICTEDDATAFLOWMODEL,WITHITSOUTOFORDEREXECUTIONCAPABILITY,BESTENABLESTHEABOVETWOREQUIREMENTS,ASFOLLOWSTHECENTEROFOURMODELISTHESETOFNODETABLES,WHEREOPERATIONSAWAITTHEIROPERANDSINSTRUCTIONMEMORYFEEDSTHEMICROENGINEATACONSTANTRATEWITHFEWSTALLSDATAMEMORYANDI/OSUPPLYANDEXTRACTDATAATCONSTANTRATESWITHFEWSTALLSFUNCTIONALUNITSAREKEPTBUSYBYNODESTHATCANFIRESOMEWHEREINTHISSYSTEM,THEREHASTOBE“SLACK”THESLACKISINTHENODESWAITINGINTHENODETABLESSINCENODESCANEXECUTEOUTOFORDER,THEREISNOBLOCKINGDUETOUNAVAILABLEDATADECODEDINSTRUCTIONSADDNODESTOTHENODETABLESANDEXECUTEDNODESREMOVETHEMTHENODETABLESTENDTOGROWINTHEPRESENCEOFDATADEPENDENCIES,ANDSHRINKASTHESEDEPENDENCIESBECOMEFEWERMEANWHILE,OURPRELIMINARYMEASUREMENTSSUPPORT,THEMULTIPLECOMPONENTSOFTHEMICROENGINEAREKEPTBUSY3THEHPSMODELOFEXECUTION31OVERVIEWANABSTRACTVIEWOFHPSISSHOWNINFIGURE1INSTRUCTIONSAREFETCHEDANDDECODEDFROMADYNAMICINSTRUCTIONSTREAM,SHOWNATTHETOPOFTHEFIGURETHEFIGUREIMPLIESTHATTHEINSTRUCTIONSTREAMISTAKENFROMASEQUENTIALCONTROLFLOWISPARCHITECTUREWENEEDTOEMPHASIZETHATTHISISNOTANECESSARYPARTOFTHEHPSSPECIFICATIONINDEED,WEAREINVESTIGATINGHAVINGHPSDIRECTLYPROCESSMULTINODEWORDSIE,THENODESOFADIRECTEDGRAPHWHICHWOULDBEPRODUCEDASTHETARGETCODEOFAFOREXAMPLECCOMPILERWHATISNECESSARYISTHAT,FOREACHINSTRUCTION,THEOUTPUTOFTHEDECODERWHICHISPRESENTEDTOTHEMERGERFORHANDLINGBYHPSISADATAFLOWGRAPHAVERYIMPORTANTPARTOFTHESPECIFICATIONOFHPSISTHENOTIONOFTHEACTIVEINSTRUCTIONWINDOWUNLIKECLASSICALDATAFLOWMACHINES,ITISNOTTHECASETHATTHEDATAFLOWGRAPHFORTHEENTIREPROGRAMISINTHEMACHINEATONETIMEWEDEFINETHEACTIVEWINDOWASTHESETOFISPINSTRUCTIONSWHOSECORRESPONDINGDATAFLOWNODESARECURRENTLYBEINGWORKEDONINTHEDATAFLOWMICROENGINEASTHEINSTRUCTIONWINDOWMOVESTHROUGHTHEDYNAMICINSTRUCTIONSTREAM,HPSEXECUTESTHEENTIREINSTRUCTIONSTREAMPARALLELISMWHICHEXISTSWITHINTHEWINDOWISFULLYEXPLOITEDBYTHEMICROENGINETHISPARALLELISMISLIMITEDINSCOPE;ERGO,THETERM“RESTRICTEDDATAFLOW”THEMERGERTAKESTHEDATAFLOWGRAPHCORRESPONDINGTOEACHISPINSTRUCTIONAND,USINGAGENERALIZED‘TOMASULOALGORITHMTORESOLVEANYEXISTINGDATADEPENDENCIES,MERGESITINTOTHEENTIREDATAFLOWGRAPHFORTHEACTIVEWINDOWEACHNODEOFTHEDATAFLOWGRAPHISSHIPPEDTOONEOFTHENODETABLESWHEREITREMAINSUNTILITISREADYTOFIREWHENALLOPERANDSFORADATAFLOWNODEAREREADY,THEDATAFLOWNODEFIRESBVTRANSMITTINETHENODETOTHEAPPROPRIATEFUNCTIONALUNITTHEFUNCTIONALUNITANALU,MEMORY,ORI/ODEVICEEXECUTESTHENODEANDDISTRIBUTESTHERESULT,IFANY,TOTHOSELOCATIONSWHEREITISNEEDEDFORSUBSEQUENTPROCESSINGTHENODETABLES,THEMERGERFORRESOLVINGSUBSEQUENTDEPENDENCIESANDTHEFETCHCONTROLUNITFORBRINGINGNEWINSTRUCTIONSINTOTHEACTIVEWINDOWWHENALLTHEDATAFLOWNODESFORAPARTICULARINSTRUCTIONHAVEBEENEXECUTED,THEINSTRUCTIONISSAIDTOHAVEEXECUTEDANINSTRUCTIONISRETIREDFROMTHEACTIVEWINDOWWHENITHASEXECUTED104ANDALLTHEINSTRUCTIONSBEFOREITHAVERETIREDALLSIDEEFFECTSTOMEMORYARETAKENCAREOFWHENANINSTRUCTIONRETIRESFROMTHEACTIVEWINDOWTHISISESSENTIALFORTHECORRECTHANDLINGOFPRECISEINTERRUPTSLLTHEINSTRUCTIONFETCHINGANDDECODINGUNITSMAINTAINTHEDEGREEOFPARALLELISMINTHENODETABLESBYBRINGINGNEWINSTRUCTIONSINTOTHEACTIVEWINDOW,WHICHRESULTSINNEWDATAFLOWNODESBEINGMERGEDINTOTHEDATAFLOWNODETABLES32INSTRUCTIONFLOWFIGURE2SHOWSTHEGLOBALDATAPATHOFHPSINSTRUCTIONSENTERTHEDATAPATHASINPUTTOTHEMERGERTHISINPUTISINTHEFORMOFADATEFLOWGRAPH,ONEPERINSTRUCTIONTHEDATAFLOWGRAPHCANBETHERESULTOFDECODINGANINSTRUCTIONINACLASSICALSEQUENTIALINSTRUCTIONSTREAM,ORITCANBETHEOUTPUTOFANONCONVENTIONALCOMPILERINEITHERCASE,THEMERGERSEESASETOFDATAFLOWNODESANDDATADEPENDENCIES,ONEFOREACHOPERATIONTHATMUSTBEPERFORMEDINTHEEXECUTIONOFTHATINSTRUCTION,OPERATIONSARE,FOREXAMPLE,READS,WRITES,ADDRESSCOMPUTATIONSANDALUFUNCTIONS,INTHEEXAMPLEOFFIGURE3,THEDATAFLOWGRAPHCORRESPONDINGTOTHEVAXINSTRUCTIONADDWLOOO,A,BCONSISTSOFTHREENODESAMEMORYREAD,MEMORYWRITE,ANDANALUOPERATIONFIGURE3ALSOSHOWSTHESTRUCTUREOFTHETHREENODESANDTHEFIVEVALUEBUFFERENTRIESREQUIREDFORTHEINSTRUCTIONTHEMERGER,USINGTHEREGISTERALIASTABLETORESOLVEDATADEPENDENCIESNOTEXPLICITINTHEINDIVIDUALINSTRUCTION,FORMSTHESETOFDATAFLOWNODESWHICHARENECESSARYTOEXECUTETHEINSTRUCTION,NODESARETHENTRANSMITTEDTOTHEAPPROPRIATENODETABLESNODETABLES,ASWESHALLSEE,ARECONTENTADDRESSIBLEMEMORIES,ANDTHUSSHOULDBEKEPTSMALLTHESIZEOFEACHNODETABLEISAFUNCTIONOFTHESIZEOFTHEACTIVEWINDOWANDTHEDECODINGRATEOFTHEVONNEUMANNINSTRUCTIONSTREAMINOUREXPERIMENTSWITHTHEVAXARCHITECTURE,FOREXAMPLE,ANACTIVEWINDOWOF16INSTRUCTIONS,COUPLEDWITHADECODINGRATEOFEIGHTNODESPERCYCLE,REQUIREDATMOSTA35ENTRYNODETABLEFOREACHNODE,ASLOTISRESERVEDINTHEGLOBALMULTIPORTVALUEBUFFERFORSTORINGTHERESULTOFTHEOPERATIONOFTHATNODETHEINDEXOFEACHSLOTISDESIGNATEDASATAGFORTHECORRESPONDINGNODE,ANDISCARRIEDALONGWITHTHENODEUNTILITCOMPLETESITSEXECUTIONVALUEBUFFERSLOTSAREASSIGNEDINACIRCULARQUEUE,THESIZEOFTHEBUFFERBEINGLARGEENOUGHTOGUARRANTEERETIREMENTOFANINSTRUCTIONBEFOREITSVALUEBUFFERSLOTISAGAINNEEDEDINTHECASEOFOURSIMULATEDIMPLEMENTATIONOFTHEVAXARCHITECTURE,ANACTIVEWINDOWOF16INSTRUCTIONS,HAVINGAPPROXIMATELYFOURNODESPERINSTRUCTION,MEANSTHATAVALUEBUFFEROF136ENTRIESISMORETHANADEQUATEANODEREMAINSINITSNODETABLEUNTILALLOFITSOPERANDSAREAVAILABLE,ATWHICHPOINTITISREADYTOFIREIE,ITISEXECUTABLEANODEISFIREDBYTRANSMITTINGITSOPERATOR,TAG,ANDSETOFOPERANDSTOONEOFTHEFUNCTIONALUNITAASSOCIATEDWITHTHATNODETABLEWHENEXECUTIONCOMPLETES,THERESULTANDITSTAGAREDISTRIBUTEDTOEACHPOTIOFTHEVALUEBUFFERINTHECASEOFARESULTDESTINEDFORAGENERALPURPOSEREGISTER,THECORRESPONDINGTAGISALSOTRANSMITTEDTOTHEREGISTERALIASTABLETOUPDATEINFORMATIONSTOREDTHERETHECORRESPONDINGTAGISALSOTRANSMITTEDTOTHENODETABLESFORTHEPURPOSEOFSETTINGTHEREADYBITSINTHOSENODESAWAITINGTHISRESULTNODESTHENUMBEROFRESULTSTHATCANBEDISTRIBUTE

注意事项

本文(20-HPS, A New Microarchitecture- Introduction and Rationale.pdf)为本站会员(baixue100)主动上传,人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知人人文库网(发送邮件至[email protected]或直接QQ联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。

关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服客服 - 联系我们

网站客服QQ:2846424093    人人文库上传用户QQ群:460291265   

[email protected] 2016-2018  renrendoc.com 网站版权所有   南天在线技术支持

经营许可证编号:苏ICP备12009002号-5