已阅读5页,还剩67页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
parallelcomputerarchitecture并行计算机体系结构lecture7theintroductionofmulticoreprocessor,april13,2009隋秀峰(sxf),2019/12/19,theintroductionofmulticoreprocessor,2,主要内容,多核处理器发展的动力多核处理器需要解决的关键问题多核处理器的发展现状多核处理器中的新兴技术,2019/12/19,theintroductionofmulticoreprocessor,3,todaysprocessor,voltagelevelaflashlight(1volt)currentlevelanoven(250amps)powerlevelalightbulb(100watts)areaapostagestamp(1squareinch)performancegflops,2019/12/19,theintroductionofmulticoreprocessor,4,whatisthefutureneed?,performanceneedisneverendingcomplainsfromend-usersnowadaystomorrowskillerapplicationnextstep:howcanwegetto1tflops?,2019/12/19,theintroductionofmulticoreprocessor,5,tomorrowskillerapplication(rms),2019/12/19,theintroductionofmulticoreprocessor,6,多核发展的动力线延迟,considerthe1tflop/ssequentialmachine:datamusttravelsomedistance,r,togetfrommemorytocpu.toget1dataelementpercycle,thismeans1012timespersecondatthespeedoflight,c=3x108m/s.thusrexample:intelhyper-threadingchipmultiprocessing(cmp)-multi-coreprocessor,2019/12/19,theintroductionofmulticoreprocessor,18,micro-architecturetrends,2019/12/19,theintroductionofmulticoreprocessor,19,adaptedfromjohandegelas,questformoreprocessingpower,anandtech,feb.8,2005.,understandingsmtandcmp,2019/12/19,theintroductionofmulticoreprocessor,20,makeclearconcurrencyvs.parallelism,concurrency:twoormorethreadsareinprogressatthesametime:parallelism:twoormorethreadsareexecutingatthesametimemultiplecoresneeded,simultaneousmultithreading(smt),minimalresourcereplicationprovidesinstructionstooverlapmemorylatencyseparatethreadsexploitidleresources,2019/12/19,theintroductionofmulticoreprocessor,21,context1,context2,functionalunits,l1cache,l2cache,mainmemory,smt:simultaneousmultithreading,2019/12/19,theintroductionofmulticoreprocessor,22,superscalar,multithreaded,smt,issueslots,gototheeraofmulticore,concurrencyintheformofhardwaremultithreadinghasbeenaroundforawhile.usefulforhidingmemorylatencies.onlyabout30%performanceimprovementforspecialapplication.howcanwecontinuetoutilizetheever-highertransistordensitiespredictedbymooreslaw?currentview:cancontinueperformanceimprovementsbypackingmultipleprocessingcoresontoasinglechip,i.e.,multicore.multi-core=chipmultiprocessing=tera-scalecomputing,2019/12/19,theintroductionofmulticoreprocessor,23,chipmultiprocessing,muchlargerdegreeofresourcereplicationtwocompleteprocessingcoresoneachchipouterlevelsofcacheandexternalinterfacearesharedgreatlyreducedresourcecontentioncomparedtosmt,2019/12/19,theintroductionofmulticoreprocessor,24,l2cache,mainmemory,context1,context2,functionalunits,functionalunits,l1cache,l1cache,whatwebenefitfrommulti-core?,2019/12/19,theintroductionofmulticoreprocessor,25,newtargetformicro-architecturehighperformance/power,multi-coreprocessors,improvedcost/performanceratiominimalincreasesinarchitecturalcomplexityprovidesignificantincreasesinperformanceminimizesperformancestalls,withadramaticincreaseinoveralleffectivesystemperformancegreatereep(energyefficientperformance)andscalabilitycoresenablethread-levelparallelismmulti-corearchitectureenablesdivide-and-conquerstrategytoperformmoreworkinagivenclockcycle.,2019/12/19,theintroductionofmulticoreprocessor,26,multi-coreprocessors(cont.),whatsspecialformany-cores?explicitmulti-threadsrequiredtospeedupsingleapplicationperformancecoretocorecommunicationlatencyreducebandwidthincreasecachesizeper-corewillalsoreduce,2019/12/19,theintroductionofmulticoreprocessor,27,multi-coreprocessors(cont.),2019/12/19,theintroductionofmulticoreprocessor,28,intelclovertown上的延迟测试,2019/12/19,theintroductionofmulticoreprocessor,29,whatistheproblem?whereistheinnovation?,howaboutthecore?equaltotheoriginaloneornot?simplecoremaybeagoodchoosehowaboutthepowercontrolonchip?finegranularitypowercontrolhowabouttheinterconnectionbetweencoresandotherunits?xcoresmeansxtimesofmemoryreferencesrequireshigherthroughputsbetweencoresandcaches,withincachehierarchy,andbetweenlast-levelcacheandmemoryrequireslesslatenciesinthoseplacesfourbasickindsofinterconnectsbuses,crossbars,tiny-networks,andringseachhasitsowntradeoffsinthroughput,latency,resourceoccupation,andeaseofimplementationmaybesuitableatdifferentlevels,2019/12/19,theintroductionofmulticoreprocessor,30,whatistheproblem?whereistheinnovation?,howaboutthecache?(nuca:non-uniformcachearch.),2019/12/19,theintroductionofmulticoreprocessor,31,anucasubstrateforflexiblecmpcachesharing,proc.the19thannualinternationalconferenceonsupercomputing,june2005,pp.31-40,多核处理器的问题,多核处理器实际上是一个片上并行系统分层性分布性加速单个应用需要显式多线程多内核处理器系统对软件技术的核心问题是并行程序的开发问题,包括并行程序的编程与调试多核处理器的软件挑战,2019/12/19,theintroductionofmulticoreprocessor,32,whatistheproblem?whereistheinnovation?,wherearethethreads?maybethemostlargestchallengemakeprogrammerwritethreadingprogramstheworldmaybeconfused.automaticparallelismmissionimpossible,butcanimproveinsomesense.makemodulewiththreadingforusehowtocontrolhighlevelbehaviorofourprograms?trytoeasetheburdenofprogrammerlooksgood,buthowcan?,2019/12/19,theintroductionofmulticoreprocessor,33,如何应对多核上的软件挑战,让程序员进行并行编程继承和优化openmp和mpi等新的编程语言x10等事务内存(transactionalmemory)自动并行化难度大,经过20年的发展通用性仍不好推测多线程(speculativemulti-threading)实现并行库intelmkl、scalapack如何控制程序的高级行为?其他有价值的工作函数语言、数据流、领域语言,2019/12/19,theintroductionofmulticoreprocessor,34,allofabovearestillopenissues,2019/12/19,theintroductionofmulticoreprocessor,35,break!,2019/12/19,theintroductionofmulticoreprocessor,36,multicoreproductsnowadays,lotsofdual-coreproductsnow:intel:pentiumdandpentiumextremeedition,coreduo(2),woodcrest,montecitoibmpowerpcamdopteron/athlon64sunultrasparciv.systemswithmorethantwocoresareherewithmorecoming:ibmcell(asymmetric).dual-corepowerpcpluseight“synergisticprocessingelements”.sunniagaraeightcores,fourhyper-threadedthreadspercore.generalpurposecomputationongraphicsprocessors(gpgpu)intelexpectstoproduce16-oreven32-corechipswithinadecade.,2019/12/19,theintroductionofmulticoreprocessor,37,architectureofdual-corechips,2019/12/19,theintroductionofmulticoreprocessor,38,intelcoreduotwophysicalcoresinapackageeachwithitsownexecutionresourceseachwithitsownl1cache32kinstructionand32kdatabothcoressharethel2cache2mb8-waysetassociative;64-bytelinesize10clockcycleslatency;writebackupdatepolicy,amdopteronseparate1mbytel2cachesimprovementformemoryaffinityandthreadaffinity,intelmulti-coreplan,2019/12/19,theintroductionofmulticoreprocessor,39,cellfromibmandsony,2019/12/19,theintroductionofmulticoreprocessor,40,cellfromibmandsony,2019/12/19,theintroductionofmulticoreprocessor,41,niagarafromsun,2019/12/19,theintroductionofmulticoreprocessor,42,thetechnologiesunderway,rethinktheconcurrencyandparallelismformulti-corenewprogrammingmodelandprogramminglanguageshardwaresupport(andsoftware)formultithreadingcontrol-drivenspeculationspeculativemultithreadingdata-drivenspeculationprogramdemultiplexingarchitecturalthreadenhancementsupportforhardwarethreadslightweightsynchronization(monitor/mwait),2019/12/19,theintroductionofmulticoreprocessor,43,rethinkthecandpformulti-core,whatwehaveseenformulti-coremoreparallelismneedtobeexploitedscalingmaybemoreimportantmoreheterogeneityneedtobeexploitedtaskmappingmayberevisitedlowlatencyandhighbandwidthbetweencoresonchipfinegranularityparallelismmayberethinked,2019/12/19,theintroductionofmulticoreprocessor,44,rethinkthecandpformulti-core,makefulluseofmulti-coreresourcesmoreparallelismhidememoryaccessstallwell-knownmemorywall,2019/12/19,theintroductionofmulticoreprocessor,45,索引计算在clovertown上的测试,索引计算是计算密集与io密集并重的应用网页数据32gb,生成的索引大小为4.5gb,2019/12/19,theintroductionofmulticoreprocessor,46,索引计算在clovertown上的测试(续),索引各个阶段,有的以计算为主,有的以io为主考虑将索引过程划分为多个流水段,实现流水索引算法,充分利用系统计算资源流水段的划分原则资源独立:各个流水段使用独立的资源时间接近:各个流水段的用时比较接近细粒度流水算法:利用流水段的重叠执行,实现并行化,2019/12/19,theintroductionofmulticoreprocessor,47,intelclovertown测试环境,2019/12/19,theintroductionofmulticoreprocessor,48,索引计算在clovertown上的测试(续),单核上的性能提高流水线隐藏部分读文档i/o时间多核下的性能提高计算并行化,2019/12/19,theintroductionofmulticoreprocessor,49,性能提高8.2%,性能提高53.4%,测试时,使用1.5g内存,且待测数据和索引位于同一块磁盘,rethinkthecandpformulti-core,processoraffinitybenefitfortaskmappingparallelfftcomputationinnpbget14%performanceincreaseformpich,2019/12/19,theintroductionofmulticoreprocessor,50,rethinkthecandpformulti-core,exploitdynamicandadaptiveout-of-orderexecutionpatternsonmulti-coreandheterogeneoussystem,2019/12/19,theintroductionofmulticoreprocessor,51,thetechnologiesunderway,rethinktheconcurrencyandparallelismformulti-corenewprogrammingmodelandprogramminglanguageshardwaresupport(andsoftware)formultithreadingcontrol-drivenspeculationspeculativemultithreadingdata-drivenspeculationprogramdemultiplexingarchitecturalthreadenhancementsupportforhardwarethreadslightweightsynchronization(monitor/mwait),2019/12/19,theintroductionofmulticoreprocessor,52,programmingmodelandpls,bridgetheapplicationsoftwaretosystemsoftwareandhardwareforbetterexpressingtheparallelismforsuchheterogeneoussystemstransactionalmemoryibmx10sunfortress其它有意义的探索函数语言数据流领域语言,2019/12/19,theintroductionofmulticoreprocessor,53,transactionalmemoryawaytoeasethreadprogramming,threadprogrammingisaboringthing,2019/12/19,theintroductionofmulticoreprocessor,54,transactionalmemoryawaytoeasethreadprogramming,threadprogrammingisaboringthing,2019/12/19,theintroductionofmulticoreprocessor,55,transactionalmemoryawaytoeasethreadprogramming,atransactionisasequenceofmemoryloadsandstoresthateithercommitsorabortsifatransactioncommits,alltheloadsandstoresappeartohaveexecutedatomicallyifatransactionaborts,noneofitsstorestakeeffecttransactionoperationsarentvisibleuntiltheycommitorabortsimplifiedversionoftraditionalaciddatabasetransactions(nodurability,forexample),2019/12/19,theintroductionofmulticoreprocessor,56,transactionalmemoryexample,2019/12/19,theintroductionofmulticoreprocessor,57,problemsintransactionalmemory,2019/12/19,theintroductionofmulticoreprocessor,58,solutionsfortransactionalmemory,2019/12/19,theintroductionofmulticoreprocessor,59,x10,对多内核系统与集群系统提供统一的支持高生产率语言设计注重可移植性和安全性性能扩展了java虚拟机提供手工性能调整的手段在java语言基础上开发继承了java语言的核心价值-高生产率,可移植性,成熟、安全面向主流java/c/c+程序员,2019/12/19,theintroductionofmulticoreprocessor,60,x10vision:portableproductiveparallelprogramming,2019/12/19,theintroductionofmulticoreprocessor,61,x10places,physicalpes,x10languagedefinesmappingfromx10objectsasynca1;asynca2;tryfinishactivitya0(part2);asynca3;asynca4;catch()activitya0(part3);,activitya3,thetechnologiesunderway,rethinktheconcurrencyandparallelismformulti-corenewprogrammingmodelandprogramminglanguageshardwaresupport(andsoftware)formultithreadingcontrol-drivenspeculationspeculativemultithreadingdata-drivenspeculationprogramdemultiplexingarchitecturalthreadenhancementsupportforhardwarethreadslightweightsynchronization(monitor/mwait),2019/12/19,theintroductionofmulticoreprocessor,64,speculativemultithreading,2019/12/19,theintroductionofmulticoreprocessor,65,problemsinspeculativemultithreading,locatethesectionoftheprogramthatcanefficientlybeexecutedinparallelpre-computationslicehaslowcomputationaloverheadworkloadbalancelowoverheadforpre-computationslicebufferingandmulti-versioninginthememoryhierarchybufferingwillkeepthespeculativestatusuntilthethreadisverifiedandcanbecommittedmulti-versioningalloweachvariabletohaveadifferentvalueforeachofthethreadsrunninginparallelcheckdatadependencemis-speculationsquickly,2019/12/19,theintroductionofmulticoreprocessor,66,summaryforcurrenttrends,tomanycorehardwaresupportformultithreadingtransactionmemoryhardtowritefastthreadedprogramslockscreatefundamentalproblemstransactionalmemoryshieldsprogrammershardwarespeedsuptransactionalmemoryenergy-efficientdesign,2019/12/19,theintroductionofmulticoreprocessor,67,addmoreaxestothemicro-archit
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年公用设备工程师之专业知识(动力专业)真题附答案详解(巩固)
- 游船船舱广告合同
- 关于出租车广告合同
- 楼宇电视广告合同
- 开发商墙面广告合同
- 小区楼宇地铺广告合同
- 微信公众号推文广告合同
- 公交车体广告合同
- 电视栏目广告合同
- 胡歌阿玛尼广告合同
- 2026年辽宁锦州海通实业有限公司度校园招聘28人笔试模拟试题及答案详解
- 髋关节撞击综合征标准化诊疗专家共识(2026 版)
- 2026北京语言大学事业编制人员招聘11人备考题库(第三批)附答案详解ab卷
- 2026年中好建造科技有限公司第二次社会招聘笔试参考试题及答案解析
- 儋州市体育北路 环评报告
- 防治职场骚扰培训课件总结
- 农业机械综合保险(适用于江苏省)
- 建筑材料说课
- 护理管理组织体系
- 《心理咨询助人伦理》课件
- 萤火虫专场活动方案
评论
0/150
提交评论