




已阅读5页,还剩23页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TamingGPUComputewithC+AMPC+AcceleratedMassiveParallelism,SteveTeixeiraDirectorofProgramManagementMicrosoftParallelComputingPlatform,Agenda,WhyGPUcompute?OverviewofC+AMPFullVisualStudioIntegrationandSupport,RapidlyChangingProcessorArchitectures,1billiontransistors45nmMulti-tasking,I/O,virtualizationSupportsgeneralcodeLowmemorybandwidthMediumlevelofparallelismDeepexecutionpipelinesHigherpowerconsumption,2billiontransistors40nmProgrammableandfixedfunctionGraphicsanddata-parallelcodeHighmemorybandwidthHighlevelofparallelismShallowexecutionpipelinesLowerpowerconsumption,Source:AMD,PerformanceGapWidensFurther,Source:NVIDIA,CurrentGPUPerformance,GPUvs.CPUPerformance,n-BodiesinC+AMP,Demo,DataParallelandGPUAcceleration,ExposeextremeperformanceofGPUsforgraphicsandcomputationProvidedeveloperstheabilitytogetsignificantperformancegainsDemandforasingleapplicationimagethatrunsonGPUhardwarefrommultiplevendors,Raytracing,Medicaltomography,LeveragingaC+ProgrammingModel,GoalsPerformance:UnleashthepowerofthehardwareProductivity:UseC+skillsforcommodityhardwarePortability:Writeonce,runonmultipleplatformsThemesProtectcodinginvestmentinthelongrunIncreaseexpressivenessashardwareevolvesLeverageexistingdeveloperskillsonlanguageandtools,C+AMP,WillbeintegratedintoVisualStudioandtheVC+CompilerBuildingontheDirectXruntimestackUbiquitousandreliableEnablesbroadrangeofdeveloperstoleveragemoderndataparallelarchitecturesHigher-levelabstractionsLow-levelhardwareaccessCodeauthoring,debugging,andprofilingsupportprovidedwithinVisualStudioWeintendtosupportalibraryecosystem,DataParallelProgrammingC=A*B,0,1,2,3,n,A,0,1,2,3,n,B,C,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)for(intx=0;xW;+x)for(inty=0;yN;+y)floatsum=0;for(inti=0;iM;i+)sum+=Ax*M+i*Bi*W+y;Cx*W+y=sum;,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)array_viewa(M,W,A),b(W,N,B);array_viewc(M,N,C);parallel_for_each(c.grid,=(indexidx)restrict(direct3d)floatsum=0;for(inti=0;ia.x;i+)sum+=a(idx.y,i)*b(i,idx.x);cidx=sum;);,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)parallel_for(0,W,grid,index,andarray_view,usingnamespaceConcurrency;voidMatrixMultiply(vector,gridDeterminestheshapeofdataandthescopeofcomputationindexAnN-dimensionalvectorusedtoindexapointinamultidimensionalarrayarray_viewMultidimensionalarrayofrankNwithelementtypeTContainerfordatausedonaccelerator,gride3(6,3,3);,indexi3(2,0,1);,RestrictionQualifiers,FunctionorlambdamodifierwitharestrictionqualifierRestrictionqualifiersinformcompilerof“intent”or“behavior”Allowsoptimizationsorspecialcode-genbehaviorDefinedqualifiersdirect3dandpotentiallyothersSyntaxexamples:,voidCompute(double,direct3dRestrictionQualifier,FunctionsandlambdasintendedtoexecuteonaDirect3DgraphicsdeviceCapableofexecutingonanyDX11deviceRestrictionsfollowlimitationsofDX11devicemodelShaderModel5No1-byteor2-bytedatatypesNofunctionpointersorvirtualmethodsCanonlycallotherdirect3dfunctionsManyotherrestrictionsOtherC+languagefeaturesworkasexpected,restrict(direct3d)CompilationChain,regularC+code,C+AMPcode,HLSLcodegen,fxcshadercompiler,VC+compiler,C+linker,Executable,C+sourcefile,Example:TargetRestrictions,/Overloadontargetfloatcos(float)restrict(direct3d,fpga)Baz*pBaz=newBaz(v);/errorreturn_TaylorSeries_cos(v);floatcos(floatv)restrict(cpu)return_x64_FastCos(v);/Target-polymorphiccallsitefloatfoo(floatv)returncos(v);,autorestrictionspecifier,Letsthecompilerdecideeffectiverestrictionsforinlinefunctions(notforforwarddeclarations)Directlyusingexistingtemplatetypesandalgorithms(andnewones)withoutannotationstd:complexclassstd:for_eachalgorithm,templateinlinevoidmy_generic_algorithm(Funcf)restrict(auto)f();,Inferringauto,/Compilewith/ZautonamespacestdtemplateFuncfor_each(IIfirst,IIlast,Funcf)/impliesrestrict(cpu,auto)for(;first!=last;+first)f(*first);returnf;,parallel_for_each,InitiatesparallelexecutiononanacceleratorSemantics:InvokethefunctorforeachpointinthegridArbitraryorderfortheactivitiesparallel_for_eachisas-ifsynchronousintermsofvisiblesideeffects,usingnamespaceConcurrency;voidMatrixMultiply(vector,DataParallelAccelerators,AbstractionforoneormoredataparallelacceleratorsOneormoreDirect3D11GPUsMultipleGPUsCPUvectorprocessorMany-coreCPUsHostandacceleratorareseparateinthemodelDatatransferto/fromacceleratorCouldbeoptimizedawayforintegratedmemoryarchitectureaccelerator_viewHandletoanaccelerator,CPUs,Systemmemory,GPU,PCIe,GPU,GPU,GPU,Host,Accelerator,Performance-orientedtopics,Usethreadgroups(tiles)andsharedmemoryforperformancetile_staticstoragemodifierDefinesharedmemoryforthreadsinthetiletile_barrierBarrierforallthreadsinthetileTiledformofparallel_for_eachtiled_gridtodefinedimensionstiled_indexSpecializedindexwithtilecoordinates,Matrixmultiplywithtiles,voidMatrixMultiplyTiled(vector,Exampleoftiledforall,IntegratedDeveloperTools,UseC+AMPfeaturesandclasseswithfullsupportfromtheVisualStudioIDEIntelliSensecolorizationDebugcomputeintensivecodesegmentsthatexecuteonGPUsGPUemulatorGPUhardwareConcurrencyVisualizerprovidesintegratedviewofactivityonCPUandGPU,DebuggingExperience,WellknownVisualStudiodebuggingfeaturesLaunch,Attach,Stepping,BreakpointsProcesses,DebugOutput,Modules,DisassemblyCallStack,Memory,Registers,Locals,Watch,QuickWatch,DataTipsNewfeatures(forbothCPUandGPU)ParallelStackswindow,ParallelWatchwindow,BarrierNewGPU-specificGPUThreadswindow,GPUDebugging,BringCPUdebuggingexperiencetoGPU,ConcurrencyVisualizerSupportforGPGPU,ProfilercombinesGPUsupportwiththeCPUth
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 旅游法规对旅游企业竞争力-洞察及研究
- 大数据信贷风险识别技术-洞察及研究
- 身体表达在当代戏剧中的应用-洞察及研究
- 面向农村妇女的病虫害防治知识培训项目-洞察及研究
- 云平台上的分布式网络监控系统架构-洞察及研究
- 手肌力量监测与反馈-洞察及研究
- 虚拟现实(VR)与增强现实(AR)中的版权问题探讨-洞察及研究
- 地质环境数据挖掘分析-洞察及研究
- 物流机器人智能路径规划-洞察及研究
- 群体决策偏差分析-洞察及研究
- 2025智慧医疗设备供应与区域市场拓展战略合作框架协议
- 外科学-颈部疾病课件
- 【优选】茶叶中的化学成分PPT文档
- LY/T 1955-2011林地保护利用规划林地落界技术规程
- GB/T 5272-2017梅花形弹性联轴器
- 一年级《劳动实践指导手册》《学习用品我整理》教案
- 高速铁路隧道衬砌拆换支架施工方案
- 班组‘五大员’管理办法
- 急性中毒急危重症护理学
- 龟虽寿-完整版课件
- 2018山东省东营市中考地理真题及答案
评论
0/150
提交评论