已阅读5页,还剩23页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TamingGPUComputewithC+AMPC+AcceleratedMassiveParallelism,SteveTeixeiraDirectorofProgramManagementMicrosoftParallelComputingPlatform,Agenda,WhyGPUcompute?OverviewofC+AMPFullVisualStudioIntegrationandSupport,RapidlyChangingProcessorArchitectures,1billiontransistors45nmMulti-tasking,I/O,virtualizationSupportsgeneralcodeLowmemorybandwidthMediumlevelofparallelismDeepexecutionpipelinesHigherpowerconsumption,2billiontransistors40nmProgrammableandfixedfunctionGraphicsanddata-parallelcodeHighmemorybandwidthHighlevelofparallelismShallowexecutionpipelinesLowerpowerconsumption,Source:AMD,PerformanceGapWidensFurther,Source:NVIDIA,CurrentGPUPerformance,GPUvs.CPUPerformance,n-BodiesinC+AMP,Demo,DataParallelandGPUAcceleration,ExposeextremeperformanceofGPUsforgraphicsandcomputationProvidedeveloperstheabilitytogetsignificantperformancegainsDemandforasingleapplicationimagethatrunsonGPUhardwarefrommultiplevendors,Raytracing,Medicaltomography,LeveragingaC+ProgrammingModel,GoalsPerformance:UnleashthepowerofthehardwareProductivity:UseC+skillsforcommodityhardwarePortability:Writeonce,runonmultipleplatformsThemesProtectcodinginvestmentinthelongrunIncreaseexpressivenessashardwareevolvesLeverageexistingdeveloperskillsonlanguageandtools,C+AMP,WillbeintegratedintoVisualStudioandtheVC+CompilerBuildingontheDirectXruntimestackUbiquitousandreliableEnablesbroadrangeofdeveloperstoleveragemoderndataparallelarchitecturesHigher-levelabstractionsLow-levelhardwareaccessCodeauthoring,debugging,andprofilingsupportprovidedwithinVisualStudioWeintendtosupportalibraryecosystem,DataParallelProgrammingC=A*B,0,1,2,3,n,A,0,1,2,3,n,B,C,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)for(intx=0;xW;+x)for(inty=0;yN;+y)floatsum=0;for(inti=0;iM;i+)sum+=Ax*M+i*Bi*W+y;Cx*W+y=sum;,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)array_viewa(M,W,A),b(W,N,B);array_viewc(M,N,C);parallel_for_each(c.grid,=(indexidx)restrict(direct3d)floatsum=0;for(inti=0;ia.x;i+)sum+=a(idx.y,i)*b(i,idx.x);cidx=sum;);,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)parallel_for(0,W,grid,index,andarray_view,usingnamespaceConcurrency;voidMatrixMultiply(vector,gridDeterminestheshapeofdataandthescopeofcomputationindexAnN-dimensionalvectorusedtoindexapointinamultidimensionalarrayarray_viewMultidimensionalarrayofrankNwithelementtypeTContainerfordatausedonaccelerator,gride3(6,3,3);,indexi3(2,0,1);,RestrictionQualifiers,FunctionorlambdamodifierwitharestrictionqualifierRestrictionqualifiersinformcompilerof“intent”or“behavior”Allowsoptimizationsorspecialcode-genbehaviorDefinedqualifiersdirect3dandpotentiallyothersSyntaxexamples:,voidCompute(double,direct3dRestrictionQualifier,FunctionsandlambdasintendedtoexecuteonaDirect3DgraphicsdeviceCapableofexecutingonanyDX11deviceRestrictionsfollowlimitationsofDX11devicemodelShaderModel5No1-byteor2-bytedatatypesNofunctionpointersorvirtualmethodsCanonlycallotherdirect3dfunctionsManyotherrestrictionsOtherC+languagefeaturesworkasexpected,restrict(direct3d)CompilationChain,regularC+code,C+AMPcode,HLSLcodegen,fxcshadercompiler,VC+compiler,C+linker,Executable,C+sourcefile,Example:TargetRestrictions,/Overloadontargetfloatcos(float)restrict(direct3d,fpga)Baz*pBaz=newBaz(v);/errorreturn_TaylorSeries_cos(v);floatcos(floatv)restrict(cpu)return_x64_FastCos(v);/Target-polymorphiccallsitefloatfoo(floatv)returncos(v);,autorestrictionspecifier,Letsthecompilerdecideeffectiverestrictionsforinlinefunctions(notforforwarddeclarations)Directlyusingexistingtemplatetypesandalgorithms(andnewones)withoutannotationstd:complexclassstd:for_eachalgorithm,templateinlinevoidmy_generic_algorithm(Funcf)restrict(auto)f();,Inferringauto,/Compilewith/ZautonamespacestdtemplateFuncfor_each(IIfirst,IIlast,Funcf)/impliesrestrict(cpu,auto)for(;first!=last;+first)f(*first);returnf;,parallel_for_each,InitiatesparallelexecutiononanacceleratorSemantics:InvokethefunctorforeachpointinthegridArbitraryorderfortheactivitiesparallel_for_eachisas-ifsynchronousintermsofvisiblesideeffects,usingnamespaceConcurrency;voidMatrixMultiply(vector,DataParallelAccelerators,AbstractionforoneormoredataparallelacceleratorsOneormoreDirect3D11GPUsMultipleGPUsCPUvectorprocessorMany-coreCPUsHostandacceleratorareseparateinthemodelDatatransferto/fromacceleratorCouldbeoptimizedawayforintegratedmemoryarchitectureaccelerator_viewHandletoanaccelerator,CPUs,Systemmemory,GPU,PCIe,GPU,GPU,GPU,Host,Accelerator,Performance-orientedtopics,Usethreadgroups(tiles)andsharedmemoryforperformancetile_staticstoragemodifierDefinesharedmemoryforthreadsinthetiletile_barrierBarrierforallthreadsinthetileTiledformofparallel_for_eachtiled_gridtodefinedimensionstiled_indexSpecializedindexwithtilecoordinates,Matrixmultiplywithtiles,voidMatrixMultiplyTiled(vector,Exampleoftiledforall,IntegratedDeveloperTools,UseC+AMPfeaturesandclasseswithfullsupportfromtheVisualStudioIDEIntelliSensecolorizationDebugcomputeintensivecodesegmentsthatexecuteonGPUsGPUemulatorGPUhardwareConcurrencyVisualizerprovidesintegratedviewofactivityonCPUandGPU,DebuggingExperience,WellknownVisualStudiodebuggingfeaturesLaunch,Attach,Stepping,BreakpointsProcesses,DebugOutput,Modules,DisassemblyCallStack,Memory,Registers,Locals,Watch,QuickWatch,DataTipsNewfeatures(forbothCPUandGPU)ParallelStackswindow,ParallelWatchwindow,BarrierNewGPU-specificGPUThreadswindow,GPUDebugging,BringCPUdebuggingexperiencetoGPU,ConcurrencyVisualizerSupportforGPGPU,ProfilercombinesGPUsupportwiththeCPUth
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 货物委托代办协议书
- 货物搬运合同协议书
- 购买钩机合同协议书
- 贵州坟地买卖协议书
- 购买木炭合同协议书
- 财产公证协议书模板
- 供料协议合同范本
- 2025年铣床操作规程题库及答案
- 购车分期还款协议书
- 财产申报房产协议书
- 浙江省宁波市第七中学2025-2026学年九年级上学期期中语文试题(含答案)
- 2025年部队卫生员考试题库
- 【MOOC】分子生物学-华中农业大学 中国大学慕课MOOC答案
- 钢材物资组织供应、运输服务方案
- 汽车电工电子技术说课
- 电子元器件与电路基础
- 矿压动态监测工操作规程
- 《动画片中的场景气氛研究开题报告》
- GB/T 17521-1998化学试剂N,N-二甲基甲酰胺
- GB/T 1094.1-2013电力变压器第1部分:总则
- GA/T 744-2013汽车车窗玻璃遮阳膜
评论
0/150
提交评论