




已阅读5页,还剩23页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TamingGPUComputewithC+AMPC+AcceleratedMassiveParallelism,SteveTeixeiraDirectorofProgramManagementMicrosoftParallelComputingPlatform,Agenda,WhyGPUcompute?OverviewofC+AMPFullVisualStudioIntegrationandSupport,RapidlyChangingProcessorArchitectures,1billiontransistors45nmMulti-tasking,I/O,virtualizationSupportsgeneralcodeLowmemorybandwidthMediumlevelofparallelismDeepexecutionpipelinesHigherpowerconsumption,2billiontransistors40nmProgrammableandfixedfunctionGraphicsanddata-parallelcodeHighmemorybandwidthHighlevelofparallelismShallowexecutionpipelinesLowerpowerconsumption,Source:AMD,PerformanceGapWidensFurther,Source:NVIDIA,CurrentGPUPerformance,GPUvs.CPUPerformance,n-BodiesinC+AMP,Demo,DataParallelandGPUAcceleration,ExposeextremeperformanceofGPUsforgraphicsandcomputationProvidedeveloperstheabilitytogetsignificantperformancegainsDemandforasingleapplicationimagethatrunsonGPUhardwarefrommultiplevendors,Raytracing,Medicaltomography,LeveragingaC+ProgrammingModel,GoalsPerformance:UnleashthepowerofthehardwareProductivity:UseC+skillsforcommodityhardwarePortability:Writeonce,runonmultipleplatformsThemesProtectcodinginvestmentinthelongrunIncreaseexpressivenessashardwareevolvesLeverageexistingdeveloperskillsonlanguageandtools,C+AMP,WillbeintegratedintoVisualStudioandtheVC+CompilerBuildingontheDirectXruntimestackUbiquitousandreliableEnablesbroadrangeofdeveloperstoleveragemoderndataparallelarchitecturesHigher-levelabstractionsLow-levelhardwareaccessCodeauthoring,debugging,andprofilingsupportprovidedwithinVisualStudioWeintendtosupportalibraryecosystem,DataParallelProgrammingC=A*B,0,1,2,3,n,A,0,1,2,3,n,B,C,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)for(intx=0;xW;+x)for(inty=0;yN;+y)floatsum=0;for(inti=0;iM;i+)sum+=Ax*M+i*Bi*W+y;Cx*W+y=sum;,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)array_viewa(M,W,A),b(W,N,B);array_viewc(M,N,C);parallel_for_each(c.grid,=(indexidx)restrict(direct3d)floatsum=0;for(inti=0;ia.x;i+)sum+=a(idx.y,i)*b(i,idx.x);cidx=sum;);,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)parallel_for(0,W,grid,index,andarray_view,usingnamespaceConcurrency;voidMatrixMultiply(vector,gridDeterminestheshapeofdataandthescopeofcomputationindexAnN-dimensionalvectorusedtoindexapointinamultidimensionalarrayarray_viewMultidimensionalarrayofrankNwithelementtypeTContainerfordatausedonaccelerator,gride3(6,3,3);,indexi3(2,0,1);,RestrictionQualifiers,FunctionorlambdamodifierwitharestrictionqualifierRestrictionqualifiersinformcompilerof“intent”or“behavior”Allowsoptimizationsorspecialcode-genbehaviorDefinedqualifiersdirect3dandpotentiallyothersSyntaxexamples:,voidCompute(double,direct3dRestrictionQualifier,FunctionsandlambdasintendedtoexecuteonaDirect3DgraphicsdeviceCapableofexecutingonanyDX11deviceRestrictionsfollowlimitationsofDX11devicemodelShaderModel5No1-byteor2-bytedatatypesNofunctionpointersorvirtualmethodsCanonlycallotherdirect3dfunctionsManyotherrestrictionsOtherC+languagefeaturesworkasexpected,restrict(direct3d)CompilationChain,regularC+code,C+AMPcode,HLSLcodegen,fxcshadercompiler,VC+compiler,C+linker,Executable,C+sourcefile,Example:TargetRestrictions,/Overloadontargetfloatcos(float)restrict(direct3d,fpga)Baz*pBaz=newBaz(v);/errorreturn_TaylorSeries_cos(v);floatcos(floatv)restrict(cpu)return_x64_FastCos(v);/Target-polymorphiccallsitefloatfoo(floatv)returncos(v);,autorestrictionspecifier,Letsthecompilerdecideeffectiverestrictionsforinlinefunctions(notforforwarddeclarations)Directlyusingexistingtemplatetypesandalgorithms(andnewones)withoutannotationstd:complexclassstd:for_eachalgorithm,templateinlinevoidmy_generic_algorithm(Funcf)restrict(auto)f();,Inferringauto,/Compilewith/ZautonamespacestdtemplateFuncfor_each(IIfirst,IIlast,Funcf)/impliesrestrict(cpu,auto)for(;first!=last;+first)f(*first);returnf;,parallel_for_each,InitiatesparallelexecutiononanacceleratorSemantics:InvokethefunctorforeachpointinthegridArbitraryorderfortheactivitiesparallel_for_eachisas-ifsynchronousintermsofvisiblesideeffects,usingnamespaceConcurrency;voidMatrixMultiply(vector,DataParallelAccelerators,AbstractionforoneormoredataparallelacceleratorsOneormoreDirect3D11GPUsMultipleGPUsCPUvectorprocessorMany-coreCPUsHostandacceleratorareseparateinthemodelDatatransferto/fromacceleratorCouldbeoptimizedawayforintegratedmemoryarchitectureaccelerator_viewHandletoanaccelerator,CPUs,Systemmemory,GPU,PCIe,GPU,GPU,GPU,Host,Accelerator,Performance-orientedtopics,Usethreadgroups(tiles)andsharedmemoryforperformancetile_staticstoragemodifierDefinesharedmemoryforthreadsinthetiletile_barrierBarrierforallthreadsinthetileTiledformofparallel_for_eachtiled_gridtodefinedimensionstiled_indexSpecializedindexwithtilecoordinates,Matrixmultiplywithtiles,voidMatrixMultiplyTiled(vector,Exampleoftiledforall,IntegratedDeveloperTools,UseC+AMPfeaturesandclasseswithfullsupportfromtheVisualStudioIDEIntelliSensecolorizationDebugcomputeintensivecodesegmentsthatexecuteonGPUsGPUemulatorGPUhardwareConcurrencyVisualizerprovidesintegratedviewofactivityonCPUandGPU,DebuggingExperience,WellknownVisualStudiodebuggingfeaturesLaunch,Attach,Stepping,BreakpointsProcesses,DebugOutput,Modules,DisassemblyCallStack,Memory,Registers,Locals,Watch,QuickWatch,DataTipsNewfeatures(forbothCPUandGPU)ParallelStackswindow,ParallelWatchwindow,BarrierNewGPU-specificGPUThreadswindow,GPUDebugging,BringCPUdebuggingexperiencetoGPU,ConcurrencyVisualizerSupportforGPGPU,ProfilercombinesGPUsupportwiththeCPUth
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 花店出入库管理制度
- 茶包装标识管理制度
- 重要接待车管理制度
- 落地式卸料平台施工方案的专家验证
- 课外读物进校园管理实施方案
- 江门市房地产市场调研分析报告(案例)
- 财经英语华为手机
- 视觉感知行业发展历程分析
- 山东省德州市宁津县育新中学等2024-2025学年七年级下学期5月期中考试数学试题(含部分答案)
- 试题【python二级】知识点-题型练习
- 安全教育培训课件:食品安全法律法规
- 社区养老院项目规划设计方案
- 2023年河北石家庄市事业单位招聘笔试参考题库(共500题)答案详解版
- 干部履历表(99年标准版)
- 跨越档封网计算表
- 断路器控制回路和信号回路
- 完整版-第八版内科冠心病课件
- 高中英语语法总结大全
- 2023小学道德与法治(部编版)五年级下册 第三单元复习课件
- 医生护士家长父母进课堂助教-儿童医学小常识PPT
- 生活垃圾清运服务组织机构及岗位职责
评论
0/150
提交评论