




已阅读5页,还剩23页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TamingGPUComputewithC+AMPC+AcceleratedMassiveParallelism,SteveTeixeiraDirectorofProgramManagementMicrosoftParallelComputingPlatform,Agenda,WhyGPUcompute?OverviewofC+AMPFullVisualStudioIntegrationandSupport,RapidlyChangingProcessorArchitectures,1billiontransistors45nmMulti-tasking,I/O,virtualizationSupportsgeneralcodeLowmemorybandwidthMediumlevelofparallelismDeepexecutionpipelinesHigherpowerconsumption,2billiontransistors40nmProgrammableandfixedfunctionGraphicsanddata-parallelcodeHighmemorybandwidthHighlevelofparallelismShallowexecutionpipelinesLowerpowerconsumption,Source:AMD,PerformanceGapWidensFurther,Source:NVIDIA,CurrentGPUPerformance,GPUvs.CPUPerformance,n-BodiesinC+AMP,Demo,DataParallelandGPUAcceleration,ExposeextremeperformanceofGPUsforgraphicsandcomputationProvidedeveloperstheabilitytogetsignificantperformancegainsDemandforasingleapplicationimagethatrunsonGPUhardwarefrommultiplevendors,Raytracing,Medicaltomography,LeveragingaC+ProgrammingModel,GoalsPerformance:UnleashthepowerofthehardwareProductivity:UseC+skillsforcommodityhardwarePortability:Writeonce,runonmultipleplatformsThemesProtectcodinginvestmentinthelongrunIncreaseexpressivenessashardwareevolvesLeverageexistingdeveloperskillsonlanguageandtools,C+AMP,WillbeintegratedintoVisualStudioandtheVC+CompilerBuildingontheDirectXruntimestackUbiquitousandreliableEnablesbroadrangeofdeveloperstoleveragemoderndataparallelarchitecturesHigher-levelabstractionsLow-levelhardwareaccessCodeauthoring,debugging,andprofilingsupportprovidedwithinVisualStudioWeintendtosupportalibraryecosystem,DataParallelProgrammingC=A*B,0,1,2,3,n,A,0,1,2,3,n,B,C,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)for(intx=0;xW;+x)for(inty=0;yN;+y)floatsum=0;for(inti=0;iM;i+)sum+=Ax*M+i*Bi*W+y;Cx*W+y=sum;,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)array_viewa(M,W,A),b(W,N,B);array_viewc(M,N,C);parallel_for_each(c.grid,=(indexidx)restrict(direct3d)floatsum=0;for(inti=0;ia.x;i+)sum+=a(idx.y,i)*b(i,idx.x);cidx=sum;);,voidMatrixMult(float*C,constfloat*A,constfloat*B,intM,intN,intW)parallel_for(0,W,grid,index,andarray_view,usingnamespaceConcurrency;voidMatrixMultiply(vector,gridDeterminestheshapeofdataandthescopeofcomputationindexAnN-dimensionalvectorusedtoindexapointinamultidimensionalarrayarray_viewMultidimensionalarrayofrankNwithelementtypeTContainerfordatausedonaccelerator,gride3(6,3,3);,indexi3(2,0,1);,RestrictionQualifiers,FunctionorlambdamodifierwitharestrictionqualifierRestrictionqualifiersinformcompilerof“intent”or“behavior”Allowsoptimizationsorspecialcode-genbehaviorDefinedqualifiersdirect3dandpotentiallyothersSyntaxexamples:,voidCompute(double,direct3dRestrictionQualifier,FunctionsandlambdasintendedtoexecuteonaDirect3DgraphicsdeviceCapableofexecutingonanyDX11deviceRestrictionsfollowlimitationsofDX11devicemodelShaderModel5No1-byteor2-bytedatatypesNofunctionpointersorvirtualmethodsCanonlycallotherdirect3dfunctionsManyotherrestrictionsOtherC+languagefeaturesworkasexpected,restrict(direct3d)CompilationChain,regularC+code,C+AMPcode,HLSLcodegen,fxcshadercompiler,VC+compiler,C+linker,Executable,C+sourcefile,Example:TargetRestrictions,/Overloadontargetfloatcos(float)restrict(direct3d,fpga)Baz*pBaz=newBaz(v);/errorreturn_TaylorSeries_cos(v);floatcos(floatv)restrict(cpu)return_x64_FastCos(v);/Target-polymorphiccallsitefloatfoo(floatv)returncos(v);,autorestrictionspecifier,Letsthecompilerdecideeffectiverestrictionsforinlinefunctions(notforforwarddeclarations)Directlyusingexistingtemplatetypesandalgorithms(andnewones)withoutannotationstd:complexclassstd:for_eachalgorithm,templateinlinevoidmy_generic_algorithm(Funcf)restrict(auto)f();,Inferringauto,/Compilewith/ZautonamespacestdtemplateFuncfor_each(IIfirst,IIlast,Funcf)/impliesrestrict(cpu,auto)for(;first!=last;+first)f(*first);returnf;,parallel_for_each,InitiatesparallelexecutiononanacceleratorSemantics:InvokethefunctorforeachpointinthegridArbitraryorderfortheactivitiesparallel_for_eachisas-ifsynchronousintermsofvisiblesideeffects,usingnamespaceConcurrency;voidMatrixMultiply(vector,DataParallelAccelerators,AbstractionforoneormoredataparallelacceleratorsOneormoreDirect3D11GPUsMultipleGPUsCPUvectorprocessorMany-coreCPUsHostandacceleratorareseparateinthemodelDatatransferto/fromacceleratorCouldbeoptimizedawayforintegratedmemoryarchitectureaccelerator_viewHandletoanaccelerator,CPUs,Systemmemory,GPU,PCIe,GPU,GPU,GPU,Host,Accelerator,Performance-orientedtopics,Usethreadgroups(tiles)andsharedmemoryforperformancetile_staticstoragemodifierDefinesharedmemoryforthreadsinthetiletile_barrierBarrierforallthreadsinthetileTiledformofparallel_for_eachtiled_gridtodefinedimensionstiled_indexSpecializedindexwithtilecoordinates,Matrixmultiplywithtiles,voidMatrixMultiplyTiled(vector,Exampleoftiledforall,IntegratedDeveloperTools,UseC+AMPfeaturesandclasseswithfullsupportfromtheVisualStudioIDEIntelliSensecolorizationDebugcomputeintensivecodesegmentsthatexecuteonGPUsGPUemulatorGPUhardwareConcurrencyVisualizerprovidesintegratedviewofactivityonCPUandGPU,DebuggingExperience,WellknownVisualStudiodebuggingfeaturesLaunch,Attach,Stepping,BreakpointsProcesses,DebugOutput,Modules,DisassemblyCallStack,Memory,Registers,Locals,Watch,QuickWatch,DataTipsNewfeatures(forbothCPUandGPU)ParallelStackswindow,ParallelWatchwindow,BarrierNewGPU-specificGPUThreadswindow,GPUDebugging,BringCPUdebuggingexperiencetoGPU,ConcurrencyVisualizerSupportforGPGPU,ProfilercombinesGPUsupportwiththeCPUth
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 纽马克文本类型理论视域下慕课英译研究-以昆明理工大学慕课礼仪指南
- 中国生物医疗废弃物项目创业计划书
- 中国建筑胶粉项目创业计划书
- 七台河市中医院病理诊断医师职业发展考核
- 中国超细氢氧化铝微粉项目投资计划书
- 朔州市人民医院移植患者超声考核
- 太原市人民医院中西医结合治疗考核
- 鹤岗市中医院内镜测漏原理与操作流程实操考核
- 阳泉市人民医院脑电图新技术考核
- 鄂尔多斯市人民医院呼吸科肺癌免疫治疗不良反应管理考核
- 2025杭州桐庐县统计局编外招聘2人考试参考题库及答案解析
- 扶贫项目实施方案及资金管理
- 2025中国华腾工业有限公司招聘笔试历年参考题库附带答案详解(3卷合一)
- 机械设计制造及其自动化专升本2025年智能设备联网试卷(含答案)
- 小学数学期末综合评价标准与表格
- 2025年江苏省国家公务员考录《行测》真题及参考答案
- 手术过程及准备流程
- 2025年电力系统工程师高级专业试题及答案
- 2025智慧医疗设备供应与区域市场拓展战略合作框架协议
- 学习通《大学生就业指导》章节测试含答案
- 深圳市中小学生流感疫苗接种知情同意书
评论
0/150
提交评论