哈工大并行程序设计课程试验报告之一_第1页
哈工大并行程序设计课程试验报告之一_第2页
哈工大并行程序设计课程试验报告之一_第3页
哈工大并行程序设计课程试验报告之一_第4页
哈工大并行程序设计课程试验报告之一_第5页
已阅读5页,还剩12页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、并行程序设计课程实验报告实验 1: Intel 多核编译器及 Intel Parallel Studio XE姓名葛书衡院系软件学院学号 1153730109任课教师张伟哲指导教师实验地点软件学院三楼机房实验时间2017.4.12实验课表现出勤、表现得分实验报告 得分实验总分操作结果得分、实验目的要求:需分析本次实验的基本目的,并综述你是如何实现这些目的的?基本目的:.掌握六步骤的优化过程;.掌握利用编译器选项来优化代码;.掌握针对不同CPU使用自动向量化进行性能调优;.学习增加并行性的三步骤;.使用Cilk Plus来增加并行性;6.使用OpenMP来增加并行性实现:通过实验指导书、结合老师

2、上课所讲的知识,在实验室上机实现以上目的二、实验内容该部分填写在实验过程中,你都完成了哪些工作。实验1.1使用intel编译器生成优质代码步骤一:不使用优化技术构建应用程序步骤二:使用通用优化步骤三:使用处理器相关的优化步骤四:增加过程间优化步骤五:性能测评指导的优化步骤六:自动向量化的调优实验 1.2 Parallel Studio XE 快速上手1、使用 Cilk Plus2、三步骤增加并行性:步骤1:分析串行程序步骤2:用Cilk Plus实现并行性步骤3:调试及错误检查三、实验结果.不使用优化技术构建应用程序使用通用优化/OdC:Xtestintel noapt.exeTineElap

3、sed 5.279257 SecsTotal=6798.&80541 Check Sun = 16阻6则TineElapsed 5.277541 SeccTotal=679ea680541 Check Sun = 163160000Tine5 .252626 SeesTotal=6798.580541 Check Suin - 169160000TineElapsed 5.247493 SecsTotal=6798.680541 Check Sun = 163160300TineElapsed 5.24702S SeccTotal=t79Ha&e0541 Check Sim = 1G316S9

4、O0TineEldpseil 5 .247576 SeesTotal-6798.680541 Clieck Sun - 163160900C= tes:t /O1Xtestintel.01.exeT imeElapsed0.966629SecsTotal=6798.680541 Check Sum=160160030T imeElapsed0.978060SecsTotal=6798.680541 Check Sum=160160030T imeElapsed0.972101SecsTotal=6798.680541 Check Sum=1601600301 imeElapsed0.97266

5、3Secsiotal=b7yB.6HU541 Check Sum=lbU16UMkJMT imeElapsed0.990346SecsTotal=6798.680541 Check Sum=160160030T inkcEla口苕ud0.5725783匕匚工TuLal-G798 .G80541 Clkeuh Sunt-1G01G0&0/O2C:Xtestintel.02.exeT imeElapsed 3.370606 Secs Total=6798.680541 CheckSun = 160169000T imeElapsed 0.367711 Secs Total=6798.680541

6、CheckSun = 160169000T imeElapsed 3.369070 Secs Total=6798.680541 CheckSun = 160169000T imeElapsed 3.270407 Gees Total=t79H-G90541 CheckSun = 1GO1G9000T imeElapsed 3.368370 Secs Total=6798.680541 CheckSun = 160169000T imeElapsed 0.369067 Secs Total=6798.680541 CheckSun = 160169000/O3|c: Xtestintel_03

7、.exeT imeElapsed0.354772SecsTotal=6798.680541CheckSun = 160160000T imeElapsed&.355271SecsTotal=6798.660541CheckSun = 160160000T imeElapsed0.254022SecsTotal=G79aCheckSun = 1G01G00S0T imeElapsedQ.353975SecsTotal=6798.680541CheckSun = 160160000T imeElapsed0.354375SecsTotal=6798.680541CheckSun = 1601600

8、00T imeElapsed0.353231SecsTotal=6798.680541CheckSun = 160160000/Ox|C: testintel-0 x-exeT imeElapsed0.367272SecsTotal=6798.680541CheckSum=16S160000TimfiElapsedSecsTntal=A79RChfinkSum=1 Am ARPinnT imeElapsed0.366461SecsTotal=6798.680541CheckSum=16S160000T imeElapsed0.365486SecsTotal=6798.680541CheckSu

9、m=16S160000T imeElapsed0.3G51OSSeesTotal-G7?8.C8S541CliccltGum-1G61GQS0ST imeElapsed0.366016SecsTotal=6798.680541CheckSum=16S160000使用处理器相关的优化(1)不使用选项,构建和运行应用程序,增加Qvec-到CFLAGS选项中以关闭自动向量化C:testintel.34.exeT luteE Icq邛匕a0.3G3797它已心石ToLintel.SSE.exeT imeElapsed0.392221SecsTotal=6798.680541CheckSum = 160

10、160000T imeElapsed0.381096SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.384797SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.383872SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.386879SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.386884SecsTotal=6798.680541CheckSum

11、= 160160000SSE2C:Xtestintel.SSE2.exeT imeElapsed0.393706SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.378600SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.378578SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.377979SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.381179

12、SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.377378SecsTotal=6798.680541CheckSum = 160160000SSE3C:Xtestintel.SSE3.exeT imeElapsed 0.382814 SecsTotal=6798.680541Check Sum = 16016000。T imeElapsed 0.378148 SecsTotal=6798.680541Check Sum = 160160000T imeElapsed 0.376223 SecsTotal=6798.680541Ch

13、eck Sum = 160160000T imeElapsed 0.377690 SecsTotal=6798.680541Check Sum = 16016000。T imeElapsed 0.374411 SecsTotal=6798.680541Check Sum = 160160000T imeElapsed 0.380942 SecsTotal=6798.680541Check Sum = 160160000SSE4.12-Xtestinte.SSE4.1.(axer imeElapsed0.384171SecsTotal=6798.680541CheckSum = 16016000

14、0r imeElapsed0.375404SecsTotal=6798.680541CheckSum = 160160000r imeElapsed0.377257SecsTotal=6798.680541CheckSum = 160160000r imeElapsed0.378104SecsTotal=6798.680541CheckSum = 160160000r imeElapsed0.382731SecsTotal=6798.680541CheckSum = 160160000r imeElapsed0.374886SecsTotal=6798.680541CheckSum = 160

15、160000SSE4.2|C:Xtestintel.SSE4.2.exeT lineElapsedQ.382833SecsTotal=6798.680541CheckSum = 160160000T in)eElapsed&.373490SecsTotal=6798.680541CheckSun = 160160000T lineElapsed&.373207SecsTotal=6798.680541CheckSum = 160160000T lineElapsed0.373814SecsTotal=6798.680541CheckSum = 160160000T in)eElapsed&.3

16、74750SecsTotal=6798.680541CheckSun = 160160000T lineElapsed&.372234SecsTotal=6798.680541CheckSum = 160160000使用QaxAVX选项重新构建应用程序AVXC:Xtestintel.axAUX.exeT ineElapsed0.379616SecsTotal=6798.680541CheckSun = 160160000T ineElapsed0.377457SecsTotal=6798.680541CheckSum = 160160000T ineElapsed0.381858SecsTot

17、al=6798.680541CheckSum = 160160000T ineElapsed0.380582SecsTotal=6798.680541CheckSun = 160160000T ineElapsed0.379117SecsTotal=6798.680541CheckSum = 160160000T ineElapsed0.383025SecsTotal=6798.680541CheckSum = 160160000.增加过程间优化使用/Qipo选项来构建和运行应用程序,添加使用的平台上最高级别的自动向量化记录运行时 间SSE2、SSE3、SSSE3)性能测评指导的优化QipoT

18、 ineElapsed0T ineElapsed0.282410SecsTotal=6798.680541CheckSumT ineElapsed0.253400SecsTotal=6798.680541CheckSumT ineElapsed0.254489SecsTotal=6798.680541CheckSumT ineElapsed0.255311SecsTotal=6798.680541CheckSumT ineElapsed0.258279SecsTotal=6798.680541CheckSumT ineElapsed0.256907SecsTotal=6798.680541Ch

19、eckSumC:Xtestintel.QipoSEE2.exe160160600160160600160160600160160600160160600160160600|c:Xtestintel.Qipo.exeT ineElapsed0.265214SecsTotal=6798.680541CheckSum = 160160000T ineElapsed0.254297SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.251030SecsTotal=6798.680541CheckSum = 160160000T ineElaps

20、ed0.254059SecsTotal=6798.680541CheckSum = 160160000T ineElapsed0.255383SecsTotal=6798.680541CheckSum = 160160000T imeElapsed0.254185SecsTotal=6798.680541CheckSum = 160160000SEE2SEE4.2C:Xtestintel.QipoSEE4.2.exeT imeElapsed 0.317439SecsTotal=6798.680541 Check Sum = 160160000T imeElapsed 0.257017SecsT

21、otal=6798.680541 Check Sum = 160160000T imeElapsed 0.251601SecsTotal=6798.680541 Check Sum = 160160000T imeElapsed 0.250948SecsTotal=6798.680541 Check Sum = 160160000T imeElapsed 0.251120SecsTotal=6798.680541 Check Sum = 160160000T imeElapsed 0.254430SecsTotal=6798.680541 Check Sum = 160160000SEE3C:

22、Xtestintel.QipoSEE3.exeT imeElapsed 0.255557SecsTotal=6798.680541Check Sum = 160160600T imeElapsed 0.252098SecsTotal=6798.680541Check Sum = 160160600T imeElapsed 0.251505SecsTotal=6798.680541Check Sum = 160160S00T imeElapsed 0.251814SecsTotal=6798.680541Check Sum = 160160600T imeElapsed 0.250546Secs

23、Total=6798.680541Check Sum = 160160600T imeElapsed 0.252690SecsTotal=6798.680541Check Sum = 160160S00(1)打开PGO功能,执行intel.pgo.gen.exe程序,记录实验结果C:Xtestintel.pgo.gen.exeT imeElapsed2.475525SecsTotal=6798.680541CheckSum = 160160000T imeElapsed2.483680SecsTotal=6798.680541CheckSum = 160160000T imeElapsed2.

24、495488SecsTotal=6798.680541CheckSum = 160160000T imeElapsed2.486739SecsTotal=6798.680541CheckSum = 160160000T imeElapsed2.469373SecsTotal=6798.680541CheckSum = 160160000T imeElapsed2.450192SecsTotal=6798.680541CheckSum = 160160000,5 a cf4ce3J01440. dyn2018/4/12 20:11DYN文件3 KBc addy.c2016/3/22 21:14C

25、 Source1 KBaddy.optrpt201B/4/12 19:37OPTRPT文件0 KBC1 chapter4.c2016/4/21 20:55C Source2 KB1 chapter4.h2O1&/3/22 216C/C+4- Header1 KEj chapter4.optrpt201B/4/12 19:37OPTRPT文件3 KB回 iritel.pgo.gen.exe201&/4/12 20:10150 KB.Makefile2O1&/3/23 10:48文件1 KB回 seiries.c2O1&/3/22 21:13C Source1 KBseries.optrpt201

26、3/4/12 19:27OPTRPT文件1 KB work.c2016/3/22 21:10C Source1 KBwork.optrpt201&/4/12 19:37OPTRPT文件1 KB wtime.c201S/3/23 9:32C Source1 KBwtime.optrpt201B/4/12 19:37OPTRPT文件0 KB重新构建应用程序,告诉编译器需要使用刚才产生的动态信息运行intel.pgo.exeC:testinte1-pgo.exeT imeElapsed1.050985SecsTotal=6798.680541CheckSum = 160160000T imeElap

27、sed1.034428SecsTotal=6798.680541CheckSum = 160160000T imeElapsed1.031015SecsTotal=6798.680541CheckSum = 160160000T imeElapsed1.027866SecsTotal=6798.680541CheckSum = 160160000T imeElapsed1.020331SecsTotal=6798.680541CheckSum = 160160000T imeElapsed1.019246SecsTotal=6798.680541CheckSum = 160160000.自动向

28、量化的调优(1)编译code1.2中的test.cpp程序,要求自动向量化器产生报告:te&t.optrpt -记莪口 口 文件归 编卷EJ *:卬 堂看&J 帮动(HJBegin optimizatian report for:float *, floal float fluat 东-fl oat *)FLcport froiri: Vcciur DptimEatLons rec JLOOP BEGIN at CtBsicodel, 2test. cpp(3, 2)remark #15344: loop 卬as not /ectorised: vectar dependence preven

29、ts VECtorizalian. First remark #15346: vector depEndence: assumed FLOW depEndcncE beiwecn Line 5 and 1 inc 5UOOF ENDLOUP BEGIN at C:tcs-tcodcl. ytest. cpp(3B lALOOP ENDrir使用GAPC:testcodel.2icl Zc test-epp /Qguide ZcIntel C + + Intel 64 Compiler XE for applications: running on Intel 64, U psion 08 Bu

30、ild 20140726Copyright 1985-2014 Intel Corporation - All rights peserued.test-eppGAP REPORT LOG OPENED ON Thu Apr 12 20:18:07 2018re mark #30761 : Add -Qparalle 1 option if you wan t the compiler to generate Iecomi endations fop imppouing auto-parallelization.C:XtestXcodel.2test.cpp(3): remark H30536

31、: Add -Qno-alias-args option F p better type-based disambiguation analysis by the compiler, if appropriate This will improve optinizations such as uectorization f of the loop at line 3 . UERIF? J Make sLire that the seman ics of this option is obeyed for the entire compilation. ALTERNATIUE Another a

32、y to get the same effect is to add the restrictM keyword to each pointer-type opmal parameter o the routineThis allows optimisations such as uectoii:at ion to be applied to the loop at line 3. EUERIFY1 Make sLire that semant ics of the pestiictM pointer qua 1 ifier is satisfied: in the poutine, all

33、data accesse thiough the pointer must not be accessed through any other pointer.Numbep of aduice-messages emitted for this compilation session: 1. END OF GAP REPORT LOG根据GAP建议,用命令行选项/Qno-alias-args选项来帮助编译器成功实现向量 化,编译代码,要求产生报告(4)编译运行conde1.2下的所有代码,并对比向量化与未向量化的性能teit.optrpt -记营本I Q I 回例中Fj漏辑(EJ格前CQ金舌M

34、 m(l-DJChin 二口二ini. e口: i匚n revert 二七r : 存;二七 _za-_*. _*. floit *. floit w)Report from: 丫丁二 m 口匚二imi.二二ti.二ns -c_。E”_A at Um 底口北,c口: LA JreiDcr: U二二344二 Icop ja? 口匚二.ectorize-:.: .ectcT depenzlenre pren7 = =cTLz.icn. FirstI nlli-rlt 2一二 34,:二 /ri. I i r L:n;jrli -ln.- : -SSJin/二口 L:n;jrli rli .r : J

35、r ILJ - n 11 - I I in H -II - i I n T OOP END_UU. EMat V:二二m1 北,Zit 二 3二,二 X,工 .Rejaider?TiO- EnrC:Xtestcodel.2icl Zc test.cpp /Qguide Zc /Qno-alias-argsIntel C+ IntelCE 64 Coinpilep XE for applications running on Intel 64, Ui Lesion 08 Build 20140726Copyright 1985-2014 Intel Copporation. All rights

36、 reserued.test.eppGAP EEPORT LOG OPENED ON Thu Apr 12 20:20:01 2018remark #30761 : Add -parallel option if you uant the compiler to generate pecoini endations for imppouing auto-parallelization.Number of aduice-messages emitted fop this compilation session: Q.END OF GAP REPORT LOGC: Xtestcodel .2,3

37、testoptrpt -记本本三性的编辑(日稻式口)食等W 韶刖&if-Bc5ll ctciiDLLDticn report for : FC.intj float /, floaT +, floa7 .匕 flea? flcit r.1Fri m in 二 Vi-ic:- !- i i: 11 h -.-il 11 ir 八中,:1LOJ?at L :iec7 23qe1. 12? ejt. cjp J. 2)rcmcik_oop r匚w no7 丁二二t匚zuizcH: vcctcr 3cn二口。cr-oc pr二卡匚匚3 CLtDrization. FixsiLclliclk #1 JJ

38、C ;匚、Lr l 匚 Lclrl 二:i-匚:aiiLULiz J FLCtJ de JzLjl.c1.l.c 1j ; Lr 匚匚 L _ i:iz 匚 c!lJ 1 LIjc 匚1.DT PhTiLOOP BEGIN at C:-testcodel. 2tBEt. epp(35 2) icl /o fff.exe main.epp test.epp /Quec-report:2 Intel C + + Intel 64 Compiler XE for applications running on Intel 64, Ue fsion 08 Build 20140726 Copyrigh

39、t 1985-2014 Intel Copporation. All rights veserued.icl: command line remark ttl0010: option 1/Quec-report:21 is deprecated and will be pemoued in a Liture release. See f /help deprecatedficl: pemapk #10397: optimization reports are generated in *.optrpt files in the output location main.epp test.epp

40、 Microsoft Incremental Linker Uersion 10.00.30319.01 Copyright Microsoft Copporation. All rights veserued.-out:ff.exe main.obj test.objC:XtestXcodel.2fff.exeTime Elapsed 22.096068 Secs uhen N = 5000000,LOOP UAS UECTORIZED5.三步骤增加并行性步骤一分析串行程序Hal VluMiWrt4ii KE 2D15-Hal VluMiWrt4ii KE 2D15-步骤二 用Cilk Pl

41、us实现并行性步骤三调试及错误检查include include injclude -.c lU:/ reduc er_ cpadd. hconst long int VEF:YBL& - 1000;/本市立球归市市x京木十才善木木市才木中市才木木木才才方加十才杷本市才魏庠东才.善本市市才就卡木木木木本东才方东市才方本木才注泰引才.才酢木木才木意木小才片人木才方加+.才木奉市本注本 int Jiaintvaid):int i.long int L sun;double sumEji sun/, i ot al;DU口 RD Mt art tin旧 5 elapsed.1 inue ;/ Out

42、put a start til仁工占口工。prirrtf (Cilk Plus Par-all cl linings for Wd it era! icais ritn”. VERYE;IG);/ repeat esperinerft several 1 iniesfor (1 = 0: 1 : i+)/ g ct st art ing 十 imesi arttine = tineC-ertliM Q ;/reset check sum & rumiLne total/ sum = 0;/iDtal = 0. D;cilks: reducer_apadd sm (0);cilk:reducer_opadd total (0,0):/Vorh Loopj da same vork by looping VERYEIC tinescilk for (int j-0; jYERYEIG; j+)I/ increneni: check sumsun 4= L;/ Calculate first

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论