版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Overview of Intel Core 2 Architecture and Software Development Tools,May 2008,Overview of Architecture ,Two Timings,Value of area,Thread A,Thread B,11.667,+3.765,15.432,15.432,+ 3.563,18.995,Value of area,Thread A,Thread B,11.667,+3.765,11.667,15.432,+ 3.563,15.230,Order of thread execution causes n
2、on determinant behavior in a data race,The Private Clause,Reproduces the variable for each thread Variables are un-initialized; C+ object is default constructed Any value external to the parallel region is undefined Can you spot the Race Condition? Make x int i; #pragma omp parallel for for(i=0; iN;
3、 i+) x = ai; y = bi; ci = x + y; ,private(x,y),Scheduling Clause,The schedule clause affects how loop iterations are mapped onto threads schedule(static ,chunk) Blocks of iterations of size “chunk” to threads Round robin distribution schedule(dynamic,chunk) Threads grab “chunk” iterations When done
4、with iterations, thread requests next set schedule(guided,chunk) Dynamic schedule starting with large block Size of the blocks shrink; no smaller than “chunk”,#pragma omp parallel for private (gP) schedule (static, 8) for( int i = start; i = end; i += 2 ) if ( TestForPrime(i) ) gP+; ,Lab 4 Mandelbro
5、t Scheduling,Objective: create a parallel version of mandelbrot. Analyze with VTune to look for load imbalance. Modify the code to add OpenMP clauses to diminish the load imbalance and improve performance Follow the next Mandelbrot activity called Mandelbrot Scheduling in the student lab doc,Work Qu
6、euing Intel Implementation Will be part of OpenMP 3.0 (slightly differently),Independent tasks can execute concurrently Create Queue of TasksWorks on Recursive functions Linked lists, etc.,Serial,Parallel,#pragma intel omp parallel taskq while(p != NULL) #pragma intel omp task do_work(p-data); p = p
7、-next; ,Optional Lab 5 Linked List Task Queue,while(p != NULL) do_work(p-data); p = p-next; ,Objective: Use VTune to identify where to parallelize a pointer chasing code and then modify the code to implement a task queue to parallelize the application Follow the Linked List task Queue activity calle
8、d LinkedListTaskQ in the student lab doc Note: We also have a companion lab, that uses worksharing to solve the same problem LinkedListWorkSharing We also have taskq labs on recursive functions - examples quicksort iNUM;i+) for(j=0;jNUM;j+) for(k=0;kNUM;k+) cij =cij + aik * bkj; for(i=0;iNUM;i+) for
9、(k=0;kNUM;k+) for(j=0;jNUM;j+) cij =cij + aik * bkj;,Fast Loop Index,Non unit stride skipping in memory can cause cache thrashing particularly for arrays sizes 2n,Unit Stride Memory Access (C/C+),Pan ready to fry eggs,Poor Cache Uilization - with Eggs,:,Carton represents cache line Refrigerator repr
10、esents main memory Table represents cache When table is filled up old cartons are evicted and most eggs are wasted,Request for an egg not already on table, brings a new carton of eggs from the refrigerator, but user only fries one egg from each carton. When table fills up old carton is evicted,User
11、requests one specific egg,User requests 2nd specific egg,User requests a 3rd egg Carton evicted,Previous user had usedall eggs on table,:,Good Cache Utilization - with Eggs,Carton eviction doesnt hurt us because weve already fried all the eggs in the cartons on the table just like previous user,User
12、 requests Eggs 1-8,User requests Eggs 9-16,User eventually asks for all the eggs,Request for one egg brings new carton of eggs from refrigerator User specifically requests eggs form carton already on table User fries all eggs in carton before egg from next carton is requested,Lab 7 Matrix Multiply C
13、ache Effects,Objective: Explore the impact of poor cache utilization on performance with VTune Analyzer and explore how to manipulation loops to achieve significantly better cache utilization & performance Follow the Matrix Multiply Cache Effects lab in the student lab doc. Set VTune Analyzer to col
14、lect samples on a counter called MEM_LOAD_RETIRED.L2_MISS & RESOURCE_STALLS,Optional Lab 8 False Sharing,Objective: Explore False sharing with VTune analyzer to learn what counters can be used to identify this issue. Manipulate the baseline code to remove the False Sharing issue Follow the False Sharing activity in the student lab doc. Set VTune Analyzer up to collect samples on a ratio called “Modified Data Sharing Ratio”,BACKUP,Lab 6 Essentials of Vectorization,Objective: Explore how auto vectorization can dramatica
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026湖北黄石市华新医院招聘2人备考题库及答案详解(典优)
- 2026浙江温州行前农贸市场有限公司招聘1人备考题库含答案详解(a卷)
- 2026贵州省农业科学院第十四届贵州人才博览会引进高层次人才47人备考题库附答案详解(巩固)
- 2026湖南长沙市天心区公开招聘特需岗位教师和名优特教师88人备考题库及答案详解(真题汇编)
- 2026辽宁铁岭市本级1家单位补充招聘公益性岗位人员1人备考题库及答案详解一套
- 2026湖南省兵器工业集团股份有限公司市场化招聘1人备考题库附答案详解ab卷
- 2026广西防城港上思县人民医院第二批招聘工作人员2人备考题库附答案详解(考试直接用)
- 2026北京市大兴区魏善庄镇镇属企业招聘1人备考题库附答案详解(综合卷)
- 2026山东威海智慧谷咨询服务有限公司招聘法律咨询辅助岗人员1人备考题库附答案详解(a卷)
- 《学习演讲词》活动探究“任务一”教学设计
- 洁厕灵中毒患者的护理
- NB-T20048-2011核电厂建设项目经济评价方法
- TD/T 1036-2013 土地复垦质量控制标准(正式版)
- 《变电站二次系统数字化设计编码规范》
- 公交司机环境监测远端交互系统设计
- 小学五年级《美术》上册知识点汇总
- 中国儿童原发性免疫性血小板减少症诊断与治疗改编指南(2021版)
- 2023年新高考II卷数学高考试卷(原卷+答案)
- 电子支付与网络银行课件
- 京东集团员工手册-京东
- 消防工程移交培训资料及签到表
评论
0/150
提交评论