生物信息学课件英文原版课件 (5)_第1页
生物信息学课件英文原版课件 (5)_第2页
生物信息学课件英文原版课件 (5)_第3页
生物信息学课件英文原版课件 (5)_第4页
生物信息学课件英文原版课件 (5)_第5页
已阅读5页,还剩97页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

EECS 583 Lecture 23Group 1 Control flow analysis + optiGroup 2 Dataflow analysis + opti,University of MichiganApril 8, 2002,Today,Control flow analysis and optimizationIdentifying branch correlations with path profilesExtending trimaran to perform path profiling, optimization using path profile informationNael, Pariwat, NuweeCompiler switch spacewalkingIdentifying the best settings for compiler switchesIbrahim, PeteDataflow analysis and optimizationBDD-based predicate analysis (record number of slides)More intelligent predicate relation analysis using binary decision diagrahmsBeth, Laura, Bill,Next time,Scheduling groupsTI C6x Dave, JeffWhile loop software pipelining Arnar, Tomas, MishaPower-sensitive scheduling Hai, AmitLast class Wednes, April 17Start 4:00 pmTo be held on central campusWe will head over to Ashleys afterwards (attendance optional) First round is on meDont have to drink if you do not want to,EECS 583 Group 1 Advanced control flow analysis and optimization“Path Profiling”,University of MichiganApril 8, 2002,Profiling?,Profiles counts occurrences of event during programs executionPoint/Basic Block ProfilingEdge ProfilingPath Profiling,120,250,160,270,270,Point Profiling,120,100,150,160,110,20,250,160,270,Edge Profiling,270,Path = execution tracePath profiles = how often does a control-flow path executes?How to profile paths?Approximate with block or edge profilesInaccurate!Trace programHigh cost!Use?Compiler optimizationPerformance TuningProgram Testing,Path Profiling What is it? & How to?,Profiling? Edge Profiling Not Enough!,120,100,150,160,110,20,250,160,270,Example: how edge profiles misidentify the most frequently executed paths.,Set k = history depthExecute program, - new block - add to FIFO queue of length kIf pattern is found in hash table - counter+Else add new pattern in hash table,Dumb way to collect path profiling,.A B C D E F A C D E,k = 4,String of path,Hash table Path / CounterA B C D10B C D E5C D E F1E F A C1F A C D1A C D E2.,FIFO,F A C D,FIFO,A C D E,Most recent path,By Thomas Ball & James R. Larus 1996From Bell Lab. And U. of Wisconsin - Madison,Efficient path profiling,Convert CFG to DAGAssign integer values to edges such that no two paths compute the same path sumSelect edges to instrument and compute appropriate increment for each edgeRegenerate path,Efficient path profiling Algorithm Overview,Path start at procedure entry or loop head - Yes! It is intraprocedural path profilingAcyclic pathRemove backedgesAdd source vertex (ENTRY)Add sink vertex (EXIT),Convert CFG to DAG,Assign Edge Values,Assign each edge value such thatSums along DAG path is unique, non-negative integerSums lie in range 0 NumPaths-1NumPaths = number of paths to EXIT,Val(e),NumPaths(v),Assign Edge Values - Example,Vertex(v) NumPaths(v)ABCDEF,Val(v-wk) = i = 1 to k-1NumPaths(wi),DAG with values computed and Path Encoding,Assign Edge Values - Example,Many ways to compute sum,Given edges values - find minimal operation to find sumEfficient event counting BallWeighs edge frequencyMax spanning tree on least traveled edges,Minimal Increments,Inc (B-D) = Val(A-B) + Val(B-D) + Val(D-F) = 2 + 2 + 0 = 4,Determine Minimal Increments,BasicInitialize r = 0 at ENTRYIncrement r+ = Inc(c)along chordRecord countr +at EXITOptimizeInitialize & Increment r= Inc(c)Increment & Record countr + Inc(c) +,Instrumentation,Given sum P , which path produced it?We know that we have path encoding 0 NumPaths-1For (I = 0 ; I NumPaths-1 ; I+)Go from ENTRY node and traverse the graphDo until reach EXITAt each block, take edge with largest Val(e) = NumPathsDecrement NumPaths,Path Regeneration,Low average run-time overhead 31% (edge-profiling 16%),Performance Measurement,Longer path than edge-profiling,Performance Measurement,Interprocedural Path Profiling by David Melski & Thomas RepsPaths may cross procedure boundariesUse numbering scheme as in Ball & LarusComplicated by cyclic paths from entering different call sitesWhole Program Paths by James R. LarusA complete, compact record of program entires control flowSupport interprocedural pathsPractically used in finding hot sub-pathsRequire more storage and post-processing,Related works and Extension,EECS 583 Project: Static Correlated Branch Prediction,Nuwee Wiwatwattana ,Pariwat Luangsuwimol , Nael Botros.April 8, 2002,Introduction:,Importance of path profiles.Exposes patterns followed by program.Allows optimizations to make use of the path frequencies rather than just point frequency.“In some situations, while it is possible to optimize a statement with respect to some paths along which it lies, the same optimization opportunity does not exist along other paths through the statement. We refer to such optimizations as path sensitive optimizations. “Rajiv Gupta, University of Arizona,Path profile driven optimizations:,Examples of path profile driven optimizations:Static correlated branch prediction.Path profile guided partial dead code elimination.Partial redundancy elimination.Load redundancy removal.Elimination of array bound check.,Increased importance of branch prediction accuracy,Increased importance of branch prediction:Deeper pipelines.Delayed branch resolution.Dynamic versus Static branch prediction:Use of actual pattern followed by branch.Costs of Hardware dynamic branch predictionCycle time cost. (Tcpu= Ninst * Cycle/Inst * Second/Cycle)Hardware cost.,Static Correlated Branch Prediction(SCBP),Static (compiler) simulation of a hardware per branch history branch prediction and branch target predictor :,SCBP,Limitation on Static Branch prediction:Limited communication with processor (in the form of a bit stating if the branch is likely taken or Not Taken passed from compiler to processor)Branch prediction static with time unlike Hardware dynamic branch predictors.How SCPB gets around those limitations:Code Duplication,SCBP,Trade-off between code expansion and branch predictability.Improves performance of multiple-issue, deeply pipelined microprocessors.,SCBP utilizes general path profiles, not limiting the technique to forward path profiles.“We appear to have been the first researchers to have collected Path profile information” Young and Smith,SCBP,What does SCBP do:Algorithm to use general path profiles to improve static branch prediction accuracy.Trade off code expansion for improved accuracy.Provide an algorithm to automatically and effectively tune SCBP space-time trade-off.,SCBP:,Steps for Static Correlation Branch prediction:Profiling.Local minimization.Global reconciliation.Layout.Example:,SCBP:,First we generate the path profile for the given code:,SCBP:,Local minimization:Here the concept of a history tree is introduced.History tree:Nodes of history tree are edges of CFG.For each block that ends with a conditional branch there is a history tree.Root node is called predicted branch, block containing it is called predicted block.Any path whose last edge starts at predicted block is called a predictive path.For each predictive path, last edge is called counted edge, and the rest of the path is called observed path.Different nodes in history tree may map to same CFG block( different paths, or history covering multiple iterations of a loop),SCBP:,SCBP:,Now we need to find minimum amount of history necessary to exploit correlation is it exists. The less history we need to preserve, the less code expansion will result in final program.Pruning history trees:,SCBP:,Example of Pruning a branch with no correlation to ancestors:Tree collapses to its root node.,SCBP:,Global Reconciliation:Determining the minimum number of copies needed for each basic block in order to preserve correlation history.,SCBP:,Global Reconciliation steps:First step: find all potential splitters. ( Each non leaf node of a minimized (pruned) tree is a splitter of the source block of the edge to which it maps.Second step: determine the number of pieces in each partition of paths leading to each splitter.Third step: get the intersection of all pieces of all partitions of a certain basic block.,SCBP:,Final CFG of our example:,SCBP:,Layout issues:The CFG produced by reconciliation typically is not an executable program, as new join points are created.If SCBP is performed in an intermediate representation the new joints are no problem.If it is performed in intermediate representation that resembles machine code, we may need to add some new branch instructions in order to ensure correct program semantics.,SCBP:,Trading Off Space and TimeSo far SCBP attempted to capture maximum improvement in branch prediction accuracy with no attempt to limit size expansion.Net effect will depend on both improvement in branch prediction and the penalties due to worse cache miss rate.Rather than making maximum number of copies to achieve the maximum improvement in branch prediction, we will choose only profitable branches and blocks for duplication.Solution:Overpruning: sacrificing some of the branch accuracy in the favor of code size.,SCBP:,Experimental Results:Benchmarks used,SCBP:,Experimental Results:Training sets used for benchmarks,SCBP:,Experimental Results:Additional information about benchmarks,SCBP:,Experimental Results:Number of paths profiled as a function of history depth,SCBP:,Experimental Results:Times for profiling,SCBP:,Experimental Results:Sizes of original and transformed code,SCBP:,Experimental Results:Mis-predict rate and cache miss rate,Reference Papers:,Static Correlated Branch PredictionCliff YoungBell LaboratoriesandMichael D. SmithHarvard University,How to Implement optimization Based on Path Profiling in Trimaran?,General Idea,Impact,Elcor,simu,Text file,How to Create a Path Profile?,RebelCmppAdd .Sub ,Codegen,C CodeIf()bb1AddSubBb1+,PATH_FILE41 2 3 4 32 4 5 6 8,Insert codeinside the Cprogram,a.out,Go Back To Elcor,2 stepsCreate small function to read input from text file and generate appropriate data structure to store it.Create an optimization code.,Compiler Switch Optimization,Peter SchwartzIbrahim BashirApril 8, 2002,Importance of Switches,The paperA case study on the importance of compiler and other optimizations for improving super-scalar processor performanceby Duvall, Andersen, Leggoe, Graham, Cooke, and Antonio1999What they didStarted with a FORTRAN programOptimized by changing compiler switchesTested results on 2 machines,Program and Optimizations,ProgramWritten in FORTRAN Modeled spherical particle transport phenomena2500 lines of code25 subroutinesMany nested loopsOptimizations,Experiment and Results,Test machinesIBM SP160MHz POWER2 CPU4 nodesDEC Alpha667MHz 21164 CPU1 nodeResults,Conclusions,Other switch settings gave similar resultsAuthors only reported good resultsGood switch settings gave speedup of 10Used experts to set switchesWe want to automate the process,Genetic Algorithm Parallelisation System (GAPS),The paperGAPS: Iterative Feedback Directed Parallelisation Using Genetic Algorithmsby Andy Nisbet1998What he didStarted with FORTRAN programUsed genetic algorithms to find a good sequence of compiler optimizationsTested with different numbers of processors,The GAPS Approach,Population and Evaluation,Population initializationUses domain informationPopulation size = 20Individuals represent sequences of transformationsFitness evaluationTransformations encoded in individual applied to original codeTransformed code compiled and run on benchmarkFaster execution = higher fitness scoreIllegal code gets lowest fitness,Selection and Reproduction,SelectionProbability = linear normalization of fitnessIllegal sequences can still reproduceElitism - only members of weaker half selected for deletionReproductionAll transformations changed loop structuresCrossover and mutation were specific to this domainDetails are irrelevant,Experiment,FORTRAN codeMachineSGI Origin 20001, 2, 4, 8, and 16 processorsCompilersGAPSPFA (Native SGI)Petit,Results,GAPS performed best44% faster than PFA37% faster than PetitTook about 24 hours to produce 20,000 individualsOnly 5.5% of GAPS individuals were legal,Conclusions,Genetic algorithms can be applied to compiler optimizationsCan require many reproductionsCan take a long timeOur search space should be much smaller,Adaptive Program Optimization,Motivation,Trying to find best combination of optimizations to apply to a given programMight not be possible to determine the applicability of certain transformations at compile-timeOptimization decisions are limited by lack of information about the input data setDrawbacks of profilingProfiling makes use of information collected during previous program runsData collected through profiling is based on program input that may not be representative of current inputProfiles may also become inaccurate if the machine configuration changesAdaptive optimization techniques attempt to gain more accurate info by making decisions at run-time based on the current input and machine configuration,Adaptive Optimization,Rather than applying a particular transformation at compile-time, generate adaptive code which can behave like transformed code (if needed) at run-timeAdaptive programs can be thought of as having multiple execution pathsSelection of a particular execution path is based on run-time information/valuesMore practical than multiversioningCost of run-time analysis is small compared to benefits if transformation can be applied,Dynamic Optimization,Technique similar to adaptive optimization; perform optimizations dynamically as information becomes available to apply and evaluate themMultiversioningCompiler generates multiple versions of a code section, and the most appropriate variant is selected at run-time based on current input data and/or machine environmentMajor limitation is that variants are generated at compile-time and therefore no run-time info can be exploited during code generationCreating enough code variants to cover all possible scenarios can lead to significant code growth, so typically only a few versions are created for each code section,Dynamic Optimization (2),Dynamic FeedbackTechnique that selects from compile-time code variants, but uses run-time sampling to chooseSame problems as multiversioning; no run-time info used in code generation and code explosionSampling phase: measure execution time for each optimization generated at compile-timeUser-defined duration; doesnt monitor changes in environmentProduction phase: use the variant with the smallest execution timeMore problems:No guarantee that input data/environment are constant during sampling phase, so it may be unreasonable to compare performance of variantsBehavior during sampling phase may not model behavior during production phase,Dynamic Optimization (3),Dynamic CompilationGenerates new code variants during program executionMakes use of run-time informationOverhead exceeds multiversioning and dynamic feedbackProgram execution is paused as new code variants are generatedGenerating code at run-time is expensiveDue to high overhead, only applied to code sections that will benefitDifficult to automate selection of such sections,Why use adaptive optimization?,ApplicabilityConservative compile-time analysis may exclude some optimizationsRun-time test can determine whether or not a transformation appliesUsefulnessWhether or not an optimization will be useful may depend on program characteristics not known at compile-timeSelectionWhen there is more than one applicable optimization, selecting the most suitable one may require run-time infoAdaptive code can behave either as untransformed code or code that has been transformed using one or more optimizations,Adaptive optimization vs. Multiversioning,Drawback of multiversion programs is the resulting code growthExperience has shown that in many situations multiple transformations must be applied to gain the desired optimization benefitNumber of versions grows exponentiallySame problem as compiler switch selectionAdaptive code avoids exponential code growth by requiring execution of some additional instructions at run-timeDespite run-time overhead due to additional instructions, adaptive programs can realize a large fraction of the speedup achievable by corresponding multiversion programsResults: adaptive version had 40-80% of the speedup achieved by multiversion program,How adaptive optimization works,Achieves effects of transformations by:Adapting flow of controlModifying bounds of loop variablesAdapting usage of loop variables in array subscript expressionsChoosing between serial and parallel execution of loopsExample: loop fusionExecution of loop iterations is interleavedAdaptive code contains back to back loops whose execution can either be interleaved or carried out in sequenceSet Boolean flags based on run-time info, and use value of flags to decide between original/transformed executionAdaptive transformation reduces code growth associated with multiversioning by introducing additional predicates and branchesSome problems:No mention of how run-time info is used to select a transformationOnly applied to a few loop transformations,Our project,Start out with a large number of Elcor/Impact optimization switchesPhase 1: narrow down switches to some reasonable numberWhats a reasonable number?Might be decided based on performance analysisHow exactly do we narrow down the switches?Couple of different techniques; use empirical analysis to decide on onePhase 2: feed reasonable number of switches into a genetic algorithmLet genetic algorithm run for a really long timeRefine this manageable number of switches into an optimal combination (near-optimal?),Our project (2),Phase 3: analy

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论