版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Timing Analysis- timing guarantees for hard real-time systems-Reinhard WilhelmSaarland UniversitySaarbrckenTexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAStructure of the LectureIntroductionStatic timing analysisthe problemour approachthe successtool architect
2、ureCache analysisPipeline analysisValue analysisWorst-case path determinationTiming Predictabilitycachesnon-cache-like devicesfuture architecturesConclusionIndustrial NeedsHard real-time systems, often in safety-critical applications aboundAeronautics, automotive, train industries, manufacturing con
3、trol Wing vibration of airplane, sensing every 5 mSecSideairbag in car,Reaction in 10 mSeccrankshaft-synchronous taskshave very tight deadlines, 45uSHard Real-Time SystemsEmbedded controllers are expected to finish their tasks reliably within time bounds.Task scheduling must be performedEssential: u
4、pper bound on the execution times of all tasks statically known Commonly called the Worst-Case Execution Time (WCET)Analogously, Best-Case Execution Time (BCET)Timing Analysis Embedded controllers are expected to finish their tasks reliably within time bounds. The problem:Givena software to produce
5、some reaction, a hardware platform, on which to execute the software,required reaction time.Derive: a guarantee for timeliness.Timing Analysisprovides parameters for schedulability analysis:Execution time, Ci, of tasks, and if that is impossible,upper bounds and maybe also lower bounds on execution
6、times of tasks, often called Worst-Case Execution Times (WCET) and Best-Case Execution Times (BCET).Architecture(constant executiontimes)Timing Analysis the Search Spaceall control-flow paths (through the binary executable) depending on the possible inputs.Feasible as search for a longest path if it
7、eration and recursion are bounded,execution time of instructions are (positive) constants.Elegant method: Timing Schemata (Shaw 89) inductive calculation of upper bounds.SoftwareInputub (if b then S1 else S2) := ub (b) + max (ub (S1), ub (S2)High-Performance Microprosessorsincrease (average-case) pe
8、rformance by using: Caches, Pipelines, Branch Prediction, SpeculationThese features make timing analysis difficult:Execution times of instructions vary widelyBest case - everything goes smoothly: no cache miss, operands ready, resources free, branch correctly predictedWorst case - everything goes wr
9、ong: all loads miss the cache, resources are occupied, operands not readySpan may be several hundred cyclesVariability of Execution TimesLOAD r2, _aLOAD r1, _bADD r3,r2,r1PPC 755x = a + b;In most cases, executionwill be fast.So, assuming the worst caseis safe, but very pessimistic!AbsInts WCET Analy
10、zer aiT IST Project DAEDALUS final review report: The AbsInt tool is probably thebest of its kind in the world and it is justified to consider this result as a breakthrough.”Several time-critical subsystems of the Airbus A380 have been certified using aiT;aiT is the only validated tool for these app
11、lications.Tremendous Progressduring the 10 years from 1998 to 2008199520022005over-estimation20-30%15%30-50%42560200cache-miss penaltyLim et al.Thesing et al.Souyris et al.The explosion of penalties has been compensated by the improvement of the analyses!10%25%State-dependent Execution TimesExecutio
12、n time depend on the execution state.Execution state results from the execution history.semantics state:values of variablesexecution state:occupancy of resourcesstateArchitectureTiming Analysis the Search Spacewith State-dependent Execution Timesall control-flow paths depending on the possible input
13、sall paths through the architecture for potential initial statesSoftwareInputinitialstatemul rD, rA, rB execution states for paths reaching this program pointinstructionin I-cacheinstructionnot in I-cache1bus occupiedbus not occupiedsmall operandslarge operands14 40ArchitectureTiming Analysis the Se
14、arch Spacewith out-of-order executionall control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesSoftwareInputinitialstateArchitectureTiming Analysis the Search Spacewith multi-threadinga
15、ll control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesincluding different interleavings of accesses to shared resourcesSoftwareInputinitialstateWhy Exhaustive Exploration?Naive attem
16、pt: follow local worst-case transitions onlyUnsound in the presence of Timing Anomalies:A path starting with a local worst case may have a lower overall execution time,Ex.: a cache miss preventing a branch mis-predictionCaused by the interference between processor components: Ex.: cache hit/miss inf
17、luences branch prediction; branch prediction causes prefetching; prefetching pollutes the I-cache. State Space Explosion in Timing Analysisconstantexecutiontimesstate-dependentexecution timesout-of-orderexecutionpreemptiveschedulingconcurrency +shared resourcesyears +methods199520002010Timing schema
18、taStatic analysis?Caches, pipelines,speculation:combined cache andpipeline analysisSuperscalar processors:interleavingsof all schedulesMulti-core withshared resources:interleavingsof several threadsNotions in Timing AnalysisHard or impossible to determineDetermine upper bounds instead High-Level Req
19、uirements for Timing AnalysisUpper bounds must be safe, i.e. not underestimatedUpper bounds should be tight, i.e. not far away from real execution timesAnalogous for lower boundsAnalysis effort must be tolerableNote: all analyzed programs are terminating, loop bounds need to be known no decidability
20、 problem, but a complexity problem!Timing Accidents and PenaltiesTiming Accident cause for an increase of the execution time of an instructionTiming Penalty the associated increaseTypes of timing accidentsCache missesPipeline stallsBranch mispredictionsBus collisionsMemory refresh of DRAMTLB missExe
21、cution Time is History-SensitiveContribution of the execution of an instruction to a programs execution time depends on the execution state, e.g. the time for a memory access depends on the cache statethe execution state depends on the execution historyneeded: an invariant about the set of execution
22、 states produced by all executions reaching a program point.We use abstract interpretation to compute these invariants.Deriving Run-Time GuaranteesOur method and tool, aiT, derives Safety Properties from these invariants : Certain timing accidents will never happen.Example: At program point p, instr
23、uction fetch will never cause a cache miss.The more accidents excluded, the lower the upper bound.MurphysinvariantFastestVariance of execution timesSlowestAbstract Interpretation in Timing AnalysisAbstract interpretation statically analyzes a program for a given property without executing it.Derived
24、 properties therefore hold for all executions.It is based on the semantics of the analyzed language.A semantics of a programming language that talks about time needs to incorporate the execution platform!Static timing analysis is thus based on such a semantics.The Architectural Abstraction inside th
25、e Timing AnalyzerTiming analyzerArchitectural abstractionsCacheAbstractionPipeline AbstractionValue Analysis, Control-FlowAnalysis,Loop-BoundAnalysisabstractions ofthe processorsarithmeticAbstract Interpretation in Timing AnalysisDeterminesinvariants about the values of variables (in registers, on t
26、he stack)to compute loop boundsto eliminate infeasible pathsto determine effective memory addressesinvariants on architectural execution stateCache contents predict hits & missesPipeline states predict or exclude pipeline stallsTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger
27、LinearProgrammingValue Analysis Determines enclosingintervals for the set of values in registers and local variables, used fordetermining addresses.Loop boundanalysisDetermines loop boundsControl FlowAnalysisDetermines infeasible pathsThe Story in DetailTool ArchitectureValue AnalysisMotivation: Pro
28、vide access information to data-cache/pipeline analysisDetect infeasible pathsDerive loop boundsMethod: calculate intervals at all program points, i.e. lower and upper bounds for the set of possible values occurring in the machine program (addresses, register contents, local and global variables) (C
29、ousot/Halbwachs78)Value Analysis II Intervals are computed along the CFG edges At joins, intervals are unioned“D1: -2,+2D1: -4,0D1: -4,+2move.l #4,D0add.l D1,D0move.l (A0,D0),D1D1: -4,4, A0 x1000,0 x1000D04,4, D1: -4,4,A0 x1000,0 x1000D00,8, D1: -4,4,A0 x1000,0 x1000access 0 x1000,0 x1008Which addre
30、ss is accessed here?Value Analysis (Airbus Benchmark)1Ghz Athlon, Memory usage Cache sets are independent: Everything explained in terms of one setLRU-Replacement Strategy: Replace the block that has been Least Recently UsedModeled by AgesExample: 4-way set associative cacheage0123m0m1Access m4 (mis
31、s)m4m2m1Access m1 (hit)m0m4m2m1m5Access m5 (miss)m4m0m0 m1 m2 m3Cache AnalysisHow to statically precompute cache contents:Must Analysis:For each program point (and context), find out which blocks are in the cache prediction of cache hitsMay Analysis: For each program point (and context), find out wh
32、ich blocks may be in the cacheComplement says what is not in the cache prediction of cache missesIn the following, we consider must analysis until otherwise stated.(Must) Cache AnalysisConsider one instruction in the program.There may be many paths leading to this instruction.How can we compute whet
33、her a will always be in cache independently of which path execution takes?load a Question: Is the access to a always a cache hit?Determine Cache-Information(abstract cache states) at each Program Pointa, bxyoungest age - 0oldest age - 3Interpretation of this cache information:describes the set of al
34、l concrete cache states in which x, a, and b occur x with an age not older than 1 a and b with an age not older than 2,Cache information contains only memory blocks guaranteed to be in cache.they are associated with their maximal age.Cache Analysis how does it work?How to compute for each program po
35、int an abstract cache state representing a set of memory blocks guaranteed to be in cache each time execution reaches this program point?Can we expect to compute the largest set?Trade-off between precision and efficiency quite typical for abstract interpretation(Must) Cache analysis of a memory acce
36、ssa, bxaccess to ab, xaAfter the access to a, a is the youngest memory block in cache, and we must assume that x has aged.What about b?baaccess to ab axyyxconcretetransferfunction(cache)abstracttransferfunction(analysis)Combining Cache InformationConsider two control-flow paths to a program point:fo
37、r one, prediction says, set of memory blocks S1 in cache,for the other, the set of memory blocks S2.Cache analysis should not predict more than S1 S2 after the merge of paths.the elements in the intersection should have their maximal age from S1 and S2.Suggests the following method: Compute cache in
38、formation along all paths to a program point and calculate their intersection but too many paths!More efficient method: combine cache information on the way,iterate until least fixpoint is reached.There is a risk of losing precision, not in case of distributive transfer functions.What happens when c
39、ontrol-paths merge? a c, f d c e a d a, c d “intersection + maximal age”We canguaranteethis contenton this path.We canguaranteethis contenton this path.Which contentcan weguaranteeon this path?combine cache information at each control-flow merge pointMust-Cache and May-Cache- InformationThe presente
40、d cache analysis is a Must Analysis. It determines safe information about cache hits.Each predicted cache hit reduces the upper bound.We can also perform a May Analysis. It determines safe information about cache misses Each predicted cache miss increases the lower bound.(May) Cache analysis of a me
41、mory accessa, bxaccess to axaWhy? After the access to a a is the youngest memory block in cache, and we must assume that x, y and b have aged.b, zyzyCache Analysis: Join (may) a c, f d c e a d a,c e f d “union + minimal age”Join (may)Abstract Domain: Must Cachezsxaxsztzsxtszxtztxs AbstractionReprese
42、nting sets of concrete caches by their descriptionconcrete caches z,xsabstract cacheAbstract Domain: Must Cache z,xs Concretizationsz, x Sets of concrete caches described by an abstract cacheremaining line filled upwith any other blockconcrete cachesabstract cacheover-approximation!Abstract Domain:
43、May Cachezsxaxsztzsxtszxtztxsz ,s, x t a Abstractionabstract cacheconcrete cachesAbstract Domain: May Cache Concretizationz,s,x t a abstract may-caches saywhat definitely is not in cacheand what the minimal age of those is that may be in cache.z,s,xz,s,x,tz,s,x,tz,s,x,t,aconcrete cachesabstract cach
44、eLessons LearnedCache analysis, an important ingredient of static timing analysis, provides for abstract domains, which proved to be sufficiently precise,have compact representation,have efficient transfer functions,which are quite natural.Problem Solved?We have shown a solution for LRU caches.LRU-c
45、ache analysis works smoothlyFavorable structure“ of domainEssential information can be summarized compactlyLRU is the best strategy under several aspectsperformance, predictability, sensitivity and yet: LRU is not the only strategyPseudo-LRU (PowerPC 755 Airbus)FIFOworse under almost all aspects, bu
46、t average-case performance!Contribution to WCETwhile . . . do max n . . . ref to s . . .odtimetmissthitloop timen tmissn thittmiss (n 1) thitthit (n 1) tmissContextsCache contents depends on the Context, i.e. calls and loopswhile cond do join (must)First Iteration loads the cache =Intersection loses
47、 most of the information!Distinguish basic blocks by contextsTransform loops into tail recursive proceduresTreat loops and procedures in the same wayUse interprocedural analysis techniques, VIVU virtual inlining of proceduresvirtual unrolling of loopsDistinguish as many contexts as useful1 unrolling
48、 for caches1 unrolling for branch prediction (pipeline)Structure of the LecturesIntroductionStatic timing analysisthe problemour approachthe successtool architectureCache analysisPipeline analysisValue analysisWorst-case path analysisTiming Predictabilitycachesnon-cache-like devicesfuture architectu
49、resConclusionTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger LinearProgrammingPipelinesHardware Features: PipelinesIdeal Case: 1 Instruction per CycleFetchDecodeExecuteWBFetchDecodeExecuteWBInst 1Inst 2Inst 3Inst 4FetchDecodeExecuteWBFetchDecodeExecuteWBFetchDecodeExecuteWBPi
50、pelinesInstruction execution is split into several stagesSeveral instructions can be executed in parallelSome pipelines can begin more than one instruction per cycle: VLIW, SuperscalarSome CPUs can execute instructions out-of-orderPractical Problems: Hazards and cache missesPipeline HazardsPipeline
51、Hazards:Data Hazards: Operands not yet available (Data Dependences)Resource Hazards: Consecutive instructions use same resourceControl Hazards: Conditional branchInstruction-Cache Hazards: Instruction fetch causes cache missCache analysis: prediction of cache hits on instruction or operand fetch or
52、storeStatic exclusion of hazardslwz r4, 20(r1)HitDependence analysis: elimination of data hazardsResource reservation tables: elimination of resource hazardsadd r4, r5,r6lwz r7, 10(r1)add r8, r4, r4OperandreadyIFEXMFCPU as a (Concrete) State MachineProcessor (pipeline, cache, memory, inputs) viewed
53、as a big state machine, performing transitions every clock cycleStarting in an initial state for an instruction transitions are performed, until a final state is reached:End state: instruction has left the pipeline# transitions: execution time of instructionA Concrete Pipeline Executing a Basic Bloc
54、kfunction exec (b : basic block, s : concrete pipeline state) t: traceinterprets instruction stream of b starting in state s producing trace t.Successor basic block is interpreted starting in initial state last(t)length(t) gives number of cyclesAn Abstract Pipeline Executing a Basic Blockfunction ex
55、ec (b : basic block, s : abstract pipeline state) t: traceinterprets instruction stream of b (annotated with cache information) starting in state s producing trace tlength(t) gives number of cyclesWhat is different?Abstract states may lack information, e.g. about cache contents.Traces may be longer
56、(but never shorter).Starting state for successor basic block? In particular, if there are several predecessor blocks.s2s1s?Alternatives: sets of states combine by least upper bound (join),hard to find one that preserves information and has a compact representation.Non-Locality of Local Contributions
57、Interference between processor components produces Timing Anomalies: Assuming local best case leads to higher overall execution time.Assuming local worst case leads to shorter overall execution timeEx.: Cache miss in the context of branch predictionTreating components in isolation may be unsafeImpli
58、cit assumptions are not always correct:Cache miss is not always the worst case!The empty cache is not always the worst-case start!An Abstract Pipeline Executing a Basic Block- processor with timing anomalies -function analyze (b : basic block, S : analysis state) T: set of traceAnalysis states = 2PS
59、 x CS PS = set of abstract pipeline statesCS = set of abstract cache statesinterprets instruction stream of b (annotated with cache information) starting in state S producing set of traces Tmax(length(T) - upper bound for execution timelast(T) - set of initial states for successor blockUnion for blo
60、cks with several predecessors. S2S1S3 =S1 S2Integrated Analysis: Overall PictureBasic Blocks1s10s2s3s11s12s1s13Fixed point iteration over Basic Blocks (in context) s1, s2, s3 abstract statemove.1 (A0,D0),D1Cyclewise evolution of processor modelfor instructions1 s2 s3Tool ArchitectureAbstract Interpr
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 出租屋消防安全责任制度
- 办公室安全管理责任制度
- 区域卫生与安全责任制度
- 医院废物管理责任制度
- 医院窗口收费责任制度
- 单位环境卫生责任制度
- 卫生管理岗位责任制度
- 叉车安全管理责任制度
- 司机配送岗位责任制度
- 员工责任制度手册模板
- 服装手工艺钩针教学课件
- 新课标初中物理词典
- 医疗质量与安全管理委员会会议专家讲座
- 外研版中考英语复习课件
- GB/T 41498-2022纤维增强塑料复合材料用剪切框测定面内剪切应力/剪切应变响应和剪切模量的试验方法
- GB/T 28733-2012固体生物质燃料全水分测定方法
- FZ/T 08001-2021羊毛絮片服装
- 博弈策略的生活解读 课件
- PSP问题分析与解决能力训练课件
- 综合实践六年级下册和灯做朋友-完整版课件
- 数字化仿真概述课件
评论
0/150
提交评论