深入理解计算机系统_复习_清华.ppt_第1页
深入理解计算机系统_复习_清华.ppt_第2页
深入理解计算机系统_复习_清华.ppt_第3页
深入理解计算机系统_复习_清华.ppt_第4页
深入理解计算机系统_复习_清华.ppt_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、计算机组成原理,Zhang, Youhui (张悠慧) ,2010 秋季,课程回顾,Topics 计算机系统结构等相关概念与范畴 数的表示 汇编语言与C语言 代码优化,计算机系统结构等相关概念与范畴,概念计算机系统结构,编写出能够在机器上正确运行的系统程序所 必须了解到的计算机系统的属性 研究计算机系统软件与硬件的功能分配,确 定计算机系统软件与硬件的分界面 研究计算机系统的外部特性,即程序员所看 到的计算机系统属性,程序员看到的计算机系统属性 数据表示:硬件直接认别和处理的数据类型 寻址技术:编址方式、寻址方式和定位方式 寄存器定义:寄存器定义、数量和使用规则 指令系统:指令的操作类型、格式

2、、排序等 存储系统:要求速度高、容量大、价格便宜 中断系统:中断类型、中断级别和响应方式 输入输出系统:数据交换方式、交换过程控制 机器工作状态:定义和切换方式,如内核态、执行态、管理态和用户态等,概念计算机组成,计算机系统的逻辑实现 设计功能部件:处理器,主存储器等 数据通路的宽度 各种操作对功能部件的共享程度 确定功能部件的并行度 设计缓冲和排队策略 设计控制机构 采用何种可靠性技术,概念汇编语言,用符号表示的机器语言,可包括宏构造,概念冯诺依曼计算机,特点: 存储程序、运算器为中心、集中控制 存储器是字长固定的、顺序线性编址的一维结构,每个地址是唯一定义的 由指令形式的低级机器语言驱动

3、指令顺序执行,一般按照指令在存储器中存放的顺序执行,程序分支由转移指令实现 运算器为中心,输入输出设备与存储器之间的数据传送都途经运算器 集中控制,运算器、存储器、输入输出设备的操作以及它们之间的联系都由控制器控制,现代处理器运算速度计算公式: P Fz X IPC X TPC 其中: Fz为处理机的工作主频 IPC (Instruction Per Cycle) 指令级并行度 TPC (Threading Per Cycle) 线程级并行度 例如:主频3GHz,4核Pentium4处理器的最高运算速度为: P 3GHz X 4IPC X 4TPC = 48GIPS 即:每秒钟480亿次,概念

4、处理器运算速度,提高处理器性能的主要途径 (1) 提高主频Fz: 增加流水线级数,依靠计算机系统结构 缩短门电路延迟时间,依靠电子技术 (2) 提高指令级并行度IPC 依靠并行算法和计算机系统结构 (3) 提高线程级并行度TPC 依靠并行算法、程序设计和计算机系统结构,近期出现的新问题: 线延迟大于门延迟 漏电流很大 功耗惊人 近期提高计算机性能的途径 只能依靠并行算法、程序设计和计算机系统结构,不能指望电子技术 不仅对计算机系统结构,而且对并行算法、 软件技术和计算机应用技术都将产生深远的 影响,概念指令执行速度,平均速度,概念Amdahl定律,数的表示,Bits, Bytes, and I

5、ntegers,Sizes of C Objects (in Bytes) C Data TypeTypical 32-bitIntel IA32x86-64 char111 short222 int444 long448 long long888 float444 double888 long double810/1210/16 char *448 Or any other pointer,Bit-Level Operations in C Operations int y = bar(); unsigned ux = x; unsigned uy = y;,Initialization,F

6、loating Point,Representation Bits to right of “binary point” represent fractional powers of 2 Represents rational number:,1,2,4,2i1,2i, , ,1/2,1/4,1/8,2j,Numerical Form 1s M 2E Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range 1.0,2.0). E

7、xponent E weights value by power of two Encoding MSB is sign bit exp field encodes E frac field encodes M Sizes Single precision: 8 exp bits, 23 frac bits Double precision: 11 exp bits, 52 frac bits Extended precision: 15 exp bits, 63 frac bits,Floating Point Representation,s,exp,frac,“Normalized” N

8、umeric Values,Condition exp 0000 and exp 1111 Exponent coded as biased value E = Exp Bias Exp : unsigned value denoted by exp Bias : Bias value Single precision: 127 (Exp: 1254, E: -126127) Double precision: 1023 (Exp: 12046, E: -10221023) in general: Bias = 2e-1 - 1, where e is number of exponent b

9、its Significand coded with implied leading 1 M = 1.xxxx2 xxxx: bits of frac Minimum when 0000 (M = 1.0) Maximum when 1111 (M = 2.0 ) Get extra leading bit for “free”,Denormalized Values,Condition exp = 0000 Value Exponent value E = Bias + 1 Significand value M = 0.xxxx2 xxxx: bits of frac Cases exp

10、= 0000, frac = 0000 Note that have distinct values +0 and 0 exp = 0000, frac 0000,Condition exp = 1111 Cases exp = 1111, frac = 0000 Represents value(infinity) Operation that overflows Both positive and negative exp = 1111, frac 0000 Not-a-Number (NaN),s exp fracEValue 0 0000 000-60 0 0000 001-61/8*

11、1/64 = 1/512 0 0000 010-62/8*1/64 = 2/512 0 0000 110-66/8*1/64 = 6/512 0 0000 111-67/8*1/64 = 7/512 0 0001000-68/8*1/64 = 8/512 0 0001 001 -69/8*1/64 = 9/512 0 0110 110-114/8*1/2 = 14/16 0 0110 111-115/8*1/2 = 15/16 0 0111 00008/8*1 = 1 0 0111 00109/8*1 = 9/8 0 0111 010010/8*1 = 10/8 0 1110110714/8*

12、128 = 224 0 1110 111715/8*128 = 240 0 1111 000n/ainf,closest to zero,largest denorm,smallest norm,closest to 1 below,closest to 1 above,largest norm,Denormalized numbers,Normalized numbers,Round-To-Even,Binary Fractional Numbers “Even” when least significant bit is 0 Half way when bits to right of r

13、ounding position = 1002 Examples Round to nearest 1/4 (2 bits right of binary point) ValueBinaryRoundedActionRounded Value 2 3/3210.00011210.002(1/2up)2 1/4 2 7/810.11100211.002(1/2up)3 2 5/810.10100210.102(1/2down)2 1/2,Floating Point in C,C Guarantees Two Levels floatsingle precision doubledouble

14、precision Conversions Casting between int, float, and double changes numeric values Double or float to int Truncates fractional part Like rounding toward zero Not defined when out of range or NaN Generally sets to Tmin or Tmax int to double Exact conversion, as long as int has 53 bit word size int t

15、o float Will round according to rounding mode,Floating Point Puzzles,For each of the following C expressions, either: Argue that it is true for all argument values Explain why not true,x = (int)(float) x x = (int)(double) x f = (float)(double) f d = (float) d f = -(-f); 2/3 = 2/3.0 d f-f -d d * d =

16、0.0 (d+f)-d = f,int x = ; float f = ; double d = ;,Assume neither d nor f is NaN,汇编与C语言,movl Operand Combinations,Cannot do memory-memory transfer with a single instruction,movl,Imm,Reg,Mem,Reg,Mem,Reg,Mem,Reg,Source,Dest,C Analog,Src,Dest,Indexed Addressing Modes,Most General Form D(Rb,Ri,S)MemRegR

17、b+S*RegRi+ D D: Constant “displacement” Rb: Base register: Any of 8 integer registers Ri:Index register: Any, except for %esp Unlikely youd use %ebp, either S: Scale: 1, 2, 4, or 8 Special Cases (Rb,Ri)MemRegRb+RegRi D(Rb,Ri)MemRegRb+RegRi+D (Rb,Ri,S)MemRegRb+S*RegRi,Address Computation Instruction,

18、leal Src,Dest Src is address mode expression Set Dest to address denoted by expression Uses Computing addresses without a memory reference E.g., translation of p = Computing arithmetic expressions of the form x + k*y k = 1, 2, 4, or 8.,%rax,%rdx,%rcx,%rbx,%rsi,%rdi,%rsp,%rbp,x86-64 General Purpose R

19、egisters,Extend existing registers. Add 8 new ones. Make %ebp/%rbp general purpose,%eax,%edx,%ecx,%ebx,%esi,%edi,%esp,%ebp,%r8,%r9,%r10,%r11,%r12,%r13,%r14,%r15,%r8d,%r9d,%r10d,%r11d,%r12d,%r13d,%r14d,%r15d,Swap in 64-bit Mode,Operands passed in registers First (xp) in %rdi, second (yp) in %rsi 64-b

20、it pointers No stack operations required 32-bit data Data held in registers %eax and %edx movl operation,void swap(int *xp, int *yp) int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; ,swap: movl(%rdi), %edx movl(%rsi), %eax movl%eax, (%rdi) movl%edx, (%rsi) retq,Reading Condition Codes,SetX Instructio

21、ns Set single byte based on combinations of condition codes,Conditional Branch Example,int absdiff( int x, int y) int result; if (x y) result = x-y; else result = y-x; return result; ,absdiff: pushl %ebp movl %esp, %ebp movl 8(%ebp), %edx movl 12(%ebp), %eax cmpl %eax, %edx jle .L7 subl %eax, %edx m

22、ovl %edx, %eax .L8: leave ret .L7: subl %edx, %eax jmp .L8,Body1,Set Up,Finish,Body2,pushl %ebp movl%esp, %ebp pushl%ebx movl8(%ebp), %ecx movl12(%ebp), %edx movl%ecx, %ebx subl%edx, %ebx movl%edx, %eax subl%ecx, %eax cmpl%edx, %ecx cmovg %ebx, %eax popl%ebx popl%ebp ret,int absdiff( int x, int y) i

23、nt result; if (x y) result = x-y; else result = y-x; return result; ,Gcc 4.3.4,New Conditional Branch Example,Implementing Loops,IA32 All loops translated into form based on “do-while” x86-64 Also make use of “jump to middle” Why the Difference IA32 compiler developed for machine where all operation

24、s costly x86-64 compiler developed for machine where unconditional branches incur (almost) no overhead,“For” “While” “Do-While”,for (Init; Test; Update ) Body,Init; while (Test ) Body Update ; ,Goto Version,Init; if (!Test) goto done; loop: Body Update ; if (Test) goto loop; done:,While Version,For

25、Version,Do-While Version,Init; if (!Test) goto done; do Body Update ; while (Test) done:,“For” “While” (Jump-to-Middle),for (Init; Test; Update ) Body,Init; while (Test ) Body Update ; ,Init; goto middle; loop: Body Update ; middle: if (Test) goto loop; done:,While Version,For Version,Goto Version,S

26、witch Statements,Implementation Options Series of conditionals Organize in tree structure Logarithmic performance Jump Table Lookup branch target Constant time Possible when cases are small integer constants GCC Picks one based on case structure,yoo,who,proc,Stack “Top”,IA32-Stack Frames,Contents Lo

27、cal variables Return information Temporary space Management Space allocated when enter procedure “Set-up” code Deallocated when return “Finish” code Pointers Stack pointer %esp indicates stack top Frame pointer %ebp indicates start of current frame,amI,IA32/Linux Stack Frame,Current Stack Frame (“To

28、p” to Bottom) Parameters for function about to call “Argument build” Local variables If cant keep in registers Saved register context Old frame pointer Caller Stack Frame Return address Pushed by call instruction Arguments for this call,Stack Pointer (%esp),Frame Pointer (%ebp),Return Addr,Saved Reg

29、isters + Local Variables,Argument Build,Old %ebp,Arguments,Caller Frame,IA32/Linux Register Usage,Integer Registers Two have special uses %ebp, %esp Three managed as callee-save %ebx, %esi, %edi Old values saved on stack prior to using Three managed as caller-save %eax, %edx, %ecx Do what you please

30、, but expect any callee to do so, as well Register %eax also stores returned value,%eax,%edx,%ecx,%ebx,%esi,%edi,%esp,%ebp,Caller-Save Temporaries,Callee-Save Temporaries,Special,%rax,%rbx,%rcx,%rdx,%rsi,%rdi,%rsp,%rbp,x86-64 Register Conventions,%r8,%r9,%r10,%r11,%r12,%r13,%r14,%r15,Return Value,Ca

31、llee Saved,Argument #4,Argument #3,Argument #2,Argument #1,Stack Pointer,Callee Saved,Argument #5,Argument #6,Callee Saved,Used for linking,C: Callee Saved,Callee Saved,Callee Saved,Callee Saved,x86-64 Registers,Arguments passed to functions via registers If more than 6 integral parameters, then pas

32、s rest on stack These registers can be used as caller-saved as well All References to Stack Frame via Stack Pointer Eliminates need to update %ebp Other Registers 6+1 callee saved 2 or 3 have special uses,x86-64 Locals in the Red Zone,Avoiding Stack Pointer Change Can hold all information within sma

33、ll window beyond stack pointer,/* Swap, using local array */ void swap_a(long *xp, long *yp) volatile long loc2; loc0 = *xp; loc1 = *yp; *xp = loc1; *yp = loc0; ,swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp

34、), %rax movq %rax, (%rsi) ret,rtn Ptr,unused,%rsp,8,loc1,loc0,16,24,x86-64 NonLeaf without Stack Frame,No values held while swap being invoked No callee save registers needed,long scount = 0; /* Swap ai ,swap_ele_se: movslq %esi,%rsi # Sign extend i leaq (%rdi,%rsi,8), %rdi # ret,x86-64 Call using J

35、ump,When swap executes ret, it will return from swap_ele Possible since swap is a “tail call”,long scount = 0; /* Swap ai ,swap_ele: movslq %esi,%rsi # Sign extend i leaq (%rdi,%rsi,8), %rdi # Array of data type T and length L Contiguously allocated region of L * sizeof(T) bytes Identifier A can be

36、used as a pointer to array element 0 Type T*,Viewing as Multidimensional Array,Declaration T ARC; 2D array of data type T R rows, C columns Type T element requires K bytes Array Size R * C * K bytes Arrangement Row-Major Ordering,int ARC;,4*R*C Bytes, ,Nested Array Row Access,Row Vectors Ai is array

37、 of C elements Each element of type T requires K bytes Starting address A + i * (C * K), ,A,int ARC;,A+i*C*4,A+(R-1)*C*4,Nested Array Element Access,Array Elements Aij is element of type T Address A + i * (C * K) + j * K = A + (i * C + j)* K, ,A i j,A i j,Ai, ,A,int ARC;,A+i*C*4,A+(R-1)*C*4,A+(i*C+j

38、)*4,Nested Array Element Access Code,Array Elements pghindexdig is int Address: pgh + 20*index + 4*dig IA32 Code Computes address pgh + 4*dig + 4*(index+4*index) movl performs memory reference,int get_pgh_digit (int index, int dig) return pghindexdig; ,# %ecx = dig # %eax = index leal 0(,%ecx,4),%ed

39、x# 4*dig leal (%eax,%eax,4),%eax# 5*index movl pgh(%edx,%eax,4),%eax# *(pgh + 4*dig + 20*index),Array Element Accesses,Similar C references Nested Array Element at Mempgh+20*index+4*dig,Different address computation Multi-Level Array Element at MemMemuniv+4*index+4*dig,int get_pgh_digit (int index,

40、int dig) return pghindexdig; ,int get_univ_digit (int index, int dig) return univindexdig; ,Using Nested Arrays,Strengths C compiler handles doubly subscripted arrays Generates very efficient code Avoids multiply in index computation Limitation Only works if have fixed array size,#define N 16 typede

41、f int fix_matrixNN;,/* Compute element i,k of fixed matrix product */ int fix_prod_ele (fix_matrix a, fix_matrix b, int i, int k) int j; int result = 0; for (j = 0; j N; j+) result += aij*bjk; return result; ,Dynamic Nested Arrays,Strength Can create matrix of arbitrary size Programming Must do index computation explicitly Performance Accessing single element costly Must do multiplication,int * new_var_matrix(int n) return (int *) calloc(sizeof(int), n*n); ,int var_ele (int *a, int i, int j, int n) return ai*n+j;

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论