高级体系结构课程试卷eng(WORD)附群主YY的答案.docx

上传人：清*** IP属地：河南上传时间：2020-01-09 格式：DOCX 页数：10 大小：290.43KB 积分：12 举报 版权申诉

已阅读5页，还剩5页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Zhejiang UniversityAdvanced Computer Architecture, Fall-Winter 2013Final Exam Student ID_Name_ Grade_Section One: You are required to choose the best answer from the four given choices marked A,B,C and D. Then mark your answer.(2 points * 10)1.Michael Flynn 1966 looked at the parallelism in the instruction and data streams called for by the instructions at the most constrained component of the multiprocessor, and placed all computers into one of four categories.Array processors and vector processors is the representatives of () while multi-processorssystem is the representative of (B).A. SISD,SIMD B.SIMD,MIMD C.MISD,MIMD D.MIMD,SIMD2.Forwarding can solve some data hazards,but not all potential data hazards can be handled by forwarding. Look at the sequence of instructions bellow, which stall cant be eliminated by forwarding.( B )A. DADD R1,R2,R3B. LD R1,0(R2) DSUB R4,R1,R5 DSUB R4,R1,R5C. DADD R1,R2,R3D. LD R4,0(R1) LD R4,0(R1) DADD R1,R2,R33. The relevance of the next followed instructions contains _ 、_ , and hazard which may occur in MIPS are _、_.CDADDIU R1,R3,# -8BNER1,R2,LOOPSUB R4,R4,#8A.RAW,WAR; B.Real Correlation , Reverse Correlation ;RAW,WAR RAW , transmissionC.Real Hazard , Control Hazard;D.RAW,WAW;RAW , transmission RAW,WAW4. Choose the right answer including all the right options.(B)Poor parallism among instructionsLimited contents of score board Limited Function Units and TypesRemained RAW & WAW HazardNot considering forwardingRemained RAW HazardA.B.C.D. 5. Tomasulos scheduling algorithm mainly holds two advantages, the first one is_ the logic of hazard checking , the second is eliminating the _ hazard.(B)A. uniformed , WAW and WARB.distributed , WAW and WARC. uniformed , WAW and RAWD. distributed , WAW and RAR6.2-bit Branch-Prediction Buffer can reduce the cost of branch. Suppose the initial value of 2-bit Branch-Prediction Buffer is zero. There is a loop program with 10 times. If the former 9 branches of this loop program is taken and the last one is not taken. The hitting rate of predicting of the 2-bit Branch-Prediction is ( ). If the first branch is taken and the transfer behavior is changed every other branch, the hitting rate of predicting of the 2-bit Branch-Prediction is ( ). (C)A. 90% ,50%B. 70% ,20%C. 70% ,50%D. 90% ,20%7.The purpose of Multiple Issue Processor is that issue more than one instructions per clock. The two basic multiple-issue technologies is ( ) and ( ). Among them ( ) issue a fixed number of instructions formatted as one large instruction. ( ) mainly use hardware to detect hazard.(B)A. multiprocessor, superscalar; superscalar,multiprocessorB. VLIW, superscalar; VLIW, superscalarC. superscalar, VLIW; superscalar, VLIWD. superscalar, VLIW; WLIW, superscalar 8.Choose the correct statement: C In symmetric (shared-memory) multiprocessors (SMPs) architectures, all processors have different latency from memory, even if the memory is organized into multiple banks. In DSM architectures, a memory reference can be made by any processor to any memory location. In Message-passing multiprocessors architectures, the same physical address on two different processors refers to two different locations in two different memories. NUMAs (nonuniform memory access) memory access time depends on the location of a data word in memory.A.B. C.D.9.There are two different programs T1 and T2 sharing the same data.T1 use cache-1 and T2 use cache-2. Suppose cache use write-back way to write back data. The initial data of memory and cache and the program is listed in the table bellow. The data in the memory is ( A ) when the program is executed.Program T1Program T2Cache-1Cache-2initial data of memoryprogramST X,1ST Y,10LD Y, R1ST Y,R1LD X,R2ST X,R2X=Y=X=X=Y=Y=X=0Y=5X=Y=T1 is executed.Cache-1 writes back XT2 is executedCache-2 writes back Xand YCache-1 writes back YA.X=1,Y=10,X=1,Y=5B. X=0,Y=10,X=1,Y=5C. X=1,Y=5,X=1,Y=10D. X=0,Y=5,X=1,Y=510.In Directory-Based Cache Coherence Protocols, each cache block may be in one of the following three states: shared, uncached, exclusive. When the block is in the shared state, which requests can occur ? (B )Read missWrite missData write backA.B. C.D.Section Two: Give a short answer to the following questions.(4 points * 5）1. Give a brief introduction of three types of pipeline hazards. Then point out the possible data hazards and approaches to solve the hazard.Structural hazards These are conflicts over hardware resources. Data hazards Instruction depends on result of prior computation which is not ready (computed or stored) yet Control hazards lbranch condition and the branch PC are not available in time to fetch an instruction on the next clock Approaches to solve the hazard.The simplest way to fix hazards is to stall the pipeline Structural hazardsOvercome by replicating hardware resources Multiple accesses to the register file; Multiple accesses to memory; some functional unit is not pipelined.;Not fully pipelined functional units Data hazardsDouble Bump”; PIPELINE STALL; ForwardingStructural hazardsFreeze or flush the pipeline: Penalty is fixed; Can not be reduced by software. Predict-not-taken (Predict-untaken): Treat every branch as not taken Predict-taken: Treat every branch as taken Delayed branch 2.Describe the theory of branch delay slot briefly and adjust the following codes according to it.ADD R1,R2,R3Delay slotIf R2 = 0 thenBranch delay slot: branch instruction; sequential successor1; sequential successor2; sequential successor n; branch target if takenThe instruction cycle after the branch is used for address calculation , 1 cycle delay necessary SOwe regard this as a free instruction cycle, and we just DO IT You (or your compiler) will need to adjust your code to put some useful work in that “slot”, since just putting in a NOP is wasteful 3.Describe the theory of Hardware-Based Speculation and function of ROB(Reorder Buffer) briefly.Then tell the functionality difference between ROB (Reorder Buffer) and reservation stations.Hardware-Based SpeculationSpeculating on the outcome of branches and executing the program as if the predictions were correct. Need to handle the situation where the speculation is incorrect.Key ideas: Dynamic branch prediction to choose which instruction to execute Speculation to allow the execution of instructions before the control dependences are resolved. Dynamic scheduling to deal with the scheduling of different combinations of basic blocks.Data flow execution: Operations execute as soon as their operands are availablefunction of ROBEliminate store buffer (Mem write is ordered in Reorder buffer)Register renaming is done via reorder buffer instead of reservation stationReservation only for storing opcodes and operands during the instruction flying between issue and execution.ROB hold the instruction results and supplyoperands in the interval between completion ofinstruction execution and instruction commit.RS(保留站)：换名功能以及暂存临时数据。一旦指令将临时结果放在保留站，则其它后续的指令都能立即在寄存器组中看见这个结果。ROB(REORDER BUFFER)：对移到顶部的指令进行判断ROB负责寄存器重命名、暂存执行后和交付前的结果。某指令将临时结果放在ROB后，其它指令并不能从寄存器组中看到这个结果,（也就是说，寄存器组的值并没有更新），而是等待该指令提交（commit）后，寄存器组才更新4.Tell the difference between Tomasulo algorithm and Scoreboard algorithm.5.Have a brife introduction of Paralism大神说以指令，数据，线程，请求四个方面分别叙述Section Three: Calculating the following questions.(10 points * 2）1) Have a brief introduction of Amdahls Law2) Assume the frequency of the FP( Float Point ) instructions is 25% , and its average CPI is 5.0, and the frequency of non-FP instructions is 2.33. FPSQR( Float Point square) instructions come up to 5%, and its average CPI is 20,we have two kinds of Optimization Method to optimize this situation below,1. Decrease the CPI of FPSQR to 22.Decrease the CPI of FP to 2.5try to calculate the Average CPI of these two different methods respectively, the results are supposed to keep two significant digits.3) Here we have 100 processor . In order to get the speed-up ratio at 50 , try to calculate the degree of parallelism of this community , the results are supposed to keep four significant digits.1) The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 2)CPIoriginal=i=1nCPIiICiInstruction count=525%+(2.3375%)=3.00CPIwith new FPSQR=3.00-5%20-2=2.10CPInew FP=2.3375%+2.52.5=2.37Speedup=3/2.1=1.433)50=1Fparallel100+1-FparallelFparallel=0.98992.The correlative parameters between commands are provided in the following table. Suppose We use a standard 5-level integer pipeline and these functional units are fully pipelined or copied. Try to analyze and calculate the following questions:(1) Without any scheduling, what the loop will execute? How many cycles it will take ?(2) Optimize the following codes using software pipelining compile method, to make it has the least hazards.(3) Compute How many cycles it will take after optimization.Former operationSuccessor operationClock DelayFP ALUFP ALU3FP ALUStore (double word)2Load (double word)FP ALU1Load (double word)Store (double word)0若题目是如下：LOOP：L.DF0，0(R1)ADD.DF4，F0，F2S.DF4，0(R1)DAAUIR1，R1，#-8BNER1，R2，LOOP（1）LOOP：L.DF0，0(R1) 1Stall2ADD.DF4，F0，F23该循环执行一次迭代需10个时钟Stall、stall 45S.DF4，0(R1) 6DAAUIR1，R1，#-87Stall8BNER1，R2，LOOP 9Stall10（2）L.DF0，0(R1) 1DAAUIR1，R1，#-82ADD.DF4，F0，F23L.DF0，0(R1) 4DAAUIR1，R1，#-85LOOP：S.DF4，16(R1)1ADD.DF4，F0，F22L.DF0，0(R1)3BNER1，R2，LOOP4DAAUIR1，R1，#-85ADD.DF4，F0，F21S.DF4，16(R1)2Stall3S.DF4，8(R1)4（3）优化后该循环执行一次迭代需 59/n 个时钟Section Four: Analyzing the following questions.1.（15 points）Branch predictors that use the behavior of other branches to make a prediction are called correlating predictors or two-level predictors. In the general case, an (m,n) predictor uses the behavior of the last m branches to choose from 2m branch predictors, each of which is an n-bit predictor for a single branch. Now there is an (2,2) correlating predictor with 8K bits.(1) How many entries are in this predictor?(2) Draw the framwork of hardware of this predictor. (3) Suppose the initial value of global transfer cache and every predictor is zero. The initial value of a is 1. Whats the hitting rate after run the program below 5 times by using this (2,2) correlating predictor?RegR1=a;BNEZR1,L1;DADDR1,R0,#1;L1:DADD R3,R1,#-1;BNEZR3,L2;DADD R1,R0,#2L2：if(a=0)a=1;if(a=1)a=2;(1) entries=8K/(4*2)=1K(2) 类似的画画吧(3) 如果看成循环30%，如果不看成循环70%；详情或疑义找群主2.（10 points）Assume the latencies for the floating-point functional units as follows: add is 2 clock cycles, multiply is 6 clock cycles, and divide is 12 clock cycles.Using the code segment below, show what the status tables look like when the DIV.D is ready to go to commit.L.D F6, 32(R2)L.DF2, 44(R3)MUL.DF0, F2, F4SUB.DF8, F2, F6DIV.DF10, F0, F6ADD.D F6, F8, F2Reorder bufferEntryBusyInstructionStateDestinationValue1NOL.D F6, 32(R2)

人人文库> 全部分类> 教育资料 > 课设设计

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

高级体系结构课程试卷eng(WORD)附群主YY的答案.docx

文档简介

温馨提示

最新文档

评论

高级体系结构课程试卷eng(WORD)附群主YY的答案.docx

文档简介

温馨提示

最新文档

评论

相关文档