新编文档-Multicycleconclusion-coursescswashingtonedu结论coursescswashingtonedu多旋回-精品文档_第1页
新编文档-Multicycleconclusion-coursescswashingtonedu结论coursescswashingtonedu多旋回-精品文档_第2页
新编文档-Multicycleconclusion-coursescswashingtonedu结论coursescswashingtonedu多旋回-精品文档_第3页
免费预览已结束,剩余20页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Multicycle conclusion My office hours, move to Mon or Wed? Plan: Pipelining this and next week, maybe performance analysis fbday:一 Microprogramming一 Extending the cycle datapath一 Multi-cycle performancePCWritePClorD0MemReadAddressMemoryWrite Mem data DataMemWriteThe multicycle datapathALUSrcAA- 0Reg

2、DstRegWriteAIRWriteEr?B4MemToReg31-2625-2120-1615-11 15-0In struction registerMemory data registersignextendS ll teft 2Read register 1Read register 2Write registerRead data 1Read data 2Ze RegistersM-012ALU OutZeroResult- 1Y PCSource3YALUSrcBFinite-state machine for the control unitRegister fetch and

3、Op = BEQInstruction fetch and PC incrementBranchcompletionALUSICA = 1ALUSrcB = 00orD = 0Mem Read = 1IR'Vnte = 1ALUSrcA = 0ALUSrcB = 0PCSource = 0PC'/nte = 1R-type 1ALUSrcA = 11 RegV/rite = 1ALUSrcB = 00*RegDst = 1ALUOp = func / MemToReg = 0 /ALUOp = 010Implementing the FSM This can be transl

4、ated into a state table; here are the first two states.Current StateInput (Op)Next StateOutput (Control signaPC WritelorDMem ReadMem WriteIR WriteReg DstMemToRegReg WriteALU SrcAALU SrcBALU opPC SourceInstr FetchXReg Fetch10101XX00010100Reg FetchBEQBranch com pl0X000XX0011010XReg FetchR-typeR-type e

5、xecute0X000XX0011010XReg FetchLW/S VCompute eff addr0X000XX0011010X You can implement this the hard way.一 Represent the current state using flip-flops or a register.一 Find equations for the next state and (control signal) outputs in terms of the current state and input (instruction word). Or you can

6、 use the easy way.一 Stick the whole state table into a memory, like a ROM一 This would be much easier, since you don't have to derive equations.Pitfalls of state machines As we just saw, we could translate this state diagram into a state table, and then make a logic circuit or stick it into a ROM

7、. This works pretty well for our small example, but designing a fin让e-state machine for a larger instruction set is much harder.一 There could be many states in the machine For example, some MIPS instructions need 20 stages to execute in some implementations-each of which would be represented by a se

8、parate state.一 There could be many paths in the machine For example, the DEC VAX from 1978 had nearly 300 opcodes that's a lot of branching!一 There could be many outputs For instance, the Pentium Pro's integer datapath has 120 control signals, and the floatingpoint datapath has 285 control s

9、ignals一 Implementing and maintaining the control unit for processors like these would be a nightmare Ybu'd have to work w让h large Boolean equations or a huge state tableMotivation for microprogramming Think of the control unit's state diagram as a little program一 Each state represents a “com

10、mand," or a set of control signals that tells the datapath what to do.一 Several commands are executed sequentially.一 “Branches" may be taken depending on the instruction opcode一 The state machine “loops" by returning to the initial state. Why don't we invent a special language for

11、 making the control unit?一 We could devise a more readable, higher-level notation rather than dealing directly with binary control signals and state transitions一 We would design control units by writing “programs” in this language.一 We will depend on a hardware or software translator to convert our

12、programs into a circuit for the control unit.A good notation is very useful Instead of specifying the exact binary values for each control signal, we will define a symbolic notation that's easier to work with. As a simple example, we might replace ALUSrcB = 01 with ALUSrcB = 4 We can also create

13、 symbols that combine several control signals together. Instead oflorD = 0Mem Read = 1IRWrite = 1it would be nicer to just say something likeRead PCMicroi nstructi onsLabelALU controlSrc1Src2Register con trolMemoryPCWrite con trolNext For the MIPS multicycle we could define microinstructions with ei

14、ght fields.一 These fields will be filled in symbolically, instead of in binary.一 They determine all the control signals for the datapath There are only 8 fields because some of them specify more than one of the 12 actual control signals一 A microinstruction corresponds to one execution stage, or one

15、cycle You can see that in each microinstruction, we can do something with the ALU, register file, memory, and program counter unitsThe first stage, the microprogramming way Below are two lines of microcode to implement the first two multicycle execution stages, instruction fetch and register fetch T

16、he first line, labelled Fetch, involves several actions一 Read from memory address PC.一 Use the ALU to compute PC + 4, and store it back in the PC.一 Continue on to the next sequential microinstructionLabelALU controlSrc1Src2Register controlMemoryPCWrite con trolNextFetchAddPC4Read PCALUSeqAddPCExtshi

17、ftReadDispatch 1LabelALU con trolSrc1Src2Register controlMemoryPCWrite con trolNextBEQ1SubABALU-ZeroFetch Control would transfer to this microinstruction 讦 the opcode was “beq"一 Compute A-B, to set the ALU's Zero bit if A=B.一 Update PC with ALUOut (which contains the branch target from the

18、previous cycle) if Zero is set一 The beq is completed, so fetch the next instruction The 1 in the label BEQ1 reminds us that we came here via the first branch point (“dispatch table 1n), from the second execution stageLabelALU con trolSrc1Src2Register controlMemoryPCWrite con trolNextRtypelfuncABSeqW

19、rite ALUFetch What if the opcode indicates an R-type instruction?一 The first cycle here performs an operation on registers A a nd B, based on the MIPS instruction's func field一 The next stage writes the ALU output to register “rd” from the MIPS instruction word. We can then go back to the Fetch

20、microinstruction, to fetch and execute the next MIPS instruct!on.LabelALU con trolSrc1Src2Register con trolMemoryPCWrite con trolNextMem1AddAExtendDispatch 2SW2Write ALUFetchLV/2Read ALUSeqWrite MDRFetch For both sw and Iw instructions, we should first compute the effective memory address, A + sign-

21、extend(IR150) Another dispatch or branch distinguishes between stores and loads一 For sw, we store data (from B) to the effective memory address 一 For Iw we copy data from the effective memory address to register rt. In either case, we continue on to Fetch after we're done.Microprogramming vs. pr

22、ogramming Microinstructions correspond to control signals一 They describe what is done in a single clock cycle一 These are the most basic operations available in a processor. Microprograms implement higher-level MIPS instructions一 MIPS assembly language instructions are comparatively complex, each pos

23、sibly requiring multiple clock cycles to execute一 But each complex MIPS instruct!on can be implemented with several simpler microinstructionsSimilarities with assembly language Microcode is intended to make control unit design easier.一 We defined symbols like Read PC to replace binary control signal

24、s.一 A translator can convert microinstructions into a real control unit一 The translation is straightforward, because each microinstruction corresponds to one set of control values This sounds similar to MIPS assembly language!一 We use mnemonics like Iw instead of binary opcodes like 100011 一 MIPS pr

25、ograms must be assembled to produce real machine code一 Each MIPS instruct!on corresponds to a 32bit instruction wordManaging complexity It looks like all we've done is devise a new notation that makes it easier to specify control signals That's exactly right! It's all about man aging com

26、plexity.一 Control units are probably the most challenging part of CPU design一 Large instruction sets require large state machines with many states, branches and outputs一 Control units for multicycle processors are difficult to create and maintain. Applying programming ideas to hardware design is a u

27、seful technique、Situations when microprogramming is bad One disadvantage of microprograms is that looking up control signals in a ROM can be slower than generating them from simplified circuits Sometimes complex instructions implemented in hardware are slower than equivalent assembly programs writte

28、n using simpler instructons一 Complex instructions are usually very general, so they can be used more often But this also means they can't be optimized for specific opera nds or situations 一 Some microprograms just aren't written very efficiently. But since they're built into the CPU, peo

29、ple are stuck with them (at least until the next processor upgrade)How microcode is used today Modern CISC processors (like x86) use a combination of hardwired logic and microcode to balanee design effort with performance一 Control for many simple instruct!ons can be implemented in hardwired which ca

30、n be faster than reading a microcode ROM.一 Less-used or very complex instructions are microprogrammed to make the design easier and more flexible. In this way, designers observe the “first law of performance,一 Make the comm on case fast!The single-cycle datapath; what is the cycle time?Performs nee

31、of a multicycle impleme ntati on Let's assume the following delays for the major functional units.PCAddressMemoryWrite Mem data DataAEr?B431-2625-2120-1615-11 15-0Read register 1Read data 1Read register 2Write registerrite RegistersRead data 2In structionregisterMemorydataregistersignextendShife

32、ft 2-0M0123ALU OutZeroResultComparing cycle times The clock period has to be long enough to allow all of the required work to complete within the cycle In the single-cycle datapath, the urequired work" was just the complete execution of any instruction.一 The longest instruction, Iw, requires s

33、(3 + 2 + 3 + 3 + 2).一 So the clock cycle time has to be 13ns, for a 77MHz clock rate For the multicycle datapath, the “required work" is only a single stage一 The longest delay is 3ns, for both the ALU and the memory.一 So our cycle time has to be 3ns, or a clock rate of 333MHz一 The register file

34、 needs only 2ns, but it must wait an extra 1ns to stay synchronized with the other functional units. The single-cycle cycle 廿me is limited by the slowest instruction, whereas the multicycle cycle time is limited by the slowest functional unit.Comparing instruction execution times In the single-cycle

35、 datapath, each instruction needs an entire clock cycle, or 13ns, to execute With the multicycle CPU, different instructions need different numbers of clock cycles, and hence different amounts of time一 A branch needs 3 cycles, or 3 x 3ns = 9ns一 Arithmetic and sw instructions each require 4 cycles, o

36、r 12ns一 Finally, a Iw takes 5 stages, or 15ns We can make some observations about performance already.一 Loads take longer with this multicycle implementation, while all other instructions are faster than before一 So if our program doesn't have too many loads, then we should see an in crease in pe

37、rformance The gcc example Let's assume the gcc instruct!on mix.Instructio nFrequencyArithmetic48%Loads22%Stores11%Branches19% In a single-cycle datapath, all instructions take 13ns to execute The average execution time for an instruction on the multicycle processor works out to 12.09ns(48% x 12n

38、s) + (22% x 15ns) + (11% x 12ns) + (19% x 9ns) = 12.09ns The multicycle implementation is faster in this case, but not by much The speedup here is only 7.5%13ns / 12.09ns = 1.075This CPU is too simple Our example instruction set is too simple to see large gains一 All of our instructions need about the same number of cycles (35)一 The benefits would be much g

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论