vhdl_project.doc_第1页
vhdl_project.doc_第2页
vhdl_project.doc_第3页
vhdl_project.doc_第4页
vhdl_project.doc_第5页
已阅读5页,还剩23页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

VHDL PROJECT 2Discrete Cosine TransformSubmitted byNinad ThakoorIntroduction:DCT is used in large number of image and signal processing applications including1. JPEG standard2. MPEG standards3. H.261 and H.263 video conferencing standards4. DVD, VCD, SVCD, HDTV etc.The goal of this project is to implement one-dimensional 8-point Discrete Cosine Transform, which can be used as a building block for various signal and image processing systems. Theory of operation:One dimensional discrete cosine transform is defined as,One dimensional discrete cosine transform is defined as,Where in both the cases,As it can be seen from above equations for calculation of each DCT point wee need N multiplications and N-1 additions. For 8 points the total number of multiplications becomes 64 and additions are 56. Amount of time and resources required for this calculation will be really high.Above transform can be written in matrix form asY=AXSymmetry of the above matrix can be used to simplify the above matrix equation as follows, After doing some simple mathematical manipulations the number of multiplies reduce to 22 and number additions reduce to 28.Values of coefficients a to g in the above matrix are defined to be,a0.4904b0.4619c0.4157d0.3536e0.2778f0.1913g0.0975Hardware design considerations:1. Floating point arithmetic implementation:Floating point arithmetic in its traditional format (Like IEEE 754) is known to be very slow and complex. As the amount of precision required for our application is not very high hence we choose format of the numbers as 2complement, fractional. Choosing this format allows us to multiply two floating-point numbers in the same way we multiply the integers. Also if the two numbers have same precision then they can be added like integers.2. Maintaining precision and word length:While implementing any floating-point algorithm it is desirable to have a fixed word length and precision. But Due to repeated multiplications along the execution path of the algorithm precision and word length of the floating-point numbers keeps on changing. Also the number of multiplications and additions are different along different execution paths. Interestingly though for the chosen algorithm the final precision and word length of all the numbers is same. Still it is important to track the decimal point along the execution path.3. Synchronous Vs. Combinational logic for the computation: As mentioned in above point the sequence of the operations (Additions and Multiplications) in each path is not the same thus making amount of time required for various paths in same state of computation, different. Thus fixing a clock frequency is a tough and sub optimal task. Also tracking the various word lengths and precisions can be complicated. Typical flow graph for a synchronous implementation looks as,4.Design of Input and output interface: Although the algorithm takes all the 8 data points in parallel and operates on them to give output parallel packing 64-bit input and 64-bit output in a FPGA chip is not possible. An input interface, which stores data as it arrives for processing byte by byte, needs to be built. Similar output interface is also required.5.Multi cycle implementation:Calculation of DCT is more time consuming than input and output operations. Allotting all the operations same time period will slow down the implementation significantly. Hence DCT operation is implemented as multi cycle operation.Hardware design:Design of the hardware has two separate parts,1. Design of DCT calculator2. Design of Input and output controllerDesign of DCT calculator:Inputs X0, X2, X7 are selected to be 8 bits wide (Precision 8.0) as in most of the raw image data and coefficients a, b, f are selected to be 12 bits wide (Precision 0.12). Pseudo code of Simplified DCT calculation algorithm along with the word lengths and precisions is as follows, S0=X0+X7 9 bits (9.0) S1=X1+X6 9 bits (9.0) S2=X2+X5 9 bits (9.0) S3=X3+X4 9 bits (9.0) S4=X0-X7 9 bits (9.0) S5=X1-X6 9 bits (9.0) S6=X2-X5 9 bits (9.0) S7=X3-X4 9 bits (9.0) S8=S0+S310 bits (10.0) S9=S1+S210 bits (10.0) S10=S8+S911 bits (11.0) Y0=d*S1023 bits (11.12) S11=S0-S310 bits (10.0) S12=S1-S210 bits (10.0) M1=b*S1122 bits (10.12) M2=c*S1222 bits (10.12) Y2=M1+M223 bits (11.12) S14=S8-S911 bits (11.0) Y4=d*S1423 bits (11.12) M4=f*S1122 bits (10.12) M5=f*S1222 bits (10.12) Y6=M4-M523 bits (11.12) M6=a*S421 bits (10.11) M7=c*S521 bits (10.11) M8=e*S621 bits (10.11) M9=g*S721 bits (10.11) S16=M6+M722 bits (13.9) S17=M8+M922 bits (13.9) Y1=S16+S1723 bits (14.9) M10=c*S421 bits (10.11) M11=g*S521 bits (10.11) M12=a*S621 bits (10.11) M13=e*S721 bits (10.11) S19=M10-M1122 bits (13.9) S20=M12+M1322 bits (13.9) Y3=S19-S2023 bits (14.9) M14=e*S421 bits (10.11) M15=a*S521 bits (10.11) M16=g*S621 bits (10.11) M17=c*S721 bits (10.11) S22=M14-M1522 bits (13.9) S23=M16+M1722 bits (13.9) Y5=S22+S2323 bits (14.9) M18=a*S421 bits (10.11) M19=c*S521 bits (10.11) M20=e*S621 bits (10.11) M21=g*S721 bits (10.11) S25=M18+M2022 bits (13.9) S26=M19+M2122 bits (13.9) Y7=S25-S2623 bits (14.9)Design of Input and output controller:Input and output controller is a Finite Sate Machine. It has following four states1. Idle: State after reset and while waiting for new data for processing.2. DataIn: Data input state where data is read from input and stored for processing.3. Processing: Actual DCT calculation takes place.4. DataOut: Once DCT is complete in this state processed data is written to the output.ReSeTData-out-count=8/ ReadyForData=1IDLENewData=1/ ReaDY=0DATAOUT DATAINProcessing-count=4/ ReadDY=1 Data-in-count=8/ ReadyForData=0PROCESSINGData-in-count, Processing-count and Data-out-count keep track number of clock cycles for which the FSM is in that particular state. Functional description:23DIN DOUTNDRST RDYCLK RFD8 DIN (Data input):This is 8-bit wide data bus, which provides the data stream. The module will read data if RDY is high and ND is also high.ND (New data):When this input signal is high it indicates that valid data is available at the input DIN. If RFD is high then the module reads this data.RFD (ready For Data):This signal indicates whether the module is ready to receive data. Module reads the input data only if the signal is high and ND is also high.DOUT (Data Output):This output port provides the results of 1-D DCT. When control signal RDY is high, the DOUT is valid.RDY (Ready):This signal indicates that data at DOUT port is valid.CLK (Clock):This clock signal is used to synchronize the module and data input output operations.RST (Reset):Reset allows user to restart the 1-D DCT process.VHDL CODE:1. DCT calculator:library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_arith.all;use ieee.std_logic_signed.all;entity proj_dct isport(x0,x1,x2,x3,x4,x5,x6,x7: in std_logic_vector(7 downto 0); y0,y1,y2,y3,y4,y5,y6,y7: out std_logic_vector(22 downto 0);end;architecture RTL of proj_dct issignal s0,s1,s2,s3,s4,s5,s6,s7: std_logic_vector(8 downto 0);signal s8,s9,s11,s12: std_logic_vector(9 downto 0);signal s10,s14: std_logic_vector(10 downto 0);signal m6,m7,m8,m9,m10,m11,m12,m13,m14,m15,m16,m17,m18,m19,m20,m21: std_logic_vector(20 downto 0);signal m1,m2,m4,m5,s16,s17,s19,s20,s22,s23,s25,s26: std_logic_vector(21 downto 0);constant coeff_a:std_logic_vector(11 downto 0):=x7D9;constant coeff_b:std_logic_vector(11 downto 0):=x764;constant coeff_c:std_logic_vector(11 downto 0):=x6A7;constant coeff_d:std_logic_vector(11 downto 0):=x5A8;constant coeff_e:std_logic_vector(11 downto 0):=x472;constant coeff_f:std_logic_vector(11 downto 0):=x310;constant coeff_g:std_logic_vector(11 downto 0):=x190;begin-Calculates sums and differences s0=sxt(x0,9)+sxt(x7,9);s1=sxt(x1,9)+sxt(x6,9);s2=sxt(x2,9)+sxt(x5,9);s3=sxt(x3,9)+sxt(x4,9);s4=sxt(x0,9)-sxt(x7,9);s5=sxt(x1,9)-sxt(x6,9);s6=sxt(x2,9)-sxt(x5,9);s7=sxt(x3,9)-sxt(x4,9);-Calculates dct point 0s8=sxt(s0,10)+sxt(s3,10);s9=sxt(s1,10)+sxt(s2,10);s10=sxt(s8,11)+sxt(s9,11);y0=s10*coeff_d;-Calculates dct point 2s11=sxt(s0,10)-sxt(s3,10);s12=sxt(s1,10)-sxt(s2,10);m1=s11*coeff_b;m2=s12*coeff_f;y2=sxt(m1,23)+sxt(m2,23);-Calculates dct point 4s14=sxt(s8,11)-sxt(s9,11);y4=s14*coeff_d;-Calculates dct point 6m4=s11*coeff_f;m5=s12*coeff_b;y6=sxt(m4,23)-sxt(m5,23);-Calculates dct point 1m6=coeff_a*s4;m7=coeff_c*s5;m8=coeff_e*s6;m9=coeff_g*s7;s16=sxt(m6,22)+sxt(m7,22);s17=sxt(m8,22)+sxt(m9,22);y1=sxt(s16,23)+sxt(s17,23);-Calculates dct point 3m10=coeff_c*s4;m11=coeff_g*s5;m12=coeff_a*s6;m13=coeff_e*s7;s19=sxt(m10,22)-sxt(m11,22);s20=sxt(m12,22)+sxt(m13,22);y3=sxt(s19,23)-sxt(s20,23);-Calculates dct point 5m14=coeff_e*s4;m15=coeff_a*s5;m16=coeff_g*s6;m17=coeff_c*s7;s22=sxt(m14,22)-sxt(m15,22);s23=sxt(m16,22)+sxt(m17,22);y5=sxt(s22,23)+sxt(s23,23);-Calculates dct point 7m18=coeff_g*s4;m19=coeff_e*s5;m20=coeff_c*s6;m21=coeff_a*s7;s25=sxt(m18,22)+sxt(m20,22);s26=sxt(m19,22)+sxt(m21,22);y7=sxt(s25,23)-sxt(s26,23);end;2. Input Output Controller: library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_signed.all;entity control_dct isport ( DIN: in std_logic_vector(7 downto 0); ND,RST,CLK: in std_logic; DOUT: out std_logic_vector(22 downto 0); RDY,RFD: out std_logic);end;architecture RTL of control_dct istype statetype is (idle,datain,processing,dataout);signal state:statetype;signal x0,x1,x2,x3,x4,x5,x6,x7: std_logic_vector(7 downto 0);signal y0,y1,y2,y3,y4,y5,y6,y7: std_logic_vector(22 downto 0);signal ix0,ix1,ix2,ix3,ix4,ix5,ix6,ix7: std_logic_vector(7 downto 0);signal iy0,iy1,iy2,iy3,iy4,iy5,iy6,iy7: std_logic_vector(22 downto 0);signal IRDY,IRFD:std_logic;signal data_in_count: integer range 0 to 8;signal data_out_count: integer range 0 to 8;signal processing_count: integer range 0 to 3;begin chip:entity j_dct port map(ix0,ix1,ix2,ix3,ix4,ix5,ix6,ix7,iy0,iy1,iy2,iy3,iy4,iy5,iy6,iy7); process(RST,CLK) begin if RST=1 then state=idle; data_in_count=0; data_out_count=0; processing_count=0; IRFD=1; IRDY=0; x00); x10); x20); x30); x40); x50); x60); x70); ix00); ix10); ix20); ix30); ix40); ix50); ix60); ix70); y00); y10); y20); y30); y40); y50); y60); y70); DOUT0); elsif rising_edge(CLK) then if(state=datain and data_in_count x0 x1 x2 x3 x4 x5 x6 x7=DIN; end case; data_in_count=data_in_count+1; elsif(state=datain and data_in_count=8)then ix0=x0;ix1=x1;ix2=x2;ix3=x3;ix4=x4;ix5=x5;ix6=x6;ix7=x7; data_in_count=0; IRFD=0; state=processing; elsif(state=processing and processing_count3)then processing_count=processing_count+1; elsif(state=processing and processing_count=3)theny0=iy0;y1=iy1;y2=iy2;y3=iy3;y4=iy4;y5=iy5;y6=iy6;y7=iy7; processing_count=0; IRDY=1; state=dataout; elsif(state=dataout and data_out_count DOUT DOUT DOUT DOUT DOUT DOUT DOUT DOUT=y7; end case; data_out_count=data_out_count+1; elsif(state=dataout and data_out_count=8)then data_out_count=0; IRDY=0; state=idle; IRFD=1; elsif(state=idle and ND=1)then state=datain; end if; end if; end process; RFD=IRFD; RDY return 0.0; when others = return 1.0;end case;end real_bit; signal DIN:std_logic_vector(7 downto 0);signal ND,RST:std_logic;signal CLK:std_logic:=0;signal DOUT:std_logic_vector(22 downto 0);signal RDY,RFD:std_logic;signal dct: std_logic_vector(22 downto 0);signal dct_sign: std_logic;signal dct_real:real; begincontrol_chip:entity work.control_dct port map (DIN,ND,RST,CLK,DOUT,RDY,RFD);RST=1 after 5ns,0 after 7ns;CLK=not CLK after 10ns;ND=1 after 15ns,0 after 300ns; DIN=xE9 after 40ns, x0A after 60ns, x64 after 80ns, xEF after 100ns, xC7 after 120ns, x2B after 140ns, x5B after 160ns, x07 after 180ns;dct=DOUT when DOUT(22)=0else (not DOUT) +1;dct_sign=DOUT(22);dct_real=(real_bit(dct(22)*1024.0+real_bit(dct(21)*512.0+real_bit(dct(20)*256.0+real_bit(dct(19)*128.0+real_bit(dct(18)*64.0+real_bit(dct(17)*32.0+real_bit(dct(16)*16.0+real_bit(dct(15)*8.0+real_bit(dct(14)*4.0+real_bit(dct(13)*2.0+real_bit(dct(12)*1.0+real_bit(dct(11)*0.5+real_bit(dct(10)*0.25+real_bit(dct(9)*0.125+real_bit(dct(8)*0.0625+real_bit(dct(7)*0.03125+real_bit(dct(6)*0.015625+real_bit(dct(5)*0.0078125+real_bit(dct(4)*0.00390625+real_bit(dct(3)*0.001953125+real_bit(dct(2)*0.0009765625+real_bit(dct(1)*0.00048828125+real_bit(dct(0)*0.000244140625)when dct_sign=0 else-1.0*(real_bit(dct(22)*1024.0+real_bit(dct(21)*512.0+real_bit(dct(20)*256.0+real_bit(dct(19)*128.0+real_bit(dct(18)*64.0+real_bit(dct(17)*32.0+real_bit(dct(16)*16.0+real_bit(dct(15)*8.0+real_bit(dct(14)*4.0+real_bit(dct(13)*2.0+real_bit(dct(12)*1.0+real_bit(dct(11)*0.5+real_bit(dct(10)*0.25+real_bit(dct(9)*0.125+real_bit(dct(8)*0.0625+real_bit(dct(7)*0.03125+real_bit(dct(6)*0.015625+real_bit(dct(5)*0.0078125+real_bit(dct(4)*0.00390625+real_bit(dct(3)*0.001953125+real_bit(dct(2)*0.0009765625+real_bit(dct(1)*0.00048828125+real_bit(dct(0)*0.000244140625);end;Synthesis results (Synplicity):Synthesis settings: Timing Constraints:Synthesis Log:RTL view (I/O controller): Schematic 2:RTL view (I/O controller): Schematic 3:RTL View (DCT calculator): Schematic 1:RTL view (DCT Calculator): Schematic 2: RTL view (DCT Calculator): Schematic 3: Xilinx Tools:HDL Synthesis ReportMacro Statistics# Registers : 31 1-bit register : 2 4-bit register : 3 2-bit register : 1 8-bit register : 16 23-bit register : 9# Multiplexers : 3 2-to-1

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论