Vivado从此开始(第2版)课件 21-Design-Analysis-After-Synthesis-PartII- 25-UltraFast-Design(3)-RTL-Coding_第1页
Vivado从此开始(第2版)课件 21-Design-Analysis-After-Synthesis-PartII- 25-UltraFast-Design(3)-RTL-Coding_第2页
Vivado从此开始(第2版)课件 21-Design-Analysis-After-Synthesis-PartII- 25-UltraFast-Design(3)-RTL-Coding_第3页
Vivado从此开始(第2版)课件 21-Design-Analysis-After-Synthesis-PartII- 25-UltraFast-Design(3)-RTL-Coding_第4页
Vivado从此开始(第2版)课件 21-Design-Analysis-After-Synthesis-PartII- 25-UltraFast-Design(3)-RTL-Coding_第5页
已阅读5页,还剩56页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Lauren

GaoDesignAnalysisAfterSynthesisPart

IIWorkingwith

Timingget_timing_path:GetstimingpathobjectsthatmeetthespecifiedcriteriaCreatecustomreportingand

analysisreturnstimingpathobjectswhichcanbequeriedforproperties,orpassedtootherTclcommandsfor

processingreport_timing:performstiminganalysisonthespecifiedtimingpathsofthecurrentSynthesizedorImplemented

Designreturnsafileora

stringreport_timing_summary,report_exception,

reset_timingsetpaths[get_timing_paths-groupclk_tx_clk_core_1-max_paths100]report_timing-of_objects

$paths#Whichistheequivalent

of:report_timing-groupclk_tx_clk_core_1-max_paths

100get_timing_paths[-fromargs][-rise_fromargs][-fall_fromargs][-toargs][-rise_toargs][-fall_toargs][-throughargs][-rise_throughargs][-fall_throughargs][-delay_typearg][-setup][-hold][-max_pathsarg][-nworstarg][-unique_pins][-slack_lesser_thanarg][-slack_greater_thanarg][-groupargs][-no_report_unconstrained][-user_ignored][-sort_byarg][-filterarg][-regexp][-nocase][-match_stylearg][-quiet]

[-verbose]–-from/-to:ports,cells,pins,clock

object-through:pins,cells,

nets-delay_type:max==-setup-delay_type:min==

-hold-slack_lesser_than:showpathwithexpected

slack-max_path-nworst-unique_pinsget_timing_pathreport_timinghasthesimilaroptionswiththistclcommandTiming

PathTimingpathnetpinportsetmystart[get_cells{i_firctrl/raddrcoe_i_reg[1]}]setmyend[get_cells

{i_firctrl/raddrcoe_i_reg[3]}]setmypath[get_timing_path–from$mystart–to$myend

-setup]{i_firctrl/raddrcoe_i_reg[1]/C-->

i_firctrl/raddrcoe_i_reg[3]/D}setmynets[get_nets-of

$mypath]setmypins[get_pins-of

$mypath]PropertiesofTiming

PathfilterMulti-Corner

ConfigurationSlow:low

voltage,high

tempFast:lowtemp,high

voltagesetupcheckswill

failatslowprocesscorner,andholdchecksatfastDemoLauren

GaoUltraFast

DesignBasic

IntroductionDocumentedDesignMethodology

toImproveDesigner

ProductivityMethodology

RecommendationsBest

practicesChecklistCustomer

BenefitsFastertimetomarketBetterQoR&

runtimesLesstimewiththeirfavorite

FAEPage

2Whatis

UltraFast?Thesmarterway

for:PCB

planningHDL

CodingDesign

ClosureXDC

ConstraintsDesign

AnalysisTiming

ClosureforVivadoDesign

Suitev1.0CreatedbyFAEs/SAEsfromalloverthe

worldCollectionofbest

practicesThingsto

avoidKnowledgeisprovidedintheformofUG949,Checklistsand

scripts.“It’sgoodtolearnfromyour

mistakes”“It’smuchbettertolearnfrom

other

peopl e’s

mistakes!”Page

3UltraFastcollectionof

best-practicesDeviceDensityhasbeenincreasing

exponentiallyFPGAsareaslargeasASICswereafewyears

agoComplexityofdesignsareincreasing

significantlyFPGAsarethecenterofthesystem,notjust

“glue-logic”Properfrontendandbackendmethodologyisessentialforproject

successVivadoEnableseasyvalidationof

constraintsPowerfultiming

analysisFullfeaturedDRC

checksSuperiordesignanalysis

capabilitiesTclaccesstocompletedesign

databaseASIC-classFPGAtool,designedtohandletheselarge

FPGAsPage

4UltraFastMethodology:Why

Now?Upfront

analysisDesignclosureateach

stepPage

5OverallStrategyforUltraFAST

Design:Earlier

IterationsDevice/IPselectionImplementationClosureIPIntegration,RTL

Design,VerificationConfig.,Bring-up,Debug1.2x1.1xImpacton

QoR100x 10xPCB/PlanningReduceDesignCycleTime&

CostCustomer

AdvantageArchitectDevicePlanningDesign

CreationImplementationVerification/SimulationConfig/DebugArchitectDevice

PlanningDesignCreationImplementationVerification/SimulationConfig/DebugNormalDevelopment

CycleLongerdebug

cycleNon-deterministic

debugUltraFastDesignMethodologyDevelopment

CycleShorterdebug

cycleMoredeterministic

debugProcessReview

StepsUltraFast

ChecklistsUltraFastDesignMethodology

ChecklistXTP301

V2014.1Project

IntroductionBoardandDevicePlanningDesign

CreationImplementationConfigurationand

DebugLauren

GaoUltraFast

DesignClockingUseMMCMorPLL

ProperlyCreateanOutput

ClockClockResourceSelection

SummarySourceSynchronous

InterfaceClockingUseMMCMorPLLProperlyUg949>Ch4>Clocking>ControllingthePhase...WhileusingMMCMorPLL,payattentiontothe

followingDonotleaveanyinputs

floatingRSTshouldbeconnectedtotheuser

logicGroundingofRSTcancauseproblemsiftheclockis

interruptedLOCKEDoutputshouldbeusedintheimplementationof

resetSynchronouslogicclockedbytheclockcomingoutofthePLLshouldbeheldinresettillLOCKEDis

assertedTheLOCKEDsignalwouldneedtobesynchronized

beforegettingusedinasynchronousportionofthe

designTheneedforBUFGinthefeedbackpathisimportantonlyifthePLL/MMCMoutputclockneedstobephasealignedwiththeinputreference

clockConfirmtheconnectivitybetweenCLKFBINand

CLKFBOUTSafeClockStartupand

SequencingPg065>Ch4>CustomizingandGeneratingtheCoreSafeClock

StartupEnablestableandvalidclockattheoutputusingBUFGCEafterLockedis

sampledHighfor8input

clocksSequencingEnableClocksinasequenceaccordingtothenumberenteredthrough

GUIDelaybetweentwoenabledoutputclocksinsequenceis8cycleofsecond

clockinthesequenceclockItisusefulforasystemwheremodulesneedtobestartoperatingoneaftertheotherSafeClockStartupandSequencing

DemoSettingsonMMCMor

PLLUg949>Ch4>Clocking>Controllingthe

Phase...IncorrectsettingsontheMMCMorPLL

mayIncreaseclockuncertaintyduetoincreasedjitterBuildincorrectphase

relationshipsMaketimingmoredifficultClockuncertaintyintiming

analysisMMCM/PLLSettings,YourGoals:Poweror

JitterIfyouselect‘MinimizePower’,‘MinimizeOutputJitter’is

removed!Ug949>Ch4>Clocking>ControllingthePhase...Dependingonyourgoals,thesettingsintheClockingWizard

maybechanged

toFurtherminimizejitter,andthusimprovetimingatthecostofhigher

powerGotheotherwaytoreducepowerbutincreaseoutput

jitterMMCMBalancedMinimizeOutput

JitterPLLBalancedMinimizeOutput

JitterJitterComparisonBetweenDifferent

SettingsMMCM:MinimizeOutput

JitterPLL:MinimizeOutput

JitterMMCM:

BalancedPLL:

BalancedPhaseBetweenOutput

Clock146Same

phase146235Same

phase235CreatinganOutput

ClockODDRD1D2CECSRQCD1D2QUg949>Ch4>Clocking>CreatinganOutput

ClockAn

effective

way:

ODDR

can

forward

a

copy

of

the

clock

to

theoutputThisisusefulforpropagatingaclockandDDRdatawithidenticaldelays–TyingtheD1inputoftheODDRprimitiveHigh,andtheD2input

LowClockResourceSelectionSummary

1Ug949>Ch4>Clocking>ClockResourceSelection

SummaryBUFGUsewhenahigh-fanoutclockmustbeprovidedtoseveralclockregionsthroughoutthe

deviceUseforveryhighfanoutnon-clocknetssuchasaglobal

resetBUFGCEUsetostopalarge-fanoutseveral-regionclock

domainBUFGMUX/BUFGCTRLUsetochangeclockfrequenciesorclocksourcesduringtheoperationofyourdesignClockResourceSelectionSummary

2Ug949>Ch4>Clocking>ClockResourceSelection

SummaryBUFHUseforsmallerclockdomainsoflogicthatcanbecontainedwithinasingleclockregionBUFRUseforsmalltomediumsizedclocknetworksthatdonotrequireperformancehigherthan450

MHzBUFIOUseforexternallyprovidedhigh-speedI/Oclockinggenerallyinsourcesynchronousdata

captureBUFMRUsewhenyouneedtouseBUFRsorBUFIOsinmorethanonevertically

adjacentclockregionsforasingleclock

sourceClockResourceSelectionSummary

3Ug949>Ch4>Clocking>ClockResourceSelection

SummaryPLL and

MMCMPLLprovidesabettercontrolof

jitterMMCMcanprovideawiderrangeofoutput

frequencies.Fortightertimingrequirement,PLLsmightbebest,providedtheycanprovidethefrequencyof

interestIDELAY/

IODELAYUseonaninputclocktoaddsmallamountsofadditionalphaseoffset

(delay)Useoninputdatatoaddadditionaldelaytodatathuseffectivelyreducingclockphaseoffsetinrelationtothe

dataODDRUsetocreateanexternalforwardedclockfromthe

deviceSource-Synchronous

InterfaceISERDESFPGAFabricCCIOBUFIOBUFRIOCLKDATAN÷DrivingMultiple

BUFIOsAlthoughBUFRscanperformthisfunction,BUFIOssupplythehighestperformanceoperationanddrivededicatedclocknetswithintheI/O

columnTheplacersoftwareautomaticallyplacesthebuffersintheappropriate

locationUg107>Appx.A:Multi-Region

ClockingDrivingMultiple

BUFRsIfthedividevalueintheBUFRisbeingused,thenallBUFRinstancesmustberesetwhiletheBUFMRCEis

disabledTheplacersoftwareautomaticallyplacesthebuffersintheappropriate

locationUg107>Appx.A:Multi-Region

ClockingDrivingMultipleBUFRs(withDivide)and

BUFIOManuallyplacethebufferswithaLOC

ConstraintThelogicdrivenbythebuffersisautomaticallyplacedintheappropriate

locationUg107>Appx.A:Multi-Region

ClockingDrivingMultipleBUFRs(WithandWithout

Divide)ManuallyplacethebufferswithaLOC

ConstraintThelogicdrivenbythebuffersisautomaticallyplacedintheappropriate

locationUg107>Appx.A:Multi-Region

ClockingSynchronizingBUFRsDrivenbya

BUFMRThisresetsthedividersinthe

BUFRsDeasserttheCLRonallthe

BUFRsThisallowsthedividerstostartonthenextrisingedgetheinputclock(currently

gated)AsserttheCEonthe

BUFMRStartstheclockstoall

BUFRsBUFRsarenowin

syncCE÷÷÷CLRBUFMRCEBUFRUg107>Appx.A:Multi-Region

ClockingInordertoclockasingleinterfacethatspansmultiplebanks,a

BUFMRmustbeusedtodrivetheBUFIOandBUFRinthedifferentregionsThedividersoneachBUFRareindependent;theymustbesynchronizedinordertoensureproperoperationofthe

interfaceUseaBUFMRCEtodisabletheclockfeedingthe

BUFRsAsserttheCLRonallthe

BUFRsLauren

GaoRTLCoding

StylePart1Blockingstatementsvs.Non-blocking

statementsIncompletesensitivity

listLatch

inference–Anifstatementwithoutanelse

clauseBasic

Functionalityprocess(G,

D)beginif(G=‘1’)

thenQ<=

D;end

if;end

process;always@(Gor

D)if

(G)Q=

D;Anintendedregisterwithoutarisingedgeorfallingedge

constructWHY:moredifficulttiming

analysesIncompletereset

specificationtheresetsignalwillgethookedtotheCEpin,therebycreatinganotheruniquecontrol

setalways@(posedge

clk)if

(rst)reg1<=

1’b0;elsebeginreg1<=

din1;reg2<=

din2;endall_latchesSliceFlip-Flopsand

Flip-Flop/LatchesEachslicehasfourflip-flop/latches(FF/L)Canbeconfiguredaseitherflip-flops

orlatchesTheDinputcancomefromtheO6LUToutput,thecarrychain,thewidemultiplexer,ortheAX/BX/CX/DXslice

inputEachslicealsohasfourflip-flops

(FF)DinputcancomefromO5outputortheAX/BX/CX/DXinputThesedon’thaveaccesstothecarry

chain,widemultiplexers,ortheslice

inputsIfanyoftheFF/Lareconfiguredaslatches,thefourFFsarenotavailableLUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRL0

1FF/LFFUseofLoopsinCodereg[3:0]dout;integeri;always@(posedgeclk)beginfor(i=0;i<=3;i=i+1)dout[3-i]<=

din[i];endProsand

ConsMinimizecoding

effortMayleadtoinefficientstructurestherebydegrading

performanceXilinxrecommendsrepresentingthesamefunctionalityusingconstructsthatareeasierforthetoolto

interpretTIPItisacceptabletoinferloopsforbasic

connectivitywhenthecodeinfershardwareresources(otherthanjustwires/interconnects),itisbettertoavoid

loopsalways@(posedge

clk)beginfor(i=0;i<=3;i=i+1)beginif(en[i])dout[i]<=

i;endendState-Machine

GuidanceMealyvs.Moore

StylesMain

difference:Mealy:Currentstate+Input=>

outputMoore:currentstate=>

outputIngeneral,MoorestatemachinesimplementbestinFPGA

devicesMostoftenone-hotstatemachinesisthechosenencoding

method,andthereislittledecodelogicnecessaryforoutput

valuesOne-Hotvs.Binary

EncodingThetwomostpopularforFPGAdesignsarebinaryand

one-hotVivado:

FSM_ENCODING"one_hot","sequential","johnson","gray","auto"and"none“,default:

“auto”(*fsm_encoding="one_hot"*)reg[7:0]

my_state;VHDLtypecount_stateis(zero,one,two,three,four,five,six);signalmy_state:

count_state;attributefsm_encoding:string;attributefsm_encodingofmy_state:signalis

"sequential";UseofDebug

LogicDebug

logicThelogicthatisnotnecessaryforthedesignfunction,butwhichisusefulinthedesign

analysisSeveralmethodscanassistinthis

objectiveGuardthelogicwitha`ifdef,parameter,orgenericthatcanbesettodisable

orenablethesesectionsof

codeCodethelogicinawaytomoreeasilyfacilitatecommentingitoutforthe

futureHaveaseparatedebugversionofamoduleorentitytointerchangefor

thispurposeTargetHaveagoodmethodologyfordebuggingthedesign

codeHaveagoodwaytoremovethat

logicDebug

logicDUTUser

logicAcontrolsetisthegroupingofcontrol

signalsset/resetclock

enableclockRegisterswithinasliceallsharecommoncontrol

signalsonlyregisterswithacommoncontrolsetmaybepackedintothesame

sliceDesignswithseveraluniquecontrol

setsHavealotofwasted

resourcesFeweroptionsforplacementresultinginhigherpowerandlower

performanceDesignswithfewercontrol

setsHavemoreoptionsandflexibilityintermsofplacement,generallyresulting

inimproved

resultsControlSignalsandControl

SetsControl

SetsAllflip-flopsandflip-flop/latchessharethe

sameCLK,SR,andCE

signalsThisisreferredtoasthe“controlset”ofthe

flip-flopsCEandSRareactive

highCLKcanbeinvertedattheslice

boundaryIfanyoneflip-flopusesaCE,allothersmust

usethesameCECEgatestheclockattheslice

boundarySaves

powerIfanyoneflip-flopusestheSR,allothersmust

usethesameSR–Theresetvalueusedforeachflip-flopisindividually

setbytheSRVAL

attributeDFF/LATCHD QCECKSRAFF/LATCHD QCECKSRD QCECKSRD QCECKSRAFFDFF●●

●●●

●report_control_setsIndicatorofpossiblepackingfragmentationandfitting

issuesRunthe–verbose

optiontogenerateafulllistControl

SetIfaninitialstateisnotspecified,itdefaultstoalogic

zeroItisnotnecessarytocodeaglobalresetforthesolepurposeofinitializingthe

deviceLimitstheoverallfanoutofthereset

netSimplifiesthetimingofthereset

pathsFunctionalsimulationshouldeasilyidentifywhetheraresetisneededornotNoresetbringsmuchgreaterflexibilityinselectingtheFPGA

resourcestomapthe

logicWhenandWheretoUsea

ResetDelay

lineSRLSRL+

RegistersAll

registersLUTorBlock

memoryWithoutresetWith

resetregisterswithacommon

resetUseActive-HighControl

SignalsFlip-FlopHierarchicaldesignmethodscanproliferateLUTusageonactive-lowcontrol

signalsTheinverters

cannotbe

combinedinto

thesame

sliceThisconsumesmorepowerandmakestimingdifficultControlaLocalizedReset

NetworkclkD QD Q DQD Qrst_nSynchronous

resetHigheffectiveLocal

resetAsynchronous

setLoweffectiveSynchronous

BridgeThenumberofflip-flopsinthechaindeterminesthe

minimumdurationoftheresetpulseissuedtothelocalized

networkControlaLocalizedReset

NetworkVerilogalways@(posedgeclkornegedgerst_n)//async.Negedgeresetbeginif(!rst_n)synchronizer_ckt<=4’hf//4stagereset

syncornizationelsesynchronizer_ckt<={synchornizer_ckt[2:0],

1’b0};endassignsynchronized_rst_n=

~synchronizer_ckt[3];//thefinalresetsignalwhichisusedtoresetthe

actual//flopsinthe

designUg949:UltraFastDesignMethodologyGuideforthe

VivadoDesignSuite,chapter

4Wp272:GetSmartAboutReset:ThinkLocal,Not

GlobalMore

InfoLauren

GaoRTLCoding

StylePart2Forlargerthan4-bitaddition,subtractionand

add-subCarrychain+oneLUTper2-bit

addition8-bit+8-bitadder:8LUTs+associatedcarry

chainTernaryadditionandwithouttheuseofaregisterin

betweenOneLUTper3-bit

addition8-bit+8-bit+8-bitadder:8LUTs+associatedcarry

chainIngeneral,multiplicationistargetedtoDSP

blocksThreelevelsofpipeliningarounditgeneratesbestsetup,clock-to-out,andpowercharacteristicsKnowWhatYou

InferShiftregistersordelaylinesthatdonotrequireresetormultiple

tappointsaregenerallymappedintoShiftRegisterLUTsor

SRLsTobestutilizeSRLs,avoidusingresetforthose

blocksIn7-seriesFPGA,eachLUTcandelayserialdatafrom1to32clock

cyclesForconditionalcoderesultinginstandardMUX

components4-to-1MUX:1LUT,onelogic

level8-to-1MUX:2LUTs+1MUXF7,onelogiclevel16-to-1MUX:4LUTS+1MUXF7+1MUXF8,onelogic

levelKnowWhatYou

InferUsingDedicatedBlocksorDistributedRAMsUsingtheOutputPipelineRegisterSelectingtheProperBlockRAMWrite

ModePerformanceConsiderationsWhenImplementing

RAMUsingDedicatedBlocksorDistributed

RAMsCLB_LLRAMsmaybeimplementedin

eitherthededicatedblock

RAMWithinLUTsusingdistributed

RAMTheFirstChoiceCriterion:Required

DepthMemoryarraysdeeperthan256aregenerallyimplementedinBlock

memorySlice_LSlice_LCLB_LMSlice_L_MSliceEachblockRAMblockcanbeused

asor36KbBRAM/FIFO18KbBRAM18KbBRAM/FIFOUsinganoutputregisterisrequiredforhighperformance

designsItisrecommendedforalldesignsThisimprovestheclocktooutputtimingof

theblock

RAMHavingbothregistershasatotalread

latencyof3DetermineearlywhetheranextraclockcycleoflatencyduringreadsistolerableUsingasynchronousresetimpactsRAMinference,andshouldbe

avoidedUsingtheOutputPipeline

RegisterBlockRAMD QD QRegisteroutofmemoryprimitivesRegisteroutofmemorycoreXilinxrecommendsthefollowingguidelinesforselectingthebest

writemodeforaparticular

operationConsiderFunctionality

FirstIfyoumustseethepriorvalueintheblockRAMduringwrite,select

READ_FIRSTIfyouwanttoreadthenewdatabeingwrittentotheblockRAMuse

WRITE_FIRSTIfyoudonotcareaboutthedatareadduringwrites,thenthenextselectioncriteriahastodowithmemory

collisionsUseNO_CHANGE

ModeInallothercases,XilinxrecommendsNO_CHANGEmode.NO_CHANGEhasthebestpower

characteristicsSelectingtheProperBlockRAMWrite

ModeREAD_FIRSTWRITE_FIRSTNO_CHANGEDSPSlice

FeaturesMULTZ-1ADDZ-1Z-1Z-236OpMode748A:B48072Y

36

0X017-bitshift17-bit

shiftA2518M

REGCED QPREGCED QB48DALUModeCarryInZC

REGCED Q14=Cor

MC48CEAREGD Q2-DeepB

REGCED Q2-DeepPPATTERNDETECTCInput

ConditioningOPCTLTheDSPblockscanperformmanydifferent

functionMultiplication,Additionandsubtraction,Comparators,Counters,General

logicFullypipelinethecodeintendedtomapintothe

DSP48DSP48E1sliceregisterscontainonlyresets,andnot

setsAvoidasynchronousresets,sincetheDSPsliceonlysupportssynchronousresetoperationsTheDSP48E1blocksuseasignedarithmetic

implementationCodeusingsignedvaluesintheHDLsourcetobestmatchtheresourcecapabilitiesThebitprecisionforsigneddatais18bitsby25

bitsThebitprecisionforunsigneddatais17bitsby24

bitsForVerilogcode,dataisconsideredunsignedunlessotherwisedeclaredinthe

codeCodingforP

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论