eecs 252 graduate computer architecture lec xx - topic252大计算机体系结构采用lec xx的话题_第1页
eecs 252 graduate computer architecture lec xx - topic252大计算机体系结构采用lec xx的话题_第2页
eecs 252 graduate computer architecture lec xx - topic252大计算机体系结构采用lec xx的话题_第3页
eecs 252 graduate computer architecture lec xx - topic252大计算机体系结构采用lec xx的话题_第4页
eecs 252 graduate computer architecture lec xx - topic252大计算机体系结构采用lec xx的话题_第5页
已阅读5页,还剩111页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Lecture on High Performance Processor Architecture(CS05162),An HFall 2007University of Science and Technology of China Department of Computer Science and Technology,TLP Architecture Case Study:Network Processors,2018/2/4,CS of USTC AN Hong,2,Outline,NP OverviewWhatNP Functions, Objects, Evolution, Speeds Network Processor Applications,Workload, and BenchmarkCategorization:Control and data planesCharacteristicsRequirementsBenchmarksNP Architecture Modeling and Simulating,2018/2/4,CS of USTC AN Hong,3,Outline,NP Architecture Case StudyOverview current productsSpecial Purpose Hardware ComparisonPipelining Model ArchitectureMultiprocessing Model Architecture NP Architecture Characteristics and Core Technologies Key characteristics of the NP architectureArchitectural approachesISAParallel Memory Programming Model,2018/2/4,CS of USTC AN Hong,4,NP Overview,2018/2/4,CS of USTC AN Hong,5,NP Overview,What: Network Processor(NP) is a programmable device that has been designed and highly optimized to perform networking functions.NP Functions:Specially for network applicationsPattern matching(lookup addresses, bit-wise)Data manipulation(TTL, CRC, SAR)Queue and Buffer Management (QoS, rate, priority, ToS)Statistics GatheringNP ObjectsReplace expensive ASIC in network device buildupProvide platform solutions through programmabilityExtending product life time through software update,2018/2/4,CS of USTC AN Hong,6,NP Overview,NP evolutionGPP(General-purpose Processor)Programmable, Not optimized for networking applicationsASIC(Application Specific Integrated Circuit)High processing capacity,High design complexity, long time to develop, Lack the flexibility)NP(Network Processor)ASICs performance + GPPs flexibilityCheaper than GPP30 companies offering network processors; 350 design wins,2018/2/4,CS of USTC AN Hong,7,History of Packet Processing,The Classic RouterCentralized CPU router architecture,2018/2/4,CS of USTC AN Hong,8,History of Packet Processing,Emergence of Fast and Slow Path ProcessingDistributed CPU router architecture,2018/2/4,CS of USTC AN Hong,9,History of Packet Processing,Hybridization of Routers and SwitchesLayer-2 switch based on distributed packet processing using ASICs,2018/2/4,CS of USTC AN Hong,10,NP Overview,Why GPP cannot keep up?Moores law can NOT keep up with the network processing speed requirement!NP speeds 1994-1996 OC-3 (155Mbps, ns) 1997-1999 OC-12 (625Mbps, 640ns)2000-2001 OC-48 (2.5Gbps, 160ns)2002-2003 OC-192 (10Gbps, 40ns)2003-2005 OC-768 (40Gbps, 10ns),2018/2/4,CS of USTC AN Hong,11,NP Overview,Why ASICs are not the answer? Four factors preventing ASIC-centered designsIP-based protocols are still evolvingLayer-2 protocols are in a greater degree of flux than everIncreasing Packet Processing ComplexityTime-to-Market Pressures NP is to address such a needTime to market(TTM)Time in market(TIM)Expanded functionalityLeverage third-party development of applications,2018/2/4,CS of USTC AN Hong,12,NP Overview,Where do NPs fit in a system?A networking device can be broken down into four overall functions:Host processingPHY(physical) layer processingSwitchingPacket processingFramingParsing/ClassificationModificationEncryption/compressionQueuing,2018/2/4,CS of USTC AN Hong,13,NP Overview,Packet processing architecture,2018/2/4,CS of USTC AN Hong,14,Components of a Generic Router,2018/2/4,CS of USTC AN Hong,15,NP Overview,Line interface, conditioning, framing,NP,Memories,CAMs,special functions,Switch,Otherlinecards,Host control processor,Line card,NP in a router application,2018/2/4,CS of USTC AN Hong,16,Packet Processing in an IP router,1. Accept packet arriving on an incoming link.2. Lookup packet destination address in the forwarding table to identify outgoing port(s).3. Edit packet header: e.g., decrement TTL, update header checksum.4. Send packet to the outgoing port(s).5. Buffer packet in the queue.6. Transmit packet onto outgoing link.,2018/2/4,CS of USTC AN Hong,17,Control Plane,Datapathper-packet processing,Switching,ForwardingTable,Routing Table,Routing Protocols,Another View of an IP Router,2018/2/4,CS of USTC AN Hong,18,Packet Forwarding Engine,header,payload,Packet,Router,Destination Address,Outgoing Port,Routing Lookup Data Structure,2018/2/4,CS of USTC AN Hong,19,Size of the Forwarding Table,Source: /ops/bgptable.html,2018/2/4,CS of USTC AN Hong,20,Lookup Rate Required,应用对网络处理器的性能要求(平均包大小设为典型值64字节),2018/2/4,CS of USTC AN Hong,21,Performance Estimation,10Gbps Core Router Functions: transport packets OC-192 Running 200Mhz = 200MIPS Assumption: 1MIPS for 1MBits I/O and 1Mbytes Mem. Estimation: #uP = 10G/200 = 50 !Memory: 10GBytes !Solutions:Coprocessors: IP forwarding , Classification, and CRC and checksumMultithreadingMemory hierarchy,2018/2/4,CS of USTC AN Hong,22,NP Design Challenges,As GPP and ASICExternal memory bandwithPower dissipationPin limitationsPackagingVerificationNP special Line speedReal-time, link-rate processingApplication complexityApplications that operate on individual packet headers(e.g., routing and forwarding)Applications that operate principally on individual packet payloads(e.g., transcoding)Applications that operate across multiple packets within a single flow(e.g., certain encryption algorithms) or across multiple flows(e.g., QoS and traffic shaping). A “flow” is considered to be a single source-destination session,2018/2/4,CS of USTC AN Hong,23,NP Design Challenges,Other NP special Port densityHigh-level of device integration(on-chip interfaces and controllers for external memories, switch fabrics, co-processors, network interfaces, etc.)Management of critical shared resources in a chip-multiprocessor environment(e.g., shared program state, memory interfaces);Compiler and software design for high-performance, real-time, parallel, and heterogeneous systemsReal-time system verification,2018/2/4,CS of USTC AN Hong,24,NP Design Techniques,Application-specific ArchitecturesExtending the RISC instruction setUse of customized on-chip or off-chip hardware assistsParallelismThread-level parallelismInstruction-level parallelismMicroarchitecturesMultiple processorsPipelined processors,2018/2/4,CS of USTC AN Hong,25,NP Application,Workload,and Benchmark,2018/2/4,CS of USTC AN Hong,26,Application,Need to understand applications before understanding “application-specific” devicesKernelsControl processing: Encompasses a large number of different tasks that usually do not need to be performance at wire speed. Pattern matching:Header parsingPacket classification:indentification of the packet type and attributesLookup: based on a key to find a specific entry in a tableData manipulation:modifies the packet headerField computation:Chechsum, CRC, time-to-live field decrement, data encryptionQueue management:Scheduling and storage of incoming and outgoing packet data units,2018/2/4,CS of USTC AN Hong,27,Application Categorization,NP ApplicationsCarrier-class metro and coreMulti-service edge and access networkEnterprise and Ethernet edgeStorage NetworksNetwork SecurityNP Application systemRoutersSwitchesFirewalls,2018/2/4,CS of USTC AN Hong,28,Application Categorization,Tasks and servicesRouting table lookupDetermine the next hop for incoming packetsPacket Classification classify packets using header fields against a set of rulesURL-based SwitchingDistribute HTTP requests based on URLsTranscodingEncryption/Decryption, intrusion detection, firewall, access control checking, denial-of-service,2018/2/4,CS of USTC AN Hong,29,Application Categorization,All tasks required for control and manament of the NPU. For exampleTables maintenance(classification tables, routing tables, QoS tables.)Ports stateTiming & signaling to all components: Pes, switch-fabric, Queues,Processing Tasks,Traffic managementqueuing, scheduling & PolicingTransformation of packet data between layers(protocols)Identify packets aginst a criteria: flow, QoS Parsing packets heather to extract protocol informationLow-level protocol implementation: Ethernet, ATM,2018/2/4,CS of USTC AN Hong,30,Application Categorization,Control-Plane tasksLess time-criticalControl and management of device operationTable maintenance, port states, etc.Data-Plane tasksOperations occurring real-time on “packet path”Core device operationsReceive, process and transmit packets,2018/2/4,CS of USTC AN Hong,31,Data Plane Tasks,Media Access ControlLow-level protocol implementationEthernet, SONET framing, ATM cell processing, etc.Data ParsingParsing cell or packet headers for address or protocol informationClassificationIdentify packet against a criteria (filtering / forwarding decision, QoS, accounting, etc.)Data TransformationTransformation of packet data between protocolsTraffic ManagementQueuing, scheduling and policing packet data,2018/2/4,CS of USTC AN Hong,32,Data Plane operations - examples,Priority based QoS mechanismSupports different levels of QoS for each output portContains QoS policy table prioritizing packetsIngress operations Applies QoS policy on the packet receivedGets the packet priority from its heather contentPlace the packet in the appropriate output queueEgress operations Identifies & schedules highest priority packet for transmissionTransmits the identified packet on to the output portSecurity Encryption/Decryption, intrusion detection, access control checking, denial of-service service,2018/2/4,CS of USTC AN Hong,33,Data Plane operations - examples,Monitoring Capturing usage patterns, time informationLoad Balancing Distribution of traffic among servers according to the server load, content and client credentials load, co,2018/2/4,CS of USTC AN Hong,34,Protocol Processing Characteristics,Protocol processing requires intensive memory operations. Memory speed determines the system performance.Protocol processing requires powerful bit manipulation.Layer2-4 protocols require error detection (Computation).e.g. CRC and checksumMulti-service (multi-protocol) coexist,2018/2/4,CS of USTC AN Hong,35,Packet Application Characteristics,Packet coverage: Header only, or Header + PayloadPacket parsing: Is the data location known/static? QoSClassification and Queuing States are maintained between packets Statefull analysis,2018/2/4,CS of USTC AN Hong,36,IPv4 Routing table lookup,Routers determine next hop and forward packets,Router,A,B,C,P,P,P,2018/2/4,CS of USTC AN Hong,37,Routing Table Lookup is a Searching Extensive Task,Search operation is not an exact matchDirect lookup needs 4G entries (32 bits IP address)Longest prefix matchTriesHashing tableBalanced tree,2018/2/4,CS of USTC AN Hong,38,Trie block keeps pointers to route entry and other trie blocksDestination IP address bits are examined group by group (4-bit),rt_ptr,trie_ptr,Next hop 1,0,15,Next hop 2,Next hop 3,Next hop 4,Trie block,Trie-based Routing Table Lookup,2018/2/4,CS of USTC AN Hong,39,Example,rt_ptr,trie_ptr,Next hop 1,0,15,Next hop 2,Next hop 3,Next hop 4,Packet destination IP address =0x13fe2233 (0001,0011,1111,1110,),2018/2/4,CS of USTC AN Hong,40,IP Lookups using Multi-way Multi-column Search,Illustration of the idea with 6-bit addressPrefixes: 1* 101 * 10101*,Binary search does not work with variable length strings.,1 0 0 0 0 01 0 1 0 0 01 0 1 0 1 0,101011101110111110,end up far away from the matching prefixMultiple addresses that match to different prefix, end up in the same region,2018/2/4,CS of USTC AN Hong,41,Multi-way multi-column search,2018/2/4,CS of USTC AN Hong,42,Packet forwarding tasks,Header parsingThis consists of pattern matching of bits in the header fieldPacket classificationIdentification of the packet type (e.g. IP, MPLS,ATM)and attributed(e.g. quality of service requirement, encryption type )LookupConsists of looking up data based on a key. It is mostly used in conjunction with pattern matching to find a specific entry in a tableComputationThis varies widely by application. Examples include checksum,CRC, time-to-live field decrement, and data encryption,2018/2/4,CS of USTC AN Hong,43,Packet forwarding tasks,Data manipulationAny function that modifies the packet headerQueue managementScheduling and storage of incoming and outgoing packet data unitsControl processingEncompasses a large number of different tasks that usually do not need to be performed at wire speed. These are usually performed on a standard RISC processor linked to the NPU.,2018/2/4,CS of USTC AN Hong,44,Packet Classification,Routers are required to distinguish packets forFlow identificationFair sharing of bandwidthQoSSecurityAccounting, billingetcPackets are classified by rulesSrc IP, Dest IP, src port #, dest port # etcClassification Algorithm MetricsSearch speedStorage costScalabilityUpdatesEtc.,2018/2/4,CS of USTC AN Hong,45,Classification: Hierarchical tries,Extension of the one dimensional radix trieConstruct trie recursively:Contruct F1-trie on the set of prefix Rj1For each prefix p in F1-trie, we recursively construct (d-1) dimensional hierarchical trie on rules where Rj:Rj1=p,Pankaj Gupta and Nick McKeown,Algorithms for Packet Classification, IEEE Network Special Issue, March/April 2001, vol. 15, no. 2, pp 24-32.,2018/2/4,CS of USTC AN Hong,46,Classification: Bitmap-intersection,The set of rules S that a packet matches is the intersection of d sets, SiWhere Si is the set of rules that match the packet in the i-th dimension alone.,Pankaj Gupta and Nick McKeown,Algorithms for Packet Classification, IEEE Network Special Issue, March/April 2001, vol. 15, no. 2, pp 24-32.,0,0,2018/2/4,CS of USTC AN Hong,47,URL-based switching,Increase efficiencyTasksTraverse the packet data (request) for each arriving packet and classify it:Contains .jpg - to image serverContains cgi-bin/ - to application server,Switch,Image Server,Application Server,HTML Server,,Internet,GET /cgi-bin/form HTTP/1.1 Host: ,APP. DATA,TCP,IP,Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik,2018/2/4,CS of USTC AN Hong,48,Transcoders,Two important requirementsIf the receiver is not capable of interpreting the stored data (multimedia transcoders)wireless receivers, hand-held devices, etc.Compression for bandwidth and storage efficiency,Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik,2018/2/4,CS of USTC AN Hong,49,NP Workloads and Benchmarks,Available:NPBench10 applicationsCommBench4 8 applications/wolf/cb/NetBench310 applications/NetBenchMiBench2EEMBC/benchmark MediaBenchTranscodersSome communications applications,2018/2/4,CS of USTC AN Hong,50,三个主要的Benchmark,2018/2/4,CS of USTC AN Hong,51,3种典型的Benchmark应用程序比较,2018/2/4,CS of USTC AN Hong,52,Benchmarking Hierarchy,System level?Function level?Microlevel?Hardware level?,2018/2/4,CS of USTC AN Hong,53,Benchmarking Reference Platform,同构或异构的多内核CMP(Chip Multiprocessors)结构,这些内核一般分为三种:用于控制平面处理的通用处理器。一组用于数据平面处理、优化的RISC处理器核,称为微引擎(MicroEngine,简称ME),或处理引擎(Processing Engine, 简称PE)、通道处理器、任务优化处理器。每个微引擎上同时运行多个硬件线程(因此又称为多线程处理单元)。此外,还集成有一些专用的协处理器(或可编程的ASIC),实现路由表查找、分类、深度的包分析、缓冲和队列管理等专门功能。所有这些内核单元主要有流水和并行两种组织方式,片上提供高速的存储和I/O接口。,2018/2/4,CS of USTC AN Hong,54,NP Workloads and Benchmarks,Several weak points:no consideration for interfacesassuming traditional programming modelMetrics: Processing timeThroughputMemory latency.,2018/2/4,CS of USTC AN Hong,55,NP Architecture Modeling and Simulating,2018/2/4,CS of USTC AN Hong,56,Architectural Comparisons,High-level organizationsAggressive superscalar (SS)Fine-grained multithreaded (FGMT)Chip multiprocessor (CMP)Simultaneous multithreaded (SMT),2018/2/4,CS of USTC AN Hong,57,Time (processor cycle),Superscalar,Fine-Grained,Coarse-Grained,Multiprocessing,Thread 1,Thread 2,Thread 3,Thread 4,Thread 5,Idle slot,SimultaneousMultithreading,Architectural Comparisons (cont.),2018/2/4,CS of USTC AN Hong,58,Systems must support some form of concurrent packet-level parallelismSMT and CMP are nearly equivalent, with SMT always coming out ahead,Forwarding: IP ForwardAuthentication: MD5Encryption: 3DES,SSFGMTCMPSMT,Workloads have little ILPNeed to exploit packet-level parallelismCMP and SMT do just that,Performance Evaluation,2018/2/4,CS of USTC AN Hong,59,NP Architecture:SS / FGMT / CMP / SMT 10,2018/2/4,CS of USTC AN Hong,60,NP Architecture: SoC: CMP + SMT + cluster 13,2018/2/4,CS of USTC AN Hong,61,NP Architecture: SoC: CMP + SMT + cluster 13,Goal: maximize IPS/area within pin count limitPerformance Models: IPS : m, n, clkp, p area : m, n, Smchl, Sp, Sci, Scd p : t, pmiss, mem (mchl) Smchl : bwmchl bwmchl : p, clkp, , linesize, pmiss n : widthmchl, clkmchl, mchl, bwmchl bwIO : IPS, compl., IO,2018/2/4,CS of USTC AN Hong,62,NP Architecture Case Study,2018/2/4,CS of USTC AN Hong,63,Network Processor Companies,AgereAlchemyAllayerApplied Micro Circuits(MMC Networks)Bay MicrosystemsBrecis CommunicationsBroadcom (SiByte)CiscoClearSpeedClearwater NetworksCognigineConexant/Mindspeed(Maker),EZchipEntridia CorporationIBMIP Semiconductors A/SIntelishoni NetworksLexraMotorola (C-Port)Navarro NetworksOnex CommunicationsPMC-Sierra (QED)Vitesse (Sitera),2018/2/4,CS of USTC AN Hong,64,Overview of Current Product,2018/2/4,CS of USTC AN Hong,65,Map of NP Market,2018/2/4,CS of USTC AN Hong,66,Map of NP Market,2018/2/4,CS of USTC AN Hong,67,Architectural Diversity,2018/2/4,CS of USTC AN Hong,68,Performance Diversity,2018/2/4,CS of USTC AN Hong,69,Special Purp

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论