NVMe over Fabric技术与性能概述_第1页
NVMe over Fabric技术与性能概述_第2页
NVMe over Fabric技术与性能概述_第3页
NVMe over Fabric技术与性能概述_第4页
NVMe over Fabric技术与性能概述_第5页
已阅读5页,还剩18页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、NVMe over Fabric 技术与性能概述Impact of Host Factors on iWARP, ROCEv2, TCP Comparing Optane SSD LUN v NVMe SSD LUNOverview: NVMe Over Fabric (NVMe-oF)NVMe-oF: Allows Disaggregation of Cloud & Data Center Compute & StorageHelps optimize access to SSD Storage and Servers at ScaleExtends high performance/low

2、 latency NVMe protocol over networksEnables memory-to-memory data flow (RDMA)Allows bypass of local CPU resources and Scale-out to DPU serversDisaggregation allows de-coupling of workloads, servers, storage & testIOProfiler allows capture of real-world workloads on any logical serverCTS IO Generator

3、 can apply Test IOs from any direct, remote or fabric serverNVMe-oF:What is it?NVMe is a Storage Protocol optimized for Flash StoragePCIe is the Low Latency Transport used for SSDsNVMe-oF Maps that protocol on to Fabric TransportsTransports include FibreChannel, InfiniBand & EthernetTodays presentat

4、ion comparesEthernet Transports:RDMA iWARPRDMA ROCEv2NVMe-TCPNVMe-oF Optimization: Storage & Server Pools, Real-world Workloads, Fabric StorageServer Pools: Storage Pools: CPU On/Offload: Workloads: Storage Test:Enterprise, Datacenter, VM, Fabric, DPU SSD Cluster, LUN, JBOF, EBOFNVMe-oF Transport (R

5、DMA or TCP)Cloud & Datacenter Application IO Capture Direct, Remote, Virtual & Fabric StorageCTS IOProfilerReal-World Workload ModuleIO Capture tools for multiple OSes Any Logical Server or StorageFile System, Block IO LevelCTS IOStimulusGeneratorCTS IO Stimulus GeneratorDirect, Remote, Virtual, Fab

6、ric Servers File System of Block IO LevelInstall on Host and/or Target Server Transmit Results to Control Server DBNVMe-oF Ethernet Transports: What are the Differences?RDMA ROCEv2Low Latency, best suited for Rack scaleRequires ROCE capable NACs/SwitchesTCP Offload bypass local CPU“Lossy Network” ne

7、eds Priority Flow ControlRDMA iWARPLow Latency, scales well with large data centersRequires iWARP capable NACs/SwitchesTCP Offload bypass local CPU“Lossless Network” no dropped packetsNVMe/TCPLow Latency, scales well on large data centersEasily supported on simple NIC/SwitchesTCP Onload uses local C

8、PU resources“Lossless Network” no dropped packetsHost Factors: Impact on PerformanceOn Host: CPU, NVMe StorageEthernet: NIC, MTU Frame size, Frame rateSwitch: Settings for Congestion MgtNetwork: Over subscribed; Fan-in RatioTransport RDMA Offload v TCP OnloadNVMe SSD Attributes - IOPS, RT QoS, RW %W

9、orkload Content - Synthetic v Real-worldTest Settings MTU, QD, Test FlowExamples of Host FactorsFocus of this PresentationFuture Factors under StudySwitch Set-upStorage Head NodesNetwork TopologyCongestion / Flow ControlLossy v Lossless Ethernet Networks:What does it mean?Lossy v Lossless NetworksLo

10、ssy, or Best Efforts, networks do not guarantee packet delivery or Quality of Service (QoS)ROCEv2 & TCP are examples of a Lossy networksLossless networks guaranteepacket delivery.TCP over IP and iWARP are examples of Lossless networksLossy networks often include Priority Flow Control (at L2) & Diffe

11、rentiated Service Code Point (at L3) to become Lossless networks (see next slide L1-L7)Onload v Offload: TCP & RDMA TransportsTCP CPU OnloadTCP/IP relies on protocol stack & consumes CPU cyclesNIC - Network Interface Card uses Host CPU for TCP/IP (onload)CPUs can become fully saturated servicing hig

12、h speed 100Gb Ethernet thereby adding latency to storage / compute IOsPerformance can be increase by adding software Polling & CPU affinity (eg SPDK)TCP Offload Engine (TOE) Network Accelerator Cards (NAC) are not yet readily availableRDMA CPU OffloadRDMA Remote Direct Memory AccessNAC Network Accel

13、erator Card Offloads Host CPU cyclesHigh Performance/Low LatencyHost offload, Host Bypass TechnologyAllows direct memory-to-memory data communication over networks but requires Offload Engine on NACReduces server resources dedicated to network functions (protocol stack servicing)2 RDMA Implementatio

14、ns: ROCE (Lossy) and iWARP (Lossless)MTU: Standard v Jumbo FramesStandard Frame Size is 1500 Byte (1500B)More overhead for 1500B v 9000BLess efficient channel utilizationMTU 1500B Standard FrameDesc MAC Addr (6)SRC MAC Addr (6)FT (2)Data Standard Frame (46-1500)CRC (4)Maximum Transmission Unit (MTU)

15、Maximum IP packet size without fragmentationStandard Frame SizeJumbo Frame SizeJumbo Frame Size is up to 9000BJumbo frames must be on complete data pathDespite Jumbo setting, some OS may default to 1500BMTU 9000B Jumbo FrameDesc MAC Addr (6)SRC MAC Addr (6)FT (2)Data Jumbo Frame (1501 - 8188)CRC (4)

16、Target LUN Storage: Optane v NVMe SSDs2 Types of Storage LUNsOptane 2.25 GB LUN: 6 x 375GB Intel P4800NVMe 24.00 TB LUN: 6 x 4TB Intel P4500SSD Individual Drive Specs target different ranges of performanceOptane - higher performance with symmetric RWNVMe - higher capacity with better Read, lower Wri

17、te performanceOptane - superior smaller block RND RW performanceNVMe - superior larger block SEQ RW performanceSSD Specs do not target Real-world workload contentNVMe-oF LUN Performance affected by each layer of abstractionNVMe-oF level performance is influenced by underlying SSD performanceInterven

18、ing layers of abstraction mask/change SSD device performanceIOs pooled in Target server RAM are affected by “R/W-throughs” to mediaQoS Policies, head node servers & RAID strategies also affect performanceEach item in data path can affect performance & response time QoSTest Plan: ObjectivesCompare NV

19、Me-oF TransportsRDMA iWARP & ROCEv2TCP NVMe/TCPCompare MTU Frame SizeStandard 1500B frameJumbo 9000B frameCompare Storage LUNsOptane 6 x 375GB 2.25 TB LUNNVMe6 x 4.0TB24.0 TB LUNCompare WorkloadsSynthetic Corner Case BenchmarksReal-world Application WorkloadsTest Settings Demand IntensityMetrics: IO

20、PS, Bandwidth & RT QoSTest Plan: TopologyTest Plan: Test FlowReal-World Workloads: ComparisonRetail Web Portal Different Events64K RW Back-up IO Spike65% Reads5,086 IO Streams 4.5M IOs9 IO Streams 71% of Total IOs 3.2M IOsGPS Nav Portal Smaller IO SizesIO Spikes SEQ .5K W94% Writes1,033 IO Streams 3

21、.5M IOs9 IO Streams 78% of Total IOs 2.7M IOsVDI Storage Cluster Typical Storage IO Sizes 6 Drive Composite75% Writes1,223 IO Streams 167M IOs9 IO Streams 65% of Total IOs 108M IOsTest Plan: ResultsCompare MTU Frame SizeSynthetic Corner Case BenchmarksStandard 1500B v Jumbo 9000BiWARP v ROCEv2 v TCP

22、Compare TransportsReal-world WorkloadsReplay TestThread Count/Queue Depth SweepCompare StorageOptane v NVMe LUNsSynthetic Corner CaseReplay TestTC/QD Sweep TestMTU: Standard v Jumbo FrameSynthetic Corner Case: RND 4K RW, SEQ 128K RWiWARP: Optane 1500B v 9000BIOPS: 1500B & 9000B Substantially Equival

23、entRND 4K R IOPS: 9000B 1500BQoS: 1500B Read Workloads have High RT SpikesROCEv2: Optane 1500B v 9000BIOPS & RTs: 1500B & 9000B Substantially EquivalentRND 4K R IOPS: ROCEv2 iWARPResponse Time QoS: ROCEv2 TCPResponse Time QoS: iWARP & ROCEv2 Rtl Web GPS NavQoS by workload: GPS Nav VDI 9000B IOPSIOPS

24、 by workload: VDI GPS Nav Rtl WebQoS by workload: GPS Nav VDI 1500B for VDI & Rtl WebIOPS: VDI Rtl Web GPS NavQoS: GPS Nav VDI Rtl WebOffload v Onload: RDMA v TCPSynthetic Corner Case & Real World ReplayRDMA v TCP: Synthetic Corner Case Optane 1500BIOPS 4K RW:RDMA Substantially Higher than TCPIOPS 1

25、28K RW:Substantially Similar IOPS; RDMA has faster RTRDMA v TCP: Replay Test Optane 1500BRDMA Significantly higher than TCPIOPS:Response Times:RDMA Significantly faster than TCPSynthetic Corner CaseRND 4K RW IOPS RDMA is Significantly faster than TCP SEQ 128K RW IOPS RDMA & TCP are Substantially sim

26、ilarReplay Test Real World WorkloadsRDMA is faster than TCP for all workloadsDemand Intensity Outside Curves: RDMA v TCPThread Count/Queue Depth Sweep Test: Rtl Web, GPS Nav, VDI ClusterNote: TC/QD Sweep test applies the fixed 9 IO Stream composite workload for each step of the test. Demand Intensit

27、y (DI) is applied over a range from T1Q1 to T36Q16. Max DI (TC x QD) = 576. DI increases from left to right. ART increases along the Y-axis. Better Max OIO is lower right.Figure of Merit: Max IOPS OIO point where IOPS are highest & ART has not yet risen dramatically.Rtl Web Portal: iWARP v ROCEv2 v

28、TCPDI Outside Curves are Substantially EquivalentTCP shows best Max IOPS OIO but highest Max ARTRDMA show similar DI OS Curves & higher IOPSGPS Nav Portal: iWARP v ROCEv2 v TCPTCP shows higher IOPS, lower ART for Max IOPS OIORDMA iWARP & ROCEv2 are substantially similarAll DI OS Curves show similar

29、Max ART at Max OIOVDI Storage Cluster: iWARP v ROCEv2 v TCPTCP shows higher IOPS, lower ART for Max IOPS OIORDMA iWARP & ROCEv2 are substantially similarAll DI OS Curves show similar Max ART at Max OIONVMe-oF Storage LUN: Optane v NVMeSynthetic Corner Case: RND 4K RW, SEQ 128K RWNote: For RND 4K RW

30、IOPS: Optane faster than NVMeFor RND 4K Read RTs: iWARP shows significant RT QoS spikesFor SEQ 128K Writes: Optane IOPS are significantly higher than NVMeFor SEQ 128K Reads: Optane & NVMe have substantially similar IOPS & RTsiWARP:Optane v NVMe 1500BW IOPS Optane significantly higher than NVMeR IOPS

31、 Optane Substantially similar to NVMeR QoS Optane has large RT Spikes for ReadsSEQ 128K R Substantially equivalent BandwidthROCEv2: Optane v NVMe 1500BRW IOPS Optane significantly higher than NVMeRW IOPS: ROCEv2 substantially similar to iWARPRT QoS: Optane faster than NVMeSEQ 128K R Substantially eq

32、uivalent BandwidthTCP: Optane v NVMe 1500BW IOPS Optane significantly higher than NVMeR IOPS Optane Substantially similar to NVMeRT QoS Optane Substantially similar to NVMeSEQ 128K R Substantially equivalent BandwidthNVMe-oF Storage LUN: Optane v NVMeDemand Intensity OS Curves Rtl Web, GPS Nav, VDI

33、ClusterNote: Comparing DI OS Curves clearly shows the difference in performance between Optane & NVMe Storage LUNs.DI OS Curves are better in the lower right corner where IOPS are further to the right and RTs are lower on the Y axis. For Rtl Web, GPS Nav: Optane IOPS are substantially higher and RTs

34、 are substantially faster than NVMe.For VDI Cluster: NVMe and Optane IOPS are substantially similar. Optane RTs are nominally faster than NVMe.Rtl Web Portal: Optane v NVMe 1500BDI OS Curves by Storage are Substantially EquivalentOptane shows significantly higher IOPSOptane shows significantly faste

35、r Response TimesGPS Nav Portal: Optane v NVMe 1500BDI OS Curves by Storage are Substantially EquivalentOptane shows significantly higher IOPSOptane shows significantly faster Response TimesVDI Storage Cluster: Optane v NVMe 1500BTCP DI OS Curve is different from RDMA DI OS CurvesOptane & NVMe IOPS are substantially equivalentOptane shows slightly faster Response TimesFindingsMTU Standard v Jumbo frameStandard & Jumbo are substantially equivalentSynthetic Corner CaseIOPS: Substantially similarRTs: iWARP RND 4K Reads show large SpikesReplay TestRtl & GPS: IO

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论