




已阅读5页,还剩7页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
毕业设计外文资料翻译设计题目: 小型特种车辆液压传动系统改进设计 译文题目: 加速汽车设计的InfiniBand 正文:外文资料译文 附 件:外文资料原文 指导教师评语: 签名: 年 月 日正文:加速汽车设计的InfiniBand 全文:版权所有2008年Mellanox技术1.0摘要CAE模拟和分析是非常尖端的应用,它可以使工程师们洞察复杂的现象和实际地调查物理行为。为了尽可能产生最佳结果,这些模拟解决方案需要高性能的计算平台。在本文中,我们要对高性能集群在最大限度地提高效率和生产率方面,CAE应用方面,特别在对汽车设计方面的最佳使用进行调查。1.1简介高性能计算对汽车设计和制造是一种非常重要的工具。从部件到整车分析,它被应用于计算机辅助工程(CAE技术)。如:模拟碰撞,结构完整,散热管理,气候控制,引擎建模,废气,噪音等等。高性能计算有助于推动加速推向市场,降低成本和巨大的灵活性。高性能计算的优点就是通过将CPU性能推动到其极限以获得最佳持续工作情况的能力。在汽车行业大幅度的成本节约和产品改进一直被认为是高性能计算的动机。为确定车辆安全性的特点,实车碰撞试验的总费用通常会达到二十五万美元或更多。另一方面,高性能计算集群的成本可以只是单个碰撞测试成本的一小部分,同时提供一个可用于每个模拟测试进行的系统。除了模拟碰撞,高性能计算还被应用于许多其他方面。计算密集型系统和应用程序被应用于从气囊部署到刹车冷却,排气系统,热舒适性和可清洗前挡风玻璃的每种模拟。以高性能计算为基础的模拟和分析使工程师和设计师们能够造出对现实生活环境更完善和安全的汽车。1.2汽车模拟碰撞汽车设计中最严格的应用就是模拟碰撞(全锋,偏锋,角度,正面,侧面碰撞,尾部撞击和更多)。在碰撞模拟的发展过程中,虽然其出现非常早,一旦车辆完全建立,其很晚才能被验证。模拟越先进和复杂,越多的部件和细节便能够被分析。以计算机为基础的分析提供了一个对在实验中难以收集到的现象的早期认识,并且,只用在较后一阶段和可观的成本。不必建造昂贵的原型,从而节省时间和金钱。汽车制造商在整个设计过程增加了对模拟碰撞的依赖程度,同时降低对实际原型的需要,从而实现更快的上市时间与设计阶段更低的成本。高性能计算集群仅仅使视觉纯虚拟开发和物理原型得以验证。对于沃尔沃S80车型, 1993年至1998年,沃尔沃在设计的过程中完成1000次模拟和使用15个原型。对于S40/V40系列,1999年至2003年,沃尔沃将模拟增加到6000次和仅使用5个原型。对于他们的V70N车型, 2005年至2007年,沃尔沃完成10000次模拟碰撞,在产品开发中没有使用任何真正的原型,并在模拟碰撞中包含了大量的各种景观,如行人,滑行碰撞,翻滚等等。高性能计算集群的工作有助于沃尔沃进行复杂的模拟碰撞,并且高性能计算集群的灵活性和可扩展性的使沃尔沃提高了计算能力,以进行更多的模拟和更早的上市。高性能计算集群是由货架服务器,一个高速互连和一套存储解决方案组成。互连对总集群的性能和可扩展性有重大影响。慢互连会造成服务器之间和服务器与存储器之间数据传输的延误,导致计算资源利用不畅和迟钝的模拟执行。一个需要将CPU周期作为联网进程的一部分的互连,将减少应用程序可用的计算资源,并因此将放慢和限制可以被执行于特定集群的模拟的数量。此外,这将限制集群的可扩展性,当处理器的数量增加,更高的负担将强加于处理器去处理网络。InfiniBand,一个由Mellanox技术分发的高速互连,可在服务器之间提供最低的反应时间和最高的吞吐量,所以计算节点之间的通讯将足够快的向CPU提供数据和消除闲散的时间。此外, InfiniBand被设计的完全地卸载,这就意味着所有的通信将在互连内部被处理,没有CPU的参与。这进一步保障了当更多的计算资源被必需时扩大线性性能的能力。1.3多核心集群环境计算集群解决方案是由多核心服务器组成。一个多核心的环境对集群组件提出了更高的要求,尤其是在集群的连接方面。在模拟工作执行过程中,每个CPU核心对网络施加了一个单独的要求,因此集群互连必须能够同时地处理这些多个数据流,并且同时保证快速和可靠的数据传向每个流。在一个多核心的环境中,避免CPU内核开销处理是极其重要的。通过提供低延迟,高带宽和极低的CPU开销,InfiniBand提供了一个平衡的计算系统和最大限度地提高了应用性能:这就是为什么的InfiniBand正在以最部署的高速互联出现,取代专有或低性能解决方案的一个重大的因素。1.4 SMP与MPI对比在一个单一的服务器中,一个普通的多核心环境由8到16个CPU内核组成。在一个典型的单服务器环境中,应用程序的工作可以以共享内存处理(SMP)的方式,或用一个信息传递接口(MPI)协议被执行。为了在这两种选择之间进行比较,我们应用了利弗莫尔软件技术公司(LSTC)的LS - DYNA基准。LS-DYNA是一个通用的结构和流体分析仿真软件包,其能模拟复杂的现实世界的问题。它被广泛应用于汽车行业的耐撞性,乘员安全和金属成形,也应用于航空航天,军事,国防和消费产品。主要有三种用于衡量一个平台的性能,效率和可扩展性的LS - DYNA基准: 3车碰撞(一辆货车撞上小型汽车的后部,轮流地,小型汽车撞上中型汽车),neon_refined(初速度为31.5英里/小时的正面碰撞)和car2car (NCAC袖珍货车模型)。最近,修订版的neon_refined已被提出,名为neon_refined_revised 。该平台用于这一性能评价的是Mellanox技术的“赫利俄斯”集群。它是Mellanox集群中心的一部分,一个可用于性能测试和应用开发的计算资源。赫利俄斯集群由32组服务器节点组成,与千兆以太网和20Gb/s的InfiniBand连接。每个服务器节点有双插槽,四核处理器2.66GHz英特尔至强处理器(代号为Clovertown)。测试中使用的MPI是卡利的MPI连接。为了比较SMP和MPI这两种方式,需要使用一台单一服务器。该比较以每24小时可以完成的工作量为度量。MPI的使用提高了系统的效率和并行可扩展性,随着更多内核的使用,MPI方式与传统的SMP方式相比表现的更好 。在一台单一服务器中,MPI的使用不仅提供了更好的性能和效率,还能够实现一台单一服务器到一个集群环境的顺利整合。此外,当需要更多的计算能力时,不需要软件的改变。1.5集群环境的扩展:互连的重要性在一个私人系统网络中,集群是商品硬件基础上的可扩展的高性能计算解决方案。集群的主要优点在于其可扩展性,可用性和高性能。一个集群利用计算服务器节点的联合计算能力,为基础的应用程序,如MCAD或CAE应用程序,形成了一个高性能的解决方案。当更多的计算能力被需要时,它可以通过向集群增加服务器节点来简单地实现。在集群环境中,当发生节点故障,或当一台服务器节点因为一个规划服务需要被提出时,每个节点可以是每个其它节点的一个备用选项。对应用程序或用户来说,此操作是透明的,并且没有应用运行会发生。集群节点连接在一起的方式对整体应用性能有很大的影响,特别是在多核心服务器使用时。群集互连对整个集群的效率和可扩展性是非常关键的,因为它需要处理来自每个CPU核心的I/O请求,而不是将任何网络开销强加在同一处理器上。1.6多核心InfiniBand集群的最大化碰撞模拟针对多核心的集群平台,为了最大限度地发挥应用程序的性能和增加每天可获得的应用工作的数量,以太网在集群大小方面变的无效而InfiniBand成为必须。随着越来越多的工作被执行,使得产品的质量得以提高和产品的上市时间得以缩短。评估计算解决方案的一个典型的方法就是其运行一个应用程序的基准,或者一个应用程序的工作所花费的时间,运行时间越快,计算解决方案就越有效。然而,对于多内核平台上真正的模拟它并非总是最好的办法。多内核平台不仅对集群互连提出了更多的要求,而且对一个服务器节点内和CPUS与存储器之间的CPU提出了更多的要求。虽然在集群中运行一个单一工作将为那些特殊工作提供最快的运行时间,但以这种方式每天最大量的模拟目标可能无法实现。1.7加速汽车设计从概念到设计,及从设计到测试和制造,汽车业依靠强大的虚拟开发解决方案。计算流体力学和碰撞模拟正在努力地确保质量和加快发展进程。集群解决方案正在最大化CAE技术环境所有权的的总价值和扩大在虚拟产品开发中的创新。为了保持一个平衡的系统,并实现应用程序高的性能和扩展,多核心集群环境对集群连接的吞吐量,低延迟,低CPU开销,网络的灵活性和高效率有较高的要求。低性能互连解决方案,或缺乏互连硬件能力将导致退化的系统和应用性能。对三种情况进行了调查。在第一种情况下,很明显,与使用一种SMP模式相比使用MPI的应用程序将提供更多的性能-即使在一台单一服务器上。在第二种情况下,我们已经对在碰撞模拟中使用高速,低延迟和低CPU开销的互连的重要性进行了调查。调查结果显示,低速互联,如以太网对集群大小变得无效,甚至在增加更多的计算节点时,降低了集群计算能力。InfiniBand对集群大小显示出更大的效率和可扩展性。第三种情况表明,CPU的亲和力和互联的使用必须正确配置,以便最大限度地发挥集群效益。通过减少插座连接和内存的压力,同时更好地利用互联,这样才能完成更多的应用工作。碰撞模拟是汽车设计的重要组成部分,并且每天能够运行更多碰撞模拟的能力将使更为复杂的模拟成为可能,减少设计阶段和需要原型的数量。附件:Accelerating Automotive Design with InfiniBand1.0 AbstractCAE simulation and analysis are highly sophisticated applications which enable engineers to get insight into complex phenomena and to virtually investigate physical behavior. In order to produce the best results possible these simulation solutions require high-performance compute platforms. In this paper we investigate the optimum usage of high-performance clusters for maximum efficiency and productivity, for CAE applications, and for automotive design in particular.1.1 IntroductionHigh-performance computing is a crucial tool for automotive design and manufacturing. It is used for computer-aided engineering (CAE) from component-level to full vehicle analyses: crash simulations, structure integrity, thermal management, climate control, engine modeling, exhaust, acoustics and much more. HPC helps drive accelerated speed to market, significant cost reductions, and tremendous flexibility. The strength in HPC is the ability to achieve best sustained performance by driving the CPU performance towards its limits.The motivation for high-performance computing in the automotive industry has long been its tremendous cost savings and product improvements. A total cost of a real vehicle crash-tests in order to determine its safety characteristics, is of the range of $250,000 or more. On the other hand, the cost of a high-performance compute cluster can be just a fraction of the price of a single crash test, while providing a system that can be used for every test simulation going forward.HPC is used for many other aspects than just crash simulations. Compute-intensive systems and applications are used to simulate everything from airbag deployment to brake cooling, exhaust systems, thermal comfort and windshield washer nozzles. HPC-based simulations and analyses empower engineers and designers to create vehicles that are more ready and safer for real-life environments.1.2 Automotive Crash SimulationsOne of the most demanding applications of automotive design is crash simulation (full-frontal, offset-frontal, angle-frontal, side-impact, rear-impact and more). Crash simulations, while performed very early in the development process, are validated very late in the development process once the vehicle is completely built. The more sophisticated and complex the simulation, the more parts and details can be analyzed. Computer-based analyses provide an early insight into phenomena that are difficult to be gathered experimentally, and if so, only at a later stage and at a substantial cost. Time and money is saved without having to build costly prototypes.Automotive makers increase their dependency for car crash simulations throughout the design process while reducing the need for real prototypes, thus achieving faster time to market with less cost associated with the design phase. HPC clusters enable the vision of pure virtual development and having physical prototypes for verification only. For the Volvo S80 car model, 1993-1998, Volvo performed 1000 simulations and used 15 prototypes during the design. For the S40/V40 series, 1999-2003, Volvo increased their simulations to 6000 and used only 5 prototypes. For their V70N model, 2005-2007, Volvo performed 10,000 crash simulations during the product development without any real prototypes, and the crash simulations included a large variety of landscapes, such as pedestrian, slide impact, rollover and much more. The performance of HPC clusters have helped Volvo to perform the complex crash simulations, and the flexibility and scalability of HPC clusters enabled Volvo to increase the compute power in order to perform a greater numbers of simulation and faster time to market.HPC Clusters consist of of-the-shelf servers, a high speed interconnect and a storage solution. The interconnect has a great influence on the total cluster performance and scalability. A slow interconnect will cause delays in data transfers between servers and between servers and storage, causing poor utilization of the compute resources and slow execution of simulations. An interconnect that requires CPU cycles as part of the networking process will decrease the compute resources available to the applications and therefore will slow down and limit the numbers of simulations that can be executed on a given cluster. Furthermore, this will limit the cluster scalability as when the amount of CPUs increases, the higher the burden that will be enforced on the CPUs to handle the networking.InfiniBand, a high-speed interconnect distributed by Mellanox Technologies, provides the lowest latency and the highest throughput between servers, so the communication between the compute nodes will be fast enough to feed the CPUs with data and eliminating idle times. Moreover, InfiniBand was designed to be fully offloaded, meaning all the communications are being handled within the interconnect, with no involvement from the CPU. This further guarantees the ability to scale up with linear performance, when more compute resources are required.1.3 Multi-core Cluster EnvironmentsCompute cluster solutions consist of multi-core servers. A multi-core environment introduces higher demands on the cluster components, especially on the cluster connectivity. Each CPU core imposes a separate demand on the network during simulation job execution, and therefore the cluster interconnect needs to be able to handle those multiple data streams simultaneously and at the same time guarantee fast and reliable data transfer for each of the streams.In a multi-core environment, it is essential to avoid overhead processing in the CPU cores. By providing low-latency, high-bandwidth and extremely low CPU overhead, InfiniBand provides a balanced compute system and maximizes application performance: a large factor for why InfiniBand is emerging as the most deployed high-speed interconnect, replacing proprietary or low-performance solutions.1.4 SMP versus MPIA common multi-core environment consists of 8 to 16 CPU cores in a single server. In a typical one server environment, application jobs can be executed in a Shared Memory Processing (SMP) fashion, or with a Message Passing Interface (MPI) protocol. In order to compare between the two options, we have used Livermore Software Technology Corporation (LSTC) LS-DYNA benchmarks.LS-DYNA is a general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems. It is widely used in the automotive industry for crashworthiness, occupant safety and metal forming and also for aerospace, military and defense and consumer products. There are three main LS-DYNA benchmarks used for evaluating a platforms performance, efficiency and scalability: 3 Vehicle Collision (a van crashes into the rear of a compact car, which, in turn, crashes into a midsize car), neon_refined (frontal crash with initial speed at 31.5 miles/hour) and car2car (NCAC minivan model). Recently, a revised version of neon_refined was introduced, named neon_refined_revised.The platform used for this performance evaluation is the Mellanox Technologies “Helios” cluster. It is part of the Mellanox Cluster Center, a compute resource available for performance testing and application development. The Helios cluster consists of 32 server nodes, connected with gigabit Ethernet and 20Gb/s InfiniBand. Each server node has dual-socket, Quad-core 2.66GHz Intel Xeon CPUs (code name Clovertown). The MPI used in the test was Scalis MPI Connect.In order to compare between an SMP and MPI approach, a single server was used. The comparison metric was the amount of jobs that can be achieved per 24 hours. According to figure 3, the usage of MPI improves the systems efficiency and parallel scalability, and as more cores were used, the MPI approach performed better versus the traditional SMP.The usage of MPI in a single server not only provides better performance and efficiency, it also enables a smooth integration of a single server into a cluster environment. In addition, when more compute power is needed, no software changes are required.1.5 Scaling in a cluster environment: the importance of interconnectsClusters are scalable performance compute solutions based on commodity hardware, on a private system network. The main benefits of clusters are scalability, availability, and high performance. A cluster uses the combined compute power of the compute sever nodes to form a high-performance solution for cluster-based applications such as a MCAD or CAE applications. When more compute power is needed, it can be simply achieved by adding server nodes to the cluster.In a cluster setting, each node can be a backup option to each other node in the event of node failure, or when a server node needs to be taken out for a planed service. To the application or the user, this operation is transparent, and no application run time will occur.The way the cluster nodes are connected together has a great influence on the overall application performance, especially when multi-core servers are used. The cluster interconnect is very critical to efficiency and scalability of the entire cluster, as it needs to handle the I/O requirements from each of the CPU core, while not imposing any networking overhead on the same CPUs.1.6 Maximizing crash simulations on multi-core InfiniBand clusterFor multi-core cluster platforms, GigE becomes ineffective with cluster size and InfiniBand is required in order to maximize the application performance and the amount of application jobs that can be obtained per day. As more jobs can be performed, the quality of the product increases and the time-to-market reduces.A typical method for evaluating a compute solution is with the time it takes to run one application benchmark, or one application job - the faster the run-time, the more effective the compute solution. However, it is not always the best approach for real simulation on multi-core platforms. Multi-core platforms create more demand on the cluster interconnect, but also on the CPU connectivity within a server node and the between the CPUs and memory. While running a single job on the cluster will provide the fastest time run for that specific job, the goal of maximum simulations per day might not be achieved with this manner.1.7 Accelerating Automotive DesignFrom concept to engineering, and from design to test and manufacturing, the automotive industry relies on powerful virtual development solutions. CFD and crash simulations are performed in an effort to secure quality and speed up th
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025-2030年中国煤炭洗煤行业发展趋势与前景展望战略研究报告
- 2025-2030年中国热交换器防污剂行业市场现状供需分析及投资评估规划分析研究报告
- 2025-2030年中国海水鱼制品行业市场发展分析及前景趋势与投资管理研究报告
- 美育浸校园的策略及实施路径
- 2025年中国丙烯酸白底面漆行业市场发展前景及发展趋势与投资战略研究报告
- 中国语言培训行业发展潜力分析及投资方向研究报告
- 加密算法优化-第2篇-洞察及研究
- 开关柜成套设备行业深度研究分析报告(2024-2030版)
- 中国竹家具行业发展前景预测及投资方向研究报告
- 身边的诱惑讲课件
- 散装白酒培训课件
- 铝材设计知识培训课件
- 客运安全培训课件
- 2025年市建设工程质量监督站工作总结(3篇)
- 《ptc钛酸钡陶瓷》课件
- 氮气安全知识培训课件
- 银发经济的发展路径
- 金矿融资计划书范文
- 2024年11月人力资源管理师三级真题及答案
- JGJ46-2024 建筑与市政工程施工现场临时用电安全技术标准
- 足球场草坪养护管理手册
评论
0/150
提交评论