Hadoop的英特尔之道_第1页
Hadoop的英特尔之道_第2页
Hadoop的英特尔之道_第3页
Hadoop的英特尔之道_第4页
Hadoop的英特尔之道_第5页
已阅读5页,还剩18页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、,Software and Services Group,Hadoop: the Intel Way,(Hadoop的英特尔之道),Bring New Analytics Capabilities to Hadoop Stack,何京翔,英特尔亚太研发有限公司总经理,#,Security & Trust,Workload Consolidation,Cloud and IOT: More Users, More Device, More Data Immersive Experiences Cloud,Connectivity Data Analytics Software and Servi

2、ces Group,Open Cloud Architecture,#,Software and Services Group,Intels Vision,This decade we will create and extend computing technology to connect and enrich the lives of every person on earth,Software and Services Group # 4,Sensor Readin g,Log,Tabl e,Image,Document,Existing IT & Data,RDBM S EDW,Da

3、ta Marts,Systems BI All of Your Big Data (Structured & Unstructured),Our Big Data Goal: Make Hadoop the Foundation of Next-Gen Data Analytics Platform Data Mining and Analytics,Business Intelligenc e,Statistic Modeling,Machine Learning,# 5,HBase,HDFS,Hive,Base Station s,3G,Instantaneous query of 3G

4、records by subscribers Software and Services Group,User,Segmentation MapReduce ETL,Hadoop in Telecom Carrier Network Optimizations,Hive,Instantaneous,query (e.g., road image),Legacy applications,MapReduce HBase Stream processing (e.g., real-time road conditions) Software and Services Group # 6,Hadoo

5、p in Smart City Data mining (e.g., vehicle,tracking),Hadoop的英特尔之道,更易用 (Reduced Complexity) 更高效,企业级解决方案 Enterprise-Grade Solution 即时分析 (Instantaneous Analysis) 英特尔Hadoop发行版, 稳定的企业级软件产 品 针对垂直行业的功能 增强,前沿技术开发 Advanced Development “Project Panthera”, Advanced development and path-finding Open source and

6、community driven,(Improved Efficiency) Bring New Analytics Capabilities to Hadoop Stack Software and Services Group # 7,英特尔Hadoop发行版 优化的大数据处理软件产品,英特尔,Hadoop Manage r 安装、部署、 配置、监控、 告警和访问控 制,利用硬件新技术进行优化 针对行业的功能增强,应对不同行业的大数据挑战 数据分析、统计和挖掘,Mahout,机器学习,R 数据统计,Hive,Pig,数据流处理语言,可靠的分布式文件系统 Software and Servi

7、ces Group # 8,稳定的企业级Hadoop发行版 为Hadoop提供即时数据处理能力 数据处理,工具集,from Revolution Analytics 交互式数据仓库 MapReduce 稳定高效的分布式计算框架 分布式、高维数据库HBase HBase 0.94的改进和创新,提供即时数据处理 HDFS,Sqoop 关系数据ETL工具 Flume 日志收集工具 Zookeeper 分布式协作服务,SQL engine for,Hive/MapReduce, Better integration with existing infrastructure using SQL,HBas

8、e, Document,semantics & significantly speedup query processing on,HBase Software and Services Group # 9, Efficient utilization,of new HW platform technologies,“Project Panthera” Open source initiatives to enable advanced analytics capabilities on Hadoop Document store on,#,Software and Services Grou

9、p,即时分析 (Instantaneous Analysis),10,Instantaneous analysis with greatly enhanced HBase Stream new data into HBase for analysis in real time, Support high update rate workloads (to keep the system always up to,date), Allow very low latency, online data serving Etc.,#,11,Interactive Query on HBase (英特尔

10、Hadoop发行版) 10X faster than MapReduce For certain queries on HBase (e.g., group-by aggregation),HBase Query Engine Layer, ,Fast, distributed aggregations directly inside HBase Parallel scanning over multiple regions Advanced, distributed filtering (CRC32 comparator, fuzzy row,filter, etc.),HBase Quer

11、y Engine as New Hive,Backend Most “SELECT” automatically optimized to use HBase Query Engine “WHERE” using advanced scanner/filter “GROUP-BY” using distributed,aggregations “JOIN” stills go to MapReduce Software and Services Group,#,12,A Document Store on HBase (“Project Panthera”) Up-to 3x storage

12、reduction and 3x query speedup For Hive/MapReduce query processing on HBase (See and HBASE-6800) DOT (Document Oriented Table) on HBase, ,Each row contains a collection of documents Each document contains a collection of fields A document is mapped to a HBase column and serialized using Avro Complet

13、e transparent to existing HBase applications Software and Services Group,#,Software and Services Group,更易用 (Reduced Complexity),13, Better data mining and statistics capabilities, Full-text indexing and search, Statistic modeling with R language, Better integration with existing infrastructures, Geo

14、-distributed datacenters Full SQL support for OLAP,#,14,Full-Text Indexing and Search (英特尔Hadoop发行版) Full-text indexing and near real-time search for advanced data mining (E.g., log and click stream analysis, healthcare record analysis, etc.),Incremental full-text indexing on HBase Full-text indexin

15、g for semi-structured data (text, strings, numbers, etc.) Index incrementally built when records inserted or updated Support very high data insertion / update rate,Near real-time search Distributed, keyword or logical expression based search Zero delay of searching latest data that are just inserted

16、 Software and Services Group,#,Software and Services Group,Bring R Statistics into Hadoop,(英特尔Hadoop发行版),15,Distributed Statistic Modeling on Hadoop using R language,16,Data Center A Virtual Big Table,Cross-Datacenter BigTable/HBase (英特尔Hadoop发行版) A virtual Big Table overlaid over existing geo-distr

17、ibuted data centers, ,Global table view Data stored in geo-distributed, ,data centers Better locality & higher availability Data transfer eliminated through distributed,aggregation Data Center C Data Center B Async Replication Software and Services Group #,17,An analytical SQL engine for Hive/MapRed

18、uce (“Project Panthera”) Goal: Provide Full SQL support for OLAP in Hadoop Required by business users, enterprise applications, 3rd party tools (e.g., BI applications), etc. (See and HIVE-3472),Hive Parser,Hive-AST,HiveQL,Driver,Query,* Software and Services Group #,(Open Source) SQL Parser*,SQL- AS

19、T,SQL-AST Analyzer & Translator Subquery Multi-Table Unnesting SELECT ,Hive Semantic Analyzer INTERSECT MINUS Support Support,Hadoop MR,SQL,Hive- AST,#,Software and Services Group,更高效 (Improved Efficiency),18, Performance benchmarks & tools, Efficient utilizing of new HW platform technologies (e.g., SSD,infiniband),#,19,英特尔Hadoop发行版高效支撑海量移动上网 记录分析 联通全国移动用户上网记录查询分析系统, ,国内首个基于Hadoop/HBase的商用电信服务 系统 系统部署 英特尔Hadoop发行版 满足高性能的数据导入和快速查询。 稳定、易于部署和管理的企业级方案。 180+节点Hadoop/HBase集群 系统性能指标 上网记录入库时间:一般小于30分钟,实 际约10分钟 具备存储全国移动用户不小于6个月的原始 上网记录能力 统计分析的中间报表数据保存不小于5年 上网记录查询速度:不高于1秒 支持并发查询数目:1000请求/秒 Softwar

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论