云数据库的发展和技术原理_第1页
云数据库的发展和技术原理_第2页
云数据库的发展和技术原理_第3页
云数据库的发展和技术原理_第4页
云数据库的发展和技术原理_第5页
已阅读5页,还剩31页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、云数据库的发展和技术原理技术创新,变革未来目录数据库背景云数据库Oushu Database数据生态系统应用用户行为分析、反欺诈、用户画像、信用模型BIQlik, Power BI分析挖掘机器学习/AISAS, SPSS,TensorflowETLInformatica Talend KettleOLAP数据仓库(Data Warehouse)MPP, SQL-on-Hadoop, New Data Warehouse数 据 治 理数 据 安 全OLTP关系数据库,NoSQL, NewSQL全球数据仓库市场规模2016年达数百亿美金Cloud (公有云和私有云)9201403209133025

2、070727852.50%49.04%57.91% 53.54%43.55%60.00%50.00%41.114%0.00%30.00%20.00%10.00%0.00%1000080006000400020000120001027070.00%全球大数据市场规模2014201520162017201820192020市场体量增长率亿美元1038169224853517590726.47%63.01%46.87%41.53%45.36%20%40%58.69%934760%5000100001500067.96%1362680%中国大数据市场规模00%201420152016201720182

3、0192020市场体量增长率数据库:55年Database: 1962年出现Inverted File Database SystemSystem Development Corporation数据库的几个阶段1960s: Navigational DBMS (网状& 层次模型)Integrated Data Store (IDS)Information Management System (IMS)1970s - 1990s: SQL/Relational DBMSOLTP, Data warehouse, MPP2000s - Present: Post RelationalNoSQL (

4、XML, KV, Graph, Tree), NewSQL, NewDW数据库的核心数据模型& 查询语言查询优化和执行索引与存储事务处理网状/层次模型找出住在Harrison的所有客户Charles Bachman 1973 Turing Award关系模型Edgar F. Codd1981 Turing AwardJim Gray1998 Turing AwardMichael Stonebraker 2014 Turing AwardSelect customer_name From customerWhere customer_city = Harrison;找出住在Harrison的所

5、有客户A Relational Model of Data for Large Shared Data Banks.Graph/Tree/KV模型Key-ValueCassandra: CQLHBase: APIGraph ModelNeo4jGiraph/PregelTreeXML DatabaseMongoDBStreaming其他分类方法事务处理 vs 分析处理并行 vs 串行硬件:CPU vs GPU vs FPGA vs Memory云数据库 vs 非云数据库?数据仓库的演进DB实例2DB实例1DB实例4DB实例3磁盘磁盘磁盘磁盘MPPshare-nothing硬件/软件架构传统数仓

6、DB实例2DB实例1DB实例4DB实例3share-storage硬件/软件架构共享存储DB实例2DB实例1DB实例3分布式文件系统新一代数仓 (New Data Warehouse)share-nothing硬件架构 +软件实现distributed shared-storage磁盘磁盘磁盘硬件配置架构可扩展性缺乏弹性不易调整大多工业标准的x86服务器面向传统BI分析复杂的计算需求几十个节点工业标准的x86服务器面向大数据和人工智能支持数据湖弹性伸缩,支持CaaS平台灵活配置上千个节点缺乏弹性不易调整适用场景大多专有硬件平台面向传统的BI分析十几个节点Oracle, DB2Teradata,

7、 Vertica, Greenplum, RedshiftHive, HAWQ, SparkSQL, Snowflake数仓代表数据仓库引擎比较开源&开放 & 线性可 扩展受限的性能 及SQL兼容性私有软件 & 闭源 & 非线性可扩展高性能及SQL兼容性Amazon AthenaSQLNewDW的细分类别SQL on HadoopSparkSQL, Hive, HAWQ 2.x, PrestoSQL on Object StoreSnowflake (on S3), Amazon Athena (on S3)Hybrid: 有自己的存储,对外部存储可插拔HAWQ 3.x, Oushu Data

8、baseImpalaNewDW特性比较SQL on HadoopSQL on Object StoreHybridFeaturesHiveSparkSQLPrestoSnowflakeAthenaHAWQOushuImpala性能lowmiddlelowlowlowhightopmiddle可扩展性highhighhighhighhighhighhighhighUpdate/DeletebadN/AN/AweakN/AN/AGoodweak索引badN/AN/AN/AN/AN/AYesweakSQL兼容性middlemiddlebadmiddlebadgoodgoodmiddle高并发查询no

9、nononononoyesno云数据库数据库服务:Database as a serviceAmazon Redshift (ParAccel MPP)RDS (PostgreSQL, MySQL et al) 等等虚拟机镜像:Virtual machine image容器镜像:Docker image云数据库的不同之处部署运维的简单收费模式的不同弹性伸缩Oushu Database的前世今生2011年 - 常雷博士在EMC/Pivotal提出创意,HAWQ项目启动。2013年 - HAWQ 1.0发布,性能是Hive的数百倍。2014年 - HAWQ SIGMOD论文发表,得到国际数据库界认

10、可。2014年 - HAWQ为全球多家大型企业客户采用。2015年 - HAWQ开源成为Apache项目。2016年 - 常雷博士及HAWQ核心团队创立偶数科技。2017年 - 偶数得到国际顶级VC投资,致力于HAWQ的发展。2017年 - Oushu Database 3.0 企业版本发布,全新执行器,世界上最快的数据仓库HAWQ主要发展历程10倍性能提升Greenplum database (2003)Primary SegmentSegment hostMaster hostInterconnectPrimary SegmentMirror SegmentMirror SegmentPr

11、imary SegmentSegment hostPrimary SegmentMirror SegmentMirror SegmentPrimary SegmentSegment hostPrimary SegmentMirror SegmentMirror SegmentPrimary SegmentSegment hostPrimary SegmentMirror SegmentMirror SegmentData/Catalog replication replication Data/CatalogDegree of Parallelism = 8#Segment Per Node

12、= 4HAWQ Alpha: Greenplum Database on HDFS (2011)Primary SegmentSegment hostMaster hostInterconnectPrimary SegmentMirror SegmentMirror SegmentPrimary SegmentPrimary SegmentMirror SegmentMirror SegmentPrimary SegmentSegment hostPrimary SegmentMirror SegmentMirror SegmentPrimary SegmentPrimary SegmentM

13、irror SegmentMirror SegmentNamenodeBRack1 Segment hostRack2 Segment hostDatanodeDatanodeDatanodeMeta OpsDatanodeCatalog replication replication CatalogDatareplicationDegree of Parallelism = 8#Segment Per Node = 4Issues:Recovery complexityExpansion complexityManagement complexity (many segments per n

14、ode)Fixed Degree of ParallelismHAWQ 1.0 GA Architecture (2013)SegmentSegment hostMaster hostInterconnectSegmentSegment hostNamenodeRack1 Segment hostRack2 Segment hostDatanodeDatanodeMeta OpsDatanodeDatareplicationStatelessDegree of Parallelism = 8#Segment Per Node = 2SegmentSegmentSegmentSegmentDat

15、anodeSegmentSegmentCatalogIssues:Recovery complexityExpansion complexityManagement complexity (many segments per node)Fixed Degree of ParallelismHAWQ 2.0: Architecture Change (2016 Q2)SegmentSegment hostMaster hostInterconnectSegment hostNamenodeRack1 Segment hostRack2 Segment hostMeta OpsCatalogSta

16、telessDegree of Parallelism = Any (#vseg)#Segment Per Node = 1Resource ManagervsegvsegvsegvsegSegmentvsegvsegvsegvsegSegmentvsegvsegvsegvsegDatanodeDatanodevsegvsegvsegvsegSegmentDatanodeDatanodeDatareplicationIssues:Recovery complexityExpansion complexityManagement complexity (many segments per nod

17、e)Fixed Degree of Parallelism世界上第一个和PaaS/Docker云平台原生结合的并行SQL引擎HAWQ+ 3.0: Hornet Execution Engine (2017 Q3)SegmentSegment hostInterconnectSegment hostNamenodeRack1 Segment hostRack2 Segment hostMeta OpsStatelessHornet Execution Engine: SIMD/New hardwareCatalogMaster hostResource ManagervsegvsegSegmen

18、tvsegvsegSegmentvsegDatanodeDatanodeDatanodeDatanodeDatareplicationHornetHornetvsegHornetvsegvsegHornetSegment10 times fasterThe Fastest Engine in the WorldOushu Database 3.0 vs SparkSQL 2.2单位(毫秒ms)OushuSparkratioselect count(*) from lineitem;21.282555120.06select count(*) from lineitem;22.772440107

19、.16AVERAGE22.032497.50113.61count 不同数据类型的列单位(毫秒ms)OushuSparkRatioselect count(l_orderkey) from lineitem;306.70392512.80select count(l_partkey) from lineitem;274.35367413.39select count(l_suppkey) from lineitem;244.77346614.16select count(l_linenumber) from lineitem;133.67326524.43select count(l_quan

20、tity) from lineitem;110.12368933.50select count(l_extendedprice) from lineitem;112.05362732.37select count(l_discount) from lineitem;108.64388635.77select count(l_tax) from lineitem;115.14372332.33select count(l_returnflag) from lineitem;70.41459165.20select count(l_linestatus) from lineitem;73.0142

21、0857.64select count(l_shipdate) from lineitem;127.12421833.18select count(l_commitdate) from lineitem;135.43450633.27select count(l_receiptdate) from lineitem;134.36419331.21select count(l_shipinstruct) from lineitem;236.63431118.22select count(l_shipmode) from lineitem;177.66417323.49select count(l

22、_comment) from lineitem;344.94588517.06AVERAGE169.064083.7529.88sum/avg 不同数据类型的列单位(毫秒ms)OushuSparkRatioselect sum(l_orderkey) from lineitem;323.16341410.56select sum(l_partkey) from lineitem;298.30332111.13select sum(l_suppkey) from lineitem;263.69324312.30select sum(l_linenumber) from lineitem;154.

23、20319320.71select sum(l_quantity) from lineitem;128.39400431.19select sum(l_extendedprice) from lineitem;138.48404229.19select sum(l_discount) from lineitem;141.68350024.70select sum(l_tax) from lineitem;143.07353624.72select avg(l_orderkey) from lineitem;327.68351110.71select avg(l_partkey) from li

24、neitem;303.51358311.81select avg(l_suppkey) from lineitem;269.36333112.37select avg(l_linenumber) from lineitem;161.41319619.80select avg(l_quantity) from lineitem;131.92361427.40select avg(l_extendedprice) from lineitem;138.48355425.66select avg(l_discount) from lineitem;134.01361827.00select avg(l

25、_tax) from lineitem;137.92354925.73AVERAGE199.703513.0620.31group by (某一列) 取count单位(毫秒ms)OushuSparkRatioselect l_orderkey, count(*) from lineitem group by l_orderkey;14314.14OOMNANselect l_partkey, count(*) from lineitem group by l_partkey;4127.98292997.10select l_suppkey, count(*) from lineitem gro

26、up by l_suppkey;1142.611818115.91select l_linenumber, count(*) from lineitem group by l_linenumber;363.51957026.33select l_quantity, count(*) from lineitem group by l_quantity;370.151136730.71select l_extendedprice, count(*) from lineitem group by l_extendedprice;4929.78297366.03select l_discount, c

27、ount(*) from lineitem group by l_discount;392.411037126.43select l_tax, count(*) from lineitem group by l_tax;352.991037129.38select l_returnflag, count(*) from lineitem group by l_returnflag;545.861134620.79select l_linestatus, count(*) from lineitem group by l_linestatus;329.301121734.06select l_s

28、hipdate, count(*) from lineitem group by l_shipdate;638.511607725.18select l_commitdate, count(*) from lineitem group by l_commitdate;642.311616125.16select l_receiptdate, count(*) from lineitem group by l_receiptdate;647.121564924.18select l_shipinstruct, count(*) from lineitem group by l_shipinstr

29、uct;823.091153914.02select l_shipmode, count(*) from lineitem group by l_shipmode;630.631137118.03select l_comment, count(*) from lineitem group by l_comment;39032.16OOMNANAVERAGE(除去spark OOM 语句)1138.3015161.0721.66group by 不同数据类型的列,取其sum和avg单位(毫秒ms)OushuSparkRatioselect l_partkey, sum(l_partkey), a

30、vg(l_partkey) fromlineitemgroup by l_partkey;8333.37544706.54select l_suppkey, sum(l_suppkey), avg(l_suppkey)from lineitemgroup by l_suppkey;1527.321950512.77select l_linenumber, sum(l_linenumber),avg(l_linenumber) from lineitemgroup by l_linenumber;416.03991423.83select l_quantity, sum(l_quantity),

31、 avg(l_quantity)from lineitemgroup by l_quantity;390.821194930.57select l_extendedprice, sum(l_extendedprice), avg(l_extendedprice) from lineitem group by l_extendedprice;9148.20320053.50select l_discount, sum(l_discount), avg(l_discount)from lineitemgroup by l_discount;418.811075725.68select l_tax,

32、 sum(l_tax), avg(l_tax) from lineitem groupby l_tax;357.991073329.98AVERAGE2941.7921333.2918.98Group by 多列单位(毫秒ms)OushuSparkRatioselect l_partkey, l_suppkey, count(*) from lineitemgroupby l_partkey, l_suppkey;13074.79OOMNANselect l_partkey, l_linenumber, count(*) from lineitemgroupby l_partkey, l_li

33、nenumber;18091.03OOMNANselect l_suppkey,l_extendedprice, count(*) from lineitem group byl_suppkey,l_extendedprice;145543.51OOMNANselect l_partkey, l_shipmode, count(*) from lineitemgroupby l_partkey, l_shipmode;21298.14OOMNANselect l_partkey, l_shipdate, count(*) from lineitemgroupby l_partkey, l_sh

34、ipdate;71890.82OOMNANselect l_suppkey, l_tax, count(*) from lineitem group by l_suppkey, l_tax;3994.25283347.09select l_shipdate,l_commitdate, count(*) from lineitemgroupby l_shipdate,l_commitdate;3159.433281110.39select count(l_orderkey) from lineitemgroup by l_linenumber, l_quantity , l_tax;1179.8

35、51808015.32AVERAGE2777.8426408.3310.93Group by 表达式单位(毫秒ms)OushuSparkRatioselect l_partkey + l_suppkey, count(*) from lineitem group by l_partkey + l_suppkey;4050.55316017.80select l_partkey + 1000 from lineitem group by l_partkey + 1000;2869.51270839.44select l_tax * 100 from lineitem group by l_tax

36、*100;426.141000523.48AVERAGEgroup by 表达式2448.7322896.3313.57多个聚集函数单位(毫秒ms)OushuSparkRatioselect l_partkey, count(*),count(l_orderkey),sum(l_orderkey), avg(l_orderkey) from lineitem group by l_partkey;11878.22OOMNANselect l_suppkey,count(*),count(l_orderkey) ,sum(l_orderkey),avg(l_orderkey) from line

37、item group by l_suppkey;2399.98237459.89select l_linenumber,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey) from lineitem group by l_linenumber;698.181094315.67select l_quantity,count(*),count(l_orderkey) ,sum(l_orderkey),avg(l_orderkey) from lineitem group by l_quantity;702.601349619.21s

38、elect l_discount,count(*),count(l_orderkey) ,sum(l_orderkey),avg(l_orderkey) from lineitem group by l_discount;741.171266817.09select l_tax,count(*),count(l_orderkey) ,sum(l_orderkey),avg(l_orderkey) from lineitem group by l_tax;670.631204617.96select l_returnflag,count(*),count(l_orderkey) ,sum(l_o

39、rderkey),avg(l_orderkey) from lineitem group by l_returnflag;913.231281214.03select l_linestatus,count(*),count(l_orderkey) ,sum(l_orderkey),avg(l_orderkey) from lineitem group by l_linestatus;675.941244418.41select l_shipdate,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey) from lineitem group by l_shi

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论