版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
OPENING李钰
(绝顶)ASF
Member,Apache
Celeborn/Flink/HBase/Paimon
PMC
Member阿里云智能
EMR
负责人Data
TrendsAIGCfurther
promotestheexplosion
of
big
data
DataVolume:AIfurtherdrivesmassivedata
explosion,
far
exceeding
the
data
growth
of
the
previous
era
Data
Diversity:
Multimodaldata
processingwill
becomeastandardforfuture
data
processing,
including
storage,
computation,andmanagement
DataGovernance:Onedataservingdifferent
roles,
including
Data
Engineer/
Data
Analysts
/
Data
Scientists
/
AI
EngineersAnalytic
Data46%PicturesAI
Models
1%Others43%Vedio5%5%Data
WarehouseReportsDatawarehousesETLApplicationsData
LakehousestreamingAnalyticsstructured,
semi
structured
andunstructured
DataData
LakeRealtimeAnalyticsData
Explore
ETL
Data
warehousesData
Lakestructured,
semi
structuredandunstructuredDataThe
EvolutionofDataArchitectureMachineLearningMachineLearningDatascienceDatascienceReportsDatabaseData
WarehouseReportsData
warehousesETLApplicationsStrengthsWeaknessesExcellent
performance·
Data
Format
isnot
openout-of-box,
Easy
to
use·Lack
ofsupport
for
Non/semi
structureFriendly
toData
AnalystsDataAll
Data
notimmediatelyrequiredwill
be
discardedApplication
DataWarehouseTheData
warehouse
ArchitectureETL
PipelineDatabaseDatabaseData
LakeRealtimeAnalyticsData
Explore
ETL
Data
warehousesData
Lakestructured,
semi
structured
andunstructured
DataStrengths
unifiedstoragewith
lowcost·performance
isnotasgood
asDW
openDataand
Meta
FormatDataGovernance
is
notmature Fits
Both
BI
an
d
AI
Hard
to
construct
and
operateAnalyze
See
ResultsData
LakeThe
DataLake
ArchitectureIterateELT
ModelMachineLearningWeaknessesApplicationDatascienceDatabaseReportsstoreAlIData
LakeData
ExploreData
LakeData
LakehousestreamingAnalyticsstructured,
semi
structured
and
unstructured
DataData
WarehouseReportsDatawarehousesETLApplicationsDatabaseData
Lake
+Data
warehouse
=Data
Lake
houseMachineLearningDatascienceDevOpsComputingEnginesManagement
Services
Apache
Gra
vit
inoDataStorageAliba
baC
lo
udOSSGovernance
ServicesData
Formats
Apache
paimoncom
pos
ableopensourceLake
housesolution
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationTieredStorage
CompactionRealtime
ComputeMaxCompute
HologresE-MapReduceDataworks
IDE
Copilot
open
Lake
TheLake
house
solution
onAli
babacloudApplicationIngestionWorkflowDataGovernanceData
QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabaseBUILDOPENSOURCECOMPATIBLE
LAKEHOUSEONALIBABACLOUD李钰
(绝顶)ASF
Member,Apache
Celeborn/Flink/HBase/Paimon
PMC
Member阿里云智能
EMR
负责人F
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tchQ
u
erie
sPaimon
Paimon
PaimonLake
houseprocessingpipelinebin
logRD
BM
SLogsHologresF
lin
kSQ
L
LakeGovernanceD
ata
S
erving
System
sA
D
SO
D
S
D
WD
D
WS
Lake
Format
Lake
StorageF
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tchF
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tch
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationMetaStoreLineageAuthenticationAuthorization
TieredStorageE图
CompactionRealtime
ComputeMaxCompute
Hologresssa$storrocksE-MapReduceDataworks
IDE
Copilot
DataGovernanceDataQualityRecap
TheLake
house
solution
onAli
babacloudApplication
Ingestion
Open
LakeWorkflowDatabaseResilient•Enterprise
remote
shuffle
service(RSS)solutionto
support
better
elasticity•
On-demandandseamless
rescaling•Native
integration
with
DLF
and
OSSEasyto
Use•
One-stopdataengineering
support•
Visualized
jobandworkflow
monitor•
Convenient
resourceandsession
managementFlexible•Rich
Open
API
supplied
forintegration•100%compatible
with
open
sourceusage,
bothAPIand
binaryaspect•Rich
ecology
supportedFast•Native
Engine
supported,
3X
fasterthanopen
source
Spark•Enhanced
RSS
supplies
1.5Xthroughputfor
IO-intensiveappsServerlessSparkTransforms
Data
ManagementwithOne-Stop,
Fully
ManagedServicesfor
Seamless
Development,
Scheduling,
and
Maintenance.100%CompatiblewithOpen-sourceSpark,
3X
Fasterwith
Fusion,an
Enterprise
Native
Engine.EMR
server
less
sparkApp
ScenarioControl
PlaneRemote
ShuffleSpark
Native
EngineCompute
PlaneData
IOStorage
LayerLake
FormatsObjectStorage
ServiceEnterpriseCache
ServiceSecurityandAuth(DLF)EnterpriseRemoteShuffle
Serviceproduct
ArchitectureDashboard
ReportOperationalAnalyticsData
DiscoveryMeta
ServiceData
EngineerSchedulingIntelligent
MaintenanceVersionControlAccountingData
ScienceConnection
ManagementResource
Usage
MonitoringSession
Management
(Resourcefor
Interactive
Query)Queue
Management
(Resourcefor
ETL)controlplanework
space
AdministrationVersion
ControlSQL
EditorArtifacts
ManagementCatalog
Viewcontrolplane
DataEngineeringIntelligent
DiagnoseJob
ListLogsMetricscontrolplane
Job
Monitor
and
DiagnoseWorkflow
ListWorkflow
Instance
Monitor
–
GlobalViewCanvas
EditorWorkflow
Instance
Monitor
–
Single
ExecutionViewcontrolplane
work
flow
Managementx86
(Intel/AMD)andARMsupportHardware
awareness
optimization•SVE
SIMD
acceleration•
zstd-ptg
compression
accelerationNative
C++Integration•
OSS-HDFSSupport•Deep
Parquet
and
ORC
integration•
Paimon
、Delta
Lake
andIcebergsupportVectorized
Execution
Engine•
Native
Operator•
SIMDJson
OptimizationFastColumnarShuffle•EnterpriseRSS
basedon
ApacheCeleborn•
Datashuffle
reduced
upto
40%computeplane
FusionEngineFusion
isanenterprise
nativeenginewhich
is3X
Fasterthan
the
open
source
Spark
Java
engineTesting
Environment•
6d3s.
16xlargeECSserver•
Alibaba
Cloud
Linux
3•OpenJDK
1.8.0•ApacheTop
Level
Project,donated
byAlibaba
Cloud•De-facto
RSS
choice,
used
byAlibaba,
LinkedIn,
etc.Multi-•Enterprisesecurity
assurancewith
data
encryptionTenancy•Enhanced
IOscheduling,flow
controland
quota
management•Widelyadopted
inAlibaba,
used
by
bothSpark
and
Flink•Successfullysupportsjobwith600TB+shuffle
data•69%
Performance
boostthanYARN
externalshuffle•Performance
gain
increaseswithshuffle
data
scaleFunctionalit•
SupportsSpark
DRAy•
SupportsSparkAQE•8d2s.10xlarge
ECSservers•
AlibabaCloud
Linux
3•OpenJDK
1.8.0•Spark
3.3.1•Shuffle
Partition
=8000computeplan
EnterpriseRemote
shuffle
serviceRSS
removesthedependencyon
localdiskfor
shuffle
data
and
enables
100%
disaggregation
of
compute
and
storageScalabilityPerformanceTestEnvironmentOpen
SourceWorkflow
IntegrationOpenAPI•
Workspace•
Job
Runs•SQL
Editor•
WorkflowsTools•
Spark-submitCompatibleJob
Submission•
Notebook•Git
integration(Planning)Alibaba
Cloud
Product
IntegrationOSS-HDFSMaxCompute
DLF
DataWorksopenAPI
and
EcosystemFunctionDatabricksEMRServerlessSparkNative
EngineYESYESSQL
EditorYESYESWorkflow
ManagementYESYESDebuggingand
MonitorYESYESIntelligent
DiagnoseNOYESCatalogandAuthenticationYESYESData
&
FSYES(DBFS)YES(OSS-HDFS)AuditingYESYESNotebookYESYESCI/CDwith
GitYESNOAssistant/CopilotYESNOML&
Vector
ServingYESNOEMR
server
less
spark
vs.Data
bricks
Function-wise
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationMetaStoreLineageAuthenticationAuthorization
TieredStorageE图
CompactionRealtime
ComputeMaxCompute
Hologresssa$storrocksE-MapReduceDataworks
IDE
Copilot
DataGovernanceDataQualityTheLake
house
solution
onAli
babacloudApplication
Ingestion
Open
LakeWorkflowDatabase•Large
scale
data
analytics•
SIMD-Optimizedqueryengine•High
speed
real-time
data
ingestion•Innovative
pipeline
executionengine•Full
stack
vectorized
technology•Innovative
CBO
technology•Multi-dimensional
LakehouseAnalyticswith
rich
lakedataformat
support•Materialized
Views
and
ETL
support•High
concurrency
support
(10k
persec)•Real-time
data
analysis•Diverse
data
model
support•Maintenance
free
with
high
SLA•
Compatiblewith
MySQL
protocol•
Compatiblewith
multiple
BItools•
Supportsslowquery
diagnose•
Visual
metadata
management•Easy
migration
with
cluster
link
tool•
Out-of-box,
minute
level
delivery•Efficient
resilience
support•Deep
integration
with
DLF
and
VVP•DisAgg
and
Virtual
Warehouse
supportServerless
StarRocksOffersa
High-Performance,All-Scenario,
Blazing-Fastand
Unified
Data
LakehouseAnalyticsService.100%CompatiblewithOpen-sourceStarRocks,
3X
Fasterthantraditional
OLAP
(Presto/Trino,
ClickHouse,
Druid..)
providing.Easy-to-use
Cloud-nativeEMR
server
lessFastUnifiedstar
ROCKSApplication
Scenario
Ad-hoc
dashboard
Operation
analytics
User
profile
Real-time
analytics
Self-service
reporting
…
Product
LayerStarRocks-instanceLayerStoragelayerAuto-ScalingLakehouse
Analytics
Shared-NothingArchitecture
HIVEData
LakeTable
FormatStarRocksTable
FormatData
LakeFast
and
unified•Acomprehensivevectorizedexecutionengine,modernizedcost-based
optimizer
(CBO),
with
concurrency
reachingtens
ofthousandsofqueries
persecond
(QPS).•
Fully
compatible
with
datalake
formats,
offering
morethan
a3X
performance
improvement
relative
to
Trino.•Supports
materialized
view
ELT
scenarios,enabling
one-
step
data
tier
processing.Separationofstorageandcompute•Optimizedcomputationalelasticity
for
on-demand
usage,with
the
potentialto
reduce
storagecosts
by
up
to
60%.•
Offers
multi-computing
cluster
capabilities,
ensuring
resourceisolation
between
different
business
unitswithout
interference.•
Various
caching
strategies
available,
allowing
customers
to
flexibly
configure
according
to
their
business
needs.Use
withease•Outofbox,theStarRocksManageroffersa
wide
rangeof
enterprise-level
features.•Intelligent
diagnostics
and
analysis,
providingcomprehensive
analysisinconjunction
withcustomer
business
operations.Data
Loading
Security
SQL
profiling
Audit
log
…Configuration
Monitoring
andManagement
alertVirtualWarehouseVirtualWarehouseVirtualWarehouseproduct
ArchitectFEFEFEData
CacheData
CacheData
CacheHealth
analysis
Upgrading
…InstanceManagementSQL
EditorCNCNCNCNCNCNCNCNCNStarRocks
ManagerStarRocks
ConsoleInstance
MonitorOne-Stop
SQL
Editand
QuerySlowSQL
Profileand
DiagnoseInstance
Diagnosecontrolplanestar
ROCKS
ManagerFully
ManagedExtreme
ElasticityOne-stop
DevandAnalyzeDis-aggregation
SupportHighlightsMaturityADSMVAccelerationcomputeplane
Fastand
stableLakehouse
Hierarchy3x-5xfasterthanTrinoSignificantlyfasterthan
ClickHouseandApache
DorisHive/Paimon/Iceberg/HudiHive/Paimon/Iceberg/HudiSupportexternal
MVand
Lakehouse
HierarchySophisticatedcachingandtieredstoragecapabilityOn-demandSecond-level
Elasticitywith
LowCostComprehensive
loadanalysisanddiagnosticHigh
PerfElasticityLakeQueryAccelerationDWDLocal
CacheCompute
NodeLocal
CacheCompute
NodeODSData
LakeData
LakeQueryAccelerationLakehouseBuild-upStarRocksStarRocksData
IngestionData
IngestionWarehouseWarehouseDWS
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Realtime
Computessa$storrocksE-MapReduceMaxCompute
HologresDataworks
IDE
Copilot
DataGovernanceDataQualityRecap
TheLake
house
solution
onAli
babacloud
TieredStorageE图
CompactionApplication
Ingestion
Open
LakeWorkflowData
Lake
FormationAuthenticationAuthorizationLineageMetaStoreDatabaseAPIs•HMS
Compatible•Import/Export
from
/
to
HMS•
MySQL
JDBC•
Open
API
&
SDKsFunctionality•Table
Schema•
TableLineage
(WIP)•
Meta
Retrieval•
MetaStats
forCBOFullyManaged•
Serverless,
Elastic•
High
Available•
HighThroughputs•
OpenAPI
/
SDKLake
Formats•ApachePaimon•
Apache
Iceberg•ApacheHudi•
Databricks
DeltaMetaDataManagementAuditing•
Audit
Log
for
Authorization•
Audit
Log
for
Meta
Operation•
Audit
Log
for
Data
Operation
(WIP)Authorization•
RBAC•Policy&ACL(WIP)Modes•
ApacheRangerCompatibleEnterprise-class
securityAuthentication•
Open
LDAP•
Kerberos
(WIP)•AlibabaCloud
RAMOpen
LakeHot
LayerWarm
LayerCold
LayerIntelligent
optimizationCompaction
ManagerTieredStorage
ManagerMeta
StoreCompactCompactStatsThanksYu
Liliyu@Paimon
+
DLF打通阿里云自研和开源计算引擎李劲松Apache
Paimon
PMC
Chair1.
Open
Lake:
一套存储对接全生态2.Apache
Paimon
与开源计算引擎3.Apache
Paimon
与自研计算引擎4.Apache
Paimon
实践场景CONTENTS1.openLake:一套存储对接全生态
+
Kafka
湖格式
SDK
读写
湖仓一体元数据湖格式+AITo
Be
Continue…+
内表
+
Parquet
+
Kafka
Hologres
+
内表MaxCompute
+
内表+
内表
+
Parquet
Hologres
+
内表MaxCompute
+
内表0101010101010101010101010100101OSS
数据湖
10101010101010101010101010100101OSS
数据湖
1数据湖到湖仓一体数据交换OSS
文件读写数据架构的选择批式数仓实时湖仓实时数仓
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationTieredStorage
CompactionRealtime
ComputeMaxCompute
HologresE-MapReduceDataworks
IDE
Copilot
open
Lake
TheLake
house
solution
onAli
babacloudApplicationIngestionWorkflowDataGovernanceData
QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabase2.Apache
pai
mon
与开源计算引擎BatchAggregate实时升级streamingpart
ia
updatestreamingAggregateODSDWDDWS•共享存储,计算平权•流批一体,实时升级•实时离线,极速查询•性能成本,业界领先
Apache
Paimon001011OSS
MaxCompute
HologresongoingPaimon
+开源大数据Ingestionit算平台事业部COM
PUTING
PLATF○
RMApplication实时OLAP
OLAPstreaming
IngestionBatchLeftJoin01010101010101010101101010阿里云
F
link+
pai
mon:streamingLake
house多表数据打宽Partial-Update;大规模Lookup
Join流写更新入湖主键表高性能更新;丰富的合并引擎离线数据加速流写流读取代队列;索引查询加速流读变更日志生成完整的变更日志,解锁流读4545阿里云
spark+
pai
mon:
离线处理一流性能TPC-DSSF1TPerformanceBaseline+DPP+自适应scan并发+native+ALL2.521.510.50Normalized
Performance(Higher
is
better)阿里云
star
ROCKS
Pai
mon:
离线数据极速阿里云
star
ROCKS
Pai
mon:Deletion
vectors模式3.Apache
pai
mon
与自研计算引擎
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)DLF打通自研计算引擎•MaxCompute:
ExternalSchema
•Hologres:
External
DatabaseMaxCompute
HologresDataLakeInformation:BridgetoMC&Ho
lo
Data
Lake
Formation
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)即将发布•
内置
Paimon•Native
加速•DeletionVectors支持•
ALIORC格式•
批写支持MaxComputeMax
compute+
pai
mon
Data
Lake
Formation
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)即将发布•Native加速-Append
No
PKTable-
DeletionVectors
Mode
HologresHol
ogres+
pai
mon
Data
Lake
Formation4.Apache
pai
mon
实践场景ODS
主键表streaming异步compactionDWDAppend
表changelog=lookupApache
Paimon00101Data
Lake某新能源汽车公司在阿里云上的实践
Application
DatabaseStreamingIngestionLSM
Tree
010101010101010101011010101streaming异步compactionBatchDWSAppend
表ODS主键表changelog=inputDWD主键表deletion-vectorsApache
Paimon00101Data
Lake某游戏公司在阿里云上的实践
Application
DatabaseStreamingIngestion
实时OLAPLSM
Tree010101010101010101011010101ODSAppend
表Cluster:Z-order索引:
bloomfilter/
bitmapApache
Paimon00101Data
Lake某本地生活公司在阿里云上的实践
Application
Database
高性能OLAPStreamingIngestionLSM
Tree010101010101010101011010101Thanks李劲松Apache
Paimon
PMC
Chair阿里云实时湖仓及Flink产品技术介绍李鲁兵(云觉)阿里云计算平台1
大数据实时湖仓发展趋势洞察2
基于阿里云实时计算F
link构建实时湖仓3
阿里云实时计算F
link
产品能力解读CONTENTS4
典型落地架构及案例分享01
大数据实时湖仓发展趋势洞察3.01.0引入数仓数据湖2023~2020-20222009-2019数据仓库
流式分析BI>
大数据进入实时化湖仓时代!AI驱动,
公共云优先!实时化、AI化!引领原生湖仓实时化AI化2.0融入湖仓融合结构化,半结构化及非结构化数据数据湖数据科学机器学习02
基于阿里云实时计算Flink构建实时湖仓实时湖仓
(streamingLakehouse)
综合性价比最优选择分钟级新鲜度秒级查询响应低成本全链路实时具备Lakehouse特性具备Streaming特性StreamingLakehouseStreaming+
Lakehouse:T
+
1mWarehouse:T+
1Lakehouse:T
+
1
/T
+
1h性能
新鲜度Streaming:T+
1s成本EMRLogs①一键入湖CTASCDASFlink流
/
批Queries③AD-HOC查询②流读流写Flink流
/
批④批读批写调度
工作流方案原理•低成本OSS存储构建Paimon•深度集成Flink全链路实时化核心优势•低成本全链路实时化•流批存储计算统一•一套平台具备数据管理、调度
、临时查询等能力•开放支持多引擎适用场景•离线全链路实时加速•实时链路降本•流批存储计算统一Data
Lake
(OSS/OSS-HDFS)实时湖仓整体方案F
link
Max
computeHol
ogresFlink流
/
批DatabaseQueriesQueries实时湖仓全链路实时加速端到端,全链路实时流动,实时更新,分钟级新鲜度,
全链路可查,
秒级查询响应!•
开放支持多种Olap引擎•
外表方式查询秒级响应•也可直接upload到引擎•
基于内存优化查询性能•Upsert/Partial-Update•Real-Time
Ingestion•Changlog
Producing•
TimeTravel•
LookupJoin•BatchOverwrite/Query•Flink流计算事实标准•
开放支持多种计算引擎•
流写流读•
批写批读•
临时查询/点查•
Streaming
ETL•
全增量一体•Schame
Evolution•整库/分库分表•
断点续传数据计算Flink及其他引擎数据存储Paimon(OSS)Table
Format数据摄取Flink
CDC数据查询OLAP引擎实时入湖入仓-简化操作CTAS分库分表合并同步
CDAS整库同步Mysql
Paimon(OSS)临时查询实时入湖入仓
兼容表变更(schemaEvolution)•
支持通过Catalog来实现元数据的自动发现和管理•
配合CTAS语法,实现数据的同步和表结构变更自动同步•
支持读取数据变更和表结构变更并同步到下游,数据和表结构变更都可以保证顺序•同步到Paimontable时Partitionby可自动兼容有无分区字段Order_dbPaimon_orderMysqlPaimon(OSS)More
sources
are
on
the
wayHudiIcebergHologresPaimonTiDBClickHouseD
ata
Stream
API实时入湖入仓-多种过程操作Flink
CDCSQ
L
APISELECTG
RO
U
P
BYag
gregateW
H
EREflatM
apm
apTop-NJO
INjo
inIN
SERTkeyByfilter•
基于OSS/HDFS等低成本存储•
基于LSM读写性能平衡•
Lakehouse特性全支持•
changelog机制数据实时流动Paimon
LSMTree000
0
000低延时低成本流批存储易集成
Distributed
FileSystem(HDFS/OSS/S3)
实时湖仓低成本存储1
1
11
111$
files
Flink
SQLSink•Apache
Paimon
内置Sink,屏蔽复杂性支持数据流批计算Apache
PaimonFile
Store实时写入Log
Store
Flink
SQL
Flink
SQL•
LSM支持
Update/Delete•
列存格式,支持压缩等优化•
支持全量批式读取
•
Table
的操作记录•
支持插件化实现•通过两阶段提交保证数据Exactly
Once•
Table
的文件存储形式
Batc
h
Log
Store
St
rea
mFile
Store•
支持增量流式订阅03
阿里云实时计算Flink产品能力解读流&批计算多语言多版本动态CEP统一元数据(catalog)开发生产隔离测试数据管理测试数据生成快速运营调试临时查询对接外部开发平台如Git等Flink
CDC•
全增量一体•
整库整表合并/分库分表•Yaml模版•
断点续传
数据连接器•
30+种主流数据产品•
自定义connector&Format批任务调度数据血缘智能诊断自动调优资源队列管理状态管理变量管理密钥管理监控告警阿里云实时计算Flink产品丰富的企业级能力安全细粒度权限管理RBAC空间隔离上下游SSL支持运维数据摄取任务开发&调测试升级企业级安全能力基础设施、平台系统安全多维度,提供全面的安全加固功能来保障数据安全!独立大规模集群及网络隔离环境阿里云数据中心数据中心保障设施
多层次的服务安全部署设计
数据中心网络安全访问控制与权限管控•阿里云账户体系身份识别•阿里云账号体系全面适配,包括阿里云账号,资源目录、云
SSO等•RAM权限控制•
集成RAM体系,支持RAM用
户以及角色登录鉴权RABC细粒度权限管理支持内置角色以及自定义角色,
实现细粒度操作授权数据安全•
密钥托管•
支持配置密钥,避免明文AccessKey带来的安全风险•
自动备份恢复•
采用存储计算分离架构,数据以及作业状态备份•
操作审计•
对接ActionTrail实现对事件的监控告警、及时审计、问题回溯分析安全隔离•网络隔离•
VPC专有网络安全可靠、灵
活可控•
支持上下游服务域名管理•
通过阿里云提供的NAT网关实现VPC网络与公网网络互
通•
租户隔离•
多租户资源隔离•
用户数据存储隔离业务中断数据泄露权限控制不足安全攻击Flink平台系统安全云上大数据服务如何保障企业数据和服务安全构建全面、多层次的安全管理能力,持续保护云上数据及服务安全全链路数据集服务高可用设计Flink基础设施安全Flink服务部署环境同城容灾与恢复数据中心安全管控发布openAPIv2版本更易集成deploymentTarget改造deployment动态更新自定义connector管理lineage数据血缘catalog管理UDF
注册重启作业指标分析综合各指标生成调优计划
执行计划部署集群基于业务处理复杂度与数据流量,资源动态调整作业资源自动调优Flink
MetricAutopilot推断可加入
MiniBatch
confFlink
RestfulAPI动态更新作业资源利用率低成本高(
易发生FailOver作业吞吐低,延迟高作业AGG算子处理能力达到瓶颈其他诊断系统作业管理平台ll更新作业配置采集指标Autopilot启动速度慢过低配置过高配置04
典型落地架构及案例分享•Hologres
、Paimon都具备流式访问能力,故数仓各层可以根据存储成本、业务时效性进行选择•
数据直接入Hologres:提供秒级时效性+极致OLAP性能•
数据构建在Paimon上+用Hologres进行查询加速:提供分钟级时效性+秒级OLAP性能•OLAP引擎可选,支持StarRocks
、Trino等OSS(Paimon)Flink
SQL
Hologres!简单SQL探查
!
OLAP查询分析
Flink奥型参考方案架构Paimon(OSS)Binlog
FlinkOSS(Paimon)FlinkDWDHologres
BinlogFlinkDWS
ADSPaimon
(OSS)
Binlog
DashboardsHologresHologresHologresODSFlink开发效率提升进一倍
,每年节省存储成本KW
,查询效率提升3倍;•从两条链路简化到一条链路,简化了系统的复杂度;运维工作复杂度大幅减轻;•一套SQL/Table
、一套schema,大幅提升开发效率;•大量缩减Kafka集群,每年节省KW成本;•
中间数据可直接查询,通过starRocks查询,相比Presto/Impala速度提升3倍以上;
Log
应用库
databa
CDC
Paimo
工
Paimon聚合
Paimon
算法库se
(OSS)
(OSS)
(OSS)n加国内出行知名互联网企业,月活千万用户;
客户基于开源hadoop体系进行自建,实时业务比重较大,
实时大数据资源超过离线数据处理;通过Flink+kafka链路处理实时数据,通过spark/hive/Trino处理离线数据;过程中,两条技术栈开发、维护成本高,存储成本高,离线实时分别存储;流处理中间数据查询困难;Impala/PrestoStarRocksADSkafka增量ADSPresto离线链路解决方案背景介绍达到效果典型客户落地案例Flink
Flink
Flink应用库报表算法库ODSkafka
dumpODSHiveDWDkafka
dumpDWDHiveFlink聚合离线聚合Flink加工离线加工Logdataba
seFlink+Paimon+StarRocksODS
DWD
ADS数据集成演进架构原有架构业务痛点实时链路报表Thanks云觉钉钉:
tute2014茶歇Flink
+
Paimon
+
Hologres在阿里巴巴智能引擎的生产实践王伟骏(鸿历)阿里巴巴智能引擎事业部技术专家CONTENTS1、产品背景简介2、解决方案举例
---
搜索离线平台3、生产作业调优及社区合作4
、
Future1、产品背景简介BinlogTransactions
Message
QueueAlgorithmdataEventsLogsDatabaseMysqlODPSPaimon…MessageQueueOfflineSystemStreamProcessingBatchProcessingODPSPaimonHologresFileSystem…
SearchEngine
AdvertisingEngine
RecommendationEngine
SampleEngine
…基于该业务场景我们做了一个提供AI
领域e2e
的ETL
数据处理解决方案的产品1、异构数据源多2、业务多且逻辑复杂3、性能调优难、运维门槛高业务场景及产品定义…UI&&WebIDE(开发、配置、运维、监控、报警)产品端核心功能依赖组件Hologres分布式
kv
存储数据集成样本处理SQLAdHocOLAP流计算批计算流批一体用户插件调度编排AirflowCatalog(Meta、版本、血缘、
Dataset)天猫本地生活菜鸟高德AE飞猪LazadaOpenSearch…
ASI(支持
K8S
协议的统一调度、统一资源池)Swift消息队列Pangu(分布式文件系统)Paimon湖格式湖表存储优化服务VVP提作业、开发、运维Celeborn统一Shuffle服务Restune作业弹性资源Embedding计算产品技术架构支持业务
淘宝
ConnectorCDC图像检索样本平台HA3ODPSPaimon视觉平台离线推理…特征
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 边界安全技术培训内容课件
- 数学奥林匹克竞赛模拟试题真题及答案
- 神经内科专科护士试题(四)及答案
- 车队雨季安全培训总结课件
- 车间级生产安全培训课件
- 酒店客房设备维护与故障处理制度
- 酒店设备设施报废制度
- 车间级别安全培训内容课件
- 银行支付清算业务处理制度
- 2026年度第三季度医保知识培训考试试题及答案
- 2026长治日报社工作人员招聘劳务派遣人员5人备考题库含答案
- 期末教师大会上校长精彩讲话:师者当备三盆水(洗头洗手洗脚)
- (2025)医院医疗质量安全管控与不良事件防范专项总结(3篇)
- 2026年江西制造职业技术学院单招职业适应性考试模拟测试卷附答案
- 《中国特色高水平高职学校和专业建设计划(2025-2029年)》深度解读课件
- 2025耐高压置入导管增强CT使用与安全专家共识课件
- 內蒙古能源集團招聘笔试题库2026
- 2025四川雅安市名山区茗投产业集团有限公司招聘合同制员工10人参考题库附答案
- 生产线操作员技能培训规范手册
- 人工智能应用与实践 课件 -第5章-智能体开发与应用
- 林草监测与保护:空天地一体化体系构建方案
评论
0/150
提交评论