2025阿里云开源大数据Workshop·杭州站_第1页
2025阿里云开源大数据Workshop·杭州站_第2页
2025阿里云开源大数据Workshop·杭州站_第3页
2025阿里云开源大数据Workshop·杭州站_第4页
2025阿里云开源大数据Workshop·杭州站_第5页
已阅读5页,还剩134页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

OPENING李钰

(绝顶)ASF

Member,Apache

Celeborn/Flink/HBase/Paimon

PMC

Member阿里云智能

EMR

负责人Data

TrendsAIGCfurther

promotestheexplosion

of

big

data

DataVolume:AIfurtherdrivesmassivedata

explosion,

far

exceeding

the

data

growth

of

the

previous

era

Data

Diversity:

Multimodaldata

processingwill

becomeastandardforfuture

data

processing,

including

storage,

computation,andmanagement

DataGovernance:Onedataservingdifferent

roles,

including

Data

Engineer/

Data

Analysts

/

Data

Scientists

/

AI

EngineersAnalytic

Data46%PicturesAI

Models

1%Others43%Vedio5%5%Data

WarehouseReportsDatawarehousesETLApplicationsData

LakehousestreamingAnalyticsstructured,

semi

structured

andunstructured

DataData

LakeRealtimeAnalyticsData

Explore

ETL

Data

warehousesData

Lakestructured,

semi

structuredandunstructuredDataThe

EvolutionofDataArchitectureMachineLearningMachineLearningDatascienceDatascienceReportsDatabaseData

WarehouseReportsData

warehousesETLApplicationsStrengthsWeaknessesExcellent

performance·

Data

Format

isnot

openout-of-box,

Easy

to

use·Lack

ofsupport

for

Non/semi

structureFriendly

toData

AnalystsDataAll

Data

notimmediatelyrequiredwill

be

discardedApplication

DataWarehouseTheData

warehouse

ArchitectureETL

PipelineDatabaseDatabaseData

LakeRealtimeAnalyticsData

Explore

ETL

Data

warehousesData

Lakestructured,

semi

structured

andunstructured

DataStrengths

unifiedstoragewith

lowcost·performance

isnotasgood

asDW

openDataand

Meta

FormatDataGovernance

is

notmature Fits

Both

BI

an

d

AI

Hard

to

construct

and

operateAnalyze

See

ResultsData

LakeThe

DataLake

ArchitectureIterateELT

ModelMachineLearningWeaknessesApplicationDatascienceDatabaseReportsstoreAlIData

LakeData

ExploreData

LakeData

LakehousestreamingAnalyticsstructured,

semi

structured

and

unstructured

DataData

WarehouseReportsDatawarehousesETLApplicationsDatabaseData

Lake

+Data

warehouse

=Data

Lake

houseMachineLearningDatascienceDevOpsComputingEnginesManagement

Services

Apache

Gra

vit

inoDataStorageAliba

baC

lo

udOSSGovernance

ServicesData

Formats

Apache

paimoncom

pos

ableopensourceLake

housesolution

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationTieredStorage

CompactionRealtime

ComputeMaxCompute

HologresE-MapReduceDataworks

IDE

Copilot

open

Lake

TheLake

house

solution

onAli

babacloudApplicationIngestionWorkflowDataGovernanceData

QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabaseBUILDOPENSOURCECOMPATIBLE

LAKEHOUSEONALIBABACLOUD李钰

(绝顶)ASF

Member,Apache

Celeborn/Flink/HBase/Paimon

PMC

Member阿里云智能

EMR

负责人F

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tchQ

u

erie

sPaimon

Paimon

PaimonLake

houseprocessingpipelinebin

logRD

BM

SLogsHologresF

lin

kSQ

L

LakeGovernanceD

ata

S

erving

System

sA

D

SO

D

S

D

WD

D

WS

Lake

Format

Lake

StorageF

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tchF

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tch

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationMetaStoreLineageAuthenticationAuthorization

TieredStorageE图

CompactionRealtime

ComputeMaxCompute

Hologresssa$storrocksE-MapReduceDataworks

IDE

Copilot

DataGovernanceDataQualityRecap

TheLake

house

solution

onAli

babacloudApplication

Ingestion

Open

LakeWorkflowDatabaseResilient•Enterprise

remote

shuffle

service(RSS)solutionto

support

better

elasticity•

On-demandandseamless

rescaling•Native

integration

with

DLF

and

OSSEasyto

Use•

One-stopdataengineering

support•

Visualized

jobandworkflow

monitor•

Convenient

resourceandsession

managementFlexible•Rich

Open

API

supplied

forintegration•100%compatible

with

open

sourceusage,

bothAPIand

binaryaspect•Rich

ecology

supportedFast•Native

Engine

supported,

3X

fasterthanopen

source

Spark•Enhanced

RSS

supplies

1.5Xthroughputfor

IO-intensiveappsServerlessSparkTransforms

Data

ManagementwithOne-Stop,

Fully

ManagedServicesfor

Seamless

Development,

Scheduling,

and

Maintenance.100%CompatiblewithOpen-sourceSpark,

3X

Fasterwith

Fusion,an

Enterprise

Native

Engine.EMR

server

less

sparkApp

ScenarioControl

PlaneRemote

ShuffleSpark

Native

EngineCompute

PlaneData

IOStorage

LayerLake

FormatsObjectStorage

ServiceEnterpriseCache

ServiceSecurityandAuth(DLF)EnterpriseRemoteShuffle

Serviceproduct

ArchitectureDashboard

ReportOperationalAnalyticsData

DiscoveryMeta

ServiceData

EngineerSchedulingIntelligent

MaintenanceVersionControlAccountingData

ScienceConnection

ManagementResource

Usage

MonitoringSession

Management

(Resourcefor

Interactive

Query)Queue

Management

(Resourcefor

ETL)controlplanework

space

AdministrationVersion

ControlSQL

EditorArtifacts

ManagementCatalog

Viewcontrolplane

DataEngineeringIntelligent

DiagnoseJob

ListLogsMetricscontrolplane

Job

Monitor

and

DiagnoseWorkflow

ListWorkflow

Instance

Monitor

GlobalViewCanvas

EditorWorkflow

Instance

Monitor

Single

ExecutionViewcontrolplane

work

flow

Managementx86

(Intel/AMD)andARMsupportHardware

awareness

optimization•SVE

SIMD

acceleration•

zstd-ptg

compression

accelerationNative

C++Integration•

OSS-HDFSSupport•Deep

Parquet

and

ORC

integration•

Paimon

、Delta

Lake

andIcebergsupportVectorized

Execution

Engine•

Native

Operator•

SIMDJson

OptimizationFastColumnarShuffle•EnterpriseRSS

basedon

ApacheCeleborn•

Datashuffle

reduced

upto

40%computeplane

FusionEngineFusion

isanenterprise

nativeenginewhich

is3X

Fasterthan

the

open

source

Spark

Java

engineTesting

Environment•

6d3s.

16xlargeECSserver•

Alibaba

Cloud

Linux

3•OpenJDK

1.8.0•ApacheTop

Level

Project,donated

byAlibaba

Cloud•De-facto

RSS

choice,

used

byAlibaba,

LinkedIn,

etc.Multi-•Enterprisesecurity

assurancewith

data

encryptionTenancy•Enhanced

IOscheduling,flow

controland

quota

management•Widelyadopted

inAlibaba,

used

by

bothSpark

and

Flink•Successfullysupportsjobwith600TB+shuffle

data•69%

Performance

boostthanYARN

externalshuffle•Performance

gain

increaseswithshuffle

data

scaleFunctionalit•

SupportsSpark

DRAy•

SupportsSparkAQE•8d2s.10xlarge

ECSservers•

AlibabaCloud

Linux

3•OpenJDK

1.8.0•Spark

3.3.1•Shuffle

Partition

=8000computeplan

EnterpriseRemote

shuffle

serviceRSS

removesthedependencyon

localdiskfor

shuffle

data

and

enables

100%

disaggregation

of

compute

and

storageScalabilityPerformanceTestEnvironmentOpen

SourceWorkflow

IntegrationOpenAPI•

Workspace•

Job

Runs•SQL

Editor•

WorkflowsTools•

Spark-submitCompatibleJob

Submission•

Notebook•Git

integration(Planning)Alibaba

Cloud

Product

IntegrationOSS-HDFSMaxCompute

DLF

DataWorksopenAPI

and

EcosystemFunctionDatabricksEMRServerlessSparkNative

EngineYESYESSQL

EditorYESYESWorkflow

ManagementYESYESDebuggingand

MonitorYESYESIntelligent

DiagnoseNOYESCatalogandAuthenticationYESYESData

&

FSYES(DBFS)YES(OSS-HDFS)AuditingYESYESNotebookYESYESCI/CDwith

GitYESNOAssistant/CopilotYESNOML&

Vector

ServingYESNOEMR

server

less

spark

vs.Data

bricks

Function-wise

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationMetaStoreLineageAuthenticationAuthorization

TieredStorageE图

CompactionRealtime

ComputeMaxCompute

Hologresssa$storrocksE-MapReduceDataworks

IDE

Copilot

DataGovernanceDataQualityTheLake

house

solution

onAli

babacloudApplication

Ingestion

Open

LakeWorkflowDatabase•Large

scale

data

analytics•

SIMD-Optimizedqueryengine•High

speed

real-time

data

ingestion•Innovative

pipeline

executionengine•Full

stack

vectorized

technology•Innovative

CBO

technology•Multi-dimensional

LakehouseAnalyticswith

rich

lakedataformat

support•Materialized

Views

and

ETL

support•High

concurrency

support

(10k

persec)•Real-time

data

analysis•Diverse

data

model

support•Maintenance

free

with

high

SLA•

Compatiblewith

MySQL

protocol•

Compatiblewith

multiple

BItools•

Supportsslowquery

diagnose•

Visual

metadata

management•Easy

migration

with

cluster

link

tool•

Out-of-box,

minute

level

delivery•Efficient

resilience

support•Deep

integration

with

DLF

and

VVP•DisAgg

and

Virtual

Warehouse

supportServerless

StarRocksOffersa

High-Performance,All-Scenario,

Blazing-Fastand

Unified

Data

LakehouseAnalyticsService.100%CompatiblewithOpen-sourceStarRocks,

3X

Fasterthantraditional

OLAP

(Presto/Trino,

ClickHouse,

Druid..)

providing.Easy-to-use

Cloud-nativeEMR

server

lessFastUnifiedstar

ROCKSApplication

Scenario

Ad-hoc

dashboard

Operation

analytics

User

profile

Real-time

analytics

Self-service

reporting

Product

LayerStarRocks-instanceLayerStoragelayerAuto-ScalingLakehouse

Analytics

Shared-NothingArchitecture

HIVEData

LakeTable

FormatStarRocksTable

FormatData

LakeFast

and

unified•Acomprehensivevectorizedexecutionengine,modernizedcost-based

optimizer

(CBO),

with

concurrency

reachingtens

ofthousandsofqueries

persecond

(QPS).•

Fully

compatible

with

datalake

formats,

offering

morethan

a3X

performance

improvement

relative

to

Trino.•Supports

materialized

view

ELT

scenarios,enabling

one-

step

data

tier

processing.Separationofstorageandcompute•Optimizedcomputationalelasticity

for

on-demand

usage,with

the

potentialto

reduce

storagecosts

by

up

to

60%.•

Offers

multi-computing

cluster

capabilities,

ensuring

resourceisolation

between

different

business

unitswithout

interference.•

Various

caching

strategies

available,

allowing

customers

to

flexibly

configure

according

to

their

business

needs.Use

withease•Outofbox,theStarRocksManageroffersa

wide

rangeof

enterprise-level

features.•Intelligent

diagnostics

and

analysis,

providingcomprehensive

analysisinconjunction

withcustomer

business

operations.Data

Loading

Security

SQL

profiling

Audit

log

…Configuration

Monitoring

andManagement

alertVirtualWarehouseVirtualWarehouseVirtualWarehouseproduct

ArchitectFEFEFEData

CacheData

CacheData

CacheHealth

analysis

Upgrading

…InstanceManagementSQL

EditorCNCNCNCNCNCNCNCNCNStarRocks

ManagerStarRocks

ConsoleInstance

MonitorOne-Stop

SQL

Editand

QuerySlowSQL

Profileand

DiagnoseInstance

Diagnosecontrolplanestar

ROCKS

ManagerFully

ManagedExtreme

ElasticityOne-stop

DevandAnalyzeDis-aggregation

SupportHighlightsMaturityADSMVAccelerationcomputeplane

Fastand

stableLakehouse

Hierarchy3x-5xfasterthanTrinoSignificantlyfasterthan

ClickHouseandApache

DorisHive/Paimon/Iceberg/HudiHive/Paimon/Iceberg/HudiSupportexternal

MVand

Lakehouse

HierarchySophisticatedcachingandtieredstoragecapabilityOn-demandSecond-level

Elasticitywith

LowCostComprehensive

loadanalysisanddiagnosticHigh

PerfElasticityLakeQueryAccelerationDWDLocal

CacheCompute

NodeLocal

CacheCompute

NodeODSData

LakeData

LakeQueryAccelerationLakehouseBuild-upStarRocksStarRocksData

IngestionData

IngestionWarehouseWarehouseDWS

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Realtime

Computessa$storrocksE-MapReduceMaxCompute

HologresDataworks

IDE

Copilot

DataGovernanceDataQualityRecap

TheLake

house

solution

onAli

babacloud

TieredStorageE图

CompactionApplication

Ingestion

Open

LakeWorkflowData

Lake

FormationAuthenticationAuthorizationLineageMetaStoreDatabaseAPIs•HMS

Compatible•Import/Export

from

/

to

HMS•

MySQL

JDBC•

Open

API

&

SDKsFunctionality•Table

Schema•

TableLineage

(WIP)•

Meta

Retrieval•

MetaStats

forCBOFullyManaged•

Serverless,

Elastic•

High

Available•

HighThroughputs•

OpenAPI

/

SDKLake

Formats•ApachePaimon•

Apache

Iceberg•ApacheHudi•

Databricks

DeltaMetaDataManagementAuditing•

Audit

Log

for

Authorization•

Audit

Log

for

Meta

Operation•

Audit

Log

for

Data

Operation

(WIP)Authorization•

RBAC•Policy&ACL(WIP)Modes•

ApacheRangerCompatibleEnterprise-class

securityAuthentication•

Open

LDAP•

Kerberos

(WIP)•AlibabaCloud

RAMOpen

LakeHot

LayerWarm

LayerCold

LayerIntelligent

optimizationCompaction

ManagerTieredStorage

ManagerMeta

StoreCompactCompactStatsThanksYu

Liliyu@Paimon

+

DLF打通阿里云自研和开源计算引擎李劲松Apache

Paimon

PMC

Chair1.

Open

Lake:

一套存储对接全生态2.Apache

Paimon

与开源计算引擎3.Apache

Paimon

与自研计算引擎4.Apache

Paimon

实践场景CONTENTS1.openLake:一套存储对接全生态

+

Kafka

湖格式

SDK

读写

湖仓一体元数据湖格式+AITo

Be

Continue…+

内表

+

Parquet

+

Kafka

Hologres

+

内表MaxCompute

+

内表+

内表

+

Parquet

Hologres

+

内表MaxCompute

+

内表0101010101010101010101010100101OSS

数据湖

10101010101010101010101010100101OSS

数据湖

1数据湖到湖仓一体数据交换OSS

文件读写数据架构的选择批式数仓实时湖仓实时数仓

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationTieredStorage

CompactionRealtime

ComputeMaxCompute

HologresE-MapReduceDataworks

IDE

Copilot

open

Lake

TheLake

house

solution

onAli

babacloudApplicationIngestionWorkflowDataGovernanceData

QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabase2.Apache

pai

mon

与开源计算引擎BatchAggregate实时升级streamingpart

ia

updatestreamingAggregateODSDWDDWS•共享存储,计算平权•流批一体,实时升级•实时离线,极速查询•性能成本,业界领先

Apache

Paimon001011OSS

MaxCompute

HologresongoingPaimon

+开源大数据Ingestionit算平台事业部COM

PUTING

PLATF○

RMApplication实时OLAP

OLAPstreaming

IngestionBatchLeftJoin01010101010101010101101010阿里云

F

link+

pai

mon:streamingLake

house多表数据打宽Partial-Update;大规模Lookup

Join流写更新入湖主键表高性能更新;丰富的合并引擎离线数据加速流写流读取代队列;索引查询加速流读变更日志生成完整的变更日志,解锁流读4545阿里云

spark+

pai

mon:

离线处理一流性能TPC-DSSF1TPerformanceBaseline+DPP+自适应scan并发+native+ALL2.521.510.50Normalized

Performance(Higher

is

better)阿里云

star

ROCKS

Pai

mon:

离线数据极速阿里云

star

ROCKS

Pai

mon:Deletion

vectors模式3.Apache

pai

mon

与自研计算引擎

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)DLF打通自研计算引擎•MaxCompute:

ExternalSchema

•Hologres:

External

DatabaseMaxCompute

HologresDataLakeInformation:BridgetoMC&Ho

lo

Data

Lake

Formation

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)即将发布•

内置

Paimon•Native

加速•DeletionVectors支持•

ALIORC格式•

批写支持MaxComputeMax

compute+

pai

mon

Data

Lake

Formation

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)即将发布•Native加速-Append

No

PKTable-

DeletionVectors

Mode

HologresHol

ogres+

pai

mon

Data

Lake

Formation4.Apache

pai

mon

实践场景ODS

主键表streaming异步compactionDWDAppend

表changelog=lookupApache

Paimon00101Data

Lake某新能源汽车公司在阿里云上的实践

Application

DatabaseStreamingIngestionLSM

Tree

010101010101010101011010101streaming异步compactionBatchDWSAppend

表ODS主键表changelog=inputDWD主键表deletion-vectorsApache

Paimon00101Data

Lake某游戏公司在阿里云上的实践

Application

DatabaseStreamingIngestion

实时OLAPLSM

Tree010101010101010101011010101ODSAppend

表Cluster:Z-order索引:

bloomfilter/

bitmapApache

Paimon00101Data

Lake某本地生活公司在阿里云上的实践

Application

Database

高性能OLAPStreamingIngestionLSM

Tree010101010101010101011010101Thanks李劲松Apache

Paimon

PMC

Chair阿里云实时湖仓及Flink产品技术介绍李鲁兵(云觉)阿里云计算平台1

大数据实时湖仓发展趋势洞察2

基于阿里云实时计算F

link构建实时湖仓3

阿里云实时计算F

link

产品能力解读CONTENTS4

典型落地架构及案例分享01

大数据实时湖仓发展趋势洞察3.01.0引入数仓数据湖2023~2020-20222009-2019数据仓库

流式分析BI>

大数据进入实时化湖仓时代!AI驱动,

公共云优先!实时化、AI化!引领原生湖仓实时化AI化2.0融入湖仓融合结构化,半结构化及非结构化数据数据湖数据科学机器学习02

基于阿里云实时计算Flink构建实时湖仓实时湖仓

(streamingLakehouse)

综合性价比最优选择分钟级新鲜度秒级查询响应低成本全链路实时具备Lakehouse特性具备Streaming特性StreamingLakehouseStreaming+

Lakehouse:T

+

1mWarehouse:T+

1Lakehouse:T

+

1

/T

+

1h性能

新鲜度Streaming:T+

1s成本EMRLogs①一键入湖CTASCDASFlink流

/

批Queries③AD-HOC查询②流读流写Flink流

/

批④批读批写调度

工作流方案原理•低成本OSS存储构建Paimon•深度集成Flink全链路实时化核心优势•低成本全链路实时化•流批存储计算统一•一套平台具备数据管理、调度

、临时查询等能力•开放支持多引擎适用场景•离线全链路实时加速•实时链路降本•流批存储计算统一Data

Lake

(OSS/OSS-HDFS)实时湖仓整体方案F

link

Max

computeHol

ogresFlink流

/

批DatabaseQueriesQueries实时湖仓全链路实时加速端到端,全链路实时流动,实时更新,分钟级新鲜度,

全链路可查,

秒级查询响应!•

开放支持多种Olap引擎•

外表方式查询秒级响应•也可直接upload到引擎•

基于内存优化查询性能•Upsert/Partial-Update•Real-Time

Ingestion•Changlog

Producing•

TimeTravel•

LookupJoin•BatchOverwrite/Query•Flink流计算事实标准•

开放支持多种计算引擎•

流写流读•

批写批读•

临时查询/点查•

Streaming

ETL•

全增量一体•Schame

Evolution•整库/分库分表•

断点续传数据计算Flink及其他引擎数据存储Paimon(OSS)Table

Format数据摄取Flink

CDC数据查询OLAP引擎实时入湖入仓-简化操作CTAS分库分表合并同步

CDAS整库同步Mysql

Paimon(OSS)临时查询实时入湖入仓

兼容表变更(schemaEvolution)•

支持通过Catalog来实现元数据的自动发现和管理•

配合CTAS语法,实现数据的同步和表结构变更自动同步•

支持读取数据变更和表结构变更并同步到下游,数据和表结构变更都可以保证顺序•同步到Paimontable时Partitionby可自动兼容有无分区字段Order_dbPaimon_orderMysqlPaimon(OSS)More

sources

are

on

the

wayHudiIcebergHologresPaimonTiDBClickHouseD

ata

Stream

API实时入湖入仓-多种过程操作Flink

CDCSQ

L

APISELECTG

RO

U

P

BYag

gregateW

H

EREflatM

apm

apTop-NJO

INjo

inIN

SERTkeyByfilter•

基于OSS/HDFS等低成本存储•

基于LSM读写性能平衡•

Lakehouse特性全支持•

changelog机制数据实时流动Paimon

LSMTree000

0

000低延时低成本流批存储易集成

Distributed

FileSystem(HDFS/OSS/S3)

实时湖仓低成本存储1

1

11

111$

files

Flink

SQLSink•Apache

Paimon

内置Sink,屏蔽复杂性支持数据流批计算Apache

PaimonFile

Store实时写入Log

Store

Flink

SQL

Flink

SQL•

LSM支持

Update/Delete•

列存格式,支持压缩等优化•

支持全量批式读取

Table

的操作记录•

支持插件化实现•通过两阶段提交保证数据Exactly

Once•

Table

的文件存储形式

Batc

h

Log

Store

St

rea

mFile

Store•

支持增量流式订阅03

阿里云实时计算Flink产品能力解读流&批计算多语言多版本动态CEP统一元数据(catalog)开发生产隔离测试数据管理测试数据生成快速运营调试临时查询对接外部开发平台如Git等Flink

CDC•

全增量一体•

整库整表合并/分库分表•Yaml模版•

断点续传

数据连接器•

30+种主流数据产品•

自定义connector&Format批任务调度数据血缘智能诊断自动调优资源队列管理状态管理变量管理密钥管理监控告警阿里云实时计算Flink产品丰富的企业级能力安全细粒度权限管理RBAC空间隔离上下游SSL支持运维数据摄取任务开发&调测试升级企业级安全能力基础设施、平台系统安全多维度,提供全面的安全加固功能来保障数据安全!独立大规模集群及网络隔离环境阿里云数据中心数据中心保障设施

多层次的服务安全部署设计

数据中心网络安全访问控制与权限管控•阿里云账户体系身份识别•阿里云账号体系全面适配,包括阿里云账号,资源目录、云

SSO等•RAM权限控制•

集成RAM体系,支持RAM用

户以及角色登录鉴权RABC细粒度权限管理支持内置角色以及自定义角色,

实现细粒度操作授权数据安全•

密钥托管•

支持配置密钥,避免明文AccessKey带来的安全风险•

自动备份恢复•

采用存储计算分离架构,数据以及作业状态备份•

操作审计•

对接ActionTrail实现对事件的监控告警、及时审计、问题回溯分析安全隔离•网络隔离•

VPC专有网络安全可靠、灵

活可控•

支持上下游服务域名管理•

通过阿里云提供的NAT网关实现VPC网络与公网网络互

通•

租户隔离•

多租户资源隔离•

用户数据存储隔离业务中断数据泄露权限控制不足安全攻击Flink平台系统安全云上大数据服务如何保障企业数据和服务安全构建全面、多层次的安全管理能力,持续保护云上数据及服务安全全链路数据集服务高可用设计Flink基础设施安全Flink服务部署环境同城容灾与恢复数据中心安全管控发布openAPIv2版本更易集成deploymentTarget改造deployment动态更新自定义connector管理lineage数据血缘catalog管理UDF

注册重启作业指标分析综合各指标生成调优计划

执行计划部署集群基于业务处理复杂度与数据流量,资源动态调整作业资源自动调优Flink

MetricAutopilot推断可加入

MiniBatch

confFlink

RestfulAPI动态更新作业资源利用率低成本高(

易发生FailOver作业吞吐低,延迟高作业AGG算子处理能力达到瓶颈其他诊断系统作业管理平台ll更新作业配置采集指标Autopilot启动速度慢过低配置过高配置04

典型落地架构及案例分享•Hologres

、Paimon都具备流式访问能力,故数仓各层可以根据存储成本、业务时效性进行选择•

数据直接入Hologres:提供秒级时效性+极致OLAP性能•

数据构建在Paimon上+用Hologres进行查询加速:提供分钟级时效性+秒级OLAP性能•OLAP引擎可选,支持StarRocks

、Trino等OSS(Paimon)Flink

SQL

Hologres!简单SQL探查

!

OLAP查询分析

Flink奥型参考方案架构Paimon(OSS)Binlog

FlinkOSS(Paimon)FlinkDWDHologres

BinlogFlinkDWS

ADSPaimon

(OSS)

Binlog

DashboardsHologresHologresHologresODSFlink开发效率提升进一倍

,每年节省存储成本KW

,查询效率提升3倍;•从两条链路简化到一条链路,简化了系统的复杂度;运维工作复杂度大幅减轻;•一套SQL/Table

、一套schema,大幅提升开发效率;•大量缩减Kafka集群,每年节省KW成本;•

中间数据可直接查询,通过starRocks查询,相比Presto/Impala速度提升3倍以上;

Log

应用库

databa

CDC

Paimo

Paimon聚合

Paimon

算法库se

(OSS)

(OSS)

(OSS)n加国内出行知名互联网企业,月活千万用户;

客户基于开源hadoop体系进行自建,实时业务比重较大,

实时大数据资源超过离线数据处理;通过Flink+kafka链路处理实时数据,通过spark/hive/Trino处理离线数据;过程中,两条技术栈开发、维护成本高,存储成本高,离线实时分别存储;流处理中间数据查询困难;Impala/PrestoStarRocksADSkafka增量ADSPresto离线链路解决方案背景介绍达到效果典型客户落地案例Flink

Flink

Flink应用库报表算法库ODSkafka

dumpODSHiveDWDkafka

dumpDWDHiveFlink聚合离线聚合Flink加工离线加工Logdataba

seFlink+Paimon+StarRocksODS

DWD

ADS数据集成演进架构原有架构业务痛点实时链路报表Thanks云觉钉钉:

tute2014茶歇Flink

+

Paimon

+

Hologres在阿里巴巴智能引擎的生产实践王伟骏(鸿历)阿里巴巴智能引擎事业部技术专家CONTENTS1、产品背景简介2、解决方案举例

---

搜索离线平台3、生产作业调优及社区合作4

Future1、产品背景简介BinlogTransactions

Message

QueueAlgorithmdataEventsLogsDatabaseMysqlODPSPaimon…MessageQueueOfflineSystemStreamProcessingBatchProcessingODPSPaimonHologresFileSystem…

SearchEngine

AdvertisingEngine

RecommendationEngine

SampleEngine

…基于该业务场景我们做了一个提供AI

领域e2e

的ETL

数据处理解决方案的产品1、异构数据源多2、业务多且逻辑复杂3、性能调优难、运维门槛高业务场景及产品定义…UI&&WebIDE(开发、配置、运维、监控、报警)产品端核心功能依赖组件Hologres分布式

kv

存储数据集成样本处理SQLAdHocOLAP流计算批计算流批一体用户插件调度编排AirflowCatalog(Meta、版本、血缘、

Dataset)天猫本地生活菜鸟高德AE飞猪LazadaOpenSearch…

ASI(支持

K8S

协议的统一调度、统一资源池)Swift消息队列Pangu(分布式文件系统)Paimon湖格式湖表存储优化服务VVP提作业、开发、运维Celeborn统一Shuffle服务Restune作业弹性资源Embedding计算产品技术架构支持业务

淘宝

ConnectorCDC图像检索样本平台HA3ODPSPaimon视觉平台离线推理…特征

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论