版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
FlinkAgents在LinkedIn的探索和实践
FlinkAgentsatLinkedIn:EarlyEnterpriseExplorationandtheRoadAhead
演讲人AlanZhang,StaffSoftwareEngineer,LinkedIn
演讲人WeiqingYang,SeniorStaffSoftwareEngineer,LinkedIn
01
Motivation
TheGap–StreamingDatavs.AutonomousReasoning
SituationBefore
●StreamingsystemspowerFeed,Ads,andSearch
●Thetradeoff:Builtforspeed,notforautonomousreasoningorreal-timedecision-making
TheEngineeringBottleneck:ContextPersistence
●Managingshort-andlong-termmemory
●Handlingstaterecoveryduringfailures
●Sharingcontextacrossmulti-agentworkflows
TheBreakthrough–LinkedIn’sFirstStreamingAIAgents
WhatWeDid:LaunchLinkedIn’sfirstStreamingAIAgentonApacheFlink
ContinuousIntelligence:Combiningreal-timestreamswithagenticworkflows
●Observe:Monitorlivedatastreams
●Reason:ApplyLLMsforcontextandintent
●Act:Executeautonomouslywithininfrastructure
TheImpact:
●Smarterautomation
●Faster,adaptivedecision-making
●Enterprise-scalememberexperiences
02
WhatisFlink-Agents
FlinkAgentsFrameworkArchitecture
WhyFlink-Agents:MemoryasaFeature
NativeMemorySupport
●EmbeddedRocksDB:Manageslocalstateforsub-millisecondreasoningandflawless
pause/resume
●Mem0Integration:Natively
supportsLong-TermSemanticMemoryforRAGandhistoricalcontext.
Event-Driven&Scalable
●Proactive:Builtforevent-
drivenworkflowsratherthanpassivequerying
●Self-HealingatScale:Nativelyhandlesmassivescaleusing
Flink’srobustcheckpointingmechanism
WhynotOthers?
●TrueStreamProcessing:
Offersfirst-class
windowingandaggregation
●ProvenTechStack:
Leveragesinfrastructure
alreadydeeplyverifiedandadoptedatLinkedIn
LinkedIn'sFirstStreamingAIAgents
LinkedInContributionstoApacheFlinkAgents
Models,Retrieval&
Integrations
Expandingtheframework'sreachacrossLLMs,embeddings,and
externaltools
●Chat-modelintegrations:
Anthropic,OpenAI,AzureOpenAI
●Vector-store&embeddingstackwithbuilt-inRAGretrieval
●MCPtoolsupport,eventlogging,andricherdocs&examples
Runtime,State&
Reliability
Hardeningdurableexecutionand
exactly-oncecorrectnessforlong-runningagentworkloads
●Refactoredthecoreaction-executionoperator
●Checkpoint-safeagentmemory&state-correctnessfixes
●Configurableevent-logobservability
03
LinkedInEcosystemIntegration
LinkedInEcosystemIntegration:Whatwevalidated
●Runtimecompatibility
○FlinkAgentscanrunasPyFlink+Flinkjobsinourmanagedenvironment
●Platformintegrationpath
○AgentjobscanfollowthesamemanagementpathasotherFlinkjobs:
controlplane→jobCR→FlinkKubernetesOperator→Kubernetes
●Ecosystemconnectivity
○AgentscanreuseFlinkconnectorstoconsumeandproducedataacrossLinkedIn’sinternaldatasystems
ArchitectureOverview:AgentsasmanagedFlinkjobs
04
PatternsfromLinkedInExploration
Pattern1:ProactiveDiagnosisAgentforFlinkJobFailureSignals
Currentworkflow:symptom-basedalertstriggerhumaninvestigation
OnLinkedIn’smanagedFlinkplatform,thousandsofjobsemithealth,lag,andotheroperationalmetrics.Defaultalertsprotectthesejobs,butwhenanalertfires,itusuallypagesthejobownerfirst,evenwhentherootcauseisplatforminfrastructure.
PainPoint:Alertingisautomated,butdiagnosisisstillhuman-triggered.
Pattern1:ProactiveDiagnosisAgentforFlinkJobFailureSignals
Newarchitecture:event-triggereddiagnosisonfailurestreams
EachmanagedFlinkjobemitsenrichedfailureeventsintoacentralizedKafkatopic,whichtriggersanagent-baseddiagnosispipeline.
Failureeventsnowtriggerdiagnosisdirectly,insteadofwaitingforsymptom-basedalertsandmanualescalation.
Pattern1:ProactiveDiagnosisAgentforFlinkJobFailureSignals
WhyFlinkAgentsfits:reasoningafterstreamprocessing
ThehardpartisnotcallinganLLM.Itiscontinuouslyprocessingnoisyfailurestreams,reducingthemintomeaningfuldiagnosistriggers,andkeepingthepathopenforsafeautomation.
01
ContinuousSignals
watchfailuresastheyhappen
Failureeventsarrivecontinuouslyacrossthousandsofmanaged
jobs.
→02
StreamProcessingFirst
dedup,filter,correlatebeforeLLM
Mostsignalsarenoisyor
repeated.Flinkdedups,filters,androutesonlymeaningfulcasesto
agents.
→03
MultipleAgentReasoning
triage+diagnosiswithtools
Triageanddiagnosisaredifferentresponsibilities,withdifferent
context,tools,andskills.
→04
SafeActionPath
diagnosistoday,guardedremediationtomorrow
Today:adiagnosisreport.Future:guardedremediationwithreplay-safeactions.
Thispatternisnot“askanagentafteranincident.”Itis“letfailurestreamsproactivelytriggerdiagnosis.”
Pattern2:Always-CurrentJobContextforAgenticDiagnosis
Currentworkflow:agentsrebuildjobcontextoneveryinvestigation
Foranyjobdiagnosisagent,onefailureeventisnotenough.Theagentneedsalways-currentcontextaboutwhatthejobdoes,whatchangedrecently,whatdependenciesithas,andwhathappenedbefore.Today,thiscontextisscatteredacrossdashboards,configs,deploymenthistory,logs,incidents,andoncallnotes.
PainPoint:Theagentcanreason,butitfirsthastorebuildthejobcontextfromscratch.
Pattern2:Always-CurrentJobContextforAgenticDiagnosis
Newarchitecture:maintainjobcontextasastreamingmemoryplane
Insteadofrebuildingcontextduringeveryinvestigation,maintainanalways-currentjobprofilefromeventstreams.AFlinkAgentspipelinecontinuouslyjoinsdeployment,failure,incident,performance,andmaintenanceeventsperjob,summarizesthemwhenneeded,andexposesapre-joinedjobprofiletodiagnosisagentsthroughaqueryinterface.
Movejobcontextfrom“rebuiltduringdiagnosis”to“maintainedcontinuouslybythestream.”
Pattern3:GuardedSupervisorAgentforCross-SystemSelf-Healing
Currentworkflow:domainagentsarepowerful,butcoordinationisfragmented
Enterprisesmayalreadyhavemanydomainagents.Theseagentscanreasonandactwithintheirowndomain,butincidentsofteninvolveshareddependenciesacrosssystems.Withoutacoordinationlayer,eachagentworksfromitsownsignalsandmayinvestigateoractindependently.
PainPoint:Domainagentsareusefulspecialists,butself-healingneedssharedcoordination,memory,andguardrails.
Pattern3:GuardedSupervisorAgentforCross-SystemSelf-Healing
Newarchitecture:FlinkAgentsastheevent-drivensupervisorlayer
UseFlinkAgentstobuildaguardedsupervisorthatcontinuouslylistenstocross-systemfailureevents,correlatesrelatedsignalsintoincidents,
coordinatesexistingdomainagentsthroughstandardinterfaces,maintainsincident-levelmemory,anddispatchesactionsonlythroughsharedguardrails.
FlinkAgentsbecomestheevent-drivencoordinationandguardraillayer,notareplacementforeveryagent.
05
LessonsLearned&RoadAhead
LessonsLearned
●Streamprocessingshouldcomebeforereasoning
○Per-eventLLMcallsdonotscaleunderrealQPS,latency,quota,and
costconstraints.UseFlinktofilter,dedup,window,batch,andcorrelateeventsbeforeinvokingthemodel
●Usestagedagentsandmodelrouting
○Lightweighttriageagentshandlequickevaluation;strongermodelsarereservedforcomplexdiagnosis
●Guardrailsmustbedesignedfromdayone
○Oncediagnosismovestowardself-healing,ratelimits,bla
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年西安市阎良区中小学编制教师招聘笔试参考试题及答案详解
- 2026年常德市武陵区中小学编制教师招聘笔试参考题库及答案详解
- 2026年杭州市余杭区中小学编制教师招聘考试参考试题及答案详解
- 2026年鞍山市铁东区中小学编制教师招聘笔试备考试题及答案详解
- 2026年防城港市港口区中小学编制教师招聘笔试模拟试题及答案详解
- 2026年汕头市金平区中小学编制教师招聘笔试参考试题及答案详解
- 五年级数学(小数乘法)计算题专项练习及答案汇编
- 2026年泸州市纳溪区中小学编制教师招聘笔试备考试题及答案详解
- 2026年武汉市江夏区中小学编制教师招聘笔试备考题库及答案详解
- 2026年包头市东河区中小学编制教师招聘考试模拟试题及答案详解
- 新视野大学英语说课课件
- 2025年山西万家寨水务控股集团所属企业招聘笔试参考题库含答案解析
- SL485水利水电工程厂(站)用电系统设计规范
- 乘务员急救知识培训课件
- 2024秋新教材七年级语文上册读读写写汇编(注音+解释)
- DB11-T 661-2009 房屋面积测算技术规程
- 机械制图-001-国开机考复习资料
- 2025年中考复习必背外研版初中英语单词词汇(精校打印)
- 山西省太原市2024-2025学年高一历史下学期期末考试试题
- 九同安一中2022届高二上学期语文校本作业之限时训练九
- 前鼻音-后鼻音汉字
评论
0/150
提交评论