2024LLM和Agent助力下的数据分析智能化转型研究与实践_第1页
2024LLM和Agent助力下的数据分析智能化转型研究与实践_第2页
2024LLM和Agent助力下的数据分析智能化转型研究与实践_第3页
2024LLM和Agent助力下的数据分析智能化转型研究与实践_第4页
2024LLM和Agent助力下的数据分析智能化转型研究与实践_第5页
已阅读5页,还剩51页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

LLM+Agent助力下的数据分析智能化转型研究与实践CONTENTS1智能数据分析新时代2腾讯智能数据分析平台OlaChat功能一览3OlaChat

Agent建设4OlaChat关键能力介绍5附录数据分析帮助人类从数据中提炼知识,形成智慧数据分析平台发展史:从报表工具到智能BI大语言模型给智能数据分析带来全新可能Zhao,Wayne

Xin,

etal.

A

Survey

of

Large

Language

Models第三代数据智能化产品挑战2、目前的方案准确率低,落地使用难度大3、目前的智能平台仅针对数据取数、数据洞察等单点需求挑战:C1、怎么设计一个完善的贴合数据分析领域的Agent/工作流,支撑在业界的智能化使用?(智能化)问题:1、大模型本身存在问题C2、如何丰富功能,保证每个功能可持续迭代优化?(丰富性)C3、该如何做到将功能高效接入不同的平台?(易集成、高可用)OlaChat:建设贯穿BI全流程的能力,多角度提升分析效率OlaChat:LLM+Agent携手助力智能数据分析AgentS1:拟人思维Agent提升数据洞察效能

S2:工具箱的设计,提供丰富的工具能力

S3:分层架构体系,高集成、易扩展CONTENTS1智能数据分析新时代2腾讯智能数据分析平台OlaChat功能一览3OlaChat

Agent建设4OlaChat关键能力介绍5附录腾讯智能数据分析平台OlaChat:功能展示OlaChat一站式相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:对话系统相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:text2SQL相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:text2DSL相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:智能绘图相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:智能仪表盘相关数据均为非业务数据即测试数据,仅供展示所用腾讯智能数据分析平台OlaChat:SQL纠错相关数据均为非业务数据即测试数据,仅供展示所用CONTENTS1智能数据分析新时代2腾讯智能数据分析平台OlaChat功能一览3OlaChat

Agent建设4OlaChat关键能力介绍5附录Agent的研究和相关应用General

AgentMulti-Modal

AgentData

AgentEducation

AgentGame

AgentEmbodied

AgentMulti-AgentEvaluation

AgentAutoGPT

2023-3Voyager2023-5CAMEL

2023-4HuggingGPT2023-3Dona

2023-3VisualChatGPT

2023-3StarfordTown

2023-4RoboAgent2023-9AgentBench

2023-82023.32023.42023.52023.82023.9…Data-Copliot

2023-5OlaGPT

2023-5…OlaGPT-赋予LLMs

人类解决问题的能力(理论篇)理论先行:OlaGPT架构图OlaGPT:

Empowering

LLMs

With

Human-like

Problem-Solving

Abil-

OlaGPT仔细研究了认知架构框架,并提出模拟人类认知的某些方面。OlaGPT

Paper

OlaGPT

CodeS1:拟人思维Agent提升数据洞察效能(实践篇)整体建设框架(Agent):CONTENTS1智能数据分析新时代2腾讯智能数据分析平台OlaChat功能一览3OlaChat

Agent建设4OlaChat关键能力介绍5附录OlaChat-平台关键能力支撑意图&对话系统:智能多任务对话,低门槛的人机交互系统意图&对话系统:明确意图,交互沟通意图&对话系统:高效率可扩展的意图识别能力Re景:需充分明确用户意图,进行合理的任务类型识别。不同的业务有不同的意图候选集,初期没有标注的可用数据,需需设计一套具有通用性的意图数据生成和意图分类方案。数据方案:搭建冷启动和数据增强方案

1)完全冷启动:依据问题分类+问题描述,LLM生成数据数据增强:模型预测增强;同近义词增强;query改写增强;双向翻译增强多方案投票:提升准确率,支持线上快速修复badcase

1)分类模型:训练所有问题类型下的基于bert,Roberta,Ernie等的大分类模型;基于分类分数重训练/基于策略适配业务的子分类需求;针对长期优化(模型定期更新)

2)检索策略:召回候选集+排序方案;针对提升高频问题准确率以及及时扩展性基于LLM:通过prompt的方式进行提示输出;针对问题泛化性和及时扩展性模块接入:准备问题类型和问题类型的大致描述(可给部分case)即可接入流程分类方案图示记忆提取:元数据检索增强(MetaRAG)元数据是按特定结构、层次组织的指标、字段、维度的组合方式不符合语言模型的基本假设特定修饰词很重要,如“有效”字段名可以很短,如“播放VV”结构化元数据检索:基于用户的Query,确定用哪个“指标+维度”或者哪些“表+字段”可以获取到数据来满足用户的需求。要求尽可能精确,以便减少传递给LLM的噪声,避免获取到的数据不符合要求记忆提取:元数据检索增强之FlattenedRAG记忆提取:元数据检索增强之StructuredRAGS2:工具箱组成丰富功能,以SQL矩阵为例SQL功能矩阵Text2SQL:现有方案不适用于业务场景Text2SQL:模型只是不断逼近数据上限的方法Text2SQL:半自动化数据构建流程Text2SQL:Agent为模型鲁棒性保驾护航(DEA-SQL)1)为什么我们需要文本转SQL任务的工作流范式?单步提示是有限的,并且有几个主要缺点:

a)LLM中的注意力分散导致效率降低。b)LLM很难将注意力集中在大量文本中的特定问题上。2)如何设计一个好的文本转SQL任务的工作流范式?信息确定:两阶段方法减少分散注意力的信息以集中注意力;分类和提示模块:将无法通用处理的不同问题分类,仅使用不同的简单提示即可解决;SQL生成模块:基于少量样本的问题模板检索;自纠模块:基于错误总结;主动学习模块:基于错误案例扩展模型能力。DEA-SQL

Paper

DEA-SQL

CodeACL’24-Decomposition

for

Enhancing

Attention:

Improving

LLM-basedText-to-SQL

through

Workflow

ParadigmText2SQL:数据+模型+Agent的效果提升SQL纠错:现有方案主针对SQL的全自动纠正Jipeng

Cen,

et

al,

SQLFixAgent:

TowardsSemantic-Accurate

SQL

Generation

via

Multi-Agent

Collaboration解决不完备:业界存在很多需要特殊解决的非通用语法错误Case(例如集群不稳定、特殊SQL方言不支持造成的,期望给出合理建议和解决方案)数据安全:使用闭源模型会导致数据泄露Arian

Askari,

et

al.

MAGIC:

Generating

Self-CorrectionGuideline

for

In-Context

Text-to-SQLSQL纠错:分而治之Agent加持提升准确率智能图表绘制:工作减负,低门槛可视化智能数据解读:补全数据分析最后一公里开放的一体化智能分析平台架构S3:OlaChat分层架构体系THANKS演讲嘉宾:谢苑珍高级算法研究员CONTENTS1智能数据分析新时代2腾讯智能数据分析平台OlaChat功能一览3OlaChat

Agent建设4OlaChat关键能力介绍5附录Text2SQL

Background

&&

Related

WorkBackground:

Reduce

the

technical

threshold

for

students

related

to

product

operations

and

data

analysis

in

data

analysis

workChallenge:The

LLM

has

limited

understanding

of

tables,

fields,

and

dimension

values

(Information

Determination:

two-stage)Single-step

COT

capability

is

limited,

How

to

design

a

good

workflow

paradigm

for

text2sql

tasks?

(Workflow

Paradigm)The

capabilities

of

the

LLM

are

limited,

how

to

maximize

the

potential

of

the

model?

(Check

optimization,

active

learning)Related

WorksupervisedlearningAdvantages:

controllable

optimization;

information

securityDisadvantages:

acquiring

annotated

text-to-SQL

data

is

costly;

training

and

fine-tuning

the

models

entail

significant

engineeringefforts

and

consume

substantial

computational

resources.in-context

learningAdvantages:

faster

learning

with

minimal

data;

reduced

consumption

of

computing

resourcesDisadvantages:

Uncontrollable

optimizationDecomposition

for

Enhancing

Attention:

Improving

LLM-based

Text-to-SQLthrough

Workflow

ParadigmWe

propose

Decomposition

for

Enhancing

Attention:

Improving

LLM-based

Text-to-SQL

through

Workflow

Paradigm

(DEA-SQL):It

draws

on

human

thinking

patterns,

adheres

to

the

principle

of

making

subtasksas

simple

as

possible,

and

reduces

irrelevant

information

in

each

step

tospecifically

enhance

the

solvable

scope

of

LLM

and

improve

the

attention

of

LLMto

enhance

their

performance.It

consists

of

five

sub-modules

imitating

the

common

solution

process

ofhumans

in

text-to-SQL

tasks: Information

Determination,

Classification

\&

Hint,SQL

Generation,

Self-Correction,

Active

Learning.Information

Determination:

Reduce

irrelevantinformationElements

identification:

identify

the

problem

elementsInformation

filter:

select

the

required

tables

and

colsbased

on

the

elements2)

Classification

&&

Hint:

Local

solutionFour

categories:

Easy,

join,

nested,

and

join-nestedHints:

Foreign

keys,

different

hints

in

different

categories3)

SQL

Generation:The

overall

question

prompt

follows

the

format

$<F,

D,H-FK,

I,

Q,

H,

S>$

shown

as

Figure2.Few-shot:

random,

question

similarity,

and

templatesimilarity

based

on

the

Classification

module二、模型架构设计底层预训练基座选型:预计wizard-coder,sqlcoderDEA-SQL:the

struction

of

the

methodDEA-SQL

Paper

DEA-SQL

Code3)

Self-Correction:

Global

correctionExtra

fields:

The

LLM

often

selects

an

excessive

numberof

fields,

rather

than

limiting

its

selection

to

thosepertinent

to

the

question; Incorrect

fields:

For

instance,

when

faced

with

fieldsbearing

identical

names

across

different

tables,

the

aliasmay

be

omitted,

leading

to

errors;Table

and

field

association

errors:

there

may

beinconsistencies

between

the

tables

and

fields

used;Fabricated

conditions

for

table

joins;Misuse

of

association

words:

For

example,

there

is

atendency

to

habitually

use

'left

join'

in

place

of

'join’;Group

or

order

by

errors:

Mistakes

such

as

incorrectaggregation

fields

and

conditions

may

be

encountered.4)

Active

Learning:

Error

correctionmore

prone

to

errors

for

certain

problem

types

(e.g.,extremum

problems)

->

fix

some

error

case

to

learnDEA-SQL:the

struction

of

modelDEA-SQL:the

prompt

of

the

methodDEA-SQL:the

prompt

of

the

methodDEA-SQL:the

prompt

of

the

methodDEA-SQL:

ExperimentsResearch

questions:RQ1:

How

doesDEA-SQL

perform

vs.

state-of-the-art

baselines?RQ2:

Whether

each

module

of

our

method

works

effectively

and

how

are

theyimpacting

the

task?RQ3:

What

is

the

token

and

time

cost

of

the

method?RQ4:

How

do

model

parameters

like

the

number

of

few-shots

or

the

informationfilter

layers

affect

the

method?RQ1.

How

does

DEA-SQL

perform

vs.

state-of-the-art

baselines?DEA-SQL:ExperimentIn

terms

of

execution

accuracy

in

the

Spider

dataset,

our

approach

based

on

workflow

outperforms

the

existing

baselines.In

the

Spider-Realistic

dataset,

which

is

more

adapted

to

real-world

scenarios

and

has

more

difficult

question

formulations,

ourapproach

is

more

stable

and

achieves

better

performance

than

other

solutions

based

on

LLMs.

This

validates

the

effectiveness

of

the

two-stage

information

determination

we

proposed,

which

can

mitigate

the

impact

of

different

question

formulations

to

someextent.RQ2.

Whether

each

module

of

our

method

works

effectively

and

how

are

they

impacting

thetask?1) The

effect

of

sub-models.DEA-SQL:ExperimentIn

information

determination

module,

by

reducing

irrelevantinformation

to

focus

the

attention

of

LLMs,

we

can

effectivelyenhance

performance

in

complex

tasks.It

is

indicated

that

distinguishing

problem

types(classification),using

simple

hints

for

simple

problems

and

targeting

complexhints

for

complex

problems,

can

effectively

improve

theperformance

of

the

LLM.The

active

learning

and

self-correction

modules

are

designedto

increase

the

capability

threshold

of

the

original

base

model,

butmay

to

some

extent

impair

the

ability

to

solve

other

easyquestions.RQ2.

Whether

each

module

of

our

method

works

effectively

and

how

are

theyimpacting

the

task?1) Random

is

a

little

detrimental

to

overall

performance,2)

The

effect

of

different

few-shot

scheme.DEA-SQL:Experimentwhile

the

retrieval

strategy

based

on

question

templatesimilarity

in

the

combined

question

classification

retrievallibrary

yields

the

best

results2) Question

templates

also

essentially

provide

a

simpleclassification

for

the

questions,

and

relying

on

questionclassification

methods

makes

it

easier

to

find

the

mostrelevant

questions

and

their

solutions,

therebystimulating

the

capabilities

of

LLMs.RQ3.

What

is

the

token

and

time

cost

of

themethod?DEA-SQL:ExperimentIt

consumes

less

in

terms

of

inference

time,

ensuring

theefficiency

of

our

method

in

real

applications.In

terms

of

token

consumption,

we

need

some

examplesfor

the

model

to

learn

from

the

context,

which

leads

tosome

increase

in

tokens.

However,

as

can

be

seen,

it

isstill

more

economical

than

DIN-SQL

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论