从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

上传人：1*** IP属地：山西上传时间：2025-01-14 格式：DOCX 页数：36 大小：246.61KB 积分：19.9 举报 版权申诉

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第2页

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第3页

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第4页

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs_第5页

已阅读5页，还剩31页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

LearningtoPlanforRetrieval-AugmentedLargeLanguageModelsfromKnowledgeGraphs

JunjieWang1,2,5*,MingyangChen3

,BinbinHu2,5,DanYang2,5,ZiqiLiu2,5,

YueShen2,5,PengWei2,5,ZhiqiangZhang2,5,JinjieGu2,5,JunZhou2,5,

JeffZ.Pan4,WenZhang1,5†,HuajunChen1,5

†

1ZhejiangUniversity,2AntGroup,3BaichuanInc.,4TheUniversityofEdinburgh

5ZhejiangUniversity-AntGroupJointLaboratoryofKnowledgeGraph

{wangjj2018,zhang.wen,huajunsir}@,chenmingyang@

Planning

/j.z.pan/

/zjukg/LPKG

Abstract

arXiv:2406.14282v3[cs.CL]23Oct2024

Improvingtheperformanceoflargelanguagemodels(LLMs)incomplexquestion-answering(QA)scenarioshasalwaysbeenaresearchfo-calpoint.Recentstudieshaveattemptedtoen-hanceLLMs’performancebycombiningstep-wiseplanningwithexternalretrieval.WhileeffectiveforadvancedmodelslikeGPT-3.5,smallerLLMsfacechallengesindecomposingcomplexquestions,necessitatingsupervisedfine-tuning.Previousworkhasreliedonman-ualannotationandknowledgedistillationfromteacherLLMs,whicharetime-consumingandnotaccurateenough.Inthispaper,weintro-duceanovelframeworkforenhancingLLMs’planningcapabilitiesbyusingplanningdataderivedfromknowledgegraphs(KGs).LLMsfine-tunedwiththisdatahaveimprovedplan-ningcapabilities,betterequippingthemtohan-dlecomplexQAtasksthatinvolveretrieval.Evaluationsonmultipledatasets,includingournewlyproposedbenchmark,highlighttheef-fectivenessofourframeworkandthebenefitsofKG-derivedplanningdata.

1Introduction

Thepastfewyearshavewitnessedsignificantin-

novationsinLLMs(Ouyangetal.,

2022;

Touvron

etal.,

2023;

Chowdheryetal.,

2023;

AI@Meta,

2024

).WhileLLMsexcelinmanynaturallan-guageprocessingtasks,theystillfacechallenges,particularlythesmallermodels,inhandlingcom-plexquestion-answering(QA)tasks(

Pressetal.,

2023;

Shaoetal.,

2023;

Yaoetal.,

2022;

Xiong

etal.,

2024a;

Huangetal.,

2024)

ToimprovetheperformanceofLLMsoncom-plexQAtasks,pastresearchhastriedvariousmeth-ods:(1)Employingcarefullydesignedpromptstrategiestoguidethemodelinreasoning,such

asChainofThought(CoT)(Kojimaetal.,

2022;

*Equalcontribution.

†Correspondingauthors.

Pattern

Q:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Spouse

Sports

FranWalshAns_1

Q1:WhoisFranWalsh’sSpouse?

A1:Ans_1

Q2:Whatsportsdoes{Ans_1}play?

Instance

A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?

A3:Ans_3

Fluminense

FinalAnswer:A2&A3

Figure1:AnexampleofaKGpattern,itsgroundedinstance,andverbalizedplanningprocess.

Weietal.,

2022

)andTreeofThought(ToT)(

Yao

etal.,

2024

)methods;(2)Utilizingretrievaltech-niquestoobtainsupplementalinformationfromexternalknowledgesource(

Lewisetal.,

2020;

Guu

etal.,

2020);(3)Combiningpromptstrategieswith

retrievalenhancements,asexemplifiedbymeth-

odslikeReAct(Yaoetal.,

2022)andSelf-Ask

(Pressetal.,

2023

).Thethirdapproachhasgar-neredwidespreadresearchinterestduetoitsinte-grationoftheadvantagesofthefirsttwomethods.Thefundamentalideaofthisclassofmethodsisto

guideLLMsinbreakingdownacomplexquestionintomultiplesimplersub-questionsandthenusearetrieval-augmentedgeneration(RAG)(

Huang

etal.,

2023,

2024

)methodtoanswereachsub-question,therebydeducingtheanswertotheorigi-nalcomplexquestion.However,planningforcom-plexquestionsisnon-trivial,especiallyforsmallerLLMs(withfewerthan10billionparameters),whichoftenrequiresupervisedfine-tuning(

Ak-

sitovetal.,

2023;

Chenetal.,

2023a;

Qinetal.,

2023

Thisraisesawidelyconcerningissue:howtoobtainsuperviseddataforlearningtheplanningabilityoncomplexquestions.Manualannotationistime-consumingandlabor-intensive,makingitdifficulttoscale.Mostexistingmethodsattemptto

distillknowledgefromteacherLLMs(Yaoetal.,

2022;

Aksitovetal.,

2023

),whichplacesexcessive

trustintheteacherLLMsand,inreality,cannot

guaranteetheaccuracyofthedistilledknowledge.

Thesechallengesinspireustoexplorenewwaysofobtainingsupervisedplanningdata.

KnowledgeGraphs(KGs)(Panetal.,

2017b,a)

usuallystoreaccurateknowledgeinastructuredway.WefindthataKGpatterncanbeviewedastheabstractofacomplexquestion,asshowninFigure

,whichrevealstheconnectionbetweenquestionplanningandpatterns.Thisopensupthepossibilityofconstructingtrainingdatatoen-hancetheplanningcapabilitiesofLLMsusingKGs.Specifically,westartbygroundingpredefinedpat-ternsinanopen-domainKGtoextractnumerousinstances,whichwethenverbalizeintocomplexquestionsandcorrespondingsub-questionsinnat-urallanguage.Inthisway,weeffectivelycreatealargenumberofaccurateplanningdataforfine-tuning.Beingfine-tunedwiththeseplanningdata,LLMs’capabilityofgeneratingplansforcomplexquestionsisenhanced,resultinginbetterfinalan-swersbyparsingandexecutingtheseplans.WerefertothisinnovativeframeworkasLearningtoPlanfromKnowledgeGraphs(LPKG).

Additionally,weconstructaComprehensiveLogicalQAbenchmark,CLQA-Wiki,fromasub-

setofWikidata(VrandecicandKrötzsch,

2014)via

groundingrichpatternsasaforementioned.Exist-

ingcomplexQAbenchmarks(Yangetal.,

2018;

etal.,

2020;

Pressetal.,

2023;

Trivedietal.,

2022

)primarilyfocusonmulti-hopandcomparison-typequestionsandlacklogicaloperations.Furthermore,mostquestionsarelabeledwithonlyoneanswer,whereasinreality,theyoftenhavemultiplecorrectanswers.TheCLQA-Wikibenchmarkevenlycov-ersmulti-hop,comparison,intersection,anduniontypesofquestions,whichismorecomprehensiveandchallengingforcomplexQAevaluation.

Ourcontributionscanbesummarizedasfollows:

(1)WeintroduceanovelframeworkLPKGthatenhancestheplanningabilityofLLMsusingdataconstructedfromKGpatterns;(2)Wedevelopacomprehensiveandchallengingevaluationbench-mark,namedCLQA-Wiki,tomoreeffectivelyas-sesstheperformanceofLLMsoncomplexQAtasks;(3)OurproposedframeworkLPKGachievesbetterresultsthanpopularbaselinesonmultipleconventionalcomplexQAbenchmarks,andweverifytheeffectivenessoftheintroductionofKG-sourcedplanningdata.

2RelatedWorks

ReasoningandPlanningwithLLMsInthecon-textofLLMs,reasoningtypicallyinvolvesdecom-posingcomplexquestionsintosub-questions(

Mi-

alonetal.,

2023;

Haoetal.,

2023)

.Prominenttech-niquesincludeChain-of-Thought(CoT)prompt-ing(

Weietal.,

2022

)whichelicitsrationalesthatleadtothefinalanswers,anditsextension,usingself-consistency(

Wangetal.,

2023)orautomated

demonstrationselection(

Zhangetal.,

2023)

.Other

methods,suchasReAct(Yaoetal.,

2022),gen

-eratereasoningstepssequentiallybyintegratingplanning,withadditionalstrategieslikeTreeof

Thoughts(ToT)(Yaoetal.,

2024),Reasoningvia

Planning(RAP)(Haoetal.,

2023),andothermeth

-ods(

Khotetal.,

2023;

Zhouetal.,

2023

)facil-itatingcomplexquestiondecompositionthroughvariedplanningapproaches.Unlikemostmethodsthatrelyonin-contextlearningthroughprompten-gineering,ourapproachgeneratesplanningdatafromKGstofine-tuneLLM,therebyenhancingtheirplanningcapabilities.

Retrieval-AugmentedGenerationRetrieval-AugmentedGeneration(RAG)canenhanceLLMsbyincorporatingexternaldata,allowingmodelstoaccessup-to-dateinformationandfactualknowl-edgetomitigatehallucinations(

Gaoetal.,

2023;

Guuetal.,

2020;

Lewisetal.,

2020

).Eachmod-uleintheRAGpipelinecanbeoptimized,forin-stance,throughretrievertuning(

Shietal.,

2023;

Linetal.,

2023

),self-reflectionduringretrieval

(Asaietal.,

2023;

Yanetal.,

2024

),orqueryre-finement(

Chanetal.,

2024

).Toaddressmulti-

hopquestions,iterativeRAGmodels(Shaoetal.,

2023;

Fengetal.,

2023;

Pressetal.,

2023)have

beendeveloped,whichiterativelyconductretrieval-enhancedgenerationandgeneration-enhancedre-trieval.However,themultipleRAGstepsinexist-ingmethodsarenotoptimizedandrelyheavilyonin-contextlearning.OurapproachusesplanningdatafromKGstofacilitatemoreefficientRAG.

LLMswithKGsIntheexistingrealmofLLMs,KGsareprimarilyutilizedassourcesofstructuredfactualknowledge(

Panetal.,

2023

).Forexam-ple,Think-on-

Graph(Sunetal.,

2023)extracts

relevanttriplesfromKGstoassistinQA.Reason-

ingonGraph(RoG)(Luoetal.,

2023)generates

relation-basedplansandretrievescorrespondingpathsfromthesegraphs.WhileaidinginKGQAtaskswhereanswersaredirectlysourcedfrom

Step2:PlanningLLMTuningandInference

SFTInference

“WhichregionsborderDrakeBell'sbirthplaceandSantaAnaatthesametime?”

Sub_Question_1:str="WhatisthebirthplaceofDrakeBell?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whichareasborderwith{Ans_1}"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="WhichareasborderwithSantaAna?"Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Step3:PlanParsingandExecution

RetrievalQALLM

SetOperate

FinalAnswer

…

Inter_Results1:str=Intersection…

Sub_Question_1:…

Info_1:str=Search…

…

Ans_1:str=Get_Answer…

Final_Answer…

…

Step1:DataConstruction

Sports

FranWalshAns_1

FluminenseSports

KnowledgeGraph

Spouse

Q1:WhatistheSpouseofFranWalsh?

A1:Ans_1

Sports

Q2:Whatsportsdoes{Ans_1}play?A2:Ans_2

Sports

Q3:WhatsportsdoesFluminenseplay?A3:Ans_3

FinalQ:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?

Output

Sub_Question_1:str="WhatistheSpouseofFranWalsh?"Info_1:str=Search(query=Sub_Question_1)

Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)

Sub_Question_2:str=f"Whatsportsdoes{Ans_1}play?"Info_2:str=Search(query=Sub_Question_2)

Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)

Sub_Question_3:str="Q3:WhatsportsdoesFluminenseplay?"

Info_3:str=Search(query=Sub_Question_3)

Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)

Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)

Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)

Input

##Example0##

…

##Example1##

...

##YourTurn##

Original_Question:str=‘WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?’

Grounding

Verbalization

Filling

Spouse

Figure2:OverviewofourLearningtoPlanfromKnowlegeGraph(LPKG)framework.

KGs,thesegraphsalsosupportrationalegeneration.Chain-of-Knowledge(CoK)(

Lietal.,

2024

)fur-therleveragesKGsalongwithotherheterogeneoussourcestogeneratefaithfulrationales.Unlikepre-viousstudies,ourapproachconstructsplanningdataforcomplexquestionsfromKGs,recogniz-ingthatpatternswithinKGsinherentlyrepresentmulti-stepplans.ThisdataisutilizedtoenhancetheplanningcapabilitiesofLLMs.

ComplexLogicalQueryinKGsRecentre-searchoncomplexlogicqueriesinKGsprimarilyfocusesonfirst-orderlogical(FOL)queriesthatincorporateoperationslikeconjunctions,disjunc-tions,negation,andexistentialquantifierswithin

incompleteKGs(Hamiltonetal.,

2018;

Renetal.,

2020;

RenandLeskovec,

2020;

Arakelyanetal.,

2021;

Chenetal.,

2022;

Xuetal.,

2022;

Xiong

etal.,

2024b;

Wuetal.,

2024

).Theseworksde-finediversepatternstoassessthecapabilityoflog-icaloperationsinvectorspaces,specificallytar-getinglogicalformsratherthannaturallanguage.Nonetheless,theirmethodologiesforpatterndefini-tionandextractioninspireourapproachtoderivingcomplexquestionsfromKGs.

3Method

3.1Overview

AsshowninFigure

,thereare3stepsinourLearningtoPlanfromKnowledgeGraphs(LPKG)

framework.(1)Inthedataconstructionstep,weconstructplanningdatafromKGs.Specifically,wedefinedsomebasicKGpatternsasshowninFigure

WegroundpatternsinanexistingKGtoextractinstances.Foreachextractedinstance,wesequen-tiallyverbalizethesub-querieswithintheinstanceintonaturallanguagesub-questionsaccordingtotheirorderintheinstance,eventuallyassemblingthemintoacomplexquestion.Afterward,webuildinputandoutputtemplatesforplanningdata,wherecomplexquestionsareconcatenatedtotheinputprompt,andsub-questionsarefilledintothecorre-spondingpositionsintheoutputtextaccordingtothetypeofpatterns.(2)IntheplanningLLMtun-ingandinferencestep,wefine-tuneLLMsbasedonsuchplanningdatatoenabletheLLMstofol-lowinstructionstoinfertheplanforeachquestioninthedownstreamtestsets.(3)Inthethirdstep,suchaplanwillbeparsedandexecuted,therebyobtainingthefinalanswertoeachquestion.

3.2ConstructionofPlanningData

BasicKGPatterns.Inspiredbypreviouswork

oncomplexlogicquerieswithinKGs(Ren

andLeskovec,

2020

),wedefinethebasicKGpatternsasshowninFigure

ThesetofKGpatternsisdenotedasP={1p,2p,3p,2i,3i,2u,ip,pi,compare}.Specifi-cally,p,i,urespectivelyindicateprojection,in-tersection,andunion.1p,2p,and3prepresent

ccompare

1p2p3p2u

2i3ipiip

Figure3:BasicKGpatterns.

queriesthatspanfromonetothreehops,2iand3irespectivelyrepresenttheintersectionoftwosub-queriesandthreesub-queries,2urepresentstheunionoftwosub-queries,andipandpirepresentcomplexqueriesthatcombinetwo-hopwithinter-sectionlogic.Inaddition,wealsocombinepairsoftriplesthathavenumerictailentitiesandthesamerelationstoconstructcomparisonpatterns,denotedascompare.

Grounding.GivenaKG,wefirstgroundthesepatternsinittoextractinstances:

Ipat=fpat(KG),pat∈P(1)

whereIpataretheinstancesgroundedbyknowl-edgegraphKGofpatternpat,fpatisthecorre-spondingextractionfunction.Forexample,anin-stanceofthe2ppatterncanbe“(Inkheart,(castmember,educatedat))”.Tobestmeettheneedsofopen-

domainQA,weuseWikidata15k(Chenetal.,

2023b

),asubsetoftheopen-domainKGWikidata,asKG.

Verbalization.Subsequently,basedonthegroundedinstances,weneedtoverbalizethembottom-upintosub-questionsandassemblethemintocomplexquestions.Thereareseveralmethodsforthisstep,suchasatemplates-basedmethod,manualannotation,orutilizinganLLM.Sincethetemplate-basedapproachoftenlacksfluencyinlanguageexpression,andthemanualmethodistime-consumingandlabor-intensive,weoptforanLLM-basedmethod.Specifically,wewriteasmallnumberofverbalizationexamplesforeachpatterntype.TheseexamplesareusedasdemonstrationsDe1tofillintheprompt.Finally,weconcatenateagroundedinstancei∈Ipattotheprompt,ask-inganLLMtoverbalizeittoanaturallanguagequestion:

{{Qsn}=1,Qc}=llm(concat(De1,i))(2)

where{Qsn}=1andQcrepresenttheresulting

sub-questionsandcomplexquestionrespectively,

concatisstringlevelconcatenation.WeuseGPT-4asllmhere.Itisimportanttonotethatherethellm’sroleismerelytotransformthedataformat;thesub-questionsandcomplexquestionstillorig-inatefromthestructureoftheKGitself,withoutintroducinganyknowledgefromthellminthetaskofquestionplanning.ThepromptweusecanbefoundinAppendix

C.1.

Filling.Wethenextractsub-questionsandcom-plexquestionsfromtheoutputofthellm.Subse-quently,webuiltasetofplanningtemplatesTpatfortheplanningprocessofquestionscorrespond-

ingtoeachpattern.The{Qsn}=1obtainedinthe

previousstepwillbefilledintofixedpositionsinTpatcorrespondingtotheirpatterntype,therebyobtainingtheoutputfortraining.TheQcobtainedinthepreviousstepisconcatenatedtotheendofafixedinstructionInsandsomeplanningdemon-strationsDe2(alsoconstructedfromKGs),thusobtainingtheinputfortrainingdata:

x=concat(Ins,De2,Qc)(3)

y=Tpat.fill({Qs}=1),pat∈P(4)

where.fillisafillingfunctionoftemplatesTpat.Inspiredby(

Aksitovetal.,

2023

),weuseacode-formattedinputxandoutputyhere(shownin“In-put”and“Output”inFigure

)tofacilitatefor-mattingandsubsequentparsingandexecutionoftheoutputplan(moredetailsinAppendix

C.2

).Intheend,weobtain9000trainingdataentries

Dtrain={xn,yn},with1000entriesforeach

pattern.Werandomlyselect100itemsfromthetrainingsetsformanualverification,withanaccu-racyrateofover95%.

3.3Fine-tuningandInferenceofPlanning

LLMs

WeusetheobtainedtrainingdataDtraintofine-tunetheplanningLLMsMpdirectlywiththestan-dardnexttokentrainingobjective:

E(x,y)∈DtrainLogpMp(y|x)(5)

Thefine-tunedplanningLLMMpcanbeusedtoinfertheplanPforeachquestionQtestinthedownstreamtestset:

P=Mp(concat(Ins,De2,Qtest))(6)

whereInsandDe2arethesameasthecontents

intheEquation(3

).Itshouldbenotedthatinthe

Type

Count

Type

Count

2pquestion

200

3pquestion

200

2iquestion

200

3iquestion

200

ipquestion

piquestion

2uquestion

200

comparequestion

100

Table1:DistributionofCLQA-Wiki.

multi-hopquestions,thespecificsub-questionsinthesecondandthirdhopsneedtobeconstructedbasedontheanswerstotheprevioushop’ssub-questions.SinceourPoutputsallprocessesatonce,theMpcannotknowtheanswerstotheprevi-oushop’ssub-questionswhenoutputtingtheplans.Therefore,wewilluseaplaceholdertoreplacetheanswertotheprevioushopsub-questions,allow-ingtheplanningtoproceedsmoothly(asshowninTable

10,

13,

inAppendix

C.1

).Theseplace-holderswillthenbefilledinduringthesubsequent

parsingandexecutionprocess.

3.4PlanParsingandExecution

TheobtainedplanPneedstobeparsedandexe-cutedtoobtainthefinalansweroftheQtest.Duetoouradoptionofcode-formattedinputandoutputforfine-tuningtheMp,thePhereisalsohighlyfor-mattedcode,whichfacilitatesourparsingofeachstepoftheplanandexecutingthem.Inparticular:

•Whenastepincludesa“Search”function,wewillcallanexternalretrievaltool.

•Whenastepincludesa“GetAnswer”func-tion,we’llinvokeanexternalQALLMMQAtogetanswersforasub-questionbasedonthere-trievedinformation.Thepossibleplaceholdersinsub-questionswillbefilledwithpreviousanswers.WeaskQALLMtoorganizeanswersintheformofalist(promptisshowninTable

inAppendix

C.3

•When“Intersection”or“Union”appearsinthestep,wewillrunactualintersectionorunionfunctions.Thiscanbeeasilycompletedduetolistformatanswersinthepreviousstep.

ItisimportanttonotethattheplanningLLMMpandtheQALLMMQAarecompletelydecou-pledinourframework.HerewecanuseanyLLMoff-the-shelftohandlethetaskofQA.Ultimately,wecanobtaintheanswertoQtest.

4NewBenchMark:CLQA-Wiki

TheconventionalcomplexQAdatasetsinclude

HotPotQA(Yangetal.,

2018),2WikiMultihopQA

(Hoetal.,

2020

),MuSiQue(

Trivedietal.,

2022

andBamboogle(Pressetal.,

2023)

.DespitetheirwidespreaduseinevaluatingtheQAperformanceoflanguagemodels,weidentifysomeproblemswiththesedatasets:

(1)Allthesedatasetsareprimarilyfocusedonmulti-hopandcomparison-typequestions.Thetypesofquestionsarenotbalancedandcomprehen-siveenough,andlessattentionispaidtoquestionsinvolvingintersectionandunionlogic,whicharealsoverycommoninreality.

(2)ExceptforMuSiQue,thequestionsontherestoftheotherthreedatasetsonlyhaveonean-swer,whereasmanyquestionsinrealityoftenhavemultipleanswers.Forexample,theanswertoanintersectionquestion“WhichcountryborderswithRussiaandChinaatthesametime?”isaset[Mon-golia,Kazakhstan,NorthKorea].

Inlightofthis,weaimtoconstructanewtest-ingbenchmarkthatembodiesmorecomprehensivelogicandallowsforanunrestrictednumberofan-swerstomorethoroughlyevaluatetheperformanceoflanguagemodelsonvariouslogicalquestions.Consideringthedetailedpatternstructuresandun-restrictednumberofanswerentitiesinKGs,weconstructatestsetbasedonWikidata15k.

Similartothemethodusedtoconstructtheplan-ningdata,weextractinstancesfromWikidata15k(whichdonotappearinthetrainingdata)anduseGPT-4todoverbalization.Moreover,foreachinstance,wecanobtainalltheanswerentitiesfromWikidata15k,whichwethendesignateastheanswerstothequestions.Aftermanualqualitychecks,weobtainatestsetcalledCLQA-Wiki,whichcontains1,200piecesofdatafeaturingavarietyofComprehensiveLogicalQApairs.ThequestiontypesandtheirdistributionarelistedinTa-ble

Itisworthnotingthatwehaveconstructed9typesoftestingquestionsuntilnow,andfornewlydefinedpatterns,wecanalsoquicklyconstructcorrespondingquestionsusingtheabovemethod,showingthebetterscalabilityofourdataset.

5Experiment

Weaimtoanswerthefollowingresearchquestionsinourexperiments:

•RQ1:CanLPKGoutperformbaselinemeth-odsonconventionalcomplexQAdatasets?

•RQ2:CanplanningdataderivedfromKGshelpimprovetheplanningabilityoftheLLMs?

•RQ3:CanplanningdataderivedfromKGs

bemorehelpfulinimprovingtheLLMs’planningabilitycomparedtonormaldistillationmethods?

•RQ4:CanLPKGoutperformbaselinemeth-odsonthenewbenchmarkCLQA-Wiki?

5.1ExperimentalSettings

DatasetsWefirstconductexperimentsonfourconventionalcomplexQAdatasets:

HotPotQA(Yangetal.,

2018),2WikiMulti

HopQA(2WikiMQA)(Hoetal.,

2020),MuSiQue

(Trivedietal.,

2022

),andBamboogle(

Pressetal.,

2023

).Amongthem,HotPotQA,2WikiMQA,andMuSiQuecontaincompletedtrainsets,developmentsets,andtestsets,whileBamboogleisasmalldatasetthatonlycontains125testdata.

Similartothepreviousmethod(Shaoetal.,

2023;

Aksitovetal.,

2023

),werespectivelyextractthefirst500entriesfromthedevelopmentsetofHotPotQA,2WikiMQA.ForMuSiQue,wefollow

Pressetal.

(2023)touseonly2-hopquestions

inthedevelopmentset.AndforBamboogle,weuseallofitsdataastestdata.Finally,weconducttestingonourbenchmarkCLQA-Wiki.

BaselinesWecompareourframeworktovariousbaselines:•Direct:DirectlyinputtheoriginalquestionintoLLM.•CoT:Follow

Kojimaetal.

(2022),weinstructLLMfirstly“Thinkstepbystep”

andthengivethefinalanswers.•DirectRAG:ThepromptsenttoLLMcontainstheoriginalquestionandretrievedinformationrelatedtotheoriginalquestion.•ReAct

(Yaoetal.,

2022

):Answeringquestionsthroughiterativeplanning,action,andobservation.Theactionhereistheretrievaltoolandobservationistheretrievedinformation.TheplanningandQAareconductedonasingleLLM.•Self-Ask

(Pressetal.,

2023

):SimilartoReAct,itfirstinstructsLLMtojudgewhethersub-questionsareneeded.Ifso,itwillrequestLLMtogeneratethesub-questions,thenconductexternalretrievalbasedonthesub-questions,andallowLLMtopro-videanswersbasedontheretrievedinformation.•ICLPKGAvariantofLPKGframework.Plan-ningLLMsarenotfine-tuned,whilejustusingIn-ContextLearningtodoPlanningwithsomeKG-sourcedplanningdemonstrations.

EvaluationMetricsExactMatch(EM)issetasanevaluationmetricinHotPotQA,2WikiMQA,Bamboogle,andMuSiQue.WhileinCLQA-Wiki,weuseRecallandPrecision.

ImplementationDetailsAllbaselinesarecon-ductedwithgpt-3.5-turbo-1106

(GPT-3.5).Thepromptsof“Direct”,“CoT”,and“DirectRAG”arewrittenbyourselves.TheReActandSelf-AskarereplicatedbasedontheirsourcecodewiththeGPT-3.5API.Tofacilitateassessment,wewillaskthemodeltoonlyoutputconciseanswerphrases. Inourframework:(1)Forpatterngrounding,weuseWikidata15kasKG,whichcontainsabout15kentitiesand263relations.Theextractiontoolingroundingismodifiedfromexistingworks

(RenandLeskovec,

2020

).(2)FortheplanningLLMMp,wechooseCodeQwen1.5-7B-ChatandLlama3-8B-Instruct,oneexcelsatcodingwhiletheotherexcelsatcommonsensereasoning.Wefine-tunethemwithLoratuning,runningon4x80GA100GPUsforabout3hours.Thefine-tuningisconductedfor2epochs,withalearningrateof5e-5andacosinelearningratescheduler.(3)Forretrieval,followingpreviousworks(

Shao

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

文档简介

温馨提示

最新文档

评论

从知识图谱中习得大语言模型的规划能力 Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

文档简介

温馨提示

最新文档

评论

相关文档