




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
LearningtoPlanforRetrieval-AugmentedLargeLanguageModelsfromKnowledgeGraphs
JunjieWang1,2,5*,MingyangChen3
*
,BinbinHu2,5,DanYang2,5,ZiqiLiu2,5,
YueShen2,5,PengWei2,5,ZhiqiangZhang2,5,JinjieGu2,5,JunZhou2,5,
JeffZ.Pan4,WenZhang1,5†,HuajunChen1,5
†
1ZhejiangUniversity,2AntGroup,3BaichuanInc.,4TheUniversityofEdinburgh
5ZhejiangUniversity-AntGroupJointLaboratoryofKnowledgeGraph
{wangjj2018,zhang.wen,huajunsir}@,chenmingyang@
Planning
/j.z.pan/
/zjukg/LPKG
Abstract
arXiv:2406.14282v3[cs.CL]23Oct2024
Improvingtheperformanceoflargelanguagemodels(LLMs)incomplexquestion-answering(QA)scenarioshasalwaysbeenaresearchfo-calpoint.Recentstudieshaveattemptedtoen-hanceLLMs’performancebycombiningstep-wiseplanningwithexternalretrieval.WhileeffectiveforadvancedmodelslikeGPT-3.5,smallerLLMsfacechallengesindecomposingcomplexquestions,necessitatingsupervisedfine-tuning.Previousworkhasreliedonman-ualannotationandknowledgedistillationfromteacherLLMs,whicharetime-consumingandnotaccurateenough.Inthispaper,weintro-duceanovelframeworkforenhancingLLMs’planningcapabilitiesbyusingplanningdataderivedfromknowledgegraphs(KGs).LLMsfine-tunedwiththisdatahaveimprovedplan-ningcapabilities,betterequippingthemtohan-dlecomplexQAtasksthatinvolveretrieval.Evaluationsonmultipledatasets,includingournewlyproposedbenchmark,highlighttheef-fectivenessofourframeworkandthebenefitsofKG-derivedplanningdata.
1Introduction
Thepastfewyearshavewitnessedsignificantin-
novationsinLLMs(Ouyangetal.,
2022;
Touvron
etal.,
2023;
Chowdheryetal.,
2023;
AI@Meta,
2024
).WhileLLMsexcelinmanynaturallan-guageprocessingtasks,theystillfacechallenges,particularlythesmallermodels,inhandlingcom-plexquestion-answering(QA)tasks(
Pressetal.,
2023;
Shaoetal.,
2023;
Yaoetal.,
2022;
Xiong
etal.,
2024a;
Huangetal.,
2024)
.
ToimprovetheperformanceofLLMsoncom-plexQAtasks,pastresearchhastriedvariousmeth-ods:(1)Employingcarefullydesignedpromptstrategiestoguidethemodelinreasoning,such
asChainofThought(CoT)(Kojimaetal.,
2022;
*Equalcontribution.
†Correspondingauthors.
Pattern
Q:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?
Spouse
Sports
FranWalshAns_1
Q1:WhoisFranWalsh’sSpouse?
A1:Ans_1
Q2:Whatsportsdoes{Ans_1}play?
Instance
A2:Ans_2
Sports
Q3:WhatsportsdoesFluminenseplay?
A3:Ans_3
Fluminense
FinalAnswer:A2&A3
Figure1:AnexampleofaKGpattern,itsgroundedinstance,andverbalizedplanningprocess.
Weietal.,
2022
)andTreeofThought(ToT)(
Yao
etal.,
2024
)methods;(2)Utilizingretrievaltech-niquestoobtainsupplementalinformationfromexternalknowledgesource(
Lewisetal.,
2020;
Guu
etal.,
2020);(3)Combiningpromptstrategieswith
retrievalenhancements,asexemplifiedbymeth-
odslikeReAct(Yaoetal.,
2022)andSelf-Ask
(Pressetal.,
2023
).Thethirdapproachhasgar-neredwidespreadresearchinterestduetoitsinte-grationoftheadvantagesofthefirsttwomethods.Thefundamentalideaofthisclassofmethodsisto
guideLLMsinbreakingdownacomplexquestionintomultiplesimplersub-questionsandthenusearetrieval-augmentedgeneration(RAG)(
Huang
etal.,
2023,
2024
)methodtoanswereachsub-question,therebydeducingtheanswertotheorigi-nalcomplexquestion.However,planningforcom-plexquestionsisnon-trivial,especiallyforsmallerLLMs(withfewerthan10billionparameters),whichoftenrequiresupervisedfine-tuning(
Ak-
sitovetal.,
2023;
Chenetal.,
2023a;
Qinetal.,
2023
).
Thisraisesawidelyconcerningissue:howtoobtainsuperviseddataforlearningtheplanningabilityoncomplexquestions.Manualannotationistime-consumingandlabor-intensive,makingitdifficulttoscale.Mostexistingmethodsattemptto
distillknowledgefromteacherLLMs(Yaoetal.,
2022;
Aksitovetal.,
2023
),whichplacesexcessive
trustintheteacherLLMsand,inreality,cannot
guaranteetheaccuracyofthedistilledknowledge.
Thesechallengesinspireustoexplorenewwaysofobtainingsupervisedplanningdata.
KnowledgeGraphs(KGs)(Panetal.,
2017b,a)
usuallystoreaccurateknowledgeinastructuredway.WefindthataKGpatterncanbeviewedastheabstractofacomplexquestion,asshowninFigure
1
,whichrevealstheconnectionbetweenquestionplanningandpatterns.Thisopensupthepossibilityofconstructingtrainingdatatoen-hancetheplanningcapabilitiesofLLMsusingKGs.Specifically,westartbygroundingpredefinedpat-ternsinanopen-domainKGtoextractnumerousinstances,whichwethenverbalizeintocomplexquestionsandcorrespondingsub-questionsinnat-urallanguage.Inthisway,weeffectivelycreatealargenumberofaccurateplanningdataforfine-tuning.Beingfine-tunedwiththeseplanningdata,LLMs’capabilityofgeneratingplansforcomplexquestionsisenhanced,resultinginbetterfinalan-swersbyparsingandexecutingtheseplans.WerefertothisinnovativeframeworkasLearningtoPlanfromKnowledgeGraphs(LPKG).
Additionally,weconstructaComprehensiveLogicalQAbenchmark,CLQA-Wiki,fromasub-
setofWikidata(VrandecicandKrötzsch,
2014)via
groundingrichpatternsasaforementioned.Exist-
ingcomplexQAbenchmarks(Yangetal.,
2018;
Ho
etal.,
2020;
Pressetal.,
2023;
Trivedietal.,
2022
)primarilyfocusonmulti-hopandcomparison-typequestionsandlacklogicaloperations.Furthermore,mostquestionsarelabeledwithonlyoneanswer,whereasinreality,theyoftenhavemultiplecorrectanswers.TheCLQA-Wikibenchmarkevenlycov-ersmulti-hop,comparison,intersection,anduniontypesofquestions,whichismorecomprehensiveandchallengingforcomplexQAevaluation.
Ourcontributionscanbesummarizedasfollows:
(1)WeintroduceanovelframeworkLPKGthatenhancestheplanningabilityofLLMsusingdataconstructedfromKGpatterns;(2)Wedevelopacomprehensiveandchallengingevaluationbench-mark,namedCLQA-Wiki,tomoreeffectivelyas-sesstheperformanceofLLMsoncomplexQAtasks;(3)OurproposedframeworkLPKGachievesbetterresultsthanpopularbaselinesonmultipleconventionalcomplexQAbenchmarks,andweverifytheeffectivenessoftheintroductionofKG-sourcedplanningdata.
2RelatedWorks
ReasoningandPlanningwithLLMsInthecon-textofLLMs,reasoningtypicallyinvolvesdecom-posingcomplexquestionsintosub-questions(
Mi-
alonetal.,
2023;
Haoetal.,
2023)
.Prominenttech-niquesincludeChain-of-Thought(CoT)prompt-ing(
Weietal.,
2022
)whichelicitsrationalesthatleadtothefinalanswers,anditsextension,usingself-consistency(
Wangetal.,
2023)orautomated
demonstrationselection(
Zhangetal.,
2023)
.Other
methods,suchasReAct(Yaoetal.,
2022),gen
-eratereasoningstepssequentiallybyintegratingplanning,withadditionalstrategieslikeTreeof
Thoughts(ToT)(Yaoetal.,
2024),Reasoningvia
Planning(RAP)(Haoetal.,
2023),andothermeth
-ods(
Khotetal.,
2023;
Zhouetal.,
2023
)facil-itatingcomplexquestiondecompositionthroughvariedplanningapproaches.Unlikemostmethodsthatrelyonin-contextlearningthroughprompten-gineering,ourapproachgeneratesplanningdatafromKGstofine-tuneLLM,therebyenhancingtheirplanningcapabilities.
Retrieval-AugmentedGenerationRetrieval-AugmentedGeneration(RAG)canenhanceLLMsbyincorporatingexternaldata,allowingmodelstoaccessup-to-dateinformationandfactualknowl-edgetomitigatehallucinations(
Gaoetal.,
2023;
Guuetal.,
2020;
Lewisetal.,
2020
).Eachmod-uleintheRAGpipelinecanbeoptimized,forin-stance,throughretrievertuning(
Shietal.,
2023;
Linetal.,
2023
),self-reflectionduringretrieval
(Asaietal.,
2023;
Yanetal.,
2024
),orqueryre-finement(
Chanetal.,
2024
).Toaddressmulti-
hopquestions,iterativeRAGmodels(Shaoetal.,
2023;
Fengetal.,
2023;
Pressetal.,
2023)have
beendeveloped,whichiterativelyconductretrieval-enhancedgenerationandgeneration-enhancedre-trieval.However,themultipleRAGstepsinexist-ingmethodsarenotoptimizedandrelyheavilyonin-contextlearning.OurapproachusesplanningdatafromKGstofacilitatemoreefficientRAG.
LLMswithKGsIntheexistingrealmofLLMs,KGsareprimarilyutilizedassourcesofstructuredfactualknowledge(
Panetal.,
2023
).Forexam-ple,Think-on-
Graph(Sunetal.,
2023)extracts
relevanttriplesfromKGstoassistinQA.Reason-
ingonGraph(RoG)(Luoetal.,
2023)generates
relation-basedplansandretrievescorrespondingpathsfromthesegraphs.WhileaidinginKGQAtaskswhereanswersaredirectlysourcedfrom
Step2:PlanningLLMTuningandInference
SFTInference
“WhichregionsborderDrakeBell'sbirthplaceandSantaAnaatthesametime?”
Sub_Question_1:str="WhatisthebirthplaceofDrakeBell?"Info_1:str=Search(query=Sub_Question_1)
Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)
Sub_Question_2:str=f"Whichareasborderwith{Ans_1}"Info_2:str=Search(query=Sub_Question_2)
Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)
Sub_Question_3:str="WhichareasborderwithSantaAna?"Info_3:str=Search(query=Sub_Question_3)
Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)
Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)
Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)
Step3:PlanParsingandExecution
RetrievalQALLM
SetOperate
FinalAnswer
…
Inter_Results1:str=Intersection…
Sub_Question_1:…
Info_1:str=Search…
…
Ans_1:str=Get_Answer…
Final_Answer…
…
…
Step1:DataConstruction
Sports
FranWalshAns_1
FluminenseSports
KnowledgeGraph
Spouse
Q1:WhatistheSpouseofFranWalsh?
A1:Ans_1
Sports
Q2:Whatsportsdoes{Ans_1}play?A2:Ans_2
Sports
Q3:WhatsportsdoesFluminenseplay?A3:Ans_3
FinalQ:WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?
Output
Sub_Question_1:str="WhatistheSpouseofFranWalsh?"Info_1:str=Search(query=Sub_Question_1)
Ans_1:str=Get_Answer(query=Sub_Question_1,info=Info_1)
Sub_Question_2:str=f"Whatsportsdoes{Ans_1}play?"Info_2:str=Search(query=Sub_Question_2)
Ans_2:str=Get_Answer(query=Sub_Question_2,info=Info_2)
Sub_Question_3:str="Q3:WhatsportsdoesFluminenseplay?"
Info_3:str=Search(query=Sub_Question_3)
Ans_3:str=Get_Answer(query=Sub_Question_3,info=Info_3)
Inter_Results1:str=Intersection(Answer1=Ans_2,Answer2=Ans_3)
Final_Answer:str=Finish_The_Plan(Answer=Inter_Results1)
Input
##Example0##
…
##Example1##
...
##YourTurn##
Original_Question:str=‘WhatsportshaveFluminenseandFranWalsh'sspouseplayedin?’
Grounding
Verbalization
Filling
Spouse
Figure2:OverviewofourLearningtoPlanfromKnowlegeGraph(LPKG)framework.
KGs,thesegraphsalsosupportrationalegeneration.Chain-of-Knowledge(CoK)(
Lietal.,
2024
)fur-therleveragesKGsalongwithotherheterogeneoussourcestogeneratefaithfulrationales.Unlikepre-viousstudies,ourapproachconstructsplanningdataforcomplexquestionsfromKGs,recogniz-ingthatpatternswithinKGsinherentlyrepresentmulti-stepplans.ThisdataisutilizedtoenhancetheplanningcapabilitiesofLLMs.
ComplexLogicalQueryinKGsRecentre-searchoncomplexlogicqueriesinKGsprimarilyfocusesonfirst-orderlogical(FOL)queriesthatincorporateoperationslikeconjunctions,disjunc-tions,negation,andexistentialquantifierswithin
incompleteKGs(Hamiltonetal.,
2018;
Renetal.,
2020;
RenandLeskovec,
2020;
Arakelyanetal.,
2021;
Chenetal.,
2022;
Xuetal.,
2022;
Xiong
etal.,
2024b;
Wuetal.,
2024
).Theseworksde-finediversepatternstoassessthecapabilityoflog-icaloperationsinvectorspaces,specificallytar-getinglogicalformsratherthannaturallanguage.Nonetheless,theirmethodologiesforpatterndefini-tionandextractioninspireourapproachtoderivingcomplexquestionsfromKGs.
3Method
3.1Overview
AsshowninFigure
2
,thereare3stepsinourLearningtoPlanfromKnowledgeGraphs(LPKG)
framework.(1)Inthedataconstructionstep,weconstructplanningdatafromKGs.Specifically,wedefinedsomebasicKGpatternsasshowninFigure
3.
WegroundpatternsinanexistingKGtoextractinstances.Foreachextractedinstance,wesequen-tiallyverbalizethesub-querieswithintheinstanceintonaturallanguagesub-questionsaccordingtotheirorderintheinstance,eventuallyassemblingthemintoacomplexquestion.Afterward,webuildinputandoutputtemplatesforplanningdata,wherecomplexquestionsareconcatenatedtotheinputprompt,andsub-questionsarefilledintothecorre-spondingpositionsintheoutputtextaccordingtothetypeofpatterns.(2)IntheplanningLLMtun-ingandinferencestep,wefine-tuneLLMsbasedonsuchplanningdatatoenabletheLLMstofol-lowinstructionstoinfertheplanforeachquestioninthedownstreamtestsets.(3)Inthethirdstep,suchaplanwillbeparsedandexecuted,therebyobtainingthefinalanswertoeachquestion.
3.2ConstructionofPlanningData
BasicKGPatterns.Inspiredbypreviouswork
oncomplexlogicquerieswithinKGs(Ren
andLeskovec,
2020
),wedefinethebasicKGpatternsasshowninFigure
3.
ThesetofKGpatternsisdenotedasP={1p,2p,3p,2i,3i,2u,ip,pi,compare}.Specifi-cally,p,i,urespectivelyindicateprojection,in-tersection,andunion.1p,2p,and3prepresent
c
u
ccompare
u
1p2p3p2u
2i3ipiip
Figure3:BasicKGpatterns.
queriesthatspanfromonetothreehops,2iand3irespectivelyrepresenttheintersectionoftwosub-queriesandthreesub-queries,2urepresentstheunionoftwosub-queries,andipandpirepresentcomplexqueriesthatcombinetwo-hopwithinter-sectionlogic.Inaddition,wealsocombinepairsoftriplesthathavenumerictailentitiesandthesamerelationstoconstructcomparisonpatterns,denotedascompare.
Grounding.GivenaKG,wefirstgroundthesepatternsinittoextractinstances:
Ipat=fpat(KG),pat∈P(1)
whereIpataretheinstancesgroundedbyknowl-edgegraphKGofpatternpat,fpatisthecorre-spondingextractionfunction.Forexample,anin-stanceofthe2ppatterncanbe“(Inkheart,(castmember,educatedat))”.Tobestmeettheneedsofopen-
domainQA,weuseWikidata15k(Chenetal.,
2023b
),asubsetoftheopen-domainKGWikidata,asKG.
Verbalization.Subsequently,basedonthegroundedinstances,weneedtoverbalizethembottom-upintosub-questionsandassemblethemintocomplexquestions.Thereareseveralmethodsforthisstep,suchasatemplates-basedmethod,manualannotation,orutilizinganLLM.Sincethetemplate-basedapproachoftenlacksfluencyinlanguageexpression,andthemanualmethodistime-consumingandlabor-intensive,weoptforanLLM-basedmethod.Specifically,wewriteasmallnumberofverbalizationexamplesforeachpatterntype.TheseexamplesareusedasdemonstrationsDe1tofillintheprompt.Finally,weconcatenateagroundedinstancei∈Ipattotheprompt,ask-inganLLMtoverbalizeittoanaturallanguagequestion:
{{Qsn}=1,Qc}=llm(concat(De1,i))(2)
where{Qsn}=1andQcrepresenttheresulting
sub-questionsandcomplexquestionrespectively,
concatisstringlevelconcatenation.WeuseGPT-4asllmhere.Itisimportanttonotethatherethellm’sroleismerelytotransformthedataformat;thesub-questionsandcomplexquestionstillorig-inatefromthestructureoftheKGitself,withoutintroducinganyknowledgefromthellminthetaskofquestionplanning.ThepromptweusecanbefoundinAppendix
C.1.
Filling.Wethenextractsub-questionsandcom-plexquestionsfromtheoutputofthellm.Subse-quently,webuiltasetofplanningtemplatesTpatfortheplanningprocessofquestionscorrespond-
ingtoeachpattern.The{Qsn}=1obtainedinthe
previousstepwillbefilledintofixedpositionsinTpatcorrespondingtotheirpatterntype,therebyobtainingtheoutputfortraining.TheQcobtainedinthepreviousstepisconcatenatedtotheendofafixedinstructionInsandsomeplanningdemon-strationsDe2(alsoconstructedfromKGs),thusobtainingtheinputfortrainingdata:
x=concat(Ins,De2,Qc)(3)
y=Tpat.fill({Qs}=1),pat∈P(4)
where.fillisafillingfunctionoftemplatesTpat.Inspiredby(
Aksitovetal.,
2023
),weuseacode-formattedinputxandoutputyhere(shownin“In-put”and“Output”inFigure
2
)tofacilitatefor-mattingandsubsequentparsingandexecutionoftheoutputplan(moredetailsinAppendix
C.2
).Intheend,weobtain9000trainingdataentries
Dtrain={xn,yn},with1000entriesforeach
pattern.Werandomlyselect100itemsfromthetrainingsetsformanualverification,withanaccu-racyrateofover95%.
3.3Fine-tuningandInferenceofPlanning
LLMs
WeusetheobtainedtrainingdataDtraintofine-tunetheplanningLLMsMpdirectlywiththestan-dardnexttokentrainingobjective:
E(x,y)∈DtrainLogpMp(y|x)(5)
Thefine-tunedplanningLLMMpcanbeusedtoinfertheplanPforeachquestionQtestinthedownstreamtestset:
P=Mp(concat(Ins,De2,Qtest))(6)
whereInsandDe2arethesameasthecontents
intheEquation(3
).Itshouldbenotedthatinthe
Type
Count
Type
Count
2pquestion
200
3pquestion
200
2iquestion
200
3iquestion
200
ipquestion
50
piquestion
50
2uquestion
200
comparequestion
100
Table1:DistributionofCLQA-Wiki.
multi-hopquestions,thespecificsub-questionsinthesecondandthirdhopsneedtobeconstructedbasedontheanswerstotheprevioushop’ssub-questions.SinceourPoutputsallprocessesatonce,theMpcannotknowtheanswerstotheprevi-oushop’ssub-questionswhenoutputtingtheplans.Therefore,wewilluseaplaceholdertoreplacetheanswertotheprevioushopsub-questions,allow-ingtheplanningtoproceedsmoothly(asshowninTable
9,
10,
13,
14
inAppendix
C.1
).Theseplace-holderswillthenbefilledinduringthesubsequent
parsingandexecutionprocess.
3.4PlanParsingandExecution
TheobtainedplanPneedstobeparsedandexe-cutedtoobtainthefinalansweroftheQtest.Duetoouradoptionofcode-formattedinputandoutputforfine-tuningtheMp,thePhereisalsohighlyfor-mattedcode,whichfacilitatesourparsingofeachstepoftheplanandexecutingthem.Inparticular:
•Whenastepincludesa“Search”function,wewillcallanexternalretrievaltool.
•Whenastepincludesa“GetAnswer”func-tion,we’llinvokeanexternalQALLMMQAtogetanswersforasub-questionbasedonthere-trievedinformation.Thepossibleplaceholdersinsub-questionswillbefilledwithpreviousanswers.WeaskQALLMtoorganizeanswersintheformofalist(promptisshowninTable
7
inAppendix
C.3
).
•When“Intersection”or“Union”appearsinthestep,wewillrunactualintersectionorunionfunctions.Thiscanbeeasilycompletedduetolistformatanswersinthepreviousstep.
ItisimportanttonotethattheplanningLLMMpandtheQALLMMQAarecompletelydecou-pledinourframework.HerewecanuseanyLLMoff-the-shelftohandlethetaskofQA.Ultimately,wecanobtaintheanswertoQtest.
4NewBenchMark:CLQA-Wiki
TheconventionalcomplexQAdatasetsinclude
HotPotQA(Yangetal.,
2018),2WikiMultihopQA
(Hoetal.,
2020
),MuSiQue(
Trivedietal.,
2022
),
andBamboogle(Pressetal.,
2023)
.DespitetheirwidespreaduseinevaluatingtheQAperformanceoflanguagemodels,weidentifysomeproblemswiththesedatasets:
(1)Allthesedatasetsareprimarilyfocusedonmulti-hopandcomparison-typequestions.Thetypesofquestionsarenotbalancedandcomprehen-siveenough,andlessattentionispaidtoquestionsinvolvingintersectionandunionlogic,whicharealsoverycommoninreality.
(2)ExceptforMuSiQue,thequestionsontherestoftheotherthreedatasetsonlyhaveonean-swer,whereasmanyquestionsinrealityoftenhavemultipleanswers.Forexample,theanswertoanintersectionquestion“WhichcountryborderswithRussiaandChinaatthesametime?”isaset[Mon-golia,Kazakhstan,NorthKorea].
Inlightofthis,weaimtoconstructanewtest-ingbenchmarkthatembodiesmorecomprehensivelogicandallowsforanunrestrictednumberofan-swerstomorethoroughlyevaluatetheperformanceoflanguagemodelsonvariouslogicalquestions.Consideringthedetailedpatternstructuresandun-restrictednumberofanswerentitiesinKGs,weconstructatestsetbasedonWikidata15k.
Similartothemethodusedtoconstructtheplan-ningdata,weextractinstancesfromWikidata15k(whichdonotappearinthetrainingdata)anduseGPT-4todoverbalization.Moreover,foreachinstance,wecanobtainalltheanswerentitiesfromWikidata15k,whichwethendesignateastheanswerstothequestions.Aftermanualqualitychecks,weobtainatestsetcalledCLQA-Wiki,whichcontains1,200piecesofdatafeaturingavarietyofComprehensiveLogicalQApairs.ThequestiontypesandtheirdistributionarelistedinTa-ble
1.
Itisworthnotingthatwehaveconstructed9typesoftestingquestionsuntilnow,andfornewlydefinedpatterns,wecanalsoquicklyconstructcorrespondingquestionsusingtheabovemethod,showingthebetterscalabilityofourdataset.
5Experiment
Weaimtoanswerthefollowingresearchquestionsinourexperiments:
•RQ1:CanLPKGoutperformbaselinemeth-odsonconventionalcomplexQAdatasets?
•RQ2:CanplanningdataderivedfromKGshelpimprovetheplanningabilityoftheLLMs?
•RQ3:CanplanningdataderivedfromKGs
bemorehelpfulinimprovingtheLLMs’planningabilitycomparedtonormaldistillationmethods?
•RQ4:CanLPKGoutperformbaselinemeth-odsonthenewbenchmarkCLQA-Wiki?
5.1ExperimentalSettings
DatasetsWefirstconductexperimentsonfourconventionalcomplexQAdatasets:
HotPotQA(Yangetal.,
2018),2WikiMulti
-
HopQA(2WikiMQA)(Hoetal.,
2020),MuSiQue
(Trivedietal.,
2022
),andBamboogle(
Pressetal.,
2023
).Amongthem,HotPotQA,2WikiMQA,andMuSiQuecontaincompletedtrainsets,developmentsets,andtestsets,whileBamboogleisasmalldatasetthatonlycontains125testdata.
Similartothepreviousmethod(Shaoetal.,
2023;
Aksitovetal.,
2023
),werespectivelyextractthefirst500entriesfromthedevelopmentsetofHotPotQA,2WikiMQA.ForMuSiQue,wefollow
Pressetal.
(2023)touseonly2-hopquestions
inthedevelopmentset.AndforBamboogle,weuseallofitsdataastestdata.Finally,weconducttestingonourbenchmarkCLQA-Wiki.
BaselinesWecompareourframeworktovariousbaselines:•Direct:DirectlyinputtheoriginalquestionintoLLM.•CoT:Follow
Kojimaetal.
(2022),weinstructLLMfirstly“Thinkstepbystep”
andthengivethefinalanswers.•DirectRAG:ThepromptsenttoLLMcontainstheoriginalquestionandretrievedinformationrelatedtotheoriginalquestion.•ReAct
(Yaoetal.,
2022
):Answeringquestionsthroughiterativeplanning,action,andobservation.Theactionhereistheretrievaltoolandobservationistheretrievedinformation.TheplanningandQAareconductedonasingleLLM.•Self-Ask
(Pressetal.,
2023
):SimilartoReAct,itfirstinstructsLLMtojudgewhethersub-questionsareneeded.Ifso,itwillrequestLLMtogeneratethesub-questions,thenconductexternalretrievalbasedonthesub-questions,andallowLLMtopro-videanswersbasedontheretrievedinformation.•ICLPKGAvariantofLPKGframework.Plan-ningLLMsarenotfine-tuned,whilejustusingIn-ContextLearningtodoPlanningwithsomeKG-sourcedplanningdemonstrations.
EvaluationMetricsExactMatch(EM)issetasanevaluationmetricinHotPotQA,2WikiMQA,Bamboogle,andMuSiQue.WhileinCLQA-Wiki,weuseRecallandPrecision.
ImplementationDetailsAllbaselinesarecon-ductedwithgpt-3.5-turbo-1106
1
(GPT-3.5).Thepromptsof“Direct”,“CoT”,and“DirectRAG”arewrittenbyourselves.TheReActandSelf-AskarereplicatedbasedontheirsourcecodewiththeGPT-3.5API.Tofacilitateassessment,wewillaskthemodeltoonlyoutputconciseanswerphrases. Inourframework:(1)Forpatterngrounding,weuseWikidata15kasKG,whichcontainsabout15kentitiesand263relations.Theextractiontoolingroundingismodifiedfromexistingworks
(RenandLeskovec,
2020
).(2)FortheplanningLLMMp,wechooseCodeQwen1.5-7B-ChatandLlama3-8B-Instruct,oneexcelsatcodingwhiletheotherexcelsatcommonsensereasoning.Wefine-tunethemwithLoratuning,runningon4x80GA100GPUsforabout3hours.Thefine-tuningisconductedfor2epochs,withalearningrateof5e-5andacosinelearningratescheduler.(3)Forretrieval,followingpreviousworks(
Shao
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年文化遗产保护与管理考试题及答案
- spijava面试题及答案
- 公共政策与社会稳定的关系探讨试题及答案
- 城市犯罪防控政策的有效性评估试题及答案
- 软考网络工程师复习计划与试题及答案
- 新考纲适应策略2025年信息系统项目管理师试题及答案
- 西方政治制度对国际关系的影响试题及答案
- 公共政策中的风险管理策略试题及答案
- 公共政策实施中的风险管理试题及答案
- 如何提升信息系统项目管理师考试中的独立思考能力试题及答案
- 电场电场强度
- 国开可编程控制器应用形考实训任务二
- 白酒质量要求 第4部分:酱香型白酒
- JT-T-329-2010公路桥梁预应力钢绞线用锚具、夹具和连接器
- 湖北武汉市2024届高三冲刺模拟数学试卷含解析
- 2024年浙江台州椒江区公安局警务辅助人员招聘笔试参考题库附带答案详解
- 广东省广州市天河区2024年八年级下册数学期末考试试题含解析
- 土木工程专业毕业答辩常问问题
- 红色大气商务企业启动会企业启动仪式
- 2024年新改版苏教版六年级下册科学全册复习资料
- 手机制造行业未来五至十年行业分析
评论
0/150
提交评论