版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
AuditingMeta-CognitiveHallucinationsinReasoningLargeLanguageModels
HaolangLu1,YilianLiu1,JingxinXu1,GuoshunNan1,
arXiv:2505.13143v1[cs.CY]19May2025
YuanlongYu1,ZhicanChen1,andKunWang2
1BeijingUniversityofPostsandTelecommunications,China
2NanyangTechnologicalUniversity,Singapore
Abstract
ThedevelopmentofReasoningLargeLanguageModels(RLLMs)hassignificantlyimprovedmulti-stepreasoningcapabilities,butithasalsomadehallucinationproblemsmorefrequentandhardertoeliminate.Whileexistingapproachesaddresshallucinationthroughexternalknowledgeintegration,modelparameteranalysis,orself-verificationmechanisms,theyfailtoprovideacomprehensiveinsightintohowhallucinationsemergeandevolvethroughoutthereasoningchain.Inthiswork,weinvestigatehallucinationcausalityunderconstrainedknowledgedomainsbyauditingtheChain-of-Thought(CoT)trajectoryandassessingthemodel’scognitiveconfidenceinpotentiallyerroneousorbiasedclaims.Analysisrevealsthatinlong-CoTsettings,RLLMsmayiterativelyreinforcebiasesanderrorsthroughflawedreflectiveprocesses,ultimatelyinducinghallucinatedreasoningpaths.Counterintuitively,evenwithinterventionsathallucinationorigins,reasoningchainsdisplaypronounced“chaindisloyalty”,resistingcorrectionandsustainingflawedtrajectories.Wefurtherpointoutthatexistinghallucinationdetectionmethodsarelessreliableandinterpretablethanpreviouslyassumed,especiallyincomplexmulti-stepreasoningcontexts.UnlikeAnthropic’scircuittracingthatrequiresaccesstomodelparameters,ourauditingenablesmoreinterpretablelong-chainhallucinationattributioninblack-boxsettings,demonstratingstrongergeneralizabilityandpracticalutility.Ourcodeisavailableat
thislink
.
1Introduction
ReasoningLargeLanguageModels(RLLMs)[
9
,
70
,
32
]havegainedincreasingattentionfortheirabilitytoperformmulti-stepreasoningthroughstructuredChain-of-Thought(CoT)andself-reflectionmechanisms[
49
,
26
,
31
,
64
].Whilethesemechanismsimproveperformanceincomplexreasoningtasks[
62
,
9
,
50
],theyalsoexacerbatetheriskofhallucinationbyamplifyingearly-stageerrorsacrossextendedreasoningchains.Inparticular,hallucinationsinlong-CoTsettingsmaybeiterativelyrevised,elaborated,orreframedthroughthereasoningprocess.Thisresultsinfinalanswersthatappearcoherentyetembeddeeplymaskedfactualerrors,whileusersoftenfocusontheanswerratherthanthereasoningprocess,thusfailingtorecognizethepresenceofhallucinations[
4
,
37
].
NumerousresearchinstitutionsandgroupshavemadesignificanteffortstoaddresshallucinationinLLMs[
5
,
25
,
26
,
66
].Atthesurfacelevel,existingliteraturemainlyfocusesondetectionandmitigationmethodsthatleverageexternalknowledgesources(e.g.,knowledgebases)[
40
,
6
],orutilizeself-checkingmechanisms[
26
,
20
].Alternatively,othermethodsarealgorithm-based,suchasusingperplexity[
21
,
16
]ordetectingthemodel’shiddenstates[
48
,
14
,
8
,
72
]toidentifyhallucinationsinlongermodeloutputs.InthecontextofCoTreasoning,somestudieshaveexploredthemulti-stepreasoningphenomenoninherenttoCoT[
27
,
9
,
32
,
60
],aimingtounderstanditsimplicationsforthereasoningmodel’soutputaccuracy[
28
]andreliability[
41
,
55
].Atadeeperlevel,understanding
Preprint.Underreview.
2
knownReasoningphase
RealworldunlknownIncorrect
II
LLMLearned
unlearned
unseenIncorrect
Trainset
Realworld
(a)KnowledgeDomain
claim1
vumu
Access>
step3
claim3
claim2
step2
step1
(b)Knowleges&Reasoning
>
Refection
2
3
>>
>>
>
<
7
→
>>
4
4
3
2
5
(c)CoTTrajectory
Figure1:Motivation.(a)Divisionofknowledgedomainsindifferentphases,distinguishingtwoillusionpatterns.(b)Incorrectandfactualknowledgearetransformedintoclaimpropagationduringthereasoningprocess.(c)Reflectionreinforcestheoriginalclaim,resultinginhallucination.
theunderlyingmechanismsofhallucinationiscriticalforimprovingRLLMs,asthecomplexityofthereasoningchainoftenmeansthatsurface-leveldetectionmethodsmaynotguaranteeoptimaloutcomes.Inthisregard,workshavemadenotablecontributionsbyleveragingsparseencoders[
1
]andcausalprobing[
71
]totracewhichcomponentsofthemodelcontributetospecificoutputs[
47
].Inthispaper,wesystematicallyinvestigatetheemergenceandevolutionofhallucinationsinreasoningchainswithoutopeningtheblack-boxmodels,offeringamoregeneralizableapproach.Concretely,weconstructacontrolledknowledgedomainthatcapturestwotypesofhallucinatedcases,overcomingthedifficultyofreliablyreproducinghallucinationsinacontrolledsetting(Figure
1a
).Then,wepresentamodelingsystemforlong-CoTthattrackshowknowledgeisintroduced,feedback,andrefinementacrossmultiplereasoningsteps,addressingthechallengeofstudyinghallucinationevolutionwithincomplexreasoningtrajectories(Figure
1b
).Goingbeyondthis,wealsoaudithallucinationinstancestoattributethepropagationofhallucinationsinreal-worldcases,tacklingthechallengeofunderstandingtheunderlyingmechanismsbehindthehallucinationsinlong-CoTreasoning.AsillustratedinFigure
1c
,k1andk3introducehallucinationsthrougherroneousknowledge,corruptingtheinitiallycorrectCoT’sstep1(c1)intothehallucinatedc4viac3reflection,therebydemonstratingpotentialrisksinreasoningmodels.
Throughcomprehensiveanalysis,weidentifythecoremechanismbehindhallucinationinRLLM.Welistourpivotalexperimentalinsightsandourcontributionsasfollows:
TheRLLMfailstoaccuratelyassessitsmetacognitiveconfidenceinclaimsderivedfromincorrectknowledge,leadingtothemistakenreinforcementofuncertainclaimsthroughreflectivereasoning.
✤HallucinationOrigin.Hallucinationsemergefromincorrectknowledgewhenthemodeloverconfi-dentlygeneratesclaimsthatithasnotproperlyinternalized,leadingtothepropagationoferrorsthroughoutthereasoningprocess.Inlong-CoTunder1,000+tokens,theLLMs’overconfidenceleadstohallucinationpassageratesof62.54%and56.08%acrossdifferentsettings(TypeIandTypeIIinFigure
1a
),respectively.Meanwhile,themodelsuccessfullyresistserroneousguidanceinonly10.66%ofcases,demonstratingthecriticaltendencyofover-alignmentwithuserprompt.
✤HallucinationPropagation.Reflectioninlong-CoTreasoningamplifieshallucinationsbyrein-forcingerroneousclaims,withthemetacognitive[
42
,
46
]confidenceincreasingfortheseflawedclaimsdespitetheirinaccuracy.Inthehallucinationgroup,weobserve~2.12×higheraveragereflectionfrequencycomparedtothecontrolgroup,including220%morehedgingwordsand219%increasedhesitanttones-alldemonstratinghowreflectionamplifieshallucinationphenomena.
✤CurrentDeficiencies.Ourstudyrevealsthatinterventionsfailtoaltertheirultimateoccurrence,
andcurrentmodelslacksufficientcapabilitytoaddressthem.Despiteourattemptstomitigate
downstreamhallucinationsthroughinterventionediting,only22.5%ofcasessuccessfullyreversed
thehallucinatedoutcome.Furthertestingshowedthateventheoptimalhallucination-handling
approachachievedonly78.95%accuracywhilerequiringday-scalecomputationalcosts,and
alternativedetectionmethodsyieldedAUROCscoresbelow55%.Thesefindingsunderscorethe
persistentchallengesinhallucinationmitigation,highlightingtheneedforextendedexploration.
3
2ModelingHallucinationinReasoningChains
Toexplorethepropagationofknowledge-basedhallucinationsthroughmulti-stepreasoninginRLLMs,webeginbyclassifyinghallucinationcases,modelingknowledgeflowwithinhallucinations,andpresentingourinsightsandassumptionsregardingReflectionandMetacognition,whicharesubsequentlyvalidatedinSection
3
.
2.1HallucinationModeling
Toprovideacompleteperspectiveonhallucinations,webeginwiththefollowingassumptionaboutthemodel’strainingenvironmenttobettermodeltheproblemofhallucinationslateron:
AssumptionA(Accuratebutincomplete):ThetrainingcorpusDcontainsonlyaccurateknowledgeunitsk,i.e.,∀k∈D,k∈W,whereWdenotesthesetofallreal-worldknowledge.However,Disincomplete,thereexistk*∈Wsuchthatk*D.
LetKMdenotethesetofknowledgesetslearnedbythemodelMtrainedfromD,andletconfM(k)denotethemodel’sconfidenceingeneratingknowledgeunitk.Figure
1a
illustratesataxonomyofhallucinationbehaviors,alignedwiththesourceofknowledgeexposureduringtraining:
TypeIHallucination(SeenbutUnlearned).Whenk∈DbutkKM,i.e.,themodelhasseentheknowledgeunitduringtrainingbutfailedtolearnorgeneralizeitproperly.ThishallucinationmayarisewhenthemodelexhibitshighconfidenceconfM(k)inaknowledgeunitk∈DthathasnotbeeneffectivelyinternalizedintoitslearnedknowledgesetKM,indicatingapotentialgapbetweentrainingdataandactualknowledgeacquisition.
TypeIIHallucination(UnseenorIncorrect).ThiscategoryoccurswhenkDandkKM,suchthatthemodelhasnoknowledgebasistogeneratek.Fromthemodel’sperspective,bothunseentruths(k∈W,kD)andwrongknowledge(kW)areequallyabsentfromtraining.HallucinationsmayarisewhenthemodelfailstoassignconfM(k)≈0tosuchknowledgeunits.
2.2KnowledgeinvolvedinReasoningProcess
TounderstandhowthesedefinedhallucinationspropagatethroughthesequentialstepsofreasoninginRLLMs,wenextformalizethestructureofreasoningchains.Followingpriorwork[
9
],weformallydefinealong-CoTasastructuredreasoningprocess.ThisprocessisexpressedinEquation(
1
),incorporatesknowledge,modelsreflection,anddiscardingintermediatereasoningpaths.
Here,eachreasoningnodecdenotesanatomicclaim,whichmayeitherbeinternallygenerated(ci)orinducedfromexternalknowledgeaski→cki.Themainreasoningtrajectoryisdefinedbydirectededgesci→cj(j>i)orci→ckj′,allowingbothlinearpropagationofthereasoningprocessandtheinjectionofknowledge.Priorworkhasobservedreflectionphenomenainlong-CoTs,wheremodelsrevisitearlierreasoningstepsforverification.Tocapturethis,weintroducereflectionlinksrefl(cp=cq),representingrecursiverevisitingofpriorclaims.
Additionally,weobservethatnotallclaimscontributetosubsequentreasoning.Inpractice,modelsmayselectivelydropspecificclaims,e.g.,eliminatingincorrectoptionsinamultiple-choicedecision.Tocapturethisbehavior,wedefinedropedgescm⊣。,whichmarktheendofareasoningbranch,therebyallowingthemodeltoabandonunpromisingsubchains.
2.3ReflectionandMetacognition
Buildingontheestablishedtaxonomyofhallucinationandthemodelingofknowledgepropagationinreasoningchains,wefurtheraimtoexplainwhymodelshallucinatewithhighconfidencebyexplicitlymodelinghowclaim-levelconfidenceevolvesduringreasoning.Inthesubsequentmodeling,wefollowtheassumptionbelow.
AssumptionB(Prompt-AlignedBeliefAdaptation):Duringreflectivereasoning,themodeltendstore-evaluatepriorclaimsinawaythatalignsmorecloselywiththesemanticdirectionoftheuserinput.Thisbiasarisesfromthemodel’strainingoninstruction-followingdatasets,whichcanleadtoaprioritizationofcoherencewiththepromptoverfactualcorrectness.
4
WefollowpriorCoTmodelingworkindecomposingthereflectionprocessintotwostages:feedbackandrefinement.Formally,thenextclaimafterreflectioniscomputedas:
cq+1←Refine(cq|Feedback(cq-1,cq),g(cq,prompt)),(2)
∆conf(cp,cq)=conf(cq)—conf(cp)=α·f(cp-1,cq)+(1—α)·g(cq,prompt).(3)
InEquation(
2
),Feedback(cq-1,cq)capturesthedirectionalinfluenceofthemostrecentreasoningstepcq-1beforereflectionfinish,whichmayreinforceorweakenthebeliefincqdependingonitsfactualconsistency.Thefunctiong(·)modelsaprompt-alignedbias,characterizingthemodel’stendencytoadjustitsconfidenceinaclaimbasedonhowwelltheclaimsemanticallyalignswiththeuserinput.Therefinementstepmaypreservetheclaimcontentoryieldanewreasoningstep,dependingonthejointinfluenceofthetwofactors.Equation(
3
)providesanexplicitformulationofthisadjustment,showinghowtheupdatedconfidenceincqemergesfromtheweightedcombinationofinternalfeedbackandpromptalignment.
AccordingtoAssumptionB,theprompt-alignedbiasg(cq,prompt)isexpectedtoincreasewiththesemanticsimilaritybetweentherevisitedclaimandtheinput,satisfying:
∂g(cq,prompt)/∂sim(cq,prompt)>0.(4)
Iftherevisitedclaimcqshowshighersemanticsimilaritytotheuserinputthanitsearliercounterpart(i.e.,sim(cq,prompt)>sim(cp,prompt)),thenthemodelismorelikelytoincreaseitsconfidence,resultinginapositiveexpectedvaluefor∆conf(cq)>0.
3HallucinationEmergenceandEvolutioninLong-CoTReasoning
Inthissection,wepresentourexperimentalresultstovalidatethekeyfindingsrelatedtohallucinationemergenceandevolutioninlong-CoTreasoning,addressingthefourresearchquestionsbelow:
•RQ1:Howcanweconstructacontrolledknowledgeenvironmentthatenablesreliablerepro-ductionanddifferentiationofhallucinationtypesinreasoninglanguagemodels?
•RQ2:Howdoreflectivereasoningpatternsinteractwithmetacognitiveconfidenceandpromptalignmenttocauseandamplifyhallucinationsduringmulti-stepCoTgeneration?
•RQ3:TowhatextentcaneditinginterventionsatdifferentstagesofCoTinfluencedownstreamreasoningandfinalanswers,andwhatlimitstheircorrectiveimpact?
•RQ4:Doexistinghallucinationdetectionmethodseffectivelycapturethereflectiveandmetacognitivedynamicsobservedinlong-CoTreasoning?
3.1ControlledKnowledgeConstructionforHallucinationReproduction(RQ1)
Table1:Comparisonofstatisticsacrosstwotypesofhallucinationandtheirrespectivecontrolgroups.TypeIreferstoquestionsbasedonfactuallycorrectknowledge.TypeIIinvolvesquestionswithembeddedfactualerrors.TheAcceptanceRaterepresentstheratioofselectedsamplestototalgenerateddata,indicatingthedifficultyofasituation.
Statistic
TypeI
(SeenbutUnlearned)
TypeIControl
(CorrectAnswer)
TypeII
(UnseenorIncorrect)
TypeIIControl
(ErrorRejected)
Hallucination?
√
X
√
X
SampleSize(Questions)
439
500
484
92
SampleSize(Answers)
439*5
500*5
484*5
92*5
RelevantRFCsnumber
314
50
50
38
CoTAvg.Length(tokens)
1409.30
1028.82
1173.46
1254.47
AnswerAvg.Length(tokens)
210.71
621.11
416.73
412.04
AcceptanceRate
439/702
500/540
484/863
92/863
Toenablerigorousanalysisofhallucination,weconstructacontrolledknowledgeenvironmentd⊂Wthatsatisfiestwoformalconstraints:
1.BoundedScope:Thedomaindisclearlyboundedandexplicitlydefined,ensuringthatallknowledgeavailabletothemodelisfullyknowntotheevaluator.Noinformationoutsideofd(i.e.,fromW\d)caninfluencethemodel’sgeneration.
2.Verifiability:Eachknowledgeunitk∈dhasaclearlydefinedtruthvaluef(k)∈0,1,enablingunambiguousevaluationofwhetheraquestionormodelresponseisfactuallycorrect.
5
Tocreatetheenvironmentddefinedabove,weconstructadatasetbasedonRequestforComments(RFC)documents,astandardizedcollectionofprotocolspecifications.RFCsareparticularlyfittooursettingastheyofferaboundedtechnicalknowledgedomainwithverifiablegroundtruth.
Specifically,hallucinationsareidentifiedthroughself-consistencychecksandexternalverificationusingRFCreferences.Weretainonlythoseexamplesthatmeetstrictagreementthresholdsacrossmul-tiplegenerations.CompleteconstructionproceduresandfilteringcriteriaaredetailedinAppendix
B
.ThestatisticsontheconstructionprocessoftheillusiondomainarepresentedinTable
1
.
InTable
1
,ourknowledgeenvironmentcomprises1,515uniquequestions,pairedwith7,575answerstocapturevariabilityinreasoning.WeobservethattheCoTlengthinallsettingssignificantlyexceedsthefinalanswerlength,indicatingthatRLLMsallocatemoreefforttoreasoningthantoanswerformulation.ThelongestCoT(1409.39)andshortestanswers(210.71)appearinSeenbutUnlearnedhallucinations,whilethelongestanswers(621.11)areinthecontrolgroup,indicatingthatlongerreasoningchainscausedbyredundantreasoning,yetleadtoshorterandoverlyconfidentanswers.
ObsI.LowErrorRejectionRateRevealsPrompt-AlignedBias.AsshowninTable
1
,thenotablylowacceptancerateintheErrorRejectedcategoryrevealsthemodel’slimitedtendencytochallengefactuallyincorrectprompts.ThissupportsAssumptionBthatreflectivereasoningininstruction-tunedmodelstendstoprioritizesemanticalignmentwiththepromptoverfactualcorrectness.
3.2BehavioralAnalysisofHallucinationsinLong-CoT(RQ2)
Tobetterunderstandhowhallucinationsoccur,wefurtherannotatedthedatasetindetailandauditedthemodel’sresponsepatterns.Theannotationprocesscombinesbothautomatedroutinesandhumanverificationtoensureaccuracyandscalability,withcompleteproceduresdetailedinAppendix
C
.Wecategorizebehavioralpatternsalongseveraldimensions,assummarizedinTable
2
.
Table2:BehavioralpatternsforHallucinationTypeIandTypeIIwithControlCases.(A)OverallcharacteristicsofclaimsfromCoT;(B/C)Statisticsontheinvolvementofexternal/internalincorrectknowledge;(D)Evidenceofmodelreflection,includinghedging,interrogatives,andhesitationmarkers;and(E)Statisticsontherepetitionofkeyhallucinatedclaims.
BehavioralCategory
MetricDescription
Control(CorrectAnswer)
TypeI
TypeII
A.OverallClaims
Avg.oftotalclaimsperCoT
36.77
52.66
38.67
Avg.rate(Count)ofhallucinatedclaims
0.68%(0.25)
12.78%(6.73)
18.14%(7.01)
Avg.HallucinatedclaimDepth
11.53
38.10
24.42
B.ExternalKnowledge
Avg.ofexternalincorrectknowledge
–
–
2.95≈3
Adoptionrate(Count)ofexternalerrors
0
0
25.93%(0.76)
Correctionrate(Count)ofexternalerrors
0
0
28.94%(0.85)
Rejectionrate(Count)ofexternalerrors
0
0
45.13%(1.33)
C.InternalKnowledge
Avg.ofinternalincorrectknowledge
0.73
6.73
5.25
Adoptionrate(Count)ofinternalerrors
73.68%(0.53)
45.55%(3.06)
55.97%(2.94)
Correctionrate(Count)ofinternalerrors
15.79%(0.12)
41.65%(2.80)
34.23%(1.80)
Rejectionrate(Count)ofinternalerrors
10.53%(0.08)
12.80%(0.86)
9.61%(0.50)
D.ReflectionEvidence
Avg.ofexplicitreflectionobserved
4.40
9.33
7.12
Avg.ofhedgingwords(“perhaps”,“maybe”)
16.92
37.14
25.67
Avg.ofinterrogativesentencesinCOT
2.63
2.49
3.27
Avg.ofhesitationwords(“butwait”,“holdon”)
12.73
27.85
15.83
E.AmplificationEffects
Totaloftimeskey(hallucinated)claimsarerepeated
6.57
7.09
10.31
Avg.repetitionperkey(hallucinated)claim
1.31
1.42
2.06
InTable
2
,fivedimensionsareusedtoevaluatetheevolutionofhallucinationsinlong-CoT.TypeIandTypeIIcasesexhibitmoreclaims,higherhallucinationproportions(6.73and7.01vs.0.68),anddeeperhallucinationpositions(38.10and24.42vs.11.53)comparedtothecontrolgroup.
ObsII.LongerChainsReflectMetacognitiveDriftunderPrompt-AlignedBias.FromTable
2
,TypeI(SeenbutUnlearned)hallucinationsexhibitlongerreasoningchains(52.66vs.36.77).ThroughfurtherauditoftheCoT,werevealthatwhenthemodeltriestorecallaTypeIknowledgeunit,itoftenextendsthereasoningchain(longerreasoningchains)inanattempttoreinforceitsinitialuncertainty.
ThisbehavioralignswithourconfidencemodelinginSection
2.3
,wheretheconf(ci)isdynamicallyupdatedacrossthereasoningchain.InTypeIcases,sincetheknowledgehasbeenseenduringtraining,themodelmaymisjudgeitsownmetacognition,whichcanleadtohallucinations.
C8C9
Answer
6
NowturntotheanalysisofPartB/C.IntheTypeIIsetting,whereexternalerrorswereinjected(threeincorrectknowledge),themodeladoptedsomeoftheseinputs,witharateof25.93%.Themajorityoftheerrors(28.94%corrected+45.13%rejected)wereeithercorrectedorrejectedbythemodel.Whileitseemsthatthese0.76errorsplayedakeyroleingeneratinghallucinations,ourfurtheranalysisanddetailedauditingoftheCoTleadstoadeeperobservation.
ObsIII.ExternalErrorsLeadtoInternalKnowledgeErrorsFabrication.AuditoftheCoTrevealsthat,insomecases,themodelcorrectlyidentifiederrorsintheexternalknowledgesources.However,itstillpropagatedtheseerrorsduetoitsstrongprompt-alignedbias.Ratherthancorrectingorrejectingthefactualerrors,themodelgeneratedadditionalfakeinternalknowledgetosupportthealignmentwiththeprompt.ThestatisticsofinternalknowledgeinTypeIIconfirmthisobservation.
NowturntotheanalysisofPartD/E.Inthehallucinatedresponses,weobservedanincreaseinreflectivebehavior,particularlyintheformofhedgingandhesitation,whichshowsthemodel’suncertaintyduringreasoning.Theselinguisticfeaturessuggestthatthemodelengagesinreflection,revisitingitsreasoningthroughtheprocessoffeedbackandrefinement.
Task-
Restatement
TypeI
(SeenbutUnlearned)
k1(internal)
CK1
C2
Drop
C1
k3(internal)
CorrespondsCorresponds
k2(external)
CK3
CK2
C3
6
5
4
Drop
Reflection
C4
123
C5
7
C7
C9
C6
k4(internal)
...
Reflection
C12
C10
C11
CK4C8
Claim
(a)TypeI:SeenbutUnlearned
Control
Task-
Restatement
(ErrorRejected)
k1(external)
CK2
CK1
k2(external)
C1
CK3
C3
k3(internal)
Drop
CK4
k4(external)
C4
k5(internal)
CorrespondsCorresponds
C5
C6
1
2
C9
CK6
3
Reflection
k3(internal)
Reflection
C7
CK5
C8
C11
C10
...
Answer
3
C12
Claim
(b)Control:ErrorRejected
Task-
Restatement
TypeII
(UnseenorIncorrect)
k1(internal)
CK1
C1
k2(external)
CK2
C2
CK3
k3(external)
1
3
2
CorrespondsCorresponds
C5
C4
If
4
6
CK4
5
7
If
C7
If
C3
Reflection
k4(internal)
8Reflection
C6
C10
Claim
...
Answer
Reasoning
Cn/CKn
WrongClaim
ReflectionDrop
Cn/CKnSelf-queryClaim
(c)TypeII:UnseenorIncorrect
Figure2:ThreecasesillustratingtheCoTtrajectory.TypeI,themodelreflectsonpreviouslyseenbutunlearnedclaims;Control,errorsarerejectedthroughreflection;andTypeII,themodelgenerateshallucinatedanswersandrefinesthemthroughreflection.
Figure
2
presentsthreecases,whereFigure
2a
(TypeI)showsfrequentself-queries,whileFigure
2c
(TypeI)featuresmanyforcedassumptionsmarkedbyif.Notably,allthreecasesexhibitclearreflectionstructures.(DetailedanalysisandcasestudiesareprovidedinAppendix
C
).InFigure
2a
,theself-queryclaimc9→c10(correspondingtoc6)amplifiestheerrorthroughreflection,enablingck4topropagatedownstreamandultimatelyleadingtoahallucinatedanswer.WhileinFigure
2c
,c5reflectsintoacorrectclaimc6,thoughthemodellaterself-persuadesbyintroducingunreasonableassumptions(if)andnewinternalknowledge(ck4),ultimatelyleadingtohallucination.
ObsIV.ReflectionAmplifiesmetacognitionwithoutLogicalGrounding.Whilereflectioncanincreaseordecreaseconfidencedependingon∆conf(cp,cq),furtherauditingrevealsthatsuchcon-fidencechangesarenotalwaysreasonable.Specifically,hallucinatedcasesofteninvolvereflectionswhere∆conf(cp,cq)>0occursdespitetheabsenceofvalidsupport.Insteadofgroundedreasoning,themodeloftenreinforcesitsmetacognitionusingself-queryquestions,orunsupportedassumptions.
3.3ImpactofUpstreamReasoningonDownstreamFidelity(RQ3)
Toexaminehowchangesinupstreamreasoningaffectdownstream,weconductcontrollededitsonbothhallucinatedandnon-hallucinatedCoTtrajectories.Byinterveningatkeypoints,weassesshoweditsalterreasoningpathsandfinalanswersasshowninFigure
3
(seeAppendix
D
fordetails).
7
Start
Edit1
k1
CK1
Editfirsthallucination
...
ReasoningPath
Before/After
Editing
AuditingAfterEdit
Ci'(Edit123)
Edit
Rejected
Ci+1'
Influenced
followingclaims?
···
Influencedfinalanswer?
(StillHallucination?)
A2
Metric
Description
M1
IstheEdittingAccepted?
M2
IsthedownstreamCoTInfluenced?
M3
IsthefinalAnswerInfluenced?
M4
IstheNewCoTConsistentwithAnswer?
M5
EdittedclaimPropagatetoAnswer?
M6
IstheNewAnsweraHallucination?
Edit1
Edit2
Edit3
Control
M1(Accepted?)
83.5%
65%
65%
53.3%
M2(CoT
Changed?)
98.5%
97.5%
99%
96.6%
M3(Answer
Changed?)
98.5%
95%
90%
23%
M4(Consistent?)
77.5%
65%
55%
80%
M5
(Edit→Answer?)
40%
27.5%
25%
6%
M6
(Hallucination?)
77.5%
70%
85%
20%
Edit1
Edit2
Edit3
TypeI
TypeII
TypeI
TypeII
TypeITypeII
M1(Accepted?)
75%
90%
55%
75%
35%95%
M4(Consistent?)
90%
65%
75%
55%
75%25%
M5(Edit→Answer?)
65%
15%
35%
20%
25%5%
M6(Hallucination?)
95%
60%
95%
45%
90%80%
Figure3:DesignandresultsofourCoTeditingexperiments.(1)TheleftdiagramillustratestheprocessofmodifyingCoT,whereeditsareintroducedatthreedistinctinterventioneditpoints.(2)Therighttablespresentthecorrespondingevaluationresults.Top:metricindicesandtheirdescriptions.
Middle:comparativestatisticsacrossdifferenteditpointsforhallucinatedcasesandtheirrespective
controls.Bottom:Type-wisebreakdownacrossTypeIandTypeIIhallucinations.
ThetableinFigure
3
revealstwokeytrends.First,upstreamedits(Edit1)haveagreaterimpactondownstreamreasoningthanlaterones(Edit2and3),indicatingadecayini
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 财务顾问考试试卷及答案
- 麻醉前评估中的患者隐瞒病史与伦理应对
- 中国成人患者肠外肠内营养临床应用指南(2024版 全文精修+重症全覆盖)
- 贵州省铜仁市石阡县民族中学2026届高三质量监测(一)化学试题试卷含解析
- T∕CATAGS 66.3-2025 无人驾驶航空器系统指挥控制传输设备适航 第三部分:试验方法
- 安徽省定远县炉桥中学2026届高三全真化学试题模拟试卷(18)含解析
- 云南省会泽县第一中学2026届高考冲刺模拟(五)化学试题试卷含解析
- 九江市重点中学2026届高三5月月考(化学试题)试卷含解析
- 财务劳动合同
- 2025~2026学年海南海口市美兰区;秀英区;龙华区;琼山区度第一学期八年级英语科期末检测题(A卷)
- JT-GQB-008-1996公路桥涵标准图整体式钢筋混凝土连续板桥上部构造
- 跳远 教案(大学体育专业)
- 23悬挑花架梁悬挑支模架专项施工方案
- (高清版)DZT 0279.32-2016 区域地球化学样品分析方法 第32部分:镧、铈等15个稀土元素量测定 封闭酸溶-电感耦合等离子体质谱法
- 工程管理的前沿研究方向
- 脑机接口在医疗中的应用
- 267104 保险原理与实务 配套习题答案
- ISO27001-2022信息安全管理体系内审全套记录表格
- NY/T 388-1999畜禽场环境质量标准
- LY/T 1000-2013容器育苗技术
- GB/T 14486-2008塑料模塑件尺寸公差
评论
0/150
提交评论