Argus:基于层级引用关系的多智能体敏感信息泄露检测框架_第1页
Argus:基于层级引用关系的多智能体敏感信息泄露检测框架_第2页
Argus:基于层级引用关系的多智能体敏感信息泄露检测框架_第3页
Argus:基于层级引用关系的多智能体敏感信息泄露检测框架_第4页
Argus:基于层级引用关系的多智能体敏感信息泄露检测框架_第5页
已阅读5页,还剩20页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Argus:AMulti-AgentSensitiveInformationLeakageDetectionFrameworkBasedonHierarchicalReferenceRelationships

BinWang1,HuiLi1,∗,LiyangZhang2,QijiaZhuang2

AoYang1,DongZhang3,XijunLuo3,∗,BingLin4

1GuangdongProvincialKeyLaboratoryofUltraHighDefinitionImmersiveMediaTechnology,ShenzhenGraduate

School,PekingUniversity2UniversityofElectronicScienceandTechnologyofChina3TencentSecurityPlatform

Department4ChinaUnicom(Guangdong)IndustrialInternetCo.,Ltd

thebinking66@

,

lih64@

,{2022090908021,

2022090917007}@

jarvisya@

,{zalezhang,

junjunluo}@

,

gds-cyhlw@

arXiv:2512.08326v1cs.CR9Dec2025

[]

Abstract

Sensitiveinformationleakageincoderepositorieshasemergedasacriticalsecuritychallenge.Traditionaldetectionmethods—relyingonregularexpressions,fingerprintfeatures,andhigh-entropy

calculations-suferfromhighfalse-positiverates,whichnotonlyre-ducedetectioneficiencybutalsosignificantlyincreasethemanualscreeningburdenondevelopers.Recentadvancesinlargelanguagemodels(LLMs)andmulti-agentcollaborativearchitectureshavedemonstratedremarkablepotentialintacklingcomplextasks,of-feringanoveltechnologicalperspectiveforsensitiveinformationdetection.Inresponsetothesechallenges,weproposeArgus,amulti-agentcollaborativeframeworkfordetectingsensitiveinfor-mation.Argusemploysathree-tierdetectionmechanismthatinte-grateskeycontent,filecontext,andprojectreferencerelationshipstoefectivelyreducefalsepositivesandenhanceoveralldetectionaccuracy.TocomprehensivelyevaluateArgusinreal-worldrepos-itoryenvironments,wedevelopedtwonewbenchmarks—onetoassessgenuineleakdetectioncapabilitiesandanothertoevaluatefalse-positivefilteringperformance.ExperimentalresultsshowthatArgusachievesupto94.86%accuracyinleakdetection,withapre-cisionof96.36%,recallof94.64%,andanF1scoreof0.955.Moreover,theanalysisof97realrepositoriesincurredatotalcostofonly$2.21.Allcodeimplementationsandrelateddatasetsarepubliclyavail-ableat

/TheBinKing/Argus-Guard

forfurtherresearchandapplication.

CCSConcepts

•Computingmethodologies→Naturallanguageprocessing;

•Securityandprivacy→Softwaresecurityengineering.

Keywords

sensitiveinformationleakage,coderepositorysecurity,multi-agentsystems,largelanguagemodels,contextualsemanticanalysis

ACMReferenceFormat:

BinWang1,HuiLi1,∗,LiyangZhang2,QijiaZhuang2andAoYang1,DongZhang3,XijunLuo3,∗,BingLin4.2026.Argus:AMulti-AgentSensitiveInformationLeakageDetectionFrameworkBasedonHierarchicalReference

ThisworkislicensedunderaCreativeCommonsAttribution4.0InternationalLicense.

ICSE’26,RiodeJaneiro,Brazil

®2026Copyrightheldbytheowner/author(s).

ACMISBN979-8-4007-2025-3/2026/04

/10.1145/3744916.3773208

Relationships.In2026IEEE/ACM48thInternationalConferenceonSoftwareEngineering(ICSE’26),April12–18,2026,RiodeJaneiro,Brazil.ACM,NewYork,NY,USA,

13

pages.

/10.1145/3744916.3773208

1INTRODUCTION

Publiccoderepositories,suchasGitHub,havebecomecentralplat-formsfordevelopercollaborationandversioncontrolinmodernsoftwaredevelopment.Theseplatformsenabledeveloperstoefi-cientlysharecode,trackissues,andmanageversionsrigorously,therebysignificantlyenhancingbothdevelopmenteficiencyandcodequality.However,theiropennaturealsointroducesnewse-curitychallenges,particularlyregardingthemanagementandpro-tectionofsensitiveinformation[

23

].AccordingtomonitoringdatafromGitGuardian[

9

],sensitiveinformationleakageincidentsonGitHubreached12.8millionin2023—a28%increaseover2022—withthetrendcontinuingupward.TheseleaksprimarilyinvolveAPIkeys,databasecredentials,privatekeys,andothercriticaldata,posingseriousrisksnotonlytoindividualprivacybutalsotoen-terprisesbyexposingthemtoseveresecurityvulnerabilitiesandpotentialeconomiclosses[

46

].ThepaperHowBadCanItGit?Char-acterizingSecretLeakageinPublicGitHubRepositories[

28

]discussestheprevalenceofsecretleakageinopen-sourceGitrepositories,highlightingtheurgencyofaddressingthisissue.

Currentapproachestodetectingsensitiveinformationleakscanbebroadlyclassifiedintotwocategories.Thefirstcomprisesrule-baseddetectiontools(e.g.,GitleaksandTru且eHog)thatrelyonregularexpressionsandentropycalculations[

38

].Thesecondin-volvesmachinelearningmethodsdesignedtoreducefalsepositivesthroughmodeltraining.However,bothapproacheshaveinherentlimitations.Whilerule-basedtoolsoferextensivecoverage,sometoolshaveafalsepositiverateofover80%[

2

],whichsubstantiallyunderminestheirutility.AsChessandMcGraw[

4

]havenoted,“anexcessivelyhighfalsepositiverateultimatelyleadsto100%ofleaksbeingoverlookedbecauseuserswilleventuallydisregardthedetec-tionresults.”Conversely,machinelearningmethods[

32

],thoughefectiveinreducingfalsepositives,lackadeepunderstandingofcodesemantics,renderingthemlessefectiveinmanagingcomplexcontextualrelationships.

Inrecentyears,theadventofLLMshasopenedanewtechni-calpathwayforsensitiveinformationdetection[

12

].Comparedtotraditionalmethods,LLMsofersuperiortextcomprehension,enablingthemtodeeplyanalyzecodecontextandidentifypotentialsensitiveinformation.However,relyingsolelyonLLMspresentschallenges:theymaystruggletopreciselyverifykeyformatsand

ICSE’26,April12–18,2026,RiodeJaneiro,BrazilBinWang1,HuiLi1,*,LiyangZhang2,QijiaZhuang2andAoYang1,DongZhang3,XijunLuo3,*,BingLin4

identifyplaceholders,andtheiroutputstabilitycandiminishwhenprocessinglengthytexts.Toovercometheselimitations,thecon-ceptof“AI-empoweredsoftwareengineering”hasemerged.ThisapproachleveragesmultipleAIagentsworkingincollaborationtoaddresscomplextasks.Theprincipleof“collaborativeAIforSE”involvesthecoordinatedoperationofseveralAIagents,eachcompensatingforthelimitationsofasingleagentwhentacklingintricateproblems.Forinstance,intaskssuchascodereviewandgeneration,multi-agentsystemshavedemonstratedsignificantad-vantages—suchasreducingsecurityvulnerabilitiesby13%[

29

]whenanLLMresponsibleforcodegenerationcollaborateswithagentsforstaticanalysisandfuzztesting—whileensuringfunc-tionalcorrectness.Thesefindingsunderscorethepotentialofacol-laborativemulti-agentstrategyinhandlingthediverseandhighlyaccuratedetectionrequirementsofsourcecodesensitiveinforma-tion.

Motivatedbytheseinsights,weproposeamulti-agentsensitiveinformationdetectionframeworknamedArgus.Thisframeworkemploysathree-tierdetectionmechanismthatintegrateskeycon-tent,filecontext,andprojectreferencerelationships,efectivelycompensatingforthelimitationsofasingleLLM.Eachagentfo-cusesonaspecificdetectiontask,andthroughtheircoordinatedef-forts,thesystemachievesstableandprecisedetectionoutcomes.Ad-ditionally,wehavedevelopedacomprehensiveevaluationdatasetthatencompassescommonsensitiveinformationscenariosfoundinopen-sourceprojects.ExperimentalresultsdemonstratethatArgusattainsadetectionaccuracyof94.86%onthisdataset,significantlyoutperformingexistingmethods.

Themaincontributionsofthispaperareasfollows:

(1)Weproposeanovelthreeleveldetectionmechanismthatofersacomprehensiveanalysisofsensitiveinformation,providingafreshperspectiveonapplyingLLMsinthefieldofsecuritydetection.

(2)Weconstructtwobenchmarkdatasetsbasedonreal-worldcoderepositoryscenarios,coveringawiderangeofsensitiveinfor-mationtypesandusagescenarios,therebyestablishingaunifiedstandardforevaluatingdetectiontools.Inaddition,weverifythevalidityofthesecretsineachrepositorytomitigatepoten-tialsecurityrisks.

(3)Wedesignandimplementamulti-agentsensitiveinformationdetectionframeworknamedArgus,whichachievesaprecisionof96.36%andarecallof94.64%onthebenchmarkdataset,sig-nificantlyoutperformingpreviousbaselinetoolsbyefficientlyidentifyinggenuineleakswhileefectivelyfilteringoutfalsepositives.

2PROBLEMANDMOTIVATION

2.1ProblemDescriptionandDefinitionofSecretLeakDetection

Inthisstudy,a“secret”referstosensitiveinformationthatappearsinplaintextwithinacoderepositorywithoutanyformofmaskingorencryption.Suchinformationtypicallyexhibitsthefollowingcharacteristics:

(1)FormatCharacteristics:Thesesecretsoftenhavefixedpre-fixesordistinctcharacterstructures.Forexample,anAWS

accesskeymightstartwith“AKIA”,oranRSAprivatekeymaybeidentifiedbymarkerssuchas“—–BEGINPRIVATEKEY—–”.Existingliteratureindicatesthatrule-basedmethodsprimarilytargettheseformattedpatterns.

(2)SemanticRelevance:Thecontentissemanticallytiedtoau-thentication,authorization,orsecurecommunicationsandiscloselylinkedtoactualbusinessoperations.Aleakofsuchinfor-mation—forinstance,anAPIkeyfromOpenAI—coulddirectlycauseservicedisruptionsorfinanciallosses.

Basedonthesefeatures,thisworkdefinesa“secretleak”astheoccurrencewhereanyfileinacoderepositorycontainsplain-textinformationthatmeetsthedefinitionofasecretandhasnotbeenproperlymaskedorencrypted.Itisimportanttonotethatsometextsmeetingthesecharacteristicsmayalsobepresentinarepository;however,ifcontextualcuesorrepositoryindicatorsmakeitclearthatthesecretwasintentionallymadepublicbythedeveloper,itshouldnotbeconsideredasecretleak.Thus,thecoretaskofthisworkistoaccuratelyidentifyandpinpointunintentionalsecretdisclosuresbydevelopers.

2.2ExcessiveFalsePositives

Currentmethodsfordetectingsecretleaksincoderepositories(e.g.,TruffleHog,Gitleaks)suferfromseverefalsepositiveissues.Thisnotonlyreducesthepracticalefficiencyofthesetoolsbutalsosignificantlyincreasesthemanualreviewburdenondevelopers,efectivelyrenderingahighfalsepositiverateequivalenttolowdetectionaccuracyinpractice[

1

].Mostexistingdetectiontoolsrelyoncustomrulesbasedonregularexpressions,fingerprintfea-tures,andhigh-entropycalculations.However,theseapproacheshaveclearlimitations.Forexample,manytoolsmistakenlyclassifycommithashstringsassensitiveinformation.Similarly,placeholderstringsintentionallyleftbydevelopers(e.g.,keytemplatesintheform“sk-xxxxxxxxxxxxxxxxx”)areerroneouslyflaggedasleakseventhoughtheyaremerelyintendedtoguidetheuserinenteringtheactualkey.

Table1:LeakDataStatistics

Platform

TE

RL

LR(%)

RL>5

TR

GitLab

1,606,827

9,803

23.44

2,330

41,826

GitHub

2,295,293

37,149

6.21

8,287

597,933

Gitee

494,247

14,750

7.27

4,512

203,012

Note:TE=TotalEntries,RL=RepositorieswithLeaks,LR=LeakRepoRatio,RL>5=Repositorieswith>5LeakEntries,TR=TotalRepositories.

2.2.1GeneralEvaluation.Inourcomprehensiveevaluation,weemployedtheactivelymaintainedopen-sourcetoolTruffleHogtosurvey2,022mirrorbackupsfromGitHub,GitLabandGitee.Scanningtheentiredatasetprovedprohibitivelyexpensive,sowerandomlysampledalargenumberofrepositoriestoestimatethefalsepositiverate.Giventhemanpowerrequiredtomanuallyverifyeverydetection,weinitiallytreatedTruffleHog’soutputsasgroundtruth.ThedetailedresultsappearinTable

1

.

Ouranalysisshowsthatover7.3%ofrepositoriescontainedatleastonereportedleak,risingto23.44%onGitLab,andthatapproximately5.57%ofrepositoriesreportedmorethanfiveleaks.

Argus:AMulti-AgentSensitiveInformationLeakageDetectionFrameworkBasedonHierarchicalReferenceRelationshipsICSE’26,April12–18,2026,RiodeJaneiro,Brazil

Structureddatafiles(.csv,.json)accountedforroughly520000detectionswhiledocumentfiles(.md,.txt)comprisedabout10%ofallfindings.Notably,repositoriesreportingmorethan50leakscontributed75.69%ofthetotaldetectionvolume.

AcloserexaminationofTruffleHog’soutputsrevealedahighprevalenceoffalsepositivesconcentratedinjustafewrules.Forexample,entriesflaggedbythe“Github”and“Gitlab”rulesmadeup72.4%ofalldetections,yetmanyofthesecorrespondedtocommithashesordefaultconfigurationfilesmisclassifiedassecrets.Like-wise,the“JDBC”and“URI”rulesprovedoverlybroad,frequentlytaggingtestdataandboilerplateassensitive.OnGitee,suchspuri-ousentriestotaledaround26000,representing5%ofallflags.Wefoundthatthesenoisyfilestypicallyfeaturetemplatedstructures,highlyrepetitivefieldsandrigidformatting,causingthesameruletotriggerrepeatedlyacrossmultipleprojects.Thissystematicam-plificationoffalsealarmsplacesaheavyburdenondownstreamanalysis.

Tovalidateourfalsepositiveassessment,werandomlysampled2000entriesfromTruffleHog’sfulloutput.Twoindependentanno-tatorswithsecurityexpertisereviewedeachrecord,classifyingitasagenuinesecretorafalsepositivebasedoncontextualsemantics,structuralpatternsandknownnon-sensitivemarkers.Athirdre-viewerresolvedanydisagreements.Thefinalannotationsshowedthatfewerthan3.4%ofsampledrecordsrepresentedgenuineleaks;thevastmajorityconsistedofdefaultvalues,placeholders,debuginformationorhighlyrepetitivestrings.Thesefindingsconfirmthatwhilesecretleaksareindeedwidespread,falsepositivesarepervasive.

Table2:FalsePositiveStatisticswithVersionInformation

Repository

Version

TH

GL

ST

WP

moby

c710b88

83

181

(148,11,0)

73

kubernetes

9253c9b

142

306

(110,66,19)

27

bitcoin

bf03c45

3

71

(2,2,102)

12

neovim

8b98642

5

7

(2,0,0)

3

webpack

3612d36

17

1

(1,1,0)

4

spring-boot

8964203

56

26

(28,11,11)

2

fastapi

113da5b

1

28

(45,0,0)

1

pandas

0691c5c

2

1

(1,2,0)

4

vue

13f4e7d

56

1

(1,0,0)

3

transformers

5d7739f

5

13

(0,66,8)

2

Note:TH=TruffleHog,GL=Gitleaks,ST=SpectralOps,WP=Whispers,ThevaluesofSTrepresentthenumberofdetectionswith(high,mid,low)severitylevels.

2.2.2FalsePositiveExperiment.Toassessthelimitationsofcurrentsensitiveinformationdetectionmethods,weselected10high-starrepositoriesfromGitHubandanalyzedthefalsepositivecountsus-ingfourtools:TruffleHog,Gitleaks,SpectralOps,andWhispers.TheexperimentalresultsindicatethatTruffleHoggenerated370falsepositives.Althoughitsdeepscanstrategyhelpsinefectivelyfilter-ingouthigh-entropystrings,thereisstillroomforimprovementinhandlingcomplexencodingsandboundarycases.Gitleaks,despiteoferingbroaddetectioncoverage,produced635falsepositivesduetooverlybroadrulesettingsthatledtonumerousfalsepositivesin

testcodeandtemplatefiles,therebyincreasingthemanualreviewburden.SpectralOpsreportedthehighestnumberoffalsepositives.Althoughitcategorizestheresultsintohigh,medium,andlowrisktoprovidedeveloperswithaprioritizationreference,itsrelianceonmachinelearningandcontextanalysishasnotsufficientlyreducedfalsepositivesfromnon-sensitivecontent.Incontrast,Whispersgeneratedonly131falsepositives,alowercountprimarilyattribut-abletoitslimiteddetectionscope(focusingsolelyonhard-codedfiles)andrestrictedlanguagesupport(limitedtoJavaScript,Java,Go,andPHP)(seeTable2fordetails).

Overall,althoughtheactualoccurrenceofsensitiveinformationleaksinthesehigh-starrepositoriesisrelativelylow,theprevalentissueofexcessivefalsepositivesnotonlyincreasesthemanualreviewworkloadfordevelopersbutalsorisksoverlookinggen-uinesensitiveinformation.Toenhancedetectionefficiencyandsecurity,futurestrategiesmustaimtoreducefalsepositivesfur-ther—throughtheincorporationofcontextanalysisanddynamicruleadjustment—whilemaintainingabroadcoverage.

3DATASETS

Currentdatasetsinthesecretleakdetectiondomainexhibitseveralshortcomings.First,mostdatasetsfocusexclusivelyonasingletypeofsecret(e.g.,keysorcredentials),resultinginasignificantgapbetweenthecollecteddataandwhatisobservedinreal-worldcoderepositories[

27

].Second,thesedatasetsgenerallylackhierar-chicalgradinganddetailedcategorizationofsensitiveinformation,makingitdifficulttothoroughlyevaluatethedistinctcharacteris-ticsandrisksassociatedwithvarioustypesofsecrets.Moreover,duetothevariablequalityofprojectsonGitHub,manyexistingdatasetsinadvertentlyincludealargenumberoflow-qualityorinactiveprojects,whichintroducessamplebias.Lastly,evenwhensomedatasetsaresourcedfromreputablerepositories,thesecretscontainedthereinmaystillbeactive,therebyposingadditionalsensitivityandsecurityrisks[

8

][

3

].

Toaddressthelimitationsfoundinexistingdatasets—suchaslimitedsecrettypes,lackofhierarchicalannotation,andinsuffi-cientvalidation—weconstructtwonewdatasets:CommonLeakandTrustedFalseSecrets.CommonLeakisbasedonaTruffleHogscanofaGitHubsnapshotfromJune2022.Wemanuallyselected97rep-resentativeprojectscoveringtencommonsecrettypes,includingAWS,GitHub,Huggingface,JDBC,MongoDB,OpenAI,PostgreSQL,PrivateKey,Redis,andURI.Eachcandidatewasreviewedbytwoindependentannotatorsusingcontextandsemanticstodistinguishrealsecretsfromfalsepositives.Disagreementswereresolvedbyathirdreviewer.Allconfirmedtrueleaksweredeactivatedforsaferelease.Thefinaldatasetcontains57truepositivesand40falsepositives.DetailsareshowninFigure

2

andTable

3

.Trusted-FalseSecretsfocusesonrepresentativefalsepositives;wecurated20typicalcasesfromtenwell-maintainedopen-sourcerepositoriestoillustratecommonmisclassificationsmadebyregex-basedtools.Thisdatasetofersacleanbenchmarkforevaluatingfalsepositivemitigationtechniques(seeTable

8

).

4METHODOLOGY

Inthissection,wepresentthedesignofArgus.Fromamethodolog-icalperspective,Argusemploysathree-levelanalysisframework

ICSE’26,April12–18,2026,RiodeJaneiro,BrazilBinWang1,HuiLi1,*,LiyangZhang2,QijiaZhuang2andAoYang1,DongZhang3,XijunLuo3,*,BingLin4

Table3:CompositionofConfigandOthers

ConfigOthers

TotalProp.SubcategoryTotalProp.

25.71%Java

14.29%CS

11.43%Dockerfile8.57%Shell

8.57%Typescript8.57%PHP

5.71%C

5.71%Gradle

5.71%Html

2.86%TCL

2.86%CPP

5

2

2

2

2

2

2

1

1

1

1

9

5

4

3

3

3

2

2

2

1

1

23.81%

9.52%

9.52%

9.52%

9.52%

9.52%

9.52%

4.76%

4.76%

4.76%

4.76%

Subcategory

Env

Json

PropertiesIpynb

MarkdownKey

Git

Data

Pem

Txt

Conf

(a)Compositionofdatasetcategories(b)CompositionofDatasetLanguage

Figure2:CompositionofDatasetandSubcategories

toassesssecrets,leveragingamulti-agentcollaborationmecha-nismtodistributeandcoordinatetasks.Additionally,itutilizesasharedmemorypooltorecordintermediateprocessesandfacilitateinformationsharingamongagents.

4.1Three-LevelContextualSemanticAnalysis

Traditionaltoolsforscanningcoderepositoriesforsecretsarefre-quentlyoverwhelmedbyfalsepositives.Toaddressthischallenge,weproposeadetectionmethodbasedonthree-tiercontextualse-manticanalysis,implementedthroughamulti-agentsystemthatautomatesdecision-making.Thisapproachdecomposesthesecretdetectiontaskintothreeinterconnectedlayers:theanalysisofin-trinsickeyfeatures,thesemanticinterpretationofitsimmediatecontext,andtheexaminationofproject-levelreferencerelation-ships.Together,theselayersformahierarchical,traceable,andinterpretabledetectionprocess(AsshowninFigure

3

).

4.1.1Level1:AnalysisofIntrinsicSemantics.Atthisinitiallevel,thefocusissolelyonthesecret’sownfeatures.Thegoalistorapidlydismissobviousfalsepositivesbyinspectingcharacteristicssuchasreadability,placeholderusage,andadherencetospecifickeyformats.Falsepositivesatthisstagegenerallyfallintothreecategories:

(1)ReadableKeys:Forexample,astringlike

https://readonly:

readonly@www.pauldreik.se

issemanticallyclearandlacksthehighentropyorspecificstructureexpectedofagenuinekey.Traditionaltoolsthatrelysolelyonentropyandregexmatchingoftenfailtoproperlyfiltersuch“readable”pseudo-keys,whereasLLMscanusetheirsemanticanalysiscapabilitiestorecognizethesenon-genuinecharacteristics.

(2)KeyswithPlaceholders:Forinstance,mongodb://username:password@serverappearsindocumentation(e.g.,Markdownfiles)asanexample,usingfixedplaceholderslikeusernameorpassword.Byanalyzinglarge-scaledata,wehaveidentifiedasetofcommonplaceholders.Ourplaceholderdetectiontoolchecksforthesemarkerswithinthekey,therebyflaggingsuchcasesaslikelyfalsepositives.

(3)KeysNotConformingtoSpecificFormats:Forexample,jdbc:postgresql://mightbeatruncatedversionofalegitimatekeyformat,omittingtherequireduser-nameandpassword.Traditionalregex-baseddetectiondoesnotdiferentiatebetweenvalidandinvalidformatsacrosskeytypes.Toremedythis,wehavedesignedpreciseregexpatternsforma-jorkeytypes(e.g.,AWS,JDBC,MongoDB)toverifycompliancewiththeirexpectedformats.

Implementation-wise,eachtoolcomprisesanLLMandaspecificfunction.Forexample,akeyformatcheckerintegratesaGPT-4omodelwithacustompromptandaregularexpressionmatchingfunction.Thefunction’sparameters,returnvalues,andusageareembeddedintheprompttoenabletheLLMtoaccuratelyutilizethetoolandprovidecorrectfeedbackbasedonitsdetectionresults.EachagentconsistsofanLLMandmultipletools.Forinstance,aprelim-inaryinspectionagentincludesanLLMwithacustompromptandtoolssuchasplaceholderandkeyformatcheckers.Thedefinitionsofthesetoolsareincorporatedintotheprompttoguidetheagentinselectingappropriatetoolsforsecretinspection.Unliketraditionalsecretdetectiontools,thedetectionresults(e.g.,regularexpressionmatches)arenottreatedasfinalconclusionsbutratherasevidencefortheLLM’sjudgment.TheLLMperformssecondaryanalysisandinference,consideringtheactualcharacteristicsofthekey,toinferwhetherthekeyisgenuineormerelyresemblesone.

Implementation-wise,therelevanttoolsareencapsulatedwithinindividualagents.Insteadofdirectlyusingregexmatchresultsasthefinalverdict,theseresultsareprovidedtoanLLM,whichcombinestheclueswiththekey’sintrinsicfeaturestoinferwhetherthekeyisgenuineormerelyresemblesone.

4.1.2Level2:SemanticAnalysisoftheSecret’sImmediateContext.Atthesecondlevel,thefocusshiftstocaseswhereakey,whileap-pearingauthenticonitsown,isintendedsolelyfordemonstration,teaching,ortesting.Becausesuchkeysexhibitalltheintrinsicchar-acteristicsofgenuinesecrets,astandaloneanalysisisinsufficient.Instead,thesurroundingcontextmustbeexamined.

Argus:AMulti-AgentSensitiveInformationLeakageDetectionFrameworkBasedonHierarchicalReferenceRelationshipsICSE’26,April12–18,2026,RiodeJaneiro,Brazil

Figure3:OverviewoftheArgusframeworkanditsoperationalflow

Forinstance,adocumentmightincludeaSECRET_ACCESS_KEYexampleaccompaniedbyexplanatorytextclarifyingitsinstruc-tionalpurpose.Traditionaltoolsmightsimplyflagthekeyasarisk,butourmulti-agentsystemfeaturesanadvancedcontextanaly-sismodule.Thismodulescrutinizesannotations,comments,andnearbynarrativecuestodetermineifthekeyismerelyillustrativeratherthanoperational.

4.1.3Level3:GlobalReferenceAnalysisattheProjectLevel.Incaseswherekeysareembeddedasstandalonefiles(e.g.,RSAprivatekeysorcertificates)anddisplayalltheattributesofgenuinesecrets,relyingsolelyonintrinsicfeatureanalysisorimmediatecontextualevaluationmaynotyielddefinitiveresults.Toaddressthis,Level3detectionexaminesthekey’sroleanditsrelationshipswithintheentireproject.BelowisanillustrationoftheLevel3detectionprocessusinganRSAprivatekeyinspectionasanexample:

(1)InitialDiscovery:ThescanningtooldetectsafilematchingtheRSAprivatekeyformat.Sinceitdoesnottriggerobviousfalsepositiveconditionsintiersoneortwo,itsauthenticityremainsundetermined.

(2)ReferencePathCheck:Theadvancedmoduleretrievesthefile’sreferencelocationwithintheproject.Forexample:

TheRSAprivatekeyisreferencedinthefile:

final_dataset\PrivateKey\...\

pay.py

Thissuggeststhekeyislikelyutilizedbyafunctionalmodule.

(3)ContextualAnalysisoftheReference:Afurtherexami-nationofthecodeinpay.pyshowsthatthekeyfileisreadandassignedtoavariable(e.g.,app_private_key_string)inconjunctionwithtermslikealipay_public_key_string,in-dicatingitsroleingenuinepaymentorencryptionoperations.

(4)FinalDetermination:Lackinganyindicatorsthatthekeyisusedfortestingordemonstration,andgivenitsactiveusageincorefunctionalities,thesystemconcludesthatitisagenuinesecretleak.

Overall,Levelthreefocusesonproject-levelusageandreferencerelationships.Ifakeycannotberuledoutasafalsepositiveviaintrinsicorcontextualanalyses,examiningitspracticaldeployment(throughreferencepaths,functioncalls,orfiledependencies)often

yieldsthefinaldetermination:ifitisemployedinproduction,itistreatedasagenuineleakrequiringimmediateremediation.

4.2RoleSpecialization

Inourmulti-agentsystem,wefirstdesignateaninitialscreeningagenttolocatehigh-entropyorfeature-basedsecretcandidates.Next,aCommanderactsastheultimatedecision-maker,delegat-ingtaskstotwospecializedroles:theBasicCheckAgentandtheAdvancedCheckAgent.Eachroleisequippedwithspecifictoolandfunctionalcapabilities,workingtogethertodetermine

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论