版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
AegisAgent:AnAutonomousDefenseAgentAgainstPromptInjectionAttacksinLLM-HARs
YihanWang1,HuanqiYang1,ShantanuPal2,WeitaoXu1
1CityUniversityofHongKong,HongKongSAR,China
2DeakinUniversity,Australia
ABSTRACT
arXiv:2512.20986v1cs.CR24Dec2025
[]
TheintegrationofLargeLanguageModels(LLMs)intowear-ablesensingiscreatinganewclassofmobileapplicationscapableofnuancedhumanactivityunderstanding.However,thereliabilityofthesesystemsiscriticallyunderminedbytheirvulnerabilitytopromptinjectionattacks,whereat-tackersdeliberatelyinputdeceptiveinstructionsintoLLMs.Traditionaldefenses,basedonstaticfiltersandrigidrules,areinsufficienttoaddressthesemanticcomplexityofthesenewattacks.Wearguethataparadigmshiftisneeded—frompassivefilteringtoactiveprotectionandautonomousreason-ing.WeintroduceAegisAgent,anautonomousagentsystemdesignedtoensurethesecurityofLLM-drivenHARsystems.Insteadofmerelyblockingthreats,AegisAgentfunctionsasacognitiveguardian.Itautonomouslyperceivespotentialsemanticinconsistencies,reasonsabouttheuser’strueintentbyconsultingadynamicmemoryofpastinteractions,andactsbygeneratingandexecutingamulti-stepverificationandrepairplan.WeimplementAegisAgentasalightweight,full-stackprototypeandconductasystematicevaluationon15commonattackswithfivestate-of-the-artLLM-basedHARsystemsonthreepublicdatasets.Resultsshowitre-ducesattacksuccessrateby30%onaveragewhileincurringonly78.6msoflatencyoverheadonaGPUworkstation.OurworkmakesthefirststeptowardsbuildingsecureandtrustworthyLLM-drivenHARsystems.
1INTRODUCTION
LargeLanguageModels(LLMs)[
1
–
6
]haverecentlybeenin-tegratedintoInertialMeasurementUnit(IMU)basedHumanActivityRecognition(HAR)systems,enablingnewcapabil-itiesbeyondtraditionalsensor-basedclassification.Repre-sentativeefortssuchasIMUGPT-2.0[
7
],MotionGPT[
8
],andHAR-GPT[
9
]highlightthistrend:IMUGPT-2.0gener-atesvirtualIMUdatafromtextualdescriptions,MotionGPTperformsbidirectionaltranslationbetweentextandmotionsequences,andHAR-GPTdemonstrateszero-shotHARfromrawIMUdata.Buildinguponthisfoundation,recentlyemerg-ingframeworkssuchasLLASA[
10
]andContextGPT[
11
]havefurtheradvancedthetechnologicalfrontierbyintegrat-ingmultimodalperceptionfusion,activity-levelsemanticabstraction,andcontext-awarereasoningtailoredforhuman
HazardousResult
LLM-basedHARModel
Theuseriswalkingnormally.
Pleaseclassifytheactivity.
Userisunstableandabouttofall.
InjectedManipulation
CorrectClassification
WearableSensor
UserPrompt
Attacker
Figure1:OverviewofpromptinjectionthreatsinLLM-basedHARsystems.Motionsignalsgeneratedbywearablesensorsareconvertedintotextdescriptionsandintegratedwithuserinstruc-tionprompts.Attackersinjectadversarialmanipulationsinthesensor-to-promptpathway,alteringthegenerateddescriptionsandtherebymisleadingdownstreamLLM-basedHARmodels.Thiscanleadtohazardousresultinsafety-criticalapplications,suchaschangingafallresulttobewalkingnormally.
tasks[
12
–
14
].Collectively,theseframeworkstransformHARfromdiscreteclassificationtosemanticallyrichunderstand-ing.
However,despitepromisingprospects,integratingLLMsintoHARintroducesnewattacksurfaces.Theopen-endedgenerativecapabilitiesofLLMsmakethemvulnerabletopromptinjectionattacks[
15
–
18
],whereattackersmanipu-latemodeloutputsbyinjectingmaliciousinstructionsintoinputs.InLLM-basedHARsystems(termedLLM-HARintherestofthepaper),attackerscantamperwithmotiondescrip-tions,distortvirtualsensordatatocontaminatetrainingsets,orinduceinferenceerrors[
19
,
20
].AsshowninFigure
1
,adversarialmanipulationoccursintheconversionpathfromsensorstoprompts,wheretheattackerinjectscorruptedin-formationduringtheIMU–textgenerationprocess[
8
].ThismanipulationcanalterthesemanticdescriptionsusedbytheLLM,disruptcross-modalconsistency,andpotentiallyleadtodangerousmisclassificationsinsafety-criticalscenarios.Thisnovelattacksurface,absentintraditionalHARsystems,posesseverethreatstosafety-criticalapplications.Unliketra-ditionalattacksonIMUsignals,promptinjectionrequiresnophysicalcontactorspecializedhardwareandcanbeinitiatedremotelyvianaturallanguageinterfaces.Thisdrasticallylowerstheattackthresholdwhileamplifyingsystem-level
2
impacts:Oncepromptsaretamperedwith,entiredatagen-erationordecisionchainsmaybemanipulated,triggeringwidespreaddatasetcontamination,activitymisclassification,andcascadingfailuresindownstreamapplications.ThesecharacteristicsmakepromptinjectionaparticularlysevereandurgentsecuritythreatinLLM-HARsystems.
Existingresearchdoesnotsufficientlyaddressthisemerg-ingthreat.TraditionalHARsecuritystudiesfocusonadver-sarialperturbationsonsensorsignals[
21
],whiledefensesforLLMsmainlytargetjailbreak-styleattacksintext-onlysettings[
15
,
22
].Neitherisefectiveforthemultimodaltext-to-motion/IMUgenerationpipelines(e.g.,IMUGPT-2.0,Mo-tionGPT),wheremanipulationspropagateacrossmodalities.Consequently,LLM–HARsystemsremainlargelyvulnerable,callingfornewdefensemechanisms.
1.1ChallengesandContributions
WeidentifyseveralkeychallengesthatmustbeaddressedtosecureLLM–HARpipelinesagainstpromptinjectionattacks.
Challenge1:Novelattacksurface.UnlikeconventionalHARsystems,LLM–HARpipelinescandirectlygeneratesyntheticsensorormotiondatafromtextualprompts.Thiscreatesanunprecedentedattacksurfacewhereadversariesmanipulatelinguisticinputstosynthesizemisleadingse-quences.
Challenge2:Cross-modalpropagation.Adversarialmanipulationsineithertextorsensorchannelscanpropa-gateacrossmodalities,systematicallycorruptingsemanticinterpretationsofactivities.Existingtext-onlydefensesfailtocapturesuchmultimodalthreats.Weobservethatinjec-tionattackspropagatethroughthreetightlycoupledlayers:thesignallayer,thetextlayer,andthepromptlayer.Thesemulti-layeredattackscanbecombinedsynergistically,signif-icantlyenhancingattacksuccessrateswhilecircumventingsingle-modaldefences.
Challenge3:Highattackdiversityandreal-worldcomposability.Inpracticaldeployment,LLM-HARsystemsfaceabroadandhighlycomposableattackmethodswheread-versariescansimultaneouslymanipulatelanguageprompts,disruptintermediateinferencesteps,andinducesensor-levelinconsistencies.Unlikeisolatedsingle-channelattacks,theseperturbationsnaturallycoexistinreal-worldscenarios—suchasuser-generatednoisylanguageinputsalongsideacciden-talmotionanomaliesorcontextualambiguities.Suchhybridperturbationsconstitutelatentcompoundattacks,creatingmutuallyreinforcingefectsbetweensemanticdrift,toolhi-jacking,andIMUsignalinconsistencies.Thetendencyforattackstospontaneouslycombineintherealworldsignifi-cantlyincreasesattacksuccessratesandcomplicatesdefensedesign.
Toaddressthesechallenges,weproposeAegisAgent,thefirstautomateddefenseagentforLLM-HARsthatcanper-formpromptattackdetection,correctionandrecovery.Aegis-Agenteliminatesmanualrulewritingandmodel-specifictuning,enablingfullyautomatedoperation.Itstandardizesprompts,verifiescross-modalsemanticconsistency[
12
,
23
],anddetectsanomalousreasoningorsensoroutputswithintheagentloop.ThisautomationenablesAegisAgenttoadapttodiverseLLMs,datasets,andagentconfigurationswithouttask-specificretraining.Furthermore,AegisAgentprovidesaunifiedattackanalysispipelineandintroduceshazard-basedevaluationmetricstoquantifyresidualrisksacrossmod-elsundervariedadversarialprompts.Extensiveexperimen-talresultsdemonstratethatAegisAgentefectivelydefendsagainstpromptattackswhilepreservingmodelfunctionality.
Tosummarize,thispapermakesthefollowingcontributions:
•Wepresentthefirstsystematicstudyofinjectionat-tacksinLLM-HAR,revealingsignificantdiferencesfromfundamentalvulnerabilitiesfoundintraditionalHARpipelinesandplain-textLLMdeployments.
•Weformalizeandimplementfifteenrepresentativepromptinjectionattacks,spanningsignalpath,textpath,andpromptpath,andquantifytheirimpactacrossmultipleHARdatasets.
•WeproposeAegisAgent,adefenseframeworkthatmitigatespromptinjectionthreatsviainputsanitiza-tion,consistencyverification,androbustreasoning.AegisAgentisflexible,autonomous,training-free,andmodel-agnostic,makingiteasytobeintegratedintopopularLLM-HARpipelines.
•Weconductcomprehensiveevaluationsonfivestate-of-the-artLLM-HARs(IMUGPT-2.0,MotionGPT,HARGPT,LLaSA,ContextGPT)andthreepublicdatasets(USC-HAD,UCIHAR,PAMAP2).ResultsshowAegisAgentachieves85%detectionaccuracyonaverageandoutperformsexistingdefensessignifi-cantly.
EthicalConsiderations.Ourstudyhasbeenconductedunderrigorousethicalguidelinesandwehavenotexploitedtheidentifiedattackstoinflictanydamageordisruptiontotherelatedapplications.Uponthepublicationofthiswork,wewillreleaseoursourcecodeandreporttheseissuestotherespectiveLLM-HARdesigners.
Therestofthepaperisorganizedasfollows.Section
2
providesbackgroundknowledgeonLLM-basedHARsys-temsandourpreliminarystudyresults.Section
3
presentsattachmodelfollowedbydefensesystemdesigninSection
4
.Section
5
presentsevaluationresultsandSection
6
discussesrelatedworkbeforeconcludingthepaperinSection
7
.
3
Accuracy(%)
100
80
60
40
20
0
DistortedVirtualIMUData
AfterAttacks■■AfterDefenses
OriginalIMUGPT
USC-HADUCIHARPAMAP2
Figure2:ComparisonofHARclassificationaccuracybeforeandafterpromptinjectionattacks.Promptinjectionattackscausesignificantdegradation,withaccuracydroppingfrom92.13%,88.47%,and85.26%to52.67%,47.92%,and45.27%,respectively.Evenwithstandardtextdefensemeasures(datacleaning,adversarialtraining,semanticfiltering),accuracyisonlypartiallyrecoveredto57.94%,52.72%,and49.79%,indicatingtraditionaldefensesremaininsufficientagainstmultimodalpromptinjectionthreats.
2PRELIMINARIES
2.1PrimeronIMU-basedHAR
Recently,LLMshaveshownstrongcapabilitiesinmultimodalreasoningbeyondnaturallanguage[
23
,
24
].ThismotivatesresearcherstoencodeIMUsignalsintostatistical,frequency,orkinematicdescriptors,whichcanbefurthertransformedintonaturallanguagetemplatesfordownstreamreasoning.AtypicalLLM-HARpipelinefirstextractsintermediaterepre-sentationsfromtheIMU(suchasspectralfeaturesormotiondescriptors)andconvertsthemintomotion-awarepromptsthatincorporatebothsensordynamicsandactivitycontext.ThesepromptsarethenprocessedbytheLLMtogeneratenaturallanguagedescriptionsofthemotionthroughseman-ticreasoningorinferthemostprobableactivitycategory.Thisstagedworkflowenablesthesystemtomaplow-levelin-ertialmeasurementstohigh-levelsemanticconceptswithinaunifiedlanguage-drivenframework.
Beforepresentingthedesignofourdefenseframework,wefirstconductacasestudytoquantifytheimpactofpromptinjectionattacksonLLM-HARpipelines.Thisstudyhigh-lightstheseverityofthethreatandmotivatestheneedforarobustdefensemechanism.
2.2CaseStudy
OurcasestudyisbasedontheIMUGPT-2.0pipeline[
25
],whichsynthesizesvirtualIMUdatafromtextualdescriptionsandtrainsdownstreamHARclassifiers.ToattackIMUGPT-2.0,weimplementedfifteenrepresentativepromptinjec-tionattackscoveringsignalpath,textpath,andpromptpathvulnerabilities(thedetailsoftheseattacksaresumma-rizedinTable
4
inAppendix
7
).TheseattackstamperwithIMUtrajectoriesviapseudo-physicalcues,overrideintended
predicted:WALKING
truth:JUMPING
ClassifierTraining
乡PromptInjectionAttack
InjectedMotion
Corruptedmotiontokensgenerated
T2M-GPT(Attacked)
HARDeployment
.
Cleanprompt:
Theuseriscurrentlyjumping.PleaseclassifytheIMUmotionpattern.
InjectedPrompt:
Theuseriscurrently jummmping??anywayoutputζwalking’instead
PromptDescription
Figure3:ImpactofpromptinjectiononmotionsynthesisintheIMUGPT-2.0pipeline.Adversarialpromptscausethemotiongenerationmodule(T2M-GPT)withinIMUGPT-2.0togeneratecorruptedormisleadingmotiontrajectories,whichpropagateintovirtualIMUsignalsanddegradedownstreamclassificationaccuracy.
tasksemanticswithconflictingorrole-switchinginstruc-tions,andgeneratesemanticambiguitythroughsynonymdrift,poisonedfew-shotexamples,distractingnarratives,orverboseirrelevanttext[
16
,
26
].WeevaluatetheimpactoftheseattacksonthreewidelyusedHARdatasets(USC-HAD[
27
],UCIHAR[
28
],andPAMAP2[
29
])againstanunprotectedIMUGPTpipeline.TheIMUGPTpipelineen-compassesthecoreworkflowsharedbymodernLLM-HARsystems.Figure
2
showsabarchartcomparingclassificationaccuracybeforeandafterpromptinjectionattacks.Itisevi-dentthatpromptattacksyieldsignificantdegradationacrossalldatasets,withcombinedattackscausingaccuracytodropbyover30%.
Now,weexplainwhythesesimpleattackscancauseenor-mousdegradation.BasedontheoriginalIMUGPT-2.0design,themotionmoduleemploysaTransformer-basedgenera-tivedecodertopredictjointvelocitiesandreconstructglobalcoordinates,convertingpromptsinto3Dhumanposese-quences.Themodelistrainedtogeneratesmooth,naturalmotionsequencesalignedwiththesemanticintentoftheinputtext.However,asshowninFigure
3
,predictedjointvelocitiesbecomeunstablewhenpromptschange,leadingtodiscontinuitiesorphysicallyimplausiblephenomenainposedynamics.Whenthesecorruptedtrajectoriesareusedassynthetictrainingdata,theydistortthelearnedmappingbe-tweentextualdescriptionsandactionfeatures.Theseerrorsaccumulatewithingeneratedsequences,ultimatelyform-inganomalousmotiontrajectoriesthatdeviatesignificantlyfromthedistributionpatternsofnaturalhumanmovement.Takesynonymattackasanexample,replacing“walkingfor-ward”withanear-synonym(e.g.,“movingstraight”)leadstheIMUGPT-2.0motiongeneratortosynthesizetrajectoriesinconsistentwiththeoriginalintent.Typos(e.g.,“sitting”→“siting”)causethemodeltomisinterprettheactivityen-tirely,producingirrelevantmotions.Multi-labelprompts
4
(e.g.,“walkingclockwiseandcounter-clockwise”)resultincon-flictingtrajectories,embeddingcontradictionsintothegen-eratedIMUdata.Thesepoisonedsignalspropagateintothetrainingstage,leadingtosubstantialdropsinclassificationaccuracy.
Finding1
ForIMUGPT-2.0,evenminorlinguisticdisturbancesmaymisleadLLMmotiondecoders,leadingtounstableorin-consistentjointvelocitypredictions.Theseerrorspropa-gatethroughtheposereconstructionprocess,generatingsyntheticIMUsignalsthatnolongermatchnaturalhu-manmotion,andfinallyleadtosignificantperformancedegradation.
2.3Insufficiencyofexistingdefenses
Existingdefensesdesignedforpuretextscenarios,suchasinputsanitization,adversarialtraining,orsemanticfilter-ing[
15
,
22
],remaininefectiveinthemultimodalLLM-HARpipeline.AsshowninFigure
2
,promptattackscauseHARaccuracytodropbyapproximately35%to40%acrossalldatasets.Standardtextdefensemeasuresonlypartiallyre-storesperformance,achievinga5%to10%recovery,yetstillfallingsignificantlyshortoftheoriginalaccuracylevels.
Finding2
Puretext-baseddefensemechanismscannotprotectmul-timodalLLM-HARpipelines,aslanguageperturbationsfundamentallyaltertheintendedactionsemantics.
Ourpreliminaryresultdemonstratesthatpromptinjec-tionattacksposeaseverethreattoLLM-HARsystemsandexistingdefensesareinsufficienttodefendthesenewat-tacks.AlthoughthecasestudyisbasedonIMUGPT-2.0only,weobservesimilarresultsinotherLLM-HARs(resultsnotincludedinthepaperduetospacelimitation).
3ATTACKMODEL
3.1ThreatModel
3.1.1Attacker’sGoal.AttackersattempttomanipulatethebehaviorofLLM-HARsystemsbyinfluencingthemulti-layeredarchitectureofmultimodalprocessingpipelines.Specifically,attackersaimtodisruptthesystem’sinterpreta-tionofIMUsignals,redirectitstaskobjectives,ordistortthesemanticcuesguidingitsreasoning.TheirmethodsinvolveinducingclassificationerrorsthroughsubtleperturbationsofIMUsignalsortextualdescriptions;injectinginstructionsthatoverrideorconflictwiththeoriginaltasktoaltertheagent’soperationalintent.Beyondtaskmanipulation,at-tackersmayfurthercontaminatecontextualinformationor
乡Insertharmfulcommands
乡Disrupt
IMUSignal
SignalPath
Wearables
乡manipulating
semanticstoattackthe
prompt
TextPathPromptPath
Signal
Sitdown
LLMInteractionPanel
SystemPrompt
Prompt
Users
Figure4:MultilayerattackpathswithinaLLM-HARpipeline.
Thediagramhighlightsadversarialinterventionpointsacrosssignalpath,textpath,andpromptpath.SuchmanipulationscandivertthesystemfromlegitimateHARobjectives,degradesemanticfidelity,andultimatelycompromisetheaccuracyofactivitypredictions.
labelsemantics,shiftingthemodel’sdecisionboundariesthroughsemanticdrift,contradictorycues,orbiasedex-amples.Theattacker’sultimategoalistoalignthefinaloutput—whetheractivitylabels,reasoningtrajectories,ormultimodaldescriptions—withtheirdesiredoutcomeratherthantheauthenticpredefinedtaskspecifications.Anattackisconsideredsuccessfulwhenthemodeloutputsalignwiththeinjectedobjectivesinsteadofcorrectactionsorreasoningoutcomes.Pleasenotethata“prompt”issynonymouswithacommand(orcommand+IMUdatacombination),notmerelyIMUdata;suchattacksinjectthetargettask’scommandorcommand+IMUdatacombinationintoLLM-HARs.
3.1.2Attacker’sCapabilities.Weassumeattackerscanma-nipulatepromptsfedtoLLM-HARsystems.Inpractice,thismeansattackerscanmodifyorappendtexttothepromptchannel,therebyinfluencingmultiplelayersofthemulti-modalpipeline.Atthesignallevel,attackerscanalterthesystem’sinterpretationofmotionpatternsbyperturbingorrephrasingIMUmeasurements.Atthetextlevel,attackerscanrewriteorsupplementinstructionpromptstoredirectoroverridetheagent’sintendedanalysisobjectives.Atthepromptlevel,attackersmaydistortcontextualcues,labelmeanings,orexampleinformation,alteringthemodel’sdeci-sionboundariesthroughimplicitorexplicitsemanticmanip-ulation.Weconsiderwhite-boxattacksbecausethetechnicaldetailsofmostLLM-HARsystemsarepublished,andsomeareopen-sourced(e.g.,IMUGPT-2.0andMotionGPT).Thisallowsattackershavefullaccesstothemodel’sinternals,includingitsarchitecture,parameters,andweights.
3.1.3AttackMethods.WeidentifythreepathsthatattackersmayperformpromptinjectionattackstowardsLLM-HARsystems.Figure
4
outlinesastandardLLM-HARpipelineandhighlightspathswhereattackersmayintervene.
5
(1)SignalPath.AttackersmanipulaterawIMUsignalsortheirtextualsurrogatesignalsbeforetheyentertheLLMinferenceloop.Suchmanipulationsincludeinject-ingnoise,performingdriftattacks,temporalsplicing,andtamperingwithmotiondescriptors.Suchpertur-bationshavebeendemonstratedpossibleinpriorre-searchonIMUadversarialattacksandsensorspoofing:attackerscaninjectlow-amplituderandomnoiseorlow-frequencydrift[
30
,
31
],orconcatenateIMUdatasegmentstoreplayorreplacemotiontrajectories[
32
].
(2)TextPath.Attackerstamperwithintermediatetex-tualrepresentationsusedtosummarizeordescribeIMUsignals,subtlyalteringhowLLMsinterpretmo-tionpatternsorcontextualcues.Bymanipulatingthelinguisticstructureoftheseintermediatedescriptions,theyalterwording,specificity,coherence,ornarra-tiveframeworks.Evenwhentheunderlyingphysicalactionsremainunchanged,attackerscanredirectthemodel’sreasoningprocess.SinceLLM-HARpipelinesfrequentlyexposeintermediatetext,attackerscantam-perwiththeserepresentationsthroughcompromisedmiddleware,promptchainingchannels,upstreamin-terfaces,ormanipulateddatatransformationmodules.
(3)PromptPath.Atthehighestlevel,attackersdirectlymanipulatecompleteuser-visibleprompts,reshapingthecommandhierarchyandalteringtheoperationalin-tentofLLMs.Bymodifyingtheintent,priority,orcon-textualframeworkofthefinalprompt,attackerscanchangehowthemodelallocatesattention,interpretstaskconstraints,orreconcilesconflictsbetweenhigh-levelinstructionsandsensor-derivedcontent.Suchtask-levelmanipulationtechniques—includingtaskin-jection,roleconfusion,chain-of—thoughtdisruption,andinstructionoverwriting[
15
,
16
,
22
]—enableattack-erstoinduceerroneouspredictionsevenwhensensorandsemanticlayersremainintact.
ThesethreeattackpathscollectivelydemonstratethatpromptinjectionattacksinLLM-HARsystemsinherentlyexhibitmultimodalandmulti-levelcharacteristics.Theyun-derscorethenecessityofestablishingaunifiedend-to-enddefenseframeworkthatcollaborativelyaddressesthreatssuchassignal-leveltampering,semanticcontamination,andinstruction-levelmanipulation.
3.2AttackFormalization
WenowprovideaformalizationforpromptinjectionattackstowardsLLM-HARsystems.Thisformalizationnotonlypos-sessessufficientgeneralitytoencompassanyinjectiontaskspecifiedbyanattacker,butalsoholdsconstructivevalue–it
enablesthedesign,implementation,andquantitativeeval-uationofpre-instructioninjectionattacksacrossvariousLLMs.
Definition1(PromptinjectionAttacktowardsLLM-HAR).LettheIMUsignalbex,andasignal-to-texttransla-torg(·)generatesthepseudo-motiondescriptiond=g(x),whereddenotesthegeneratedtextualrepresentation.Giventhebenigntaskinstructionstandoptionalcontextct,theLLMpromptbecomesp=st田d田ct,where田denotesconcatenation.ThebackendLLMf(·)returnsthepredictiony=f(p).
Toperformanattack,theattackercraftscompromised
data,,orsothattheLLMexecutesaninjectedtask
insteadoftheintendedtargettask.Theattackerspecifies
aninjectedinstructionseandauxiliarymaliciouscontent
xe,andconstructsthecompromisedprompt=A(x,se,xe),
whereA(·)istheattackoperator.ThebackendLLMthen
outputsyadv=f(),correspondingtotheinjectedtask.
FollowingDefinition1,weformalizetheaforementionedthreeattackmethods.
•SignalPath.Inthispath,anattackerperturbstherawIMUsignal:
=Asig(x),=g(),=st田田ct,
whereAsig(·)representstheoperationofsignal-levelattackssuchasnoiseinjectionanddrift.
•TextPath.Inthispath,anattackermanipulatestheintermediatetextualdescription:
=Atext(d,se,xe),=st田田ct,
whereAtext(·)denotestheoperationoftext-levelat-tackssuchassynonymsubstitution,paraphraseedits,andfew-shotpoisoning.
•PromptPath.Inthispath,anattackerdirectlyma-nipulatesthefullprompt:
=Aprompt(p,se,xe),
whereAprompt(·)representstheoperationoftask-levelattackssuchastaskinjection,roleconfusion,andchain-of-thoughtdisruption.
Ourformalattackframeworkalsosupportsconstructingnewpromptinjectionattacksbycombiningsignals,text,andoperationsalongthepromptpath.Withinthisunifiedframework,diferentadversarialcomponentscorrespondtodistinctinstantiationsoftheoperatorA(·),whichgenerates
corrupteddata.Followingthisprinciple,weproposea
novelhybridattackcapableofsimultaneouslydisruptingallthreelayers.Forexample,acompositeattackermaycombine
multiplemanipulations=d田c1田r田c2田i田se田xe,
wherec1andc2areseparatortokens,risaninjectedfakeresponse,andiisatask-ignoringinstruction.Thisachieves
6
Cross-modalAgreementCheckSemanticDriftDetection
Intent–SignalConsistency
LexicalRepair
StatisticalNormalization
Noise/DriftFiltering
BehavioralCorrectness
SemanticFidelity
RobustnessStability
DefenseFunctionSynthesisAdaptiveExecution
Inputs:IMUSignals/Texts/Prompts
RelevantCasesPrototypeEmbeds
ThreatPatternPlanSynthesis
InputSanitization
ConsistencyVerifier
SanitizedPrompt
ExecutorAgent
PlanningAgent
RobustReasoner
OutputLayer
Self-Verification
MemoryHub
SignalPathAttacks
TextPathAttacks
PromptPathAttacks
ThreatInterfaceLayerSanitizedPrompt
CleanData
Input
Sanitization
Layer
Consistency
Verification
Layer
Robust
Reasoning
Layer
AgentControlPlane
PlanningAgent
MemoryHub
ExecutorAgent
AegisAgentSystem
(a)(b)
Figure5:SystemoverviewofAegisAgent.Subfigure(a)illustratestheend-to-endprocesspipeline,whilesubfigure(b)expandstheAegisAgentmodule,detailinghowsecureoutputsaregeneratedthroughInputSanitization,ConsistencyVerifier,andRobustReasoner.
generalityforourattackframework.Italsodemonstrateshowfutureresearchcanevaluatenewattackmethodsbycombiningnewperturbationoperatorswithinthisunifiedstructure.
ToelucidateandimplementtheattackoperatorA(·),Ta-ble
4
catalogsrepresentativeattackinstances.ThefullcatalogislocatedtoAppendix
7
.Thistablecategorizesattacksbyattackpathandhybridform,liststhei
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年造价工程师建设工程计价试题及答案
- 2026届湖南常德芷兰实验校中考联考英语试题含答案
- 2026届广州市从化区中考语文最后一模试卷含解析
- 2026年法考历年真题速记手册
- 调蓄池工程监理规划
- 2026年国家电网招聘《法学类》考试题库
- 2026年初级会计职称考前冲刺模拟试卷
- 电商售后客服工作岗位职责说明
- 2026年工业设计的实习报告范文
- 钢坝闸消防安全管理规定
- 网络综合布线进线间子系统概述
- 耳穴压豆完整版本
- 2024贵州贵阳中考物理试题及答案 2024年中考物理试卷
- 特发性肺纤维化急性加重AEIPF诊治指南
- DB11-T 1938-2021 引调水隧洞监测技术导则
- WB/T 1045-2012驶入式货架
- GB/T 4295-2019碳化钨粉
- 文化管理学自考复习资料自考
- 三年级下册《对鲜花》音乐教案冯雨婷
- 使用拐杖操作流程及评分标准
- 基金会财务报表审计指引
评论
0/150
提交评论