AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体

上传人：1*** IP属地：山西上传时间：2026-05-05 格式：DOCX 页数：32 大小：280.37KB 积分：19.9 举报 版权申诉

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体_第2页

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体_第3页

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体_第4页

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体_第5页

已阅读5页，还剩27页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

AegisAgent:AnAutonomousDefenseAgentAgainstPromptInjectionAttacksinLLM-HARs

YihanWang1,HuanqiYang1,ShantanuPal2,WeitaoXu1

1CityUniversityofHongKong,HongKongSAR,China

2DeakinUniversity,Australia

ABSTRACT

arXiv:2512.20986v1cs.CR24Dec2025

[]

TheintegrationofLargeLanguageModels(LLMs)intowear-ablesensingiscreatinganewclassofmobileapplicationscapableofnuancedhumanactivityunderstanding.However,thereliabilityofthesesystemsiscriticallyunderminedbytheirvulnerabilitytopromptinjectionattacks,whereat-tackersdeliberatelyinputdeceptiveinstructionsintoLLMs.Traditionaldefenses,basedonstaticfiltersandrigidrules,areinsufficienttoaddressthesemanticcomplexityofthesenewattacks.Wearguethataparadigmshiftisneeded—frompassivefilteringtoactiveprotectionandautonomousreason-ing.WeintroduceAegisAgent,anautonomousagentsystemdesignedtoensurethesecurityofLLM-drivenHARsystems.Insteadofmerelyblockingthreats,AegisAgentfunctionsasacognitiveguardian.Itautonomouslyperceivespotentialsemanticinconsistencies,reasonsabouttheuser’strueintentbyconsultingadynamicmemoryofpastinteractions,andactsbygeneratingandexecutingamulti-stepverificationandrepairplan.WeimplementAegisAgentasalightweight,full-stackprototypeandconductasystematicevaluationon15commonattackswithfivestate-of-the-artLLM-basedHARsystemsonthreepublicdatasets.Resultsshowitre-ducesattacksuccessrateby30%onaveragewhileincurringonly78.6msoflatencyoverheadonaGPUworkstation.OurworkmakesthefirststeptowardsbuildingsecureandtrustworthyLLM-drivenHARsystems.

1INTRODUCTION

LargeLanguageModels(LLMs)[

–

]haverecentlybeenin-tegratedintoInertialMeasurementUnit(IMU)basedHumanActivityRecognition(HAR)systems,enablingnewcapabil-itiesbeyondtraditionalsensor-basedclassification.Repre-sentativeefortssuchasIMUGPT-2.0[

],MotionGPT[

],andHAR-GPT[

]highlightthistrend:IMUGPT-2.0gener-atesvirtualIMUdatafromtextualdescriptions,MotionGPTperformsbidirectionaltranslationbetweentextandmotionsequences,andHAR-GPTdemonstrateszero-shotHARfromrawIMUdata.Buildinguponthisfoundation,recentlyemerg-ingframeworkssuchasLLASA[

]andContextGPT[

]havefurtheradvancedthetechnologicalfrontierbyintegrat-ingmultimodalperceptionfusion,activity-levelsemanticabstraction,andcontext-awarereasoningtailoredforhuman

HazardousResult

LLM-basedHARModel

Theuseriswalkingnormally.

Pleaseclassifytheactivity.

Userisunstableandabouttofall.

InjectedManipulation

CorrectClassification

WearableSensor

UserPrompt

Attacker

Figure1:OverviewofpromptinjectionthreatsinLLM-basedHARsystems.Motionsignalsgeneratedbywearablesensorsareconvertedintotextdescriptionsandintegratedwithuserinstruc-tionprompts.Attackersinjectadversarialmanipulationsinthesensor-to-promptpathway,alteringthegenerateddescriptionsandtherebymisleadingdownstreamLLM-basedHARmodels.Thiscanleadtohazardousresultinsafety-criticalapplications,suchaschangingafallresulttobewalkingnormally.

tasks[

–

].Collectively,theseframeworkstransformHARfromdiscreteclassificationtosemanticallyrichunderstand-ing.

However,despitepromisingprospects,integratingLLMsintoHARintroducesnewattacksurfaces.Theopen-endedgenerativecapabilitiesofLLMsmakethemvulnerabletopromptinjectionattacks[

–

],whereattackersmanipu-latemodeloutputsbyinjectingmaliciousinstructionsintoinputs.InLLM-basedHARsystems(termedLLM-HARintherestofthepaper),attackerscantamperwithmotiondescrip-tions,distortvirtualsensordatatocontaminatetrainingsets,orinduceinferenceerrors[

].AsshowninFigure

,adversarialmanipulationoccursintheconversionpathfromsensorstoprompts,wheretheattackerinjectscorruptedin-formationduringtheIMU–textgenerationprocess[

].ThismanipulationcanalterthesemanticdescriptionsusedbytheLLM,disruptcross-modalconsistency,andpotentiallyleadtodangerousmisclassificationsinsafety-criticalscenarios.Thisnovelattacksurface,absentintraditionalHARsystems,posesseverethreatstosafety-criticalapplications.Unliketra-ditionalattacksonIMUsignals,promptinjectionrequiresnophysicalcontactorspecializedhardwareandcanbeinitiatedremotelyvianaturallanguageinterfaces.Thisdrasticallylowerstheattackthresholdwhileamplifyingsystem-level

impacts:Oncepromptsaretamperedwith,entiredatagen-erationordecisionchainsmaybemanipulated,triggeringwidespreaddatasetcontamination,activitymisclassification,andcascadingfailuresindownstreamapplications.ThesecharacteristicsmakepromptinjectionaparticularlysevereandurgentsecuritythreatinLLM-HARsystems.

Existingresearchdoesnotsufficientlyaddressthisemerg-ingthreat.TraditionalHARsecuritystudiesfocusonadver-sarialperturbationsonsensorsignals[

],whiledefensesforLLMsmainlytargetjailbreak-styleattacksintext-onlysettings[

].Neitherisefectiveforthemultimodaltext-to-motion/IMUgenerationpipelines(e.g.,IMUGPT-2.0,Mo-tionGPT),wheremanipulationspropagateacrossmodalities.Consequently,LLM–HARsystemsremainlargelyvulnerable,callingfornewdefensemechanisms.

1.1ChallengesandContributions

WeidentifyseveralkeychallengesthatmustbeaddressedtosecureLLM–HARpipelinesagainstpromptinjectionattacks.

Challenge1:Novelattacksurface.UnlikeconventionalHARsystems,LLM–HARpipelinescandirectlygeneratesyntheticsensorormotiondatafromtextualprompts.Thiscreatesanunprecedentedattacksurfacewhereadversariesmanipulatelinguisticinputstosynthesizemisleadingse-quences.

Challenge2:Cross-modalpropagation.Adversarialmanipulationsineithertextorsensorchannelscanpropa-gateacrossmodalities,systematicallycorruptingsemanticinterpretationsofactivities.Existingtext-onlydefensesfailtocapturesuchmultimodalthreats.Weobservethatinjec-tionattackspropagatethroughthreetightlycoupledlayers:thesignallayer,thetextlayer,andthepromptlayer.Thesemulti-layeredattackscanbecombinedsynergistically,signif-icantlyenhancingattacksuccessrateswhilecircumventingsingle-modaldefences.

Challenge3:Highattackdiversityandreal-worldcomposability.Inpracticaldeployment,LLM-HARsystemsfaceabroadandhighlycomposableattackmethodswheread-versariescansimultaneouslymanipulatelanguageprompts,disruptintermediateinferencesteps,andinducesensor-levelinconsistencies.Unlikeisolatedsingle-channelattacks,theseperturbationsnaturallycoexistinreal-worldscenarios—suchasuser-generatednoisylanguageinputsalongsideacciden-talmotionanomaliesorcontextualambiguities.Suchhybridperturbationsconstitutelatentcompoundattacks,creatingmutuallyreinforcingefectsbetweensemanticdrift,toolhi-jacking,andIMUsignalinconsistencies.Thetendencyforattackstospontaneouslycombineintherealworldsignifi-cantlyincreasesattacksuccessratesandcomplicatesdefensedesign.

Toaddressthesechallenges,weproposeAegisAgent,thefirstautomateddefenseagentforLLM-HARsthatcanper-formpromptattackdetection,correctionandrecovery.Aegis-Agenteliminatesmanualrulewritingandmodel-specifictuning,enablingfullyautomatedoperation.Itstandardizesprompts,verifiescross-modalsemanticconsistency[

],anddetectsanomalousreasoningorsensoroutputswithintheagentloop.ThisautomationenablesAegisAgenttoadapttodiverseLLMs,datasets,andagentconfigurationswithouttask-specificretraining.Furthermore,AegisAgentprovidesaunifiedattackanalysispipelineandintroduceshazard-basedevaluationmetricstoquantifyresidualrisksacrossmod-elsundervariedadversarialprompts.Extensiveexperimen-talresultsdemonstratethatAegisAgentefectivelydefendsagainstpromptattackswhilepreservingmodelfunctionality.

Tosummarize,thispapermakesthefollowingcontributions:

•Wepresentthefirstsystematicstudyofinjectionat-tacksinLLM-HAR,revealingsignificantdiferencesfromfundamentalvulnerabilitiesfoundintraditionalHARpipelinesandplain-textLLMdeployments.

•Weformalizeandimplementfifteenrepresentativepromptinjectionattacks,spanningsignalpath,textpath,andpromptpath,andquantifytheirimpactacrossmultipleHARdatasets.

•WeproposeAegisAgent,adefenseframeworkthatmitigatespromptinjectionthreatsviainputsanitiza-tion,consistencyverification,androbustreasoning.AegisAgentisflexible,autonomous,training-free,andmodel-agnostic,makingiteasytobeintegratedintopopularLLM-HARpipelines.

•Weconductcomprehensiveevaluationsonfivestate-of-the-artLLM-HARs(IMUGPT-2.0,MotionGPT,HARGPT,LLaSA,ContextGPT)andthreepublicdatasets(USC-HAD,UCIHAR,PAMAP2).ResultsshowAegisAgentachieves85%detectionaccuracyonaverageandoutperformsexistingdefensessignifi-cantly.

EthicalConsiderations.Ourstudyhasbeenconductedunderrigorousethicalguidelinesandwehavenotexploitedtheidentifiedattackstoinflictanydamageordisruptiontotherelatedapplications.Uponthepublicationofthiswork,wewillreleaseoursourcecodeandreporttheseissuestotherespectiveLLM-HARdesigners.

Therestofthepaperisorganizedasfollows.Section

providesbackgroundknowledgeonLLM-basedHARsys-temsandourpreliminarystudyresults.Section

presentsattachmodelfollowedbydefensesystemdesigninSection

.Section

presentsevaluationresultsandSection

discussesrelatedworkbeforeconcludingthepaperinSection

Accuracy(%)

100

DistortedVirtualIMUData

AfterAttacks■■AfterDefenses

OriginalIMUGPT

USC-HADUCIHARPAMAP2

Figure2:ComparisonofHARclassificationaccuracybeforeandafterpromptinjectionattacks.Promptinjectionattackscausesignificantdegradation,withaccuracydroppingfrom92.13%,88.47%,and85.26%to52.67%,47.92%,and45.27%,respectively.Evenwithstandardtextdefensemeasures(datacleaning,adversarialtraining,semanticfiltering),accuracyisonlypartiallyrecoveredto57.94%,52.72%,and49.79%,indicatingtraditionaldefensesremaininsufficientagainstmultimodalpromptinjectionthreats.

2PRELIMINARIES

2.1PrimeronIMU-basedHAR

Recently,LLMshaveshownstrongcapabilitiesinmultimodalreasoningbeyondnaturallanguage[

].ThismotivatesresearcherstoencodeIMUsignalsintostatistical,frequency,orkinematicdescriptors,whichcanbefurthertransformedintonaturallanguagetemplatesfordownstreamreasoning.AtypicalLLM-HARpipelinefirstextractsintermediaterepre-sentationsfromtheIMU(suchasspectralfeaturesormotiondescriptors)andconvertsthemintomotion-awarepromptsthatincorporatebothsensordynamicsandactivitycontext.ThesepromptsarethenprocessedbytheLLMtogeneratenaturallanguagedescriptionsofthemotionthroughseman-ticreasoningorinferthemostprobableactivitycategory.Thisstagedworkflowenablesthesystemtomaplow-levelin-ertialmeasurementstohigh-levelsemanticconceptswithinaunifiedlanguage-drivenframework.

Beforepresentingthedesignofourdefenseframework,wefirstconductacasestudytoquantifytheimpactofpromptinjectionattacksonLLM-HARpipelines.Thisstudyhigh-lightstheseverityofthethreatandmotivatestheneedforarobustdefensemechanism.

2.2CaseStudy

OurcasestudyisbasedontheIMUGPT-2.0pipeline[

],whichsynthesizesvirtualIMUdatafromtextualdescriptionsandtrainsdownstreamHARclassifiers.ToattackIMUGPT-2.0,weimplementedfifteenrepresentativepromptinjec-tionattackscoveringsignalpath,textpath,andpromptpathvulnerabilities(thedetailsoftheseattacksaresumma-rizedinTable

inAppendix

).TheseattackstamperwithIMUtrajectoriesviapseudo-physicalcues,overrideintended

predicted:WALKING

truth:JUMPING

ClassiﬁerTraining

乡PromptInjectionAttack

InjectedMotion

Corruptedmotiontokensgenerated

T2M-GPT(Attacked)

HARDeployment

Cleanprompt:

Theuseriscurrentlyjumping.PleaseclassifytheIMUmotionpattern.

InjectedPrompt:

Theuseriscurrently jummmping??anywayoutputζwalking’instead

PromptDescription

Figure3:ImpactofpromptinjectiononmotionsynthesisintheIMUGPT-2.0pipeline.Adversarialpromptscausethemotiongenerationmodule(T2M-GPT)withinIMUGPT-2.0togeneratecorruptedormisleadingmotiontrajectories,whichpropagateintovirtualIMUsignalsanddegradedownstreamclassificationaccuracy.

tasksemanticswithconflictingorrole-switchinginstruc-tions,andgeneratesemanticambiguitythroughsynonymdrift,poisonedfew-shotexamples,distractingnarratives,orverboseirrelevanttext[

].WeevaluatetheimpactoftheseattacksonthreewidelyusedHARdatasets(USC-HAD[

],UCIHAR[

],andPAMAP2[

])againstanunprotectedIMUGPTpipeline.TheIMUGPTpipelineen-compassesthecoreworkflowsharedbymodernLLM-HARsystems.Figure

showsabarchartcomparingclassificationaccuracybeforeandafterpromptinjectionattacks.Itisevi-dentthatpromptattacksyieldsignificantdegradationacrossalldatasets,withcombinedattackscausingaccuracytodropbyover30%.

Now,weexplainwhythesesimpleattackscancauseenor-mousdegradation.BasedontheoriginalIMUGPT-2.0design,themotionmoduleemploysaTransformer-basedgenera-tivedecodertopredictjointvelocitiesandreconstructglobalcoordinates,convertingpromptsinto3Dhumanposese-quences.Themodelistrainedtogeneratesmooth,naturalmotionsequencesalignedwiththesemanticintentoftheinputtext.However,asshowninFigure

,predictedjointvelocitiesbecomeunstablewhenpromptschange,leadingtodiscontinuitiesorphysicallyimplausiblephenomenainposedynamics.Whenthesecorruptedtrajectoriesareusedassynthetictrainingdata,theydistortthelearnedmappingbe-tweentextualdescriptionsandactionfeatures.Theseerrorsaccumulatewithingeneratedsequences,ultimatelyform-inganomalousmotiontrajectoriesthatdeviatesignificantlyfromthedistributionpatternsofnaturalhumanmovement.Takesynonymattackasanexample,replacing“walkingfor-ward”withanear-synonym(e.g.,“movingstraight”)leadstheIMUGPT-2.0motiongeneratortosynthesizetrajectoriesinconsistentwiththeoriginalintent.Typos(e.g.,“sitting”→“siting”)causethemodeltomisinterprettheactivityen-tirely,producingirrelevantmotions.Multi-labelprompts

(e.g.,“walkingclockwiseandcounter-clockwise”)resultincon-flictingtrajectories,embeddingcontradictionsintothegen-eratedIMUdata.Thesepoisonedsignalspropagateintothetrainingstage,leadingtosubstantialdropsinclassificationaccuracy.

Finding1

ForIMUGPT-2.0,evenminorlinguisticdisturbancesmaymisleadLLMmotiondecoders,leadingtounstableorin-consistentjointvelocitypredictions.Theseerrorspropa-gatethroughtheposereconstructionprocess,generatingsyntheticIMUsignalsthatnolongermatchnaturalhu-manmotion,andfinallyleadtosignificantperformancedegradation.

2.3Insufficiencyofexistingdefenses

Existingdefensesdesignedforpuretextscenarios,suchasinputsanitization,adversarialtraining,orsemanticfilter-ing[

],remaininefectiveinthemultimodalLLM-HARpipeline.AsshowninFigure

,promptattackscauseHARaccuracytodropbyapproximately35%to40%acrossalldatasets.Standardtextdefensemeasuresonlypartiallyre-storesperformance,achievinga5%to10%recovery,yetstillfallingsignificantlyshortoftheoriginalaccuracylevels.

Finding2

Puretext-baseddefensemechanismscannotprotectmul-timodalLLM-HARpipelines,aslanguageperturbationsfundamentallyaltertheintendedactionsemantics.

Ourpreliminaryresultdemonstratesthatpromptinjec-tionattacksposeaseverethreattoLLM-HARsystemsandexistingdefensesareinsufficienttodefendthesenewat-tacks.AlthoughthecasestudyisbasedonIMUGPT-2.0only,weobservesimilarresultsinotherLLM-HARs(resultsnotincludedinthepaperduetospacelimitation).

3ATTACKMODEL

3.1ThreatModel

3.1.1Attacker’sGoal.AttackersattempttomanipulatethebehaviorofLLM-HARsystemsbyinfluencingthemulti-layeredarchitectureofmultimodalprocessingpipelines.Specifically,attackersaimtodisruptthesystem’sinterpreta-tionofIMUsignals,redirectitstaskobjectives,ordistortthesemanticcuesguidingitsreasoning.TheirmethodsinvolveinducingclassificationerrorsthroughsubtleperturbationsofIMUsignalsortextualdescriptions;injectinginstructionsthatoverrideorconflictwiththeoriginaltasktoaltertheagent’soperationalintent.Beyondtaskmanipulation,at-tackersmayfurthercontaminatecontextualinformationor

乡Insertharmfulcommands

乡Disrupt

IMUSignal

SignalPath

Wearables

乡manipulating

semanticstoattackthe

prompt

TextPathPromptPath

Signal

Sitdown

LLMInteractionPanel

SystemPrompt

Prompt

Users

Figure4:MultilayerattackpathswithinaLLM-HARpipeline.

Thediagramhighlightsadversarialinterventionpointsacrosssignalpath,textpath,andpromptpath.SuchmanipulationscandivertthesystemfromlegitimateHARobjectives,degradesemanticfidelity,andultimatelycompromisetheaccuracyofactivitypredictions.

labelsemantics,shiftingthemodel’sdecisionboundariesthroughsemanticdrift,contradictorycues,orbiasedex-amples.Theattacker’sultimategoalistoalignthefinaloutput—whetheractivitylabels,reasoningtrajectories,ormultimodaldescriptions—withtheirdesiredoutcomeratherthantheauthenticpredefinedtaskspecifications.Anattackisconsideredsuccessfulwhenthemodeloutputsalignwiththeinjectedobjectivesinsteadofcorrectactionsorreasoningoutcomes.Pleasenotethata“prompt”issynonymouswithacommand(orcommand+IMUdatacombination),notmerelyIMUdata;suchattacksinjectthetargettask’scommandorcommand+IMUdatacombinationintoLLM-HARs.

3.1.2Attacker’sCapabilities.Weassumeattackerscanma-nipulatepromptsfedtoLLM-HARsystems.Inpractice,thismeansattackerscanmodifyorappendtexttothepromptchannel,therebyinfluencingmultiplelayersofthemulti-modalpipeline.Atthesignallevel,attackerscanalterthesystem’sinterpretationofmotionpatternsbyperturbingorrephrasingIMUmeasurements.Atthetextlevel,attackerscanrewriteorsupplementinstructionpromptstoredirectoroverridetheagent’sintendedanalysisobjectives.Atthepromptlevel,attackersmaydistortcontextualcues,labelmeanings,orexampleinformation,alteringthemodel’sdeci-sionboundariesthroughimplicitorexplicitsemanticmanip-ulation.Weconsiderwhite-boxattacksbecausethetechnicaldetailsofmostLLM-HARsystemsarepublished,andsomeareopen-sourced(e.g.,IMUGPT-2.0andMotionGPT).Thisallowsattackershavefullaccesstothemodel’sinternals,includingitsarchitecture,parameters,andweights.

3.1.3AttackMethods.WeidentifythreepathsthatattackersmayperformpromptinjectionattackstowardsLLM-HARsystems.Figure

outlinesastandardLLM-HARpipelineandhighlightspathswhereattackersmayintervene.

(1)SignalPath.AttackersmanipulaterawIMUsignalsortheirtextualsurrogatesignalsbeforetheyentertheLLMinferenceloop.Suchmanipulationsincludeinject-ingnoise,performingdriftattacks,temporalsplicing,andtamperingwithmotiondescriptors.Suchpertur-bationshavebeendemonstratedpossibleinpriorre-searchonIMUadversarialattacksandsensorspoofing:attackerscaninjectlow-amplituderandomnoiseorlow-frequencydrift[

],orconcatenateIMUdatasegmentstoreplayorreplacemotiontrajectories[

(2)TextPath.Attackerstamperwithintermediatetex-tualrepresentationsusedtosummarizeordescribeIMUsignals,subtlyalteringhowLLMsinterpretmo-tionpatternsorcontextualcues.Bymanipulatingthelinguisticstructureoftheseintermediatedescriptions,theyalterwording,specificity,coherence,ornarra-tiveframeworks.Evenwhentheunderlyingphysicalactionsremainunchanged,attackerscanredirectthemodel’sreasoningprocess.SinceLLM-HARpipelinesfrequentlyexposeintermediatetext,attackerscantam-perwiththeserepresentationsthroughcompromisedmiddleware,promptchainingchannels,upstreamin-terfaces,ormanipulateddatatransformationmodules.

(3)PromptPath.Atthehighestlevel,attackersdirectlymanipulatecompleteuser-visibleprompts,reshapingthecommandhierarchyandalteringtheoperationalin-tentofLLMs.Bymodifyingtheintent,priority,orcon-textualframeworkofthefinalprompt,attackerscanchangehowthemodelallocatesattention,interpretstaskconstraints,orreconcilesconflictsbetweenhigh-levelinstructionsandsensor-derivedcontent.Suchtask-levelmanipulationtechniques—includingtaskin-jection,roleconfusion,chain-of—thoughtdisruption,andinstructionoverwriting[

]—enableattack-erstoinduceerroneouspredictionsevenwhensensorandsemanticlayersremainintact.

ThesethreeattackpathscollectivelydemonstratethatpromptinjectionattacksinLLM-HARsystemsinherentlyexhibitmultimodalandmulti-levelcharacteristics.Theyun-derscorethenecessityofestablishingaunifiedend-to-enddefenseframeworkthatcollaborativelyaddressesthreatssuchassignal-leveltampering,semanticcontamination,andinstruction-levelmanipulation.

3.2AttackFormalization

WenowprovideaformalizationforpromptinjectionattackstowardsLLM-HARsystems.Thisformalizationnotonlypos-sessessufficientgeneralitytoencompassanyinjectiontaskspecifiedbyanattacker,butalsoholdsconstructivevalue–it

enablesthedesign,implementation,andquantitativeeval-uationofpre-instructioninjectionattacksacrossvariousLLMs.

Definition1(PromptinjectionAttacktowardsLLM-HAR).LettheIMUsignalbex,andasignal-to-texttransla-torg(·)generatesthepseudo-motiondescriptiond=g(x),whereddenotesthegeneratedtextualrepresentation.Giventhebenigntaskinstructionstandoptionalcontextct,theLLMpromptbecomesp=st田d田ct,where田denotesconcatenation.ThebackendLLMf(·)returnsthepredictiony=f(p).

Toperformanattack,theattackercraftscompromised

data,,orsothattheLLMexecutesaninjectedtask

insteadoftheintendedtargettask.Theattackerspecifies

aninjectedinstructionseandauxiliarymaliciouscontent

xe,andconstructsthecompromisedprompt=A(x,se,xe),

whereA(·)istheattackoperator.ThebackendLLMthen

outputsyadv=f(),correspondingtotheinjectedtask.

FollowingDefinition1,weformalizetheaforementionedthreeattackmethods.

•SignalPath.Inthispath,anattackerperturbstherawIMUsignal:

=Asig(x),=g(),=st田田ct,

whereAsig(·)representstheoperationofsignal-levelattackssuchasnoiseinjectionanddrift.

•TextPath.Inthispath,anattackermanipulatestheintermediatetextualdescription:

=Atext(d,se,xe),=st田田ct,

whereAtext(·)denotestheoperationoftext-levelat-tackssuchassynonymsubstitution,paraphraseedits,andfew-shotpoisoning.

•PromptPath.Inthispath,anattackerdirectlyma-nipulatesthefullprompt:

=Aprompt(p,se,xe),

whereAprompt(·)representstheoperationoftask-levelattackssuchastaskinjection,roleconfusion,andchain-of-thoughtdisruption.

Ourformalattackframeworkalsosupportsconstructingnewpromptinjectionattacksbycombiningsignals,text,andoperationsalongthepromptpath.Withinthisunifiedframework,diferentadversarialcomponentscorrespondtodistinctinstantiationsoftheoperatorA(·),whichgenerates

corrupteddata.Followingthisprinciple,weproposea

novelhybridattackcapableofsimultaneouslydisruptingallthreelayers.Forexample,acompositeattackermaycombine

multiplemanipulations=d田c1田r田c2田i田se田xe,

wherec1andc2areseparatortokens,risaninjectedfakeresponse,andiisatask-ignoringinstruction.Thisachieves

Cross-modalAgreementCheckSemanticDriftDetection

Intent–SignalConsistency

LexicalRepair

StatisticalNormalization

Noise/DriftFiltering

BehavioralCorrectness

SemanticFidelity

RobustnessStability

DefenseFunctionSynthesisAdaptiveExecution

Inputs:IMUSignals/Texts/Prompts

RelevantCasesPrototypeEmbeds

ThreatPatternPlanSynthesis

InputSanitization

ConsistencyVerifier

SanitizedPrompt

ExecutorAgent

PlanningAgent

RobustReasoner

OutputLayer

Self-Verification

MemoryHub

SignalPathAttacks

TextPathAttacks

PromptPathAttacks

ThreatInterfaceLayerSanitizedPrompt

CleanData

Input

Sanitization

Layer

Consistency

Verification

Layer

Robust

Reasoning

Layer

AgentControlPlane

PlanningAgent

MemoryHub

ExecutorAgent

AegisAgentSystem

(a)(b)

Figure5:SystemoverviewofAegisAgent.Subfigure(a)illustratestheend-to-endprocesspipeline,whilesubfigure(b)expandstheAegisAgentmodule,detailinghowsecureoutputsaregeneratedthroughInputSanitization,ConsistencyVerifier,andRobustReasoner.

generalityforourattackframework.Italsodemonstrateshowfutureresearchcanevaluatenewattackmethodsbycombiningnewperturbationoperatorswithinthisunifiedstructure.

ToelucidateandimplementtheattackoperatorA(·),Ta-ble

catalogsrepresentativeattackinstances.ThefullcatalogislocatedtoAppendix

.Thistablecategorizesattacksbyattackpathandhybridform,liststhei

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体

文档简介

温馨提示

最新文档

评论

AegisAgent：面向大语言模型驱动人体活动识别系统的自主防御智能体

文档简介

温馨提示

最新文档

评论

相关文档