版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
GoogleMay2025
Google’sApproachforSecureAIAgents:
AnIntroduction
SantiagoDíaz,ChristophKern,KaraOlive
01
02
03
04
05
06
Introduction:thepromiseandrisksofAIagents
SecuritychallengesofhowAIagentswork
KeyrisksassociatedwithAIagents
Coreprinciplesforagentsecurity
Google’sapproach:ahybriddefense-in-depth
Navigatingthefutureofagentssecurely
ThispaperispartofourongoingefforttoshareGoogle’sbestpracticesforbuildingsecureAIsystems.ReadmoreaboutGoogle’sSecureAI
Frameworkatsaif.google.
01–Introduction:ThepromiseandrisksofAIagents
Introduction:ThepromiseandrisksofAIagents
WeareenteringaneweradrivenbyAIagents—AIsystemsdesignedtoperceivetheirenvironment,makedecisions,andtakeautonomousactionstoachieveuser-definedgoals.UnlikestandardLargeLanguageModels(LLMs)thatprimarilygeneratecontent,agentsact.TheyleverageAIreasoningtointeractwithothersystemsandexecutetasks,rangingfromsimpleautomationlikecategorizingincomingservicerequeststocomplex,multi-stepplanningsuchasresearchingatopicacrossmultiplesources,summarizingthefindings,anddraftinganemailtoateam.
Thisincreasingcapabilityandautonomypromisessignificantvalue,poten-tiallyreshapinghowbusinessesoperateandindividualsinteractwithtechnology.TherapiddevelopmentofagentframeworkslikeGoogle’sAgentDevelopmentKit1andopensourcetoolssuchasLangChainsignalsamovetowardwidespreaddeployment,suggesting“fleets”ofagentsoper-atingatscaleratherthanjustisolatedinstances.Atthesametime,thepromiseofagentsintroducesuniqueandcriticalsecuritychallengesthatdemandexecutiveattention.
Keyrisks:Rogueactionsandsensitivedatadisclosure
TheverynatureofAIagentsintroducesnewrisksstemmingfromseveralinherentcharacteristics.TheunderlyingAImodelscanbeunpredictable,astheirnon-deterministicnaturemeanstheirbehaviorisn’talwaysrepeatableevenwiththesameinput.Complex,emergentbehaviorscanarisethatweren’texplic-itlyprogrammed.Higherlevelsofautonomyindecision-makingincreasethepotentialscopeandimpactoferrorsaswellaspotentialvulnerabilitiestomaliciousactors.Ensuringalignment—thatagentactionsrea-sonablymatchuserintent,especiallywheninterpretingambiguousinstructionsorprocessinguntrustedinputs—remainsasignificanthurdle.Finally,therearechallengesinmanagingagentidentityandprivilegeseffectively.
ThesefactorscreatetheneedforAgentSecurity,aspecializedfieldfocusedonmitigatingthenovelrisksthesesystemspresent.Theprimaryconcernsdemandingstrategicfocusarerogueactions(unintended,harmful,orpolicy-violatingactions)andsensitivedatadisclosure(unauthorizedrevelationofprivateinfor-mation).Afundamentaltensionexists:increasedagentautonomyandpower,whichdriveutility,correlatedirectlywithincreasedrisk.
Traditionalsecurityparadigmsaloneareinsufficient
SecuringAIagentsinvolvesachallengingtrade-off:enhancinganagent’sutilitythroughgreaterautonomyandcapabilityinherentlyincreasesthecomplexityofensuringitssafetyandsecurity.Traditionalsystemssecurityapproaches(suchasrestrictionsonagentactionsimplementedthroughclassicalsoftware)lackthecontextualawarenessneededforversatileagentsandcanoverlyrestrictutility.Conversely,purelyreason-ing-basedsecurity(relyingsolelyontheAImodel’sjudgment)isinsufficientbecausecurrentLLMsremainsusceptibletomanipulationslikepromptinjectionandcannotyetoffersufficientlyrobustguarantees.Neitherapproachissufficientinisolationtomanagethisdelicatebalancebetweenutilityandrisk.
1
https://google.github.io/adk-docs/
01–Introduction:ThepromiseandrisksofAIagents
Ourpathforward:Ahybridapproach
Buildingonwell-establishedprinciplesofsecuresoftwareandsystemsdesign,andinalignmentwithGoogle’sSecureAIFramework(SAIF),2Googleisadvocatingforandimplementingahybridapproach,combiningthestrengthsofbothtraditional,deterministiccontrolsanddynamic,reasoning-baseddefenses.Thiscreatesalayeredsecurityposture—a“defense-in-depthapproach”3—thataimstoconstrainpotentialharmwhilepreservingmaximumutility.Thisstrategyisbuiltuponthreecoresecurityprinciplesdetailedlaterinthisdocument.
ThispaperfirstexplainsthetypicalworkflowofanAIagentanditsinherentsecuritytouchpoints.Itthenaddresseskeyrisksagentspose,introducescoresecurityprinciples,anddetailsGoogle’shybriddefense-in-depthstrategy.Throughout,guidingquestionsaresuggestedtohelpframeyourthinking.Aforthcoming,comprehensivewhitepaperwilldelvedeeperintothesetopics,offeringmoreextensivetechnicaldetailsandmitigations.
2
www.saif.google
3
https://google.github.io/building-secure-and-reliable-systems/raw/ch08.html#defense_in_depth
Google’sApproachforSecureAIAgents:AnIntroduction5
02–SecuritychallengesofhowAIagentswork
SecuritychallengesofhowAIagentswork
Tounderstandtheuniquesecurityrisksofagents,it’shelpfultostartwithamentalmodelthatdescribesacommonagentarchitecture.Whiledetailsvary,thereareseveralbroadlyapplicableconcepts.Wewillbrieflydiscusseachandidentifytherisksthatapplytoeachcomponent.
OrchestrationAgentApplication
UserInteraction
Application
System
instructions
Userquerydetails
Rendering
Outputtransformation
Perception
Inputtransformation
Model(s)
ReasoningandplanningLLM
Reasoningcore
Model(s)
Dataprocessing
Agentmemory
Content(RAG)
Tools
Figure1:Asimplifiedconceptualagentarchitectureforvisualizingrelevantsecurityconsiderations
02–SecuritychallengesofhowAIagentswork
Google’sApproachforSecureAIAgents:AnIntroduction7
Input,perceptionandpersonalization:Agentsbeginbyreceivinginput.Thisinputcanbeadirectuserinstruction(typedcommand,voicequery)orcontextualdatagatheredfromtheenvironment(sensorread-ings,applicationstate,recentdocuments).Theinput,whichcanbemultimodal(text,image,audio),isprocessedandperceivedbytheagentandoftentransformedintoaformattheAImodelcanunderstand.
Securityimplication:Acriticalchallengehereisreliablydistinguishingtrustedusercommandsfrompotentiallyuntrustedcontextualdataandinputsfromothersources(forexample,contentwithinanemailorwebpage).Failuretodosoopensthedoortopromptinjectionattacks,wheremaliciousinstruc-tionshiddenindatacanhijacktheagent.Secureagentsmustcarefullyparseandseparatetheseinputstreams.Personalizationfeatures,whereagentslearnuserpreferences,alsoneedcontrolstopreventmanipulationordatacontaminationacrossusers.
Questionstoconsider
Whattypesofinputsdoestheagentprocess,andcanitclearlydistinguishtrusteduserinputsfrompotentiallyuntrustedcontextualinputs?
Doestheagentactimmediatelyinresponsetoinputsordoesitperformactionsasynchronouslywhentheusermaynotbepresenttoprovideoversight?
Istheuserabletoinspect,approve,andrevokepermissionsforagentactions,memory,andpersonalizationfeatures?
Ifanagenthasmultipleusers,howdoesitensureitknowswhichuserisgivinginstructions,applytherightpermissionsforthatuser,andkeepeachuser’smemoryisolated?
Systeminstructions:Theagent’scoremodeloperatesonacombinedinputintheformofastructuredprompt.Thispromptintegratespredefinedsysteminstructions(whichdefinetheagent’spurpose,capabil-ities,andboundaries)withthespecificuserqueryandvariousdatasourceslikeagentmemoryorexternallyretrievedinformation.
Securityimplication:Acrucialsecuritymeasureinvolvesclearlydelimitingandseparatingthesedif-ferentelementswithintheprompt.Maintaininganunambiguousdistinctionbetweentrustedsysteminstructionsandpotentiallyuntrusteduserdataorexternalcontentisimportantformitigatingpromptinjectionattacks.
Reasoningandplanning:Theprocessedinput,combinedwithsysteminstructionsdefiningtheagent’spur-poseandcapabilities,isfedintothecoreAImodel.Thismodelreasonsabouttheuser’sgoalanddevelopsaplan—oftenasequenceofstepsinvolvinginformationretrievalandtoolusage—toachieveit.Thisplanningcanbeiterative,refiningtheplanbasedonnewinformationortoolfeedback.
Securityimplication:BecauseLLMplanningisprobabilistic,it’sinherentlyunpredictableandpronetoerrorsfrommisinterpretation.Furthermore,currentLLMarchitecturesdonotproviderigoroussepara-tionbetweenconstituentpartsofaprompt(inparticular,systemanduserinstructionsversusexternal,untrustworthyinputs),makingthemsusceptibletomanipulationlikepromptinjection.Thecommonpracticeofiterativeplanning(ina“reasoningloop”)exacerbatesthisrisk:eachcycleintroducesoppor-tunitiesforflawedlogic,divergencefromintent,orhijackingbymaliciousdata,potentiallycompoundingissues.Consequently,agentswithhighautonomyundertakingcomplex,multi-stepiterativeplanningpresentasignificantlyhigherrisk,demandingrobustsecuritycontrols.
02–SecuritychallengesofhowAIagentswork
Questionstoconsider
Howdoestheagenthandleambiguousinstructionsorconflictinggoals,andcanitrequestuserclarification?
Whatlevelofautonomydoestheagenthaveinplanningandselectingwhichplantoexecute,andarethereconstraintsonplancomplexityorlength?
Doestheagentrequireuserconfirmationbeforeexecutinghigh-riskorirreversibleactions?
Orchestrationandactionexecution(tooluse):Toexecuteitsplan,theagentinteractswithexternal
systemsorresourcesvia“tools”or“actions.”ThesecouldbethroughAPIsforsendingemails,querying
databases,accessingfilesystems,controllingsmartdevices,oreveninteractingwithwebbrowserelements.
Theagentselectstheappropriatetoolandprovidesthenecessaryparametersbasedonitsplan.
Securityimplication:Thisstageiswhererogueplanstranslateintoreal-worldimpact.Eachtoolgrantstheagentspecificpowers.Uncontrolledaccesstopowerfulactions(suchasdeletingfiles,makingpur-chases,transferringdata,andevenadjustingsettingsonmedicaldevices)ishighlyriskyiftheplanningphaseiscompromised.Secureorchestrationrequiresrobustauthenticationandauthorizationfortooluse,ensuringtheagenthasappropriatelyconstrainedpermissions(reducedprivilege)forthetaskathand.Dynamicallyincorporatingnewtools,especiallythird-partyones,introducesrisksrelatedtodeceptivetooldescriptionsorinsecureimplementations.
Questionstoconsider
Isthesetofavailableagentactionsclearlydefined,andcanuserseasilyinspectactions,under-standtheirimplications,andprovideconsent?
Howareactionswithpotentiallysevereconsequencesidentifiedandsubjectedtospecificcontrolsorconfinement?
Whatsafeguards(suchassandboxingpolicies,usercontrols,andsensitivedeploymentexclu-sions)preventagentactionsfromimproperlyexposinghigh-privilegeinformationorcapabilitiesinlow-privilegecontexts?
Agentmemory:Manyagentsmaintainsomeformofmemorytoretaincontextacrossinteractions,storelearneduserpreferences,orrememberfactsfromprevioustasks.
Securityimplication:Memorycanbecomeavectorforpersistentattacks.Ifmaliciousdatacontainingapromptinjectionisprocessedandstoredinmemory(forexample,asa“fact”summarizedfromamaliciousdocument),itcouldinfluencetheagent’sbehaviorinfuture,unrelatedinteractions.Memoryimplementationsmustensurestrictisolationbetweenusersandpotentiallybetweendifferentcon-textsforthesameusertopreventcontamination.Usersalsoneedtransparencyandcontroloveragentmemory.Understandingthesestageshighlightshowvulnerabilitiescanarisethroughouttheagent’soperationalcycle,necessitatingsecuritycontrolsateachcriticaljuncture.
02–SecuritychallengesofhowAIagentswork
Google’sApproachforSecureAIAgents:AnIntroduction9
Responserendering:Thisstagetakestheagent’sfinalgeneratedoutputandformatsitfordisplaywithintheuser’sapplicationinterfacesuchasawebbrowserormobileapp.
Securityimplication:Iftheapplicationrendersagentoutputwithoutpropersanitizationorescapingbasedoncontenttype,vulnerabilitieslikeCross-SiteScripting(XSS)ordataexfiltration(frommaliciouslycraftedURLsinimagetags,forexample)canoccur.Robustsanitizationbytherenderingcomponentiscrucial.
Questionstoconsider
Howisagentmemoryisolatedbetweendifferentusersandcontextstopreventdataleakageorcross-contamination?
Whatstopsstoredmaliciousinputs(likepromptinjections)fromcausingpersistentharm?
Whatsanitizationandescapingprocessesareappliedwhenrenderingagent-generatedoutputtopreventexecutionvulnerabilities(suchasXSS)?
Howisrenderedagentoutput,especiallygeneratedURLsorembeddedcontent,validatedtopreventsensitivedatadisclosure?
03–KeyrisksassociatedwithAIagents
KeyrisksassociatedwithAIagents
Wethinktheinherentdesignofagents,combinedwiththeirpowerfulcapabilities,canexposeuserstotwomajorrisks,whatwecallrogueactionsandsensitivedatadisclosure.Thefollowingsectionexaminesthesetworisksandmethodsattackersusetorealizethem.
OrchestrationAgentApplication
Application
Perception
Inputtransformation
Rendering
Outputtransformation
System
instructions
Userquery
details
Reasoningcore
UserInteraction
2
1
Model(s)
ReasoningandplanningLLM
Model(s)
Dataprocessing
Content(RAG)
Agentmemory
1
2
Tools
Figure2:RisksassociatedwithAIagentsacrosstheagentarchitecture:Rogueactions(1)andSensitivedatadisclosure(2)
03–KeyrisksassociatedwithAIagents
Google’sApproachforSecureAIAgents:AnIntroduction,,
Risk1:Rogueactions
Rogueactions—unintended,harmful,orpolicy-violatingagentbehav-iors—representaprimarysecurityriskforAIagents.
Akeycauseispromptinjection:maliciousinstructionshiddenwithinprocesseddata(likefiles,emails,orwebsites)cantricktheagent’scoreAImodel,hijackingitsplanningorreasoningphases.Themodelmisinterpretsthisembeddeddataasinstructions,causingittoexecuteattackercommandsusingtheuser’sauthority.Forexample,anagentprocessingamaliciousemailmightbemanipulatedintoleakinguserdatainsteadofperformingtherequestedtask.
Rogueactionscanalsooccurwithoutmaliciousinput,stemminginsteadfromfundamentalmisalignmentormisinterpretation.Theagentmightmisunderstandambiguousinstructionsorcontext.Forinstance,anambiguousrequestlike“emailMikeabouttheprojectupdate”couldleadtheagenttoselectthewrongcon-tact,inadvertentlysharingsensitiveinformation.Suchcasesinvolveharmfuldivergencefromuserintentduetotheagent’sinterpretation,notexternalcompromise.
Additionally,unexpectednegativeoutcomescanariseiftheagentmisinterpretscomplexinteractionswithexternaltoolsorenvironments.Forexample,itmightmisinterpretthefunctionofbuttonsorformsonacomplexwebsite,leadingtoaccidentalpurchasesorunintendeddatasubmissionswhentryingtoexecuteaplannedaction.
Thepotentialimpactofanyrogueactionscalesdirectlywiththeagent’sauthorizedcapabilitiesandtoolaccess.Thepotentialforfinancialloss,databreaches,systemdisruption,reputationaldamage,andevenphysicalsafetyrisksescalatesdramaticallywiththesensitivityandreal-worldimpactoftheactionstheagentispermittedtotake.
Risk2:Sensitivedatadisclosure
Thiscriticalriskinvolvesanagentimproperlyrevealingprivateorcon-
fidentialinformation.Aprimarymethodforachievingsensitivedata
disclosureisdataexfiltration.Thisinvolvestrickingtheagentintomak-
ingsensitiveinformationvisibletoanattacker.Attackersoftenachieve
thisbyexploitingagentactionsandtheirsideeffects,typically
drivenbypromptinjection.Attackerscanmethodicallyguideanagent
throughasequenceofactions.Theymighttricktheagentintoretrieving
sensitivedataandthenleakingitthroughactions,suchasembedding
datainaURLtheagentispromptedtovisit,orhidingsecretsincode
commitmessages.
Alternatively,datacanbeleakedbymanipulatingtheagent,soutputgeneration.Anattackermighttricktheagentintoincludingsensitivedatadirectlyinitsresponse(liketextorMarkdown).Ifthisoutputisren-deredinsecurelybytheapplication(becauseitlacksappropriatevalidationorsanitizationfordisplayinabrowser,forexample),thedatacanbeexposed.ThiscanhappenthroughcraftedimageURLshiddeninMarkdownthatleakdatawhenfetched,forinstance.ThisvectorcanalsoleadtoCross-SiteScripting(XSS).
Theimpactofdatadisclosureissevere,potentiallyleadingtoprivacybreaches,intellectualpropertyloss,complianceviolations,orevenaccounttakeover,andthedamageisoftenirreversible.
Mitigatingthesediverseandpotentrisksrequiresadeliberate,multi-facetedsecuritystrategygroundedinclear,actionableprinciples.
04–Coreprinciplesforagentsecurity
Coreprinciplesforagentsecurity
Tomitigatetherisksofagentswhilebenefitingfromtheirimmensepotential,weproposethatagenticproductdevelopersshouldadoptthreecoreprinciplesforagentsecurity.Foreachprinciple,werecommendcontrolsortechniquesforyoutoconsider.
OrchestrationAgentApplication
UserInteraction
1Application
Perception
Inputtransformation
2
Rendering
Outputtransformation
Userquerydetails
System
instructions
Reasoningcore
Model(s)
ReasoningandplanningLLM
2
Content(RAG)
Agentmemory
Model(s)
Dataprocessing
Tools
,.
>3
>3
Figure3:ControlsrelevanttoAIagents:Agentusercontrols(1),Agentpermissions(2),andAgentobservability(3)
04–Coreprinciplesforagentsecurity
Google’sApproachforSecureAIAgents:AnIntroduction13
Principle1:Agentsmusthavewell-definedhumancontrollers
Agentstypicallyactasproxiesorassistantsforhumans,inheritingprivilegestoaccessresourcesandperformactions.Therefore,itisessentialforsecurityandaccountabilitythatagentsoperateunderclearhumanoversight.Everyagentmusthaveawell-definedsetofcontrollinghumanuser(s).Thisprinciplemandatesthatsystemsmustbeabletoreliablydistinguishinstructionsoriginatingfromanautho-rizedcontrollinguserversusanyotherinput,especiallypotentiallyuntrusteddataprocessedbytheagent.Foractionsdeemedcriticalorirreversible—suchasdeletinglargeamountsofdata,authorizingsignif-icantfinancialtransactions,orchangingsecuritysettings—thesystemshouldrequireexplicithumanconfirmationbeforeproceeding,ensur-ingtheuserremainsintheloop.
Furthermore,scenariosinvolvingmultipleusersoragentsrequirecarefulconsideration.Agentsactingonbehalfofteamsorgroupsneeddistinctidentitiesandclearauthorizationmodelstopreventunauthorizedcross-userdataaccessoroneuserinadvertentlytriggeringactionsimpactinganother.Usersshouldbegiventhetoolstograntmoregranularpermissionswhentheagentisshared,comparedtothecoarse-grainedpermissionsthatmightbeappropriateforasingle-useragent.Similarly,ifagentconfigurationsorcustompromptscanbeshared,theprocessmustbetransparent,ensuringusersunderstandexactlyhowasharedconfigurationmightaltertheagent’sbehaviorandpotentialactions.
Controls:ThisprinciplereliesoneffectiveAgentUserControls,supportedbyinfrastructurethatprovidesdistinctagentidentitiesandsecureinputchannelstodifferentiateusercommands.
Principle2:Agentpowersmusthavelimitations
Anagent’spowers—theactionsitcantakeandtheresourcesitcanaccess—mustbecarefullylimitedinalignmentwithitsintendedpur-poseanditscontrollinguser’srisktolerance.Forexample,anagentdesignedforresearchshouldnotpossessthepowertomodifyfinancialaccounts.General-purposeagentsneedmechanismstodynamicallyconfinetheircapabilitiesatruntime,ensuringonlyrelevantpermissionsareactiveforanygivenquery(forexample,disallowingfiledeletionactionswhenthetaskiscreativewriting).
nl,
Thisprincipleextendstraditionalleastprivilegebyrequiringanagent’spermissionstobedynamicallyalignedwithitsspecificpurposeandcurrentuserintent,ratherthanjustbeingstaticallyminimized.Thisdistinctioniscrucialbecausemodel-basedagentsoperateinpotentiallyunboundedecosystemsandaresusceptibletoreasoningerrors,necessitatingcontext-awarelimitationsthatgobeyondthetypicalscopeconsideredforhumanusersortraditionalsoftware.
Thisprinciplenecessitatesthatmaximumpermissionlevelsmustbedefinableandenforceable.Critically,agentsmustbepreventedfromescalatingtheirownprivilegesbeyondexplicitlypre-authorizedscopes.Usersmustalsoretaintheabilitytoinspectandrevokeanyauthoritydelegatedtoanagent.ImplementingthisrequiresadaptingandstrengtheningfundamentalAuthentication,Authorization,andAuditing(AAA)infrastructureforagents,includingverifiableagentidentities,granularpermissionsystems,andsecureman-agementofcredentialslikescopedOAuthtokens.
04–Coreprinciplesforagentsecurity
Controls:ImplementingthisprinciplerequiresdefinedAgentPermissionscontrols,enforcedbyrobustAuthentication,Authorization,andAuditing(AAA)infrastructureadaptedforagents,andutilizingscopedcredentialslikeOAuthtokenstomanageaccesssecurely.
Principle3:Agentactionsandplanningmustbeobservable
Trust,effectivedebugging,securityauditing,incidentresponse,anduserunderstandingallhingeontransparencyintotheagent’sactivities.Wecannotensureanagentisactingfaithfullyordiagnoseproblemsifitsoperationsareentirelyopaque.Therefore,agentactions,andwherefeasible,theirplanningprocesses,mustbeobservableandauditable.Thisrequiresimplementingrobustloggingacrosstheagent’sarchi-tecturetocapturecriticalinformationsuchasinputsreceived,toolsinvoked,parameterspassed,outputsgenerated,andideally,interme-diatereasoningsteps.Thisloggingmustbedonesecurely,protectingsensitivedatawithinthelogsthemselves.
Effectiveobservabilityalsomeansthatthepropertiesoftheactionsanagentcantake—suchaswhetheranactionisread-onlyversusstate-changing,orifithandlessensitivedata—mustbeclearlycharacterized.Thismetadataiscrucialforautomatedsecuritymechanismsandhumanreviewers.Finally,userinterfacesshouldbedesignedtopromotetransparency,providinguserswithinsightsintotheagent’s“thoughtprocess,”thedatasourcesitconsulted,ortheactionsitintendstotake,especiallyforcomplexorhigh-riskoperations.Thisrequiresinfrastructureinvestmentsinsecure,centralizedloggingsystemsandAPIsthatexposeactioncharacteristicsunderstandably.
Controls:EffectiveAgentObservabilitycontrolsarecrucial,necessitatinginfrastructureinvestmentsinsecure,centralizedloggingsystemsandstandardizedAPIsthatclearlycharacterizeactionpropertiesandpotentialsideeffects.
Thesethreeprinciplescollectivelyformastrategicframeworkformitigatingagentrisks.
Principle
1.Human
controllers
Summary
Ensuresaccountability,usercontrol,andpreventsagentsfromactingautonomouslyincriticalsituationswithoutclearhuman
oversightorattribution.
KeyControlFocus
Agentuser
controls
InfrastructureNeeds
Distinctagentidentities,
userconsentmechanisms,
secureinputs
2.Limitedpowers
Enforcesappropriate,dynamicallylimitedprivileges,ensuringagentshaveonlythe
capabilitiesandpermissionsnecessaryfortheirintendedpurposeandcannotescalateprivilegesinappropriately.
Agentpermissions
RobustAAAforagents,scopedcredentialmanagement,
sandboxing
3.Observableactions
Requirestransparencyandauditability
throughrobustloggingofinputs,reasoning,actions,andoutputs,enablingsecurity
decisionsanduserunderstanding.
Agent
observability
Secure/centralizedlogging,characterizedactionAPIs,transparentUX
Figure4:Asummaryof
agentsecurityprinciples,controls,andhigh-level
infrastructureneeds
05–Google’sapproach:Ahybriddefense-in-depth
Google’sApproachforSecureAIAgents:AnIntroduction15
Google’sapproach:Ahybriddefense-in-depth
GiventheinherentlimitationsofcurrentAImodelsandthepracticalimpossibilityofguaranteeingper-fectalignmentagainstallpotentialthreats,Googleemploysadefense-in-depthstrategycenteredaroundahybridapproach.Thisapproachstrategicallycombinestraditional,deterministicsecuritymeasureswithdynamic,reasoning-baseddefenses.Thegoalistocreaterobustboundariesaroundtheagentsoperationalenvironment,significantlymitigatingtheriskofharmfuloutcomes,particularlyrogueactionsstemmingfrompromptinjection,whilestrivingtopreservetheagentsutility.
Thisdefense-in-depthapproachreliesonenforcedboundariesaroundtheAIagentsoperationalenviron-menttopreventpotentialworst-casescenarios,actingasguardrailseveniftheagentsinternalreasoningprocessbecomescompromisedormisalignedbysophisticatedattacksorunexpectedinputs.Thismulti-layeredapproachrecognizesthatneitherpurelyrule-basedsystemsnorpurelyAI-basedjudgmentaresufficientontheirown.
Examplesofnew
vulnerabilities
Runtimepolicyenforcement
Dependableconstraintson
agentprivileges
Hardeningofthebasemodel,classifiers,andsafetyfine-tuning
Reasoning-baseddefenses
Regressiontesting
VariantAnalysis
Testingforregressions,variants,andnewvulnerabilities
RedTeams&HumanReviewers
AIAgent
Application
PerceptionRendering
Reasoningcore
Orchestration
Figure5:Googleshybri
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 智能家居系统功能与使用说明
- 2020届高三英语模拟试题解析
- 网络营销推广方案设计与实施指南
- 小学生英语单词记忆法与练习
- 2025-2030中国环保技术行业供需分析及投资评估规划发展报告
- 2025-2030中国环保废弃物处理行业政策支持与可持续发展分析研究报告
- 2025-2030中国海鲜市场现状供需分析及投资评估规划分析研究报告
- 2025-2030中国机械装备制造业现状与发展研究分析
- 江苏省南京市浦口区江浦高级中学2026届生物高三第一学期期末质量跟踪监视模拟试题含解析
- 镀锌钢管连接技术施工流程
- 北师大四年级数学上册《总复习》课件
- 家庭农场的商业计划书(6篇)
- 高处安全作业培训
- 2023-2024学年北京市通州区数学九年级第一学期期末综合测试试题含解析
- 泌尿外科降低持续膀胱冲洗患者膀胱痉挛的发生率根本原因分析柏拉图鱼骨图对策拟定
- 图形创意应用课件
- 浙江省中医医疗技术感染预防与控制标准操作规程
- 胸痛中心联合例会与质控分析会-ACS患者如何更好的管理时间
- 保险管选型指导书
- 建筑风景速写课件
- 高强度螺栓连接施拧记录
评论
0/150
提交评论