谷歌的安全AI智能体简介 Googles Approach for Secure AI Agents An Introduction_第1页
谷歌的安全AI智能体简介 Googles Approach for Secure AI Agents An Introduction_第2页
谷歌的安全AI智能体简介 Googles Approach for Secure AI Agents An Introduction_第3页
谷歌的安全AI智能体简介 Googles Approach for Secure AI Agents An Introduction_第4页
谷歌的安全AI智能体简介 Googles Approach for Secure AI Agents An Introduction_第5页
已阅读5页,还剩28页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

GoogleMay2025

Google’sApproachforSecureAIAgents:

AnIntroduction

SantiagoDíaz,ChristophKern,KaraOlive

01

02

03

04

05

06

Introduction:thepromiseandrisksofAIagents

SecuritychallengesofhowAIagentswork

KeyrisksassociatedwithAIagents

Coreprinciplesforagentsecurity

Google’sapproach:ahybriddefense-in-depth

Navigatingthefutureofagentssecurely

ThispaperispartofourongoingefforttoshareGoogle’sbestpracticesforbuildingsecureAIsystems.ReadmoreaboutGoogle’sSecureAI

Frameworkatsaif.google.

01–Introduction:ThepromiseandrisksofAIagents

Introduction:ThepromiseandrisksofAIagents

WeareenteringaneweradrivenbyAIagents—AIsystemsdesignedtoperceivetheirenvironment,makedecisions,andtakeautonomousactionstoachieveuser-definedgoals.UnlikestandardLargeLanguageModels(LLMs)thatprimarilygeneratecontent,agentsact.TheyleverageAIreasoningtointeractwithothersystemsandexecutetasks,rangingfromsimpleautomationlikecategorizingincomingservicerequeststocomplex,multi-stepplanningsuchasresearchingatopicacrossmultiplesources,summarizingthefindings,anddraftinganemailtoateam.

Thisincreasingcapabilityandautonomypromisessignificantvalue,poten-tiallyreshapinghowbusinessesoperateandindividualsinteractwithtechnology.TherapiddevelopmentofagentframeworkslikeGoogle’sAgentDevelopmentKit1andopensourcetoolssuchasLangChainsignalsamovetowardwidespreaddeployment,suggesting“fleets”ofagentsoper-atingatscaleratherthanjustisolatedinstances.Atthesametime,thepromiseofagentsintroducesuniqueandcriticalsecuritychallengesthatdemandexecutiveattention.

Keyrisks:Rogueactionsandsensitivedatadisclosure

TheverynatureofAIagentsintroducesnewrisksstemmingfromseveralinherentcharacteristics.TheunderlyingAImodelscanbeunpredictable,astheirnon-deterministicnaturemeanstheirbehaviorisn’talwaysrepeatableevenwiththesameinput.Complex,emergentbehaviorscanarisethatweren’texplic-itlyprogrammed.Higherlevelsofautonomyindecision-makingincreasethepotentialscopeandimpactoferrorsaswellaspotentialvulnerabilitiestomaliciousactors.Ensuringalignment—thatagentactionsrea-sonablymatchuserintent,especiallywheninterpretingambiguousinstructionsorprocessinguntrustedinputs—remainsasignificanthurdle.Finally,therearechallengesinmanagingagentidentityandprivilegeseffectively.

ThesefactorscreatetheneedforAgentSecurity,aspecializedfieldfocusedonmitigatingthenovelrisksthesesystemspresent.Theprimaryconcernsdemandingstrategicfocusarerogueactions(unintended,harmful,orpolicy-violatingactions)andsensitivedatadisclosure(unauthorizedrevelationofprivateinfor-mation).Afundamentaltensionexists:increasedagentautonomyandpower,whichdriveutility,correlatedirectlywithincreasedrisk.

Traditionalsecurityparadigmsaloneareinsufficient

SecuringAIagentsinvolvesachallengingtrade-off:enhancinganagent’sutilitythroughgreaterautonomyandcapabilityinherentlyincreasesthecomplexityofensuringitssafetyandsecurity.Traditionalsystemssecurityapproaches(suchasrestrictionsonagentactionsimplementedthroughclassicalsoftware)lackthecontextualawarenessneededforversatileagentsandcanoverlyrestrictutility.Conversely,purelyreason-ing-basedsecurity(relyingsolelyontheAImodel’sjudgment)isinsufficientbecausecurrentLLMsremainsusceptibletomanipulationslikepromptinjectionandcannotyetoffersufficientlyrobustguarantees.Neitherapproachissufficientinisolationtomanagethisdelicatebalancebetweenutilityandrisk.

1

https://google.github.io/adk-docs/

Google

01–Introduction:ThepromiseandrisksofAIagents

Ourpathforward:Ahybridapproach

Buildingonwell-establishedprinciplesofsecuresoftwareandsystemsdesign,andinalignmentwithGoogle’sSecureAIFramework(SAIF),2Googleisadvocatingforandimplementingahybridapproach,combiningthestrengthsofbothtraditional,deterministiccontrolsanddynamic,reasoning-baseddefenses.Thiscreatesalayeredsecurityposture—a“defense-in-depthapproach”3—thataimstoconstrainpotentialharmwhilepreservingmaximumutility.Thisstrategyisbuiltuponthreecoresecurityprinciplesdetailedlaterinthisdocument.

ThispaperfirstexplainsthetypicalworkflowofanAIagentanditsinherentsecuritytouchpoints.Itthenaddresseskeyrisksagentspose,introducescoresecurityprinciples,anddetailsGoogle’shybriddefense-in-depthstrategy.Throughout,guidingquestionsaresuggestedtohelpframeyourthinking.Aforthcoming,comprehensivewhitepaperwilldelvedeeperintothesetopics,offeringmoreextensivetechnicaldetailsandmitigations.

2

www.saif.google

3

https://google.github.io/building-secure-and-reliable-systems/raw/ch08.html#defense_in_depth

Google’sApproachforSecureAIAgents:AnIntroduction5

02–SecuritychallengesofhowAIagentswork

Google

SecuritychallengesofhowAIagentswork

Tounderstandtheuniquesecurityrisksofagents,it’shelpfultostartwithamentalmodelthatdescribesacommonagentarchitecture.Whiledetailsvary,thereareseveralbroadlyapplicableconcepts.Wewillbrieflydiscusseachandidentifytherisksthatapplytoeachcomponent.

OrchestrationAgentApplication

UserInteraction

Application

System

instructions

Userquerydetails

Rendering

Outputtransformation

Perception

Inputtransformation

Model(s)

ReasoningandplanningLLM

Reasoningcore

Model(s)

Dataprocessing

Agentmemory

Content(RAG)

Tools

Figure1:Asimplifiedconceptualagentarchitectureforvisualizingrelevantsecurityconsiderations

02–SecuritychallengesofhowAIagentswork

Google’sApproachforSecureAIAgents:AnIntroduction7

Input,perceptionandpersonalization:Agentsbeginbyreceivinginput.Thisinputcanbeadirectuserinstruction(typedcommand,voicequery)orcontextualdatagatheredfromtheenvironment(sensorread-ings,applicationstate,recentdocuments).Theinput,whichcanbemultimodal(text,image,audio),isprocessedandperceivedbytheagentandoftentransformedintoaformattheAImodelcanunderstand.

Securityimplication:Acriticalchallengehereisreliablydistinguishingtrustedusercommandsfrompotentiallyuntrustedcontextualdataandinputsfromothersources(forexample,contentwithinanemailorwebpage).Failuretodosoopensthedoortopromptinjectionattacks,wheremaliciousinstruc-tionshiddenindatacanhijacktheagent.Secureagentsmustcarefullyparseandseparatetheseinputstreams.Personalizationfeatures,whereagentslearnuserpreferences,alsoneedcontrolstopreventmanipulationordatacontaminationacrossusers.

Questionstoconsider

Whattypesofinputsdoestheagentprocess,andcanitclearlydistinguishtrusteduserinputsfrompotentiallyuntrustedcontextualinputs?

Doestheagentactimmediatelyinresponsetoinputsordoesitperformactionsasynchronouslywhentheusermaynotbepresenttoprovideoversight?

Istheuserabletoinspect,approve,andrevokepermissionsforagentactions,memory,andpersonalizationfeatures?

Ifanagenthasmultipleusers,howdoesitensureitknowswhichuserisgivinginstructions,applytherightpermissionsforthatuser,andkeepeachuser’smemoryisolated?

Systeminstructions:Theagent’scoremodeloperatesonacombinedinputintheformofastructuredprompt.Thispromptintegratespredefinedsysteminstructions(whichdefinetheagent’spurpose,capabil-ities,andboundaries)withthespecificuserqueryandvariousdatasourceslikeagentmemoryorexternallyretrievedinformation.

Securityimplication:Acrucialsecuritymeasureinvolvesclearlydelimitingandseparatingthesedif-ferentelementswithintheprompt.Maintaininganunambiguousdistinctionbetweentrustedsysteminstructionsandpotentiallyuntrusteduserdataorexternalcontentisimportantformitigatingpromptinjectionattacks.

Reasoningandplanning:Theprocessedinput,combinedwithsysteminstructionsdefiningtheagent’spur-poseandcapabilities,isfedintothecoreAImodel.Thismodelreasonsabouttheuser’sgoalanddevelopsaplan—oftenasequenceofstepsinvolvinginformationretrievalandtoolusage—toachieveit.Thisplanningcanbeiterative,refiningtheplanbasedonnewinformationortoolfeedback.

Securityimplication:BecauseLLMplanningisprobabilistic,it’sinherentlyunpredictableandpronetoerrorsfrommisinterpretation.Furthermore,currentLLMarchitecturesdonotproviderigoroussepara-tionbetweenconstituentpartsofaprompt(inparticular,systemanduserinstructionsversusexternal,untrustworthyinputs),makingthemsusceptibletomanipulationlikepromptinjection.Thecommonpracticeofiterativeplanning(ina“reasoningloop”)exacerbatesthisrisk:eachcycleintroducesoppor-tunitiesforflawedlogic,divergencefromintent,orhijackingbymaliciousdata,potentiallycompoundingissues.Consequently,agentswithhighautonomyundertakingcomplex,multi-stepiterativeplanningpresentasignificantlyhigherrisk,demandingrobustsecuritycontrols.

02–SecuritychallengesofhowAIagentswork

Google

Questionstoconsider

Howdoestheagenthandleambiguousinstructionsorconflictinggoals,andcanitrequestuserclarification?

Whatlevelofautonomydoestheagenthaveinplanningandselectingwhichplantoexecute,andarethereconstraintsonplancomplexityorlength?

Doestheagentrequireuserconfirmationbeforeexecutinghigh-riskorirreversibleactions?

Orchestrationandactionexecution(tooluse):Toexecuteitsplan,theagentinteractswithexternal

systemsorresourcesvia“tools”or“actions.”ThesecouldbethroughAPIsforsendingemails,querying

databases,accessingfilesystems,controllingsmartdevices,oreveninteractingwithwebbrowserelements.

Theagentselectstheappropriatetoolandprovidesthenecessaryparametersbasedonitsplan.

Securityimplication:Thisstageiswhererogueplanstranslateintoreal-worldimpact.Eachtoolgrantstheagentspecificpowers.Uncontrolledaccesstopowerfulactions(suchasdeletingfiles,makingpur-chases,transferringdata,andevenadjustingsettingsonmedicaldevices)ishighlyriskyiftheplanningphaseiscompromised.Secureorchestrationrequiresrobustauthenticationandauthorizationfortooluse,ensuringtheagenthasappropriatelyconstrainedpermissions(reducedprivilege)forthetaskathand.Dynamicallyincorporatingnewtools,especiallythird-partyones,introducesrisksrelatedtodeceptivetooldescriptionsorinsecureimplementations.

Questionstoconsider

Isthesetofavailableagentactionsclearlydefined,andcanuserseasilyinspectactions,under-standtheirimplications,andprovideconsent?

Howareactionswithpotentiallysevereconsequencesidentifiedandsubjectedtospecificcontrolsorconfinement?

Whatsafeguards(suchassandboxingpolicies,usercontrols,andsensitivedeploymentexclu-sions)preventagentactionsfromimproperlyexposinghigh-privilegeinformationorcapabilitiesinlow-privilegecontexts?

Agentmemory:Manyagentsmaintainsomeformofmemorytoretaincontextacrossinteractions,storelearneduserpreferences,orrememberfactsfromprevioustasks.

Securityimplication:Memorycanbecomeavectorforpersistentattacks.Ifmaliciousdatacontainingapromptinjectionisprocessedandstoredinmemory(forexample,asa“fact”summarizedfromamaliciousdocument),itcouldinfluencetheagent’sbehaviorinfuture,unrelatedinteractions.Memoryimplementationsmustensurestrictisolationbetweenusersandpotentiallybetweendifferentcon-textsforthesameusertopreventcontamination.Usersalsoneedtransparencyandcontroloveragentmemory.Understandingthesestageshighlightshowvulnerabilitiescanarisethroughouttheagent’soperationalcycle,necessitatingsecuritycontrolsateachcriticaljuncture.

02–SecuritychallengesofhowAIagentswork

Google’sApproachforSecureAIAgents:AnIntroduction9

Responserendering:Thisstagetakestheagent’sfinalgeneratedoutputandformatsitfordisplaywithintheuser’sapplicationinterfacesuchasawebbrowserormobileapp.

Securityimplication:Iftheapplicationrendersagentoutputwithoutpropersanitizationorescapingbasedoncontenttype,vulnerabilitieslikeCross-SiteScripting(XSS)ordataexfiltration(frommaliciouslycraftedURLsinimagetags,forexample)canoccur.Robustsanitizationbytherenderingcomponentiscrucial.

Questionstoconsider

Howisagentmemoryisolatedbetweendifferentusersandcontextstopreventdataleakageorcross-contamination?

Whatstopsstoredmaliciousinputs(likepromptinjections)fromcausingpersistentharm?

Whatsanitizationandescapingprocessesareappliedwhenrenderingagent-generatedoutputtopreventexecutionvulnerabilities(suchasXSS)?

Howisrenderedagentoutput,especiallygeneratedURLsorembeddedcontent,validatedtopreventsensitivedatadisclosure?

03–KeyrisksassociatedwithAIagents

Google

KeyrisksassociatedwithAIagents

Wethinktheinherentdesignofagents,combinedwiththeirpowerfulcapabilities,canexposeuserstotwomajorrisks,whatwecallrogueactionsandsensitivedatadisclosure.Thefollowingsectionexaminesthesetworisksandmethodsattackersusetorealizethem.

OrchestrationAgentApplication

Application

Perception

Inputtransformation

Rendering

Outputtransformation

System

instructions

Userquery

details

Reasoningcore

UserInteraction

2

1

Model(s)

ReasoningandplanningLLM

Model(s)

Dataprocessing

Content(RAG)

Agentmemory

1

2

Tools

Figure2:RisksassociatedwithAIagentsacrosstheagentarchitecture:Rogueactions(1)andSensitivedatadisclosure(2)

03–KeyrisksassociatedwithAIagents

Google’sApproachforSecureAIAgents:AnIntroduction,,

Risk1:Rogueactions

Rogueactions—unintended,harmful,orpolicy-violatingagentbehav-iors—representaprimarysecurityriskforAIagents.

Akeycauseispromptinjection:maliciousinstructionshiddenwithinprocesseddata(likefiles,emails,orwebsites)cantricktheagent’scoreAImodel,hijackingitsplanningorreasoningphases.Themodelmisinterpretsthisembeddeddataasinstructions,causingittoexecuteattackercommandsusingtheuser’sauthority.Forexample,anagentprocessingamaliciousemailmightbemanipulatedintoleakinguserdatainsteadofperformingtherequestedtask.

Rogueactionscanalsooccurwithoutmaliciousinput,stemminginsteadfromfundamentalmisalignmentormisinterpretation.Theagentmightmisunderstandambiguousinstructionsorcontext.Forinstance,anambiguousrequestlike“emailMikeabouttheprojectupdate”couldleadtheagenttoselectthewrongcon-tact,inadvertentlysharingsensitiveinformation.Suchcasesinvolveharmfuldivergencefromuserintentduetotheagent’sinterpretation,notexternalcompromise.

Additionally,unexpectednegativeoutcomescanariseiftheagentmisinterpretscomplexinteractionswithexternaltoolsorenvironments.Forexample,itmightmisinterpretthefunctionofbuttonsorformsonacomplexwebsite,leadingtoaccidentalpurchasesorunintendeddatasubmissionswhentryingtoexecuteaplannedaction.

Thepotentialimpactofanyrogueactionscalesdirectlywiththeagent’sauthorizedcapabilitiesandtoolaccess.Thepotentialforfinancialloss,databreaches,systemdisruption,reputationaldamage,andevenphysicalsafetyrisksescalatesdramaticallywiththesensitivityandreal-worldimpactoftheactionstheagentispermittedtotake.

Risk2:Sensitivedatadisclosure

Thiscriticalriskinvolvesanagentimproperlyrevealingprivateorcon-

fidentialinformation.Aprimarymethodforachievingsensitivedata

disclosureisdataexfiltration.Thisinvolvestrickingtheagentintomak-

ingsensitiveinformationvisibletoanattacker.Attackersoftenachieve

thisbyexploitingagentactionsandtheirsideeffects,typically

drivenbypromptinjection.Attackerscanmethodicallyguideanagent

throughasequenceofactions.Theymighttricktheagentintoretrieving

sensitivedataandthenleakingitthroughactions,suchasembedding

datainaURLtheagentispromptedtovisit,orhidingsecretsincode

commitmessages.

Alternatively,datacanbeleakedbymanipulatingtheagent,soutputgeneration.Anattackermighttricktheagentintoincludingsensitivedatadirectlyinitsresponse(liketextorMarkdown).Ifthisoutputisren-deredinsecurelybytheapplication(becauseitlacksappropriatevalidationorsanitizationfordisplayinabrowser,forexample),thedatacanbeexposed.ThiscanhappenthroughcraftedimageURLshiddeninMarkdownthatleakdatawhenfetched,forinstance.ThisvectorcanalsoleadtoCross-SiteScripting(XSS).

Theimpactofdatadisclosureissevere,potentiallyleadingtoprivacybreaches,intellectualpropertyloss,complianceviolations,orevenaccounttakeover,andthedamageisoftenirreversible.

Mitigatingthesediverseandpotentrisksrequiresadeliberate,multi-facetedsecuritystrategygroundedinclear,actionableprinciples.

04–Coreprinciplesforagentsecurity

Google

Coreprinciplesforagentsecurity

Tomitigatetherisksofagentswhilebenefitingfromtheirimmensepotential,weproposethatagenticproductdevelopersshouldadoptthreecoreprinciplesforagentsecurity.Foreachprinciple,werecommendcontrolsortechniquesforyoutoconsider.

OrchestrationAgentApplication

UserInteraction

1Application

Perception

Inputtransformation

2

Rendering

Outputtransformation

Userquerydetails

System

instructions

Reasoningcore

Model(s)

ReasoningandplanningLLM

2

Content(RAG)

Agentmemory

Model(s)

Dataprocessing

Tools

,.

>3

>3

Figure3:ControlsrelevanttoAIagents:Agentusercontrols(1),Agentpermissions(2),andAgentobservability(3)

04–Coreprinciplesforagentsecurity

Google’sApproachforSecureAIAgents:AnIntroduction13

Principle1:Agentsmusthavewell-definedhumancontrollers

Agentstypicallyactasproxiesorassistantsforhumans,inheritingprivilegestoaccessresourcesandperformactions.Therefore,itisessentialforsecurityandaccountabilitythatagentsoperateunderclearhumanoversight.Everyagentmusthaveawell-definedsetofcontrollinghumanuser(s).Thisprinciplemandatesthatsystemsmustbeabletoreliablydistinguishinstructionsoriginatingfromanautho-rizedcontrollinguserversusanyotherinput,especiallypotentiallyuntrusteddataprocessedbytheagent.Foractionsdeemedcriticalorirreversible—suchasdeletinglargeamountsofdata,authorizingsignif-icantfinancialtransactions,orchangingsecuritysettings—thesystemshouldrequireexplicithumanconfirmationbeforeproceeding,ensur-ingtheuserremainsintheloop.

Furthermore,scenariosinvolvingmultipleusersoragentsrequirecarefulconsideration.Agentsactingonbehalfofteamsorgroupsneeddistinctidentitiesandclearauthorizationmodelstopreventunauthorizedcross-userdataaccessoroneuserinadvertentlytriggeringactionsimpactinganother.Usersshouldbegiventhetoolstograntmoregranularpermissionswhentheagentisshared,comparedtothecoarse-grainedpermissionsthatmightbeappropriateforasingle-useragent.Similarly,ifagentconfigurationsorcustompromptscanbeshared,theprocessmustbetransparent,ensuringusersunderstandexactlyhowasharedconfigurationmightaltertheagent’sbehaviorandpotentialactions.

Controls:ThisprinciplereliesoneffectiveAgentUserControls,supportedbyinfrastructurethatprovidesdistinctagentidentitiesandsecureinputchannelstodifferentiateusercommands.

Principle2:Agentpowersmusthavelimitations

Anagent’spowers—theactionsitcantakeandtheresourcesitcanaccess—mustbecarefullylimitedinalignmentwithitsintendedpur-poseanditscontrollinguser’srisktolerance.Forexample,anagentdesignedforresearchshouldnotpossessthepowertomodifyfinancialaccounts.General-purposeagentsneedmechanismstodynamicallyconfinetheircapabilitiesatruntime,ensuringonlyrelevantpermissionsareactiveforanygivenquery(forexample,disallowingfiledeletionactionswhenthetaskiscreativewriting).

nl,

Thisprincipleextendstraditionalleastprivilegebyrequiringanagent’spermissionstobedynamicallyalignedwithitsspecificpurposeandcurrentuserintent,ratherthanjustbeingstaticallyminimized.Thisdistinctioniscrucialbecausemodel-basedagentsoperateinpotentiallyunboundedecosystemsandaresusceptibletoreasoningerrors,necessitatingcontext-awarelimitationsthatgobeyondthetypicalscopeconsideredforhumanusersortraditionalsoftware.

Thisprinciplenecessitatesthatmaximumpermissionlevelsmustbedefinableandenforceable.Critically,agentsmustbepreventedfromescalatingtheirownprivilegesbeyondexplicitlypre-authorizedscopes.Usersmustalsoretaintheabilitytoinspectandrevokeanyauthoritydelegatedtoanagent.ImplementingthisrequiresadaptingandstrengtheningfundamentalAuthentication,Authorization,andAuditing(AAA)infrastructureforagents,includingverifiableagentidentities,granularpermissionsystems,andsecureman-agementofcredentialslikescopedOAuthtokens.

04–Coreprinciplesforagentsecurity

Google

Controls:ImplementingthisprinciplerequiresdefinedAgentPermissionscontrols,enforcedbyrobustAuthentication,Authorization,andAuditing(AAA)infrastructureadaptedforagents,andutilizingscopedcredentialslikeOAuthtokenstomanageaccesssecurely.

Principle3:Agentactionsandplanningmustbeobservable

Trust,effectivedebugging,securityauditing,incidentresponse,anduserunderstandingallhingeontransparencyintotheagent’sactivities.Wecannotensureanagentisactingfaithfullyordiagnoseproblemsifitsoperationsareentirelyopaque.Therefore,agentactions,andwherefeasible,theirplanningprocesses,mustbeobservableandauditable.Thisrequiresimplementingrobustloggingacrosstheagent’sarchi-tecturetocapturecriticalinformationsuchasinputsreceived,toolsinvoked,parameterspassed,outputsgenerated,andideally,interme-diatereasoningsteps.Thisloggingmustbedonesecurely,protectingsensitivedatawithinthelogsthemselves.

Effectiveobservabilityalsomeansthatthepropertiesoftheactionsanagentcantake—suchaswhetheranactionisread-onlyversusstate-changing,orifithandlessensitivedata—mustbeclearlycharacterized.Thismetadataiscrucialforautomatedsecuritymechanismsandhumanreviewers.Finally,userinterfacesshouldbedesignedtopromotetransparency,providinguserswithinsightsintotheagent’s“thoughtprocess,”thedatasourcesitconsulted,ortheactionsitintendstotake,especiallyforcomplexorhigh-riskoperations.Thisrequiresinfrastructureinvestmentsinsecure,centralizedloggingsystemsandAPIsthatexposeactioncharacteristicsunderstandably.

Controls:EffectiveAgentObservabilitycontrolsarecrucial,necessitatinginfrastructureinvestmentsinsecure,centralizedloggingsystemsandstandardizedAPIsthatclearlycharacterizeactionpropertiesandpotentialsideeffects.

Thesethreeprinciplescollectivelyformastrategicframeworkformitigatingagentrisks.

Principle

1.Human

controllers

Summary

Ensuresaccountability,usercontrol,andpreventsagentsfromactingautonomouslyincriticalsituationswithoutclearhuman

oversightorattribution.

KeyControlFocus

Agentuser

controls

InfrastructureNeeds

Distinctagentidentities,

userconsentmechanisms,

secureinputs

2.Limitedpowers

Enforcesappropriate,dynamicallylimitedprivileges,ensuringagentshaveonlythe

capabilitiesandpermissionsnecessaryfortheirintendedpurposeandcannotescalateprivilegesinappropriately.

Agentpermissions

RobustAAAforagents,scopedcredentialmanagement,

sandboxing

3.Observableactions

Requirestransparencyandauditability

throughrobustloggingofinputs,reasoning,actions,andoutputs,enablingsecurity

decisionsanduserunderstanding.

Agent

observability

Secure/centralizedlogging,characterizedactionAPIs,transparentUX

Figure4:Asummaryof

agentsecurityprinciples,controls,andhigh-level

infrastructureneeds

05–Google’sapproach:Ahybriddefense-in-depth

Google’sApproachforSecureAIAgents:AnIntroduction15

Google’sapproach:Ahybriddefense-in-depth

GiventheinherentlimitationsofcurrentAImodelsandthepracticalimpossibilityofguaranteeingper-fectalignmentagainstallpotentialthreats,Googleemploysadefense-in-depthstrategycenteredaroundahybridapproach.Thisapproachstrategicallycombinestraditional,deterministicsecuritymeasureswithdynamic,reasoning-baseddefenses.Thegoalistocreaterobustboundariesaroundtheagentsoperationalenvironment,significantlymitigatingtheriskofharmfuloutcomes,particularlyrogueactionsstemmingfrompromptinjection,whilestrivingtopreservetheagentsutility.

Thisdefense-in-depthapproachreliesonenforcedboundariesaroundtheAIagentsoperationalenviron-menttopreventpotentialworst-casescenarios,actingasguardrailseveniftheagentsinternalreasoningprocessbecomescompromisedormisalignedbysophisticatedattacksorunexpectedinputs.Thismulti-layeredapproachrecognizesthatneitherpurelyrule-basedsystemsnorpurelyAI-basedjudgmentaresufficientontheirown.

Examplesofnew

vulnerabilities

Runtimepolicyenforcement

Dependableconstraintson

agentprivileges

Hardeningofthebasemodel,classifiers,andsafetyfine-tuning

Reasoning-baseddefenses

Regressiontesting

VariantAnalysis

Testingforregressions,variants,andnewvulnerabilities

RedTeams&HumanReviewers

AIAgent

Application

PerceptionRendering

Reasoningcore

Orchestration

Figure5:Googleshybri

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论