版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
october2025
CenterforSecurityandEmergingTechnology|1
ExecutiveSummary
Withrecentadvancementsinartificialintelligence—particularly,powerfulgenerative
models—privateandpublicsectoractorshaveheraldedthebenefitsofincorporatingAImoreprominentlyintoourdailylives.Frequentlycitedbenefitsincludeincreased
productivity,efficiency,andpersonalization.However,theharmcausedbyAIremainstobemorefullyunderstood.AsaresultofwiderAIdeploymentanduse,thenumberofAIharmincidentshassurgedinrecentyears,suggestingthatcurrentapproachesto
harmpreventionmaybefallingshort.ThisreportarguesthatthisisduetoalimitedunderstandingofhowAIrisksmaterializeinpractice.LeveragingAIincidentreportsfromtheAIIncidentDatabase,itanalyzeshowAIdeploymentresultsinharmandidentifiessixkeymechanismsthatdescribethisprocess(Table1).
Table1:TheSixAIHarmMechanisms
IntentionalHarm
UnintentionalHarm
●Harmbydesign
●AImisuse
●AttacksonAIsystems
●AIfailures
●Failuresofhumanoversight
●Integrationharm
AreviewofAIincidentsassociatedwiththesemechanismsleadstoseveralkeytakeawaysthatshouldinformAIgovernanceapproachesinthefuture.
1.Aone-size-fits-allapproachtoharmpreventionwillfallshort.Thisreport
illustratesthediversepathwaystoAIharmandthewiderangeofactors
involved.Effectivemitigationrequiresanequallydiverseresponsestrategythatincludessociotechnicalapproaches.Adoptingmodel-basedapproachesalonecouldespeciallyneglectintegrationharmsandfailuresofhumanoversight.
2.Todate,riskofharmcorrelatesonlyweaklywithmodelcapabilities.This
reportillustratesmanyinstancesofharmthatimplicatesingle-purposeAI
systems.Yetmanypolicyapproachesusebroadmodelcapabilities,oftenproxiedbycomputingpower,asapredictorforthepropensitytodoharm.Thisfailsto
mitigatethesignificantriskassociatedwiththeirresponsibledesign,development,anddeploymentoflesspowerfulAIsystems.
3.TrackingAIincidentsoffersinvaluableinsightsintorealAIrisksandhelps
buildresponsecapacity.Technicalinnovation,experimentationwithnewusecases,andnovelattackstrategieswillresultinnewAIharmincidentsinthe
CenterforSecurityandEmergingTechnology|2
future.Keepingpacewiththesedevelopmentsrequiresrapidadaptationandagileresponses.ComprehensiveAIincidentreportingallowsforlearningandadaptationatanacceleratedpace,enablingimprovedmitigationstrategiesandidentificationofnovelAIrisksastheyemerge.Incidentreportingmustbe
recognizedasacriticalpolicytooltoaddressAIrisks.
CenterforSecurityandEmergingTechnology|3
TableofContents
ExecutiveSummary 1
Introduction 4
Methodology 6
Limitations 6
AIHarmMechanisms 9
IntentionalHarm 9
HarmbyDesign 9
AIMisuse 10
AttacksonAISystems 12
UnintentionalHarm 14
AIFailures 14
FailuresofHumanOversight 16
IntegrationHarm 19
Discussion 22
Conclusion 23
Appendix 25
Authors 27
Acknowledgments 27
Endnotes 28
CenterforSecurityandEmergingTechnology|4
Introduction
DuetowidespreadAIuseanddeployment,AIsystemsareincreasinglyimplicatedinharmfulevents.Justsincethebeginningof2025,279newincidentshavebeenaddedtotheAIIncidentDatabase(AIID),anonprofiteffortdedicatedtotrackingrealized
harmfromAIdeployment
.1
SinceitslaunchinNovember2020,thedatabasehascollectedandindexedmorethan1,200incidentsofharmornearmissesinvolvingalgorithmicsystemsandAI
.*
Clearly,moreeffortsareneededtopreventsuchAIharm.PreemptiveharmpreventionistheunderlyinggoalpursuedbymostAIgovernanceinterventions,beitregulations
liketheEuropeanUnion’sAIAct,executiveguidanceliketheOfficeofManagementandBudget’smemorandumM-25-21,orcompanyframeworkslikeAnthropic’sResponsibleScalingPolicy
.2
Preventingharmeffectively,however,requiresabetterunderstandingofhowAIuseleadstoharmfuloutcomesinpractice,ratherthanintheory.
AIincidentsdataprovidevaluableinsightsforunderstandinghowAIsystemscan
causerealharm.Bycollecting,indexing,andarchivingreportsfromhundredsofreal-worldAIincidents,theAIIDhascreatedatreasuretroveofdatadescribingnotonlythemyriadsofharmsAIsystemshavebeenimplicatedin,butalsohowtheseharmscametobe.CSETpreviouslyleveragedthisdatatocreateananalyticalframeworkthat
providesfundamentaldefinitionsandclassificationschemesforincidentdataanalysis
.3
Thisframeworkwasthenusedtoannotateandclassifymorethan200incidentsfromthedatabasebyincidenttype,harmcategory,andotherdimensions
.4
Thispastanalyticalworkservesasthefoundationforthisreport,whichdescribesthevarietyofforcesinvolvedinAIharm.Leveragingthemorethan200reviewedcasesofreal-worldharmfromthedatabase,itidentifiessixkey“mechanismsofharm,”whichcanbedividedintointentionalandunintentionalharm(Table2).
*EachincidentintheAIIDcorrespondstooneormoreinstancesofharm,sothetotalnumberofdiscreteharmeventscapturedinthedatabaseishigherthanthenumberofincidentIDs.WhilesomeincidentIDscorrespondtoasingledocumentedharminstance,otherscapturemedia-constructedaccountsthat
aggregaterelatedincidentsintoasinglenarrative.
CenterforSecurityandEmergingTechnology|5
Table2:TheSixAIHarmMechanisms
IntentionalHarm
UnintentionalHarm
●Harmbydesign:HarmcausedbyAI
●AIfailures:HarmcausedbyAI
systemsdesignedanddevelopedfor
errors,malfunctions,orbias
harmfulpurposes
●Failuresofhumanoversight:Harm
●AImisuse:UseofAIsystemsfor
resultingfromthefailureofhuman-
harmagainstthedevelopers’
machine-teams
intentions
●Integrationharm:Harmresultingas
●AttacksonAIsystems:Harm
anunintendedconsequenceof
resultingfromAIbehavioror
deploymentinagivencontext
(in)actioncausedbycyberattacks
ThesixmechanismscomprehensivelydescribethevariouspathwaystoharmfoundintheAIID.Assuch,theyprovidearicherunderstandingofhowAIrisksmaterializeinpractice,whichcanhelpguidemitigationstrategies.
TheseharmmechanismsmayhaveimmediatepolicyrelevanceforcompanieshopingtocomplywithregulationsliketheEuropeanUnion’sAIAct.TheEUrecentlyreleaseda
codeofpracticeforgeneral-purposeAI.Thisvoluntarycompliancetoolwasdevelopedtohelpprovidersofgeneral-purposeAImodelsadheretotheact’srequirements,andlaysoutacomprehensiveriskmanagementprocess.Underit,developersmustengageinriskmodeling,describedas“astructuredprocessaimedatspecifyingpathways
throughwhichasystemicriskstemmingfromamodelmightmaterialize.
”5
This
frameworkofharmmechanisms,builtonempiricalevidenceofreal-worldincidents,mayserveasausefulstartingpointforthisexercise.
Importantly,thesediversemechanismsneedtobeaddressedthroughanequallywiderangeofmitigationstrategies.Securitypracticesmayservetoalleviaterisksofmisuseandattack,butdonothingtoaddressintegrationharms.PerformancestandardsandtestingprotocolscanreduceAIfailuresbutwon’tmitigatelimitationsofhuman
oversight.TopreventAIharmeffectively,adiversetoolboxisrequired.
ThefollowingsectionspresentharmincidentsfromtheAIIDtoillustratethesixharmmechanismsanddeepenreaders’understandingofthevarietyofwaysinwhichAIcancauseharm.Whilethisreportdoesnotintendtoprovideacomprehensiveoverviewofmitigationtechniques,ithighlightsmeasuresthatcombatspecificmechanismswherepossible,anddescribeswheremoreresearchisneededtofindeffectivemitigationstoparticularchallenges.
CenterforSecurityandEmergingTechnology|6
Methodology
Analysisforthisreporttookplaceintwostages.Thefirststageinvolvedthe
identificationofthesixharmmechanismsbasedonin-depthstudyofAIincidents.In
previousresearch,CSETdevelopedastandardizedanalyticalframeworkforthestudyofAIharms
.6
Thedevelopmentofthisframeworkinvolvedaniterativeprocessof
incidentreviewandframeworkadaptationthroughwhichthekeyelementsrequiredfortheidentificationofreal-worldAIharm,theirbasicrelationalstructure,andtheir
definitionswereidentified.Acentralconceptualcomponentoftheframeworkisthe“chainofharm,”i.e.,theseriesofeventsbetweenAIdeploymentandtheincident
outcomethatleadstoharm.Thechainofharmservesanessentialfunctioninthe
identificationofAIharm:FortheretobeAIharm,therehastobeadirectlinkbetweenthebehavioroftheAIsystemandtheharmthatoccurred.
Theframeworkwasappliedtomorethan200incidentsfromtheAIID,whichwere
annotatedandclassifiedbyincidenttype,harmcategory,andothervariables
.7
This
analyticalprocessandthecorrespondingin-depthinvestigationofalargenumberofincidentsshowedthatthechainofharmwascharacterizedbyavarietyofforcesthatshapedhowincidentsunfolded:theharmmechanisms.Whileerrors—bothhumanandAI—playedarole,sodidintentionallyharmfulusesofAIsystems,andonoccasiontheintegrationofAIintoaspecificdeploymentcontextwasharmfulonitsown.Thesix
harmmechanismspresentedabovewerederivedfromtheanalysisoftheserepeatedpatternsofeventsinthechainsofharm.
Thesecondstageinvolvedvalidatingthederivedmechanismsbycategorizinga
randomsetof200incidentsfromtheAIID.Thiswasnecessaryfortworeasons.First,thesampleofincidentstowhichtheAIHarmFrameworkhadoriginallybeenappliedhadnotbeenrandomlyselected,andwasthusnotrepresentativeofincidentsinthe
AIIDatthetime.Secondly,thenumberofincidentsintheAIIDhadgrownbyover50%sincethebeginningofthefirststagein2023.Validationwasthereforeessentialto
ensurethemechanismsremainapplicableandrelevanttoanewersetofincidentsthatmoreaccuratelyreflectthecurrentleveloftechnologicalinnovation.TheresultofthisexerciseisshowninFigure1intheappendix.
Limitations
Allframeworksandmodelsarenecessarilyanabstractandsimplifiedrepresentationofreality.Intherealworld,harmmechanismsareoftennotasclear-cutastheyappearinthisreport,andseveralmechanismscanbeactivesimultaneously.AttacksonAI
systemsareoftencarriedouttoenablemisuse.Modelperformanceissuesandhuman
CenterforSecurityandEmergingTechnology|7
oversightfailuresoccuratthesametime.Andsystemsdesignedforharmcanfailandcauseunintendedorexcessiveharm.
Representingintentionalityasabinaryisusefultohelpdistinguishthedifferentactorsthatinterventionsneedtotarget.However,inrealityintentionalityisaspectrum.SomeincidentsoccurastrulyunintendedandunforeseeableconsequencesofAIuse,and
othersareobviouslyintentional.Butmanysitinbetween,resultingfromdevelopers’
anddeployers’negligencetoconsiderpotentialimpacts,orevenfromreckless
disregardofeasilyforeseeableharm.Thisfluiditycreatesedgecasesthatrenderthe
distinctionbetweenharmbydesignandintegrationharmdifficult.Judgingthesecaseswithcertaintywouldrequireinsightsintothedevelopers’anddeployers’decision-
makingandgovernanceprocessesthataregenerallynotavailable.Thus,unlessthereisevidencetothecontrary,thisreportreliesontheassumptionthatharmwas
unintentionalwhencategorizingedgecaseincidents.Futureworkmayaddressthislimitationthroughamoredisaggregatedrepresentationofintentionality.
Lackofinformationcansimilarlyimpedethedistinctionbetweenharmbydesignand
misuse.SinceoutsideobserversgenerallycannotdiscerntheunderlyingAImodelinasystemthatisinvolvedinharm,itisoftenunclearwhetheranexistingAImodelwas
misusedoronewasdesignedforthispurpose
.8
Distinguishingbetweenthetwo
categoriesisnonethelessworthwhile,becausemitigationrequiresdifferent
interventionsbasedonwhichactoralongtheAIvaluechainintendstodoharm;the
modeldeveloperprecipitatesharmbydesign,whereastheAIsystem’susercauses
harmfrommisuse
.*
Evenwhentheysharesomeoverlap,separationallowsustoidentifytheappropriatemitigationmeasurestoaddresseachmechanismmoreeffectively.
Finally,therearetwolimitationsofthedatasource.AlthoughtheAIIDrepresentsthemostcomprehensivecollectionofAIincidentsandharmstodate,thedistributionof
incidenttypesinthedatabasedoesnotnecessarilyreflecttheirprevalenceintherealworld.Sincethedatabasedependsonjournalisticreporting,itrepresentsthepracticesandbiasesofthemediaecosystem.Assuch,itoverrepresentsincidentsthatare
attention-grabbingorassociatedwithcurrentsocietaldebates.Thissuggeststhatless
*Distinguishingdevelopers,deployers,andusersisnotalwaysstraightforward,andsometimesoneentityoccupiesmultipleroles.Forexample,ChatGPTisanAIsystemthatisbothdevelopedanddeployedby
OpenAI.Individualsinteractingwiththechatbotaretheusers.InascenariowhereOpenAIbuildsa
customerservicechatbotontopoftheirlanguagemodelGPT-5forathirdparty(e.g.,aninsurance
company),thedeveloperisstillOpenAIbutthedeployeristheinsurancecompany,andthecompany’scustomersarethechatbot’susers.
CenterforSecurityandEmergingTechnology|8
spectacularmechanismslikeintegrationharmmightbeunderrepresentedcomparedtoinstancesofhumanmisuse,whichhavedrivenmuchofthesocietaldebaterecently--particularlywhereitrelatestogenerativeAIsystems.
Lastly,therearemanyharmsfromAIthatcannoteasilybecapturedinanincident
databasebecausetheydonotmaterializeindiscreteinstances.TheconsequencesofAIenergyconsumption,thedetrimentalimpactofAIoverrelianceoneducation,orthe
adverseeffectsofAIcompanionsonhumanrelationshipsarejustafewexamplesofpossibleharmsthatrarelypresentasindividualincidents.Whileworthyofanalysis,thesetypesofharmsarenotwithinthescopeofthisstudy.
CenterforSecurityandEmergingTechnology|9
AIHarmMechanisms
IntentionalHarm
HarmbyDesign
AIsystemsdesignedwiththeintentionofcausingharmrepresentthemost
straightforwardofthesixharmmechanisms.Inthiscase,thedeveloperdesignstheAIsystemtoperformaninherentlyharmfultaskortobeusedinharmfulways.
Developers’andusers’intentionstocauseharmaregenerallyalignedinincidentsofharmbydesign,thoughasthefollowingexamplesillustrate,typesofharmbydesignsystemscanvarywidely.
SomeAIsystemsdevelopedfordefenseandlawenforcement,suchasAI-enabled
intelligenceanalysisandbattlespacemanagementsystemsusedfortargetingor
autonomousweaponssystemswithAI-enablednavigation,computervision,or
terminalguidance,areobviousexamplesofharmbydesignsystems.NomalfunctionsormisuseneedtooccurforharmtomaterializewhentheseAIsystemsareused,sinceharmistheintendedoutcome,bothbydeveloperanddeployer.Militariesmay
appropriatelyusethesesystemsagainstlawfulcombatantswhendeployedin
accordancewiththelawofarmedconflict.RecentconflictsinvolvingUkraineandIsraelhavereportedlyseenAI-enabledsystemscapableofcausingharmdeployedin
combat
.9
Butharmbydesignisalsoprevalentoutsideoflawenforcementanddefensecontexts.Deepfakeappsthatallowuserstomaliciouslycreatenonconsensualintimateimagery(NCII)abound.TherearedozensofsuchincidentsrecordedintheAIID—fartoomany
todescribeindividually
.10
Theharmoverwhelminglyaffectswomenandgirls,andas
such,theseincidentsprovidetangibleevidenceofafast-growingformofgender-baseddigitalviolence
.11
WhilethisisnotanewproblemnorevenanewAIcapability(imagegeneratorshavebeenaroundsinceapproximately2017),incidentsinvolving
pornographiccontenthavesurgedsinceAI“nudify”appsandpornographicvideogeneratorshavebecomemorewidelyavailableonline
.12
Therearealsomoresubtleformsofintentionallyharmfulalgorithmicdesign.Online
marketplacessuchasNaver,Coupang,andAmazonIndiahavebeenaccusedof
engaginginunfaircompetitivepracticesthroughalgorithmicmanipulation
.13
The
companiesallegedlyriggedtherecommendersystemsandsearchalgorithmspoweringtheirplatformstofavortheirownproductsandbrands,boostingtheirmarketshareandcausingeconomicandfinancialharmtotheircompetitors.Exploitingtheirdominanceas
CenterforSecurityandEmergingTechnology|10
anonlinemarketplatformtopromotetheirownbrandasavendorviolatesantitrust
lawsand,giventheplatforms’wideonlinereachandcustomerbase,theoverallimpactandscaleofharmfromevenminormanipulationcanbesignificant.
MitigatingHarmbyDesign
Ingeneral,thechoiceofapproachtoaddressingharmbydesigndependsonwhethertheintendedharmisconsideredsociallyacceptableornecessary.
ProhibitingthedevelopmentanddeploymentofAIsystemsforcertainusecasescanbeaneffectivemeasureforusecasescausingunacceptableharms.
Incontextswhereharmbydesignisdeemedacceptable,suchasdefenseandlawenforcementfunctions,thegoalofgovernanceisnottopreventharmentirelybuttoreduceittowhatisnecessaryinaclearlydefinedandcontestableframework.Institutionalpolicycanhelpensuretheresponsibledeploymentofthetechnologyinordertopreventexcessiveharmandnegligentuseorabuse.Generally,
organizationsshouldestablishAIgovernanceprinciplesthatclearlydefinethe
circumstancesandconditionsunderwhichrelianceonautonomousorAI-powereddecisionsandactionsisacceptable.Asolidaccountabilityframeworkwithclearlyassignedrolesandresponsibilitiescanensurethatanydecision-makingthatleadstoharmistransparentandtractable.Institutionaloversightbodies,assuming
sufficientindependenceandtransparency,shouldthenbeauthorizedto
investigateandauditAI-supporteddecision-makingandactionstakenwhenviolationsofthosepoliciesandframeworksoccuroraresuspected.
AIMisuse
AIsystemsthathavenotbeendevelopedwiththeexplicitgoalofdoingharmcanstillbemisusedforthatpurpose.Comparedtotheharmbydesignmechanism,wheretheintenttoharmlieswithboththedeveloperanduser,incasesofAImisusetheintenttoharmlieswiththeuseroroperatoroftheAIsystemonly.NotethatAImodelscanalsobemisusedfornon-maliciouspurposes,suchasusingAItodohomework
.14
Whilethismaycauseuserstounintentionallyharmthemselves—forexamplebydetrimentally
affectingtheirownlearning—thissectionisonlyconcernedwithintentionallyharmful,maliciousmisuse
.15
Bothspecializedalgorithmicsystemsandgeneral-purposeAImodelsareproneto
maliciousmisuse,althoughincidentsinvolvingthelatteraremorecommonintheAIID.
CenterforSecurityandEmergingTechnology|11
General-purposeAI,includinglargelanguagemodelsandtext-to-imagegenerators,performmanydifferenttaskswell,whichmakesthemparticularlyeasytomisuseforarangeofpurposes.
Forexample,in2023,usersoftheonlineforum4chancreatedhatefulandviolentvoiceimpersonationsofcelebritiesusingElevenLabs’voicesynthesisAImodel
.16
More
recently,MicrosoftandOpenAIreportedonhowstate-sponsoredhackersfromNorthKorea,Iran,Russia,andChinahadmisusedChatGPTforphishingandsocial
engineeringattackstargetingdefense,cybersecurity,andcryptocurrencysectors
.17
OtherinvestigationsrevealedthatChatGPThadbeenmisusedbycybercriminalstocreatemalwareandothermalicioussoftware
.18
SpecializedAIsystems,whichgenerallyserveasingleparticularpurpose,canalsobeharmfullymisused.Rankingonlinesearchresultsprovidesatroublingexample.
MaliciousactorscanbeespeciallyeffectiveatexploitingsearchresultrankingwithAIsystemsthatexhibitfullorveryhighlevelsofautonomy,misusingthemtoachieve
harmful,nefariousoutcomes.
Forinstance,antisemiticonlinegroupstaggedimagesofportableovensonwheelswiththelabel“Jewishbabystroller”
.19
Asaresult,ifuserssearchedfortheterm“Jewish
babystroller”,Google’salgorithmrankedimagesoftheovensatthetopofthesearchresults.Thiswasadirectexploitationoftheimagesearchalgorithm’sfunctionality,
whichworksbymatchingthewordsinaquerytothewordsthatappearnexttoimagesonawebpage.Thestrategysucceededparticularlywellbecauseofa“datavoid”
relatedtothesearchterm:Becausetheproduct“Jewishbabystroller”doesn’tactuallyexist,theonlyresultsavailableweretheoffensiveimages,whichwerethenpromotedbythealgorithm
.20
Malicioususershavecarriedoutsimilarlycoordinatedactivitiestotriggercontent
moderationalgorithmsintoremovingmarginalizedcreators’socialmediaposts,atacticknownas“adversarialreporting.
”21
Becausecontentonsocialmediasitesissometimesautomaticallyremovedwhenasufficientlyhighnumberofusersflagapost,regardlessofwhetherornotthepostactuallyviolatesanypolicies,right-wingtrollshave
strategicallyreportedpostsbyinfluencersbelongingtominoritygroupsonTikTokinordertotriggertheplatform’scontentmoderationalgorithm
.22
EvenifTikTok’sappealandreviewprocessfindsthatthevideodidnotviolatecommunityguidelines,penaltiesbecomemoreseverethemorefrequentlyacreator’spostsareflagged,andcanrangefromcontentremovaltoaccountdeletion.Automatedcontentmoderationsystemsarethusexploitedtoeffectivelycensormarginalizedcommunitiesonline.
TheLimitsofTechnicalMitigationsofMisuseRisk
DevelopersofgenerativeAImodelscantakestepstocontrolmodeloutputssoastolimitthegenerationofharmfulcontent
.23
Risk-basedreleasestrategiesthat
restrictaccesstoparticularlycapableoradvancedmodelscanfurtherhelpaddressmisuserisks
.24
Assessingamodel’spropensityformisuserequiresevaluating
whetheritcanperformagivenmalicioustask(plausibility)and,ifso,howwellitcandoso(performance)
.25
Red-teaminghasemergedasapopularmethodto
uncovermisuseplausibilityacrossawiderangeofdomains,andtosurfacewhereadditionalsafeguardsareneeded
.26
Performanceassessmentsshouldinclude
benchmarkevaluationsandexperiments,andfocusonthemarginalutilityofthemodelcomparedtoexistingmodesfortaskexecution
.27
Puttinginplacecomprehensivesafeguardsisexceptionallychallengingsinceitisdifficultfordeveloperstoanticipateallpotential(mis)usesoftheirmodel.Most
importantly,suchinterventionsatthemodellevelwillnotnecessarilyprevent
misusewithoutdeterioratingmodelperformance—alsoknownastheMisuse-UseTradeoff
.28
BecauseAImodelslackthecontexttounderstandmaliciousintent,
guardrailsthatpreventthemfrom,forexample,writingphishingemailswilllikelystopthemfromwritingotheremailsaswell.Thesameholdstrueforwritingcode:Guardrailstopreventmalwaremayreducethequalityofinnocuouscomputer
programs.Atthecurrentstateoftheart,buildinganAIsystemthatcanneverbemisusedoftenmeansbuildingasystemthatisbarelyusefulfornon-malicious
purposes.WhilethereareotherstepsAIdevelopersanddeployerscantaketopreventmodelmisuse,technicalfixesalonewillnoteliminatemisuserisks.
AttacksonAISystems
HarmcanalsoresultfromcyberattacksonAIsystems
.*
Aswiththemisusemechanism,theAIsystemdevelopersanddeployersdonotintendharmhere.Instead,theharmfulintentionsliewiththeattackers.Thecybersecuritycommunitycategorizesattacksintothreegroups:confidentiality,integrity,andavailabilityattacks
.29
Confidentialityattacksaimtoextractsensitiveinformation,integrityattacksaimtocompromisethemodel’s
*AdversarialattacksonAImodelscanoccurateverystageoftheAIlifecycle.SincerelevantincidentsintheAIIDresultfromattacksondeployedsystems,thissectiononlycoverspost-deploymentattacks.
CenterforSecurityandEmergingTechnology|12
CenterforSecurityandEmergingTechnology|13
performance,andavailabilityattacksaimtohalttheoverallfunctioningofthemodel.
Lately,anemergingcategoryofattacksaimstocircumventgenerativemodels’
safeguards
.30
SuchexploitationsofAIsystems’securityvulnerabilitiescanpotentiallyleadtoharmfuloutcomes.Moreover,inadditiontothestandaloneharmstheycause,attacksonAIsystemscanenablemodelmisuse.
WhilethereisampleevidenceofthesecurityvulnerabilitiesofAIsystems,mostattacksrecordedintheAIIDstilloccurinexperimentalsettingsthatdonotleadtoreal-worldharm.Forexample,securityresearchershaveuncoveredvulnerabilitiesinGitHub
CopilotthatwouldenableattackerstomodifyCopilot’sresponsesorleakthe
developer’sdata(confidentialityandintegrityattacks)
.31
ExperimentsshowedthatflawsinTesla’sautopilotcouldbeexploitedtomakethecaraccelerateandveerintotheoncomingtrafficlane(anintegrityattack)
.32
Finally,aninvestigationfoundthatadivergenceattackonChatGPTcouldforcethesystemtoleaktrainingdata,includingpersonalidentifiableinformationsuchasphonenumbersandemailandphysical
addresses(aconfidentialityattack)
.33
HarmincidentsfromtheAIIDshowthatinpractice,attacksonAIsystemsareoften
carriedouttoevadegenerativeAImodelsafeguards.Thispractice,called“jailbreaking,”reliesonpromptinjectionattacksinwhichuserscomeupwithtextpromptsthatinducetheAImodeltobehaveinwaysthatviolateitspolicies
.34
Promptinjectionattacks
enableduserstoevadeChatGPT’sguardrailsshortlyafteritsreleaseinorderto
producediscriminatoryandviolentcontent,aswellasofferinstructionsonhowtocarryoutcriminalactivities
.35
Evenaftermodelshaveundergoneextensivesafetytestingandred-teaming,promptinjectionattacksremainapopularandeffectivetechniqueto
circumventguardrails.Hackersusinglargelanguagemodelsformalwarecreation
employdozensofpromptingstrategiesforvariou
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026河北石家庄井陉矿区人民医院招聘16人备考题库带答案详解(基础题)
- 三《项脊轩志》教案【中职专用】高教版2023基础模块下册
- 第5课 一举一动-创建电子相册教学设计小学信息技术(信息科技)四年级下册清华大学版
- 2026福建福州市鼓楼区第二批公益性岗位招聘6人备考题库附参考答案详解(达标题)
- 2026浙江宁波甬江未来科创港有限公司招聘1人备考题库含答案详解(精练)
- 2026河南黄金叶投资管理有限公司所属企业大学生招聘29人备考题库(第一批次)带答案详解ab卷
- 2026黑龙江哈尔滨工程大学信息与通信工程学院集成电路学院岗位招聘1人备考题库及一套完整答案详解
- 2026青海西宁城市建设开发有限责任公司招聘备考题库附答案详解
- 2026广东东莞厚街社区招聘社区网格员2人备考题库附答案详解(突破训练)
- 第一单元 欢天喜地教学设计小学地方、校本课程黑教版人文与社会
- 2026年中国邮政集团有限公司安徽省分公司校园招聘考试参考题库及答案解析
- 2026年北京市朝阳区高三一模历史试卷(含答案)
- 湖南天壹名校联盟2026届高三下学期3月质量检测历史试卷(含解析)
- 2026年温州市瓯海区专职社区工作者公开招聘6人考试参考试题及答案解析
- 中药材采购合作框架协议书
- 2026年宁夏财经职业技术学院单招职业技能测试题库及1套参考答案详解
- 2026年信阳职业技术学院单招职业适应性测试题库与答案详解
- 温室气体内部管理制度
- 家庭装修施工合同正规格式(2026年版)
- MCNP-5A程序使用说明书
- 中药制剂检测技术第五章中药制剂的卫生学检查课件
评论
0/150
提交评论