人工智能危害的机制：从人工智能事件中吸取的教训（英文）

上传人：策*** IP属地：山西上传时间：2025-12-07 格式：DOCX 页数：63 大小：1.61MB 积分：19.9 举报 版权申诉

已阅读5页，还剩58页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

october2025

CenterforSecurityandEmergingTechnology|1

ExecutiveSummary

Withrecentadvancementsinartificialintelligence—particularly,powerfulgenerative

models—privateandpublicsectoractorshaveheraldedthebenefitsofincorporatingAImoreprominentlyintoourdailylives.Frequentlycitedbenefitsincludeincreased

productivity,efficiency,andpersonalization.However,theharmcausedbyAIremainstobemorefullyunderstood.AsaresultofwiderAIdeploymentanduse,thenumberofAIharmincidentshassurgedinrecentyears,suggestingthatcurrentapproachesto

harmpreventionmaybefallingshort.ThisreportarguesthatthisisduetoalimitedunderstandingofhowAIrisksmaterializeinpractice.LeveragingAIincidentreportsfromtheAIIncidentDatabase,itanalyzeshowAIdeploymentresultsinharmandidentifiessixkeymechanismsthatdescribethisprocess(Table1).

Table1:TheSixAIHarmMechanisms

IntentionalHarm

UnintentionalHarm

●Harmbydesign

●AImisuse

●AttacksonAIsystems

●AIfailures

●Failuresofhumanoversight

●Integrationharm

AreviewofAIincidentsassociatedwiththesemechanismsleadstoseveralkeytakeawaysthatshouldinformAIgovernanceapproachesinthefuture.

1.Aone-size-fits-allapproachtoharmpreventionwillfallshort.Thisreport

illustratesthediversepathwaystoAIharmandthewiderangeofactors

involved.Effectivemitigationrequiresanequallydiverseresponsestrategythatincludessociotechnicalapproaches.Adoptingmodel-basedapproachesalonecouldespeciallyneglectintegrationharmsandfailuresofhumanoversight.

2.Todate,riskofharmcorrelatesonlyweaklywithmodelcapabilities.This

reportillustratesmanyinstancesofharmthatimplicatesingle-purposeAI

systems.Yetmanypolicyapproachesusebroadmodelcapabilities,oftenproxiedbycomputingpower,asapredictorforthepropensitytodoharm.Thisfailsto

mitigatethesignificantriskassociatedwiththeirresponsibledesign,development,anddeploymentoflesspowerfulAIsystems.

3.TrackingAIincidentsoffersinvaluableinsightsintorealAIrisksandhelps

buildresponsecapacity.Technicalinnovation,experimentationwithnewusecases,andnovelattackstrategieswillresultinnewAIharmincidentsinthe

CenterforSecurityandEmergingTechnology|2

future.Keepingpacewiththesedevelopmentsrequiresrapidadaptationandagileresponses.ComprehensiveAIincidentreportingallowsforlearningandadaptationatanacceleratedpace,enablingimprovedmitigationstrategiesandidentificationofnovelAIrisksastheyemerge.Incidentreportingmustbe

recognizedasacriticalpolicytooltoaddressAIrisks.

CenterforSecurityandEmergingTechnology|3

TableofContents

ExecutiveSummary 1

Introduction 4

Methodology 6

Limitations 6

AIHarmMechanisms 9

IntentionalHarm 9

HarmbyDesign 9

AIMisuse 10

AttacksonAISystems 12

UnintentionalHarm 14

AIFailures 14

FailuresofHumanOversight 16

IntegrationHarm 19

Discussion 22

Conclusion 23

Appendix 25

Authors 27

Acknowledgments 27

Endnotes 28

CenterforSecurityandEmergingTechnology|4

Introduction

DuetowidespreadAIuseanddeployment,AIsystemsareincreasinglyimplicatedinharmfulevents.Justsincethebeginningof2025,279newincidentshavebeenaddedtotheAIIncidentDatabase(AIID),anonprofiteffortdedicatedtotrackingrealized

harmfromAIdeployment

SinceitslaunchinNovember2020,thedatabasehascollectedandindexedmorethan1,200incidentsofharmornearmissesinvolvingalgorithmicsystemsandAI

Clearly,moreeffortsareneededtopreventsuchAIharm.PreemptiveharmpreventionistheunderlyinggoalpursuedbymostAIgovernanceinterventions,beitregulations

liketheEuropeanUnion’sAIAct,executiveguidanceliketheOfficeofManagementandBudget’smemorandumM-25-21,orcompanyframeworkslikeAnthropic’sResponsibleScalingPolicy

Preventingharmeffectively,however,requiresabetterunderstandingofhowAIuseleadstoharmfuloutcomesinpractice,ratherthanintheory.

AIincidentsdataprovidevaluableinsightsforunderstandinghowAIsystemscan

causerealharm.Bycollecting,indexing,andarchivingreportsfromhundredsofreal-worldAIincidents,theAIIDhascreatedatreasuretroveofdatadescribingnotonlythemyriadsofharmsAIsystemshavebeenimplicatedin,butalsohowtheseharmscametobe.CSETpreviouslyleveragedthisdatatocreateananalyticalframeworkthat

providesfundamentaldefinitionsandclassificationschemesforincidentdataanalysis

Thisframeworkwasthenusedtoannotateandclassifymorethan200incidentsfromthedatabasebyincidenttype,harmcategory,andotherdimensions

Thispastanalyticalworkservesasthefoundationforthisreport,whichdescribesthevarietyofforcesinvolvedinAIharm.Leveragingthemorethan200reviewedcasesofreal-worldharmfromthedatabase,itidentifiessixkey“mechanismsofharm,”whichcanbedividedintointentionalandunintentionalharm(Table2).

*EachincidentintheAIIDcorrespondstooneormoreinstancesofharm,sothetotalnumberofdiscreteharmeventscapturedinthedatabaseishigherthanthenumberofincidentIDs.WhilesomeincidentIDscorrespondtoasingledocumentedharminstance,otherscapturemedia-constructedaccountsthat

aggregaterelatedincidentsintoasinglenarrative.

CenterforSecurityandEmergingTechnology|5

Table2:TheSixAIHarmMechanisms

IntentionalHarm

UnintentionalHarm

●Harmbydesign:HarmcausedbyAI

●AIfailures:HarmcausedbyAI

systemsdesignedanddevelopedfor

errors,malfunctions,orbias

harmfulpurposes

●Failuresofhumanoversight:Harm

●AImisuse:UseofAIsystemsfor

resultingfromthefailureofhuman-

harmagainstthedevelopers’

machine-teams

intentions

●Integrationharm:Harmresultingas

●AttacksonAIsystems:Harm

anunintendedconsequenceof

resultingfromAIbehavioror

deploymentinagivencontext

(in)actioncausedbycyberattacks

ThesixmechanismscomprehensivelydescribethevariouspathwaystoharmfoundintheAIID.Assuch,theyprovidearicherunderstandingofhowAIrisksmaterializeinpractice,whichcanhelpguidemitigationstrategies.

TheseharmmechanismsmayhaveimmediatepolicyrelevanceforcompanieshopingtocomplywithregulationsliketheEuropeanUnion’sAIAct.TheEUrecentlyreleaseda

codeofpracticeforgeneral-purposeAI.Thisvoluntarycompliancetoolwasdevelopedtohelpprovidersofgeneral-purposeAImodelsadheretotheact’srequirements,andlaysoutacomprehensiveriskmanagementprocess.Underit,developersmustengageinriskmodeling,describedas“astructuredprocessaimedatspecifyingpathways

throughwhichasystemicriskstemmingfromamodelmightmaterialize.

”5

This

frameworkofharmmechanisms,builtonempiricalevidenceofreal-worldincidents,mayserveasausefulstartingpointforthisexercise.

Importantly,thesediversemechanismsneedtobeaddressedthroughanequallywiderangeofmitigationstrategies.Securitypracticesmayservetoalleviaterisksofmisuseandattack,butdonothingtoaddressintegrationharms.PerformancestandardsandtestingprotocolscanreduceAIfailuresbutwon’tmitigatelimitationsofhuman

oversight.TopreventAIharmeffectively,adiversetoolboxisrequired.

ThefollowingsectionspresentharmincidentsfromtheAIIDtoillustratethesixharmmechanismsanddeepenreaders’understandingofthevarietyofwaysinwhichAIcancauseharm.Whilethisreportdoesnotintendtoprovideacomprehensiveoverviewofmitigationtechniques,ithighlightsmeasuresthatcombatspecificmechanismswherepossible,anddescribeswheremoreresearchisneededtofindeffectivemitigationstoparticularchallenges.

CenterforSecurityandEmergingTechnology|6

Methodology

Analysisforthisreporttookplaceintwostages.Thefirststageinvolvedthe

identificationofthesixharmmechanismsbasedonin-depthstudyofAIincidents.In

previousresearch,CSETdevelopedastandardizedanalyticalframeworkforthestudyofAIharms

Thedevelopmentofthisframeworkinvolvedaniterativeprocessof

incidentreviewandframeworkadaptationthroughwhichthekeyelementsrequiredfortheidentificationofreal-worldAIharm,theirbasicrelationalstructure,andtheir

definitionswereidentified.Acentralconceptualcomponentoftheframeworkisthe“chainofharm,”i.e.,theseriesofeventsbetweenAIdeploymentandtheincident

outcomethatleadstoharm.Thechainofharmservesanessentialfunctioninthe

identificationofAIharm:FortheretobeAIharm,therehastobeadirectlinkbetweenthebehavioroftheAIsystemandtheharmthatoccurred.

Theframeworkwasappliedtomorethan200incidentsfromtheAIID,whichwere

annotatedandclassifiedbyincidenttype,harmcategory,andothervariables

This

analyticalprocessandthecorrespondingin-depthinvestigationofalargenumberofincidentsshowedthatthechainofharmwascharacterizedbyavarietyofforcesthatshapedhowincidentsunfolded:theharmmechanisms.Whileerrors—bothhumanandAI—playedarole,sodidintentionallyharmfulusesofAIsystems,andonoccasiontheintegrationofAIintoaspecificdeploymentcontextwasharmfulonitsown.Thesix

harmmechanismspresentedabovewerederivedfromtheanalysisoftheserepeatedpatternsofeventsinthechainsofharm.

Thesecondstageinvolvedvalidatingthederivedmechanismsbycategorizinga

randomsetof200incidentsfromtheAIID.Thiswasnecessaryfortworeasons.First,thesampleofincidentstowhichtheAIHarmFrameworkhadoriginallybeenappliedhadnotbeenrandomlyselected,andwasthusnotrepresentativeofincidentsinthe

AIIDatthetime.Secondly,thenumberofincidentsintheAIIDhadgrownbyover50%sincethebeginningofthefirststagein2023.Validationwasthereforeessentialto

ensurethemechanismsremainapplicableandrelevanttoanewersetofincidentsthatmoreaccuratelyreflectthecurrentleveloftechnologicalinnovation.TheresultofthisexerciseisshowninFigure1intheappendix.

Limitations

Allframeworksandmodelsarenecessarilyanabstractandsimplifiedrepresentationofreality.Intherealworld,harmmechanismsareoftennotasclear-cutastheyappearinthisreport,andseveralmechanismscanbeactivesimultaneously.AttacksonAI

systemsareoftencarriedouttoenablemisuse.Modelperformanceissuesandhuman

CenterforSecurityandEmergingTechnology|7

oversightfailuresoccuratthesametime.Andsystemsdesignedforharmcanfailandcauseunintendedorexcessiveharm.

Representingintentionalityasabinaryisusefultohelpdistinguishthedifferentactorsthatinterventionsneedtotarget.However,inrealityintentionalityisaspectrum.SomeincidentsoccurastrulyunintendedandunforeseeableconsequencesofAIuse,and

othersareobviouslyintentional.Butmanysitinbetween,resultingfromdevelopers’

anddeployers’negligencetoconsiderpotentialimpacts,orevenfromreckless

disregardofeasilyforeseeableharm.Thisfluiditycreatesedgecasesthatrenderthe

distinctionbetweenharmbydesignandintegrationharmdifficult.Judgingthesecaseswithcertaintywouldrequireinsightsintothedevelopers’anddeployers’decision-

makingandgovernanceprocessesthataregenerallynotavailable.Thus,unlessthereisevidencetothecontrary,thisreportreliesontheassumptionthatharmwas

unintentionalwhencategorizingedgecaseincidents.Futureworkmayaddressthislimitationthroughamoredisaggregatedrepresentationofintentionality.

Lackofinformationcansimilarlyimpedethedistinctionbetweenharmbydesignand

misuse.SinceoutsideobserversgenerallycannotdiscerntheunderlyingAImodelinasystemthatisinvolvedinharm,itisoftenunclearwhetheranexistingAImodelwas

misusedoronewasdesignedforthispurpose

Distinguishingbetweenthetwo

categoriesisnonethelessworthwhile,becausemitigationrequiresdifferent

interventionsbasedonwhichactoralongtheAIvaluechainintendstodoharm;the

modeldeveloperprecipitatesharmbydesign,whereastheAIsystem’susercauses

harmfrommisuse

Evenwhentheysharesomeoverlap,separationallowsustoidentifytheappropriatemitigationmeasurestoaddresseachmechanismmoreeffectively.

Finally,therearetwolimitationsofthedatasource.AlthoughtheAIIDrepresentsthemostcomprehensivecollectionofAIincidentsandharmstodate,thedistributionof

incidenttypesinthedatabasedoesnotnecessarilyreflecttheirprevalenceintherealworld.Sincethedatabasedependsonjournalisticreporting,itrepresentsthepracticesandbiasesofthemediaecosystem.Assuch,itoverrepresentsincidentsthatare

attention-grabbingorassociatedwithcurrentsocietaldebates.Thissuggeststhatless

*Distinguishingdevelopers,deployers,andusersisnotalwaysstraightforward,andsometimesoneentityoccupiesmultipleroles.Forexample,ChatGPTisanAIsystemthatisbothdevelopedanddeployedby

OpenAI.Individualsinteractingwiththechatbotaretheusers.InascenariowhereOpenAIbuildsa

customerservicechatbotontopoftheirlanguagemodelGPT-5forathirdparty(e.g.,aninsurance

company),thedeveloperisstillOpenAIbutthedeployeristheinsurancecompany,andthecompany’scustomersarethechatbot’susers.

CenterforSecurityandEmergingTechnology|8

spectacularmechanismslikeintegrationharmmightbeunderrepresentedcomparedtoinstancesofhumanmisuse,whichhavedrivenmuchofthesocietaldebaterecently--particularlywhereitrelatestogenerativeAIsystems.

Lastly,therearemanyharmsfromAIthatcannoteasilybecapturedinanincident

databasebecausetheydonotmaterializeindiscreteinstances.TheconsequencesofAIenergyconsumption,thedetrimentalimpactofAIoverrelianceoneducation,orthe

adverseeffectsofAIcompanionsonhumanrelationshipsarejustafewexamplesofpossibleharmsthatrarelypresentasindividualincidents.Whileworthyofanalysis,thesetypesofharmsarenotwithinthescopeofthisstudy.

CenterforSecurityandEmergingTechnology|9

AIHarmMechanisms

IntentionalHarm

HarmbyDesign

AIsystemsdesignedwiththeintentionofcausingharmrepresentthemost

straightforwardofthesixharmmechanisms.Inthiscase,thedeveloperdesignstheAIsystemtoperformaninherentlyharmfultaskortobeusedinharmfulways.

Developers’andusers’intentionstocauseharmaregenerallyalignedinincidentsofharmbydesign,thoughasthefollowingexamplesillustrate,typesofharmbydesignsystemscanvarywidely.

SomeAIsystemsdevelopedfordefenseandlawenforcement,suchasAI-enabled

intelligenceanalysisandbattlespacemanagementsystemsusedfortargetingor

autonomousweaponssystemswithAI-enablednavigation,computervision,or

terminalguidance,areobviousexamplesofharmbydesignsystems.NomalfunctionsormisuseneedtooccurforharmtomaterializewhentheseAIsystemsareused,sinceharmistheintendedoutcome,bothbydeveloperanddeployer.Militariesmay

appropriatelyusethesesystemsagainstlawfulcombatantswhendeployedin

accordancewiththelawofarmedconflict.RecentconflictsinvolvingUkraineandIsraelhavereportedlyseenAI-enabledsystemscapableofcausingharmdeployedin

combat

Butharmbydesignisalsoprevalentoutsideoflawenforcementanddefensecontexts.Deepfakeappsthatallowuserstomaliciouslycreatenonconsensualintimateimagery(NCII)abound.TherearedozensofsuchincidentsrecordedintheAIID—fartoomany

todescribeindividually

.10

Theharmoverwhelminglyaffectswomenandgirls,andas

such,theseincidentsprovidetangibleevidenceofafast-growingformofgender-baseddigitalviolence

.11

WhilethisisnotanewproblemnorevenanewAIcapability(imagegeneratorshavebeenaroundsinceapproximately2017),incidentsinvolving

pornographiccontenthavesurgedsinceAI“nudify”appsandpornographicvideogeneratorshavebecomemorewidelyavailableonline

.12

Therearealsomoresubtleformsofintentionallyharmfulalgorithmicdesign.Online

marketplacessuchasNaver,Coupang,andAmazonIndiahavebeenaccusedof

engaginginunfaircompetitivepracticesthroughalgorithmicmanipulation

.13

The

companiesallegedlyriggedtherecommendersystemsandsearchalgorithmspoweringtheirplatformstofavortheirownproductsandbrands,boostingtheirmarketshareandcausingeconomicandfinancialharmtotheircompetitors.Exploitingtheirdominanceas

CenterforSecurityandEmergingTechnology|10

anonlinemarketplatformtopromotetheirownbrandasavendorviolatesantitrust

lawsand,giventheplatforms’wideonlinereachandcustomerbase,theoverallimpactandscaleofharmfromevenminormanipulationcanbesignificant.

MitigatingHarmbyDesign

Ingeneral,thechoiceofapproachtoaddressingharmbydesigndependsonwhethertheintendedharmisconsideredsociallyacceptableornecessary.

ProhibitingthedevelopmentanddeploymentofAIsystemsforcertainusecasescanbeaneffectivemeasureforusecasescausingunacceptableharms.

Incontextswhereharmbydesignisdeemedacceptable,suchasdefenseandlawenforcementfunctions,thegoalofgovernanceisnottopreventharmentirelybuttoreduceittowhatisnecessaryinaclearlydefinedandcontestableframework.Institutionalpolicycanhelpensuretheresponsibledeploymentofthetechnologyinordertopreventexcessiveharmandnegligentuseorabuse.Generally,

organizationsshouldestablishAIgovernanceprinciplesthatclearlydefinethe

circumstancesandconditionsunderwhichrelianceonautonomousorAI-powereddecisionsandactionsisacceptable.Asolidaccountabilityframeworkwithclearlyassignedrolesandresponsibilitiescanensurethatanydecision-makingthatleadstoharmistransparentandtractable.Institutionaloversightbodies,assuming

sufficientindependenceandtransparency,shouldthenbeauthorizedto

investigateandauditAI-supporteddecision-makingandactionstakenwhenviolationsofthosepoliciesandframeworksoccuroraresuspected.

AIMisuse

AIsystemsthathavenotbeendevelopedwiththeexplicitgoalofdoingharmcanstillbemisusedforthatpurpose.Comparedtotheharmbydesignmechanism,wheretheintenttoharmlieswithboththedeveloperanduser,incasesofAImisusetheintenttoharmlieswiththeuseroroperatoroftheAIsystemonly.NotethatAImodelscanalsobemisusedfornon-maliciouspurposes,suchasusingAItodohomework

.14

Whilethismaycauseuserstounintentionallyharmthemselves—forexamplebydetrimentally

affectingtheirownlearning—thissectionisonlyconcernedwithintentionallyharmful,maliciousmisuse

.15

Bothspecializedalgorithmicsystemsandgeneral-purposeAImodelsareproneto

maliciousmisuse,althoughincidentsinvolvingthelatteraremorecommonintheAIID.

CenterforSecurityandEmergingTechnology|11

General-purposeAI,includinglargelanguagemodelsandtext-to-imagegenerators,performmanydifferenttaskswell,whichmakesthemparticularlyeasytomisuseforarangeofpurposes.

Forexample,in2023,usersoftheonlineforum4chancreatedhatefulandviolentvoiceimpersonationsofcelebritiesusingElevenLabs’voicesynthesisAImodel

.16

recently,MicrosoftandOpenAIreportedonhowstate-sponsoredhackersfromNorthKorea,Iran,Russia,andChinahadmisusedChatGPTforphishingandsocial

engineeringattackstargetingdefense,cybersecurity,andcryptocurrencysectors

.17

OtherinvestigationsrevealedthatChatGPThadbeenmisusedbycybercriminalstocreatemalwareandothermalicioussoftware

.18

SpecializedAIsystems,whichgenerallyserveasingleparticularpurpose,canalsobeharmfullymisused.Rankingonlinesearchresultsprovidesatroublingexample.

MaliciousactorscanbeespeciallyeffectiveatexploitingsearchresultrankingwithAIsystemsthatexhibitfullorveryhighlevelsofautonomy,misusingthemtoachieve

harmful,nefariousoutcomes.

Forinstance,antisemiticonlinegroupstaggedimagesofportableovensonwheelswiththelabel“Jewishbabystroller”

.19

Asaresult,ifuserssearchedfortheterm“Jewish

babystroller”,Google’salgorithmrankedimagesoftheovensatthetopofthesearchresults.Thiswasadirectexploitationoftheimagesearchalgorithm’sfunctionality,

whichworksbymatchingthewordsinaquerytothewordsthatappearnexttoimagesonawebpage.Thestrategysucceededparticularlywellbecauseofa“datavoid”

relatedtothesearchterm:Becausetheproduct“Jewishbabystroller”doesn’tactuallyexist,theonlyresultsavailableweretheoffensiveimages,whichwerethenpromotedbythealgorithm

.20

Malicioususershavecarriedoutsimilarlycoordinatedactivitiestotriggercontent

moderationalgorithmsintoremovingmarginalizedcreators’socialmediaposts,atacticknownas“adversarialreporting.

”21

Becausecontentonsocialmediasitesissometimesautomaticallyremovedwhenasufficientlyhighnumberofusersflagapost,regardlessofwhetherornotthepostactuallyviolatesanypolicies,right-wingtrollshave

strategicallyreportedpostsbyinfluencersbelongingtominoritygroupsonTikTokinordertotriggertheplatform’scontentmoderationalgorithm

.22

EvenifTikTok’sappealandreviewprocessfindsthatthevideodidnotviolatecommunityguidelines,penaltiesbecomemoreseverethemorefrequentlyacreator’spostsareflagged,andcanrangefromcontentremovaltoaccountdeletion.Automatedcontentmoderationsystemsarethusexploitedtoeffectivelycensormarginalizedcommunitiesonline.

TheLimitsofTechnicalMitigationsofMisuseRisk

DevelopersofgenerativeAImodelscantakestepstocontrolmodeloutputssoastolimitthegenerationofharmfulcontent

.23

Risk-basedreleasestrategiesthat

restrictaccesstoparticularlycapableoradvancedmodelscanfurtherhelpaddressmisuserisks

.24

Assessingamodel’spropensityformisuserequiresevaluating

whetheritcanperformagivenmalicioustask(plausibility)and,ifso,howwellitcandoso(performance)

.25

Red-teaminghasemergedasapopularmethodto

uncovermisuseplausibilityacrossawiderangeofdomains,andtosurfacewhereadditionalsafeguardsareneeded

.26

Performanceassessmentsshouldinclude

benchmarkevaluationsandexperiments,andfocusonthemarginalutilityofthemodelcomparedtoexistingmodesfortaskexecution

.27

Puttinginplacecomprehensivesafeguardsisexceptionallychallengingsinceitisdifficultfordeveloperstoanticipateallpotential(mis)usesoftheirmodel.Most

importantly,suchinterventionsatthemodellevelwillnotnecessarilyprevent

misusewithoutdeterioratingmodelperformance—alsoknownastheMisuse-UseTradeoff

.28

BecauseAImodelslackthecontexttounderstandmaliciousintent,

guardrailsthatpreventthemfrom,forexample,writingphishingemailswilllikelystopthemfromwritingotheremailsaswell.Thesameholdstrueforwritingcode:Guardrailstopreventmalwaremayreducethequalityofinnocuouscomputer

programs.Atthecurrentstateoftheart,buildinganAIsystemthatcanneverbemisusedoftenmeansbuildingasystemthatisbarelyusefulfornon-malicious

purposes.WhilethereareotherstepsAIdevelopersanddeployerscantaketopreventmodelmisuse,technicalfixesalonewillnoteliminatemisuserisks.

AttacksonAISystems

HarmcanalsoresultfromcyberattacksonAIsystems

Aswiththemisusemechanism,theAIsystemdevelopersanddeployersdonotintendharmhere.Instead,theharmfulintentionsliewiththeattackers.Thecybersecuritycommunitycategorizesattacksintothreegroups:confidentiality,integrity,andavailabilityattacks

.29

Confidentialityattacksaimtoextractsensitiveinformation,integrityattacksaimtocompromisethemodel’s

*AdversarialattacksonAImodelscanoccurateverystageoftheAIlifecycle.SincerelevantincidentsintheAIIDresultfromattacksondeployedsystems,thissectiononlycoverspost-deploymentattacks.

CenterforSecurityandEmergingTechnology|12

CenterforSecurityandEmergingTechnology|13

performance,andavailabilityattacksaimtohalttheoverallfunctioningofthemodel.

Lately,anemergingcategoryofattacksaimstocircumventgenerativemodels’

safeguards

.30

SuchexploitationsofAIsystems’securityvulnerabilitiescanpotentiallyleadtoharmfuloutcomes.Moreover,inadditiontothestandaloneharmstheycause,attacksonAIsystemscanenablemodelmisuse.

WhilethereisampleevidenceofthesecurityvulnerabilitiesofAIsystems,mostattacksrecordedintheAIIDstilloccurinexperimentalsettingsthatdonotleadtoreal-worldharm.Forexample,securityresearchershaveuncoveredvulnerabilitiesinGitHub

CopilotthatwouldenableattackerstomodifyCopilot’sresponsesorleakthe

developer’sdata(confidentialityandintegrityattacks)

.31

ExperimentsshowedthatflawsinTesla’sautopilotcouldbeexploitedtomakethecaraccelerateandveerintotheoncomingtrafficlane(anintegrityattack)

.32

Finally,aninvestigationfoundthatadivergenceattackonChatGPTcouldforcethesystemtoleaktrainingdata,includingpersonalidentifiableinformationsuchasphonenumbersandemailandphysical

addresses(aconfidentialityattack)

.33

HarmincidentsfromtheAIIDshowthatinpractice,attacksonAIsystemsareoften

carriedouttoevadegenerativeAImodelsafeguards.Thispractice,called“jailbreaking,”reliesonpromptinjectionattacksinwhichuserscomeupwithtextpromptsthatinducetheAImodeltobehaveinwaysthatviolateitspolicies

.34

Promptinjectionattacks

enableduserstoevadeChatGPT’sguardrailsshortlyafteritsreleaseinorderto

producediscriminatoryandviolentcontent,aswellasofferinstructionsonhowtocarryoutcriminalactivities

.35

Evenaftermodelshaveundergoneextensivesafetytestingandred-teaming,promptinjectionattacksremainapopularandeffectivetechniqueto

circumventguardrails.Hackersusinglargelanguagemodelsformalwarecreation

employdozensofpromptingstrategiesforvariou

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

人工智能危害的机制：从人工智能事件中吸取的教训（英文）

文档简介

温馨提示

最新文档

评论

人工智能危害的机制：从人工智能事件中吸取的教训（英文）

文档简介

温馨提示

最新文档

评论

相关文档