Rand兰德:2024年评估人工智能对国家安全和公共安全的影响报告(英文版)_第1页
Rand兰德:2024年评估人工智能对国家安全和公共安全的影响报告(英文版)_第2页
Rand兰德:2024年评估人工智能对国家安全和公共安全的影响报告(英文版)_第3页
Rand兰德:2024年评估人工智能对国家安全和公共安全的影响报告(英文版)_第4页
Rand兰德:2024年评估人工智能对国家安全和公共安全的影响报告(英文版)_第5页
已阅读5页,还剩19页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

ConferenceProceedings

ANTONSHENK

EvaluatingArtificial

IntelligenceforNational

SecurityandPublicSafety

InsightsfromFrontierModelEvaluation

ScienceDay

2

F

rontierModelEvaluationScienceDayassem-bledmorethan100leadingexpertsinartifi-cialintelligence(AI),nationalsecurity,andpolicytoaddresstheemergingchallengesofevaluatingthreatsfromadvancedAIsystems.Theday’sagendawasstructuredaroundfourtracks,eachfocusingonauniqueaspectofAIevaluationscienceandpolicy.Thesetracksweredevelopedtoaddressfundamentalissuesinthefieldwhilekeepingthe

meetingagendaandinvitationlistmanageable.Themeeting’sfocusonevaluationmethodologyprovidedaspecializedforumforin-depthdiscussion,distin-guishingitfrombroaderAIsecuritytopicscoveredinothervenues.Thefourtrackswereasfollows:

•Thechemistryandbiologytrackfocused

ontheintersectionofAIwithchemicaland

biologicalrisks.Thistrackutilizedinsightsfrompreviousevaluationsofgeneral-purposeanddomain-specificAImodelsandaimedtoidentifycurrentandfutureevaluationneeds,includingintegratingwetlabvalidationandautomatedlabprocesses.

•ThelossofcontroltrackexploredscenariosinwhichAIsystemscouldoperatebeyondtheintendedboundariessetbytheirdevelop-ersorusers—includingAIsystemsdeceiv-inghumansoractingautonomously.ThesediscussionsaimedtoidentifyearlywarningsignsandexplorestrategiestopreventlossofcontrolofAIsystems.

•Therisk-agnosticmethodstracksought

tooutlinecomprehensiveanduniversal

approachestoevaluatingAImodels,spanningsuchtopicsasredteaming,automatedbench-marking,andtaskdesign.ItsobjectivewastoforgeaversatileframeworkforassessingAIsystems’capabilities,applicableacrossvariedriskscenarios,toensurethatevaluationsare

consistentlyrigorousandattheforefrontofthescience.

•Thecollaborationandcoordinationtrack

aimedtoconnectstakeholdersingovernment,industry,andcivilsocietytodevelopasharedunderstandingoftheobjectivesofevaluationscience.Discussionsinthistrackcenteredonestablishingkeypolicytimelinesanddeliver-

ables,thresholdsfordangerousAIcapabilities,andvoluntaryriskmanagementpoliciesfor

scalingAIcapabilities.

Theworkshopproceedingssynthesizeinsightsfromthesesessions,outlinethecomplexitiesofeval-uatingAIfordangerouscapabilities,andhighlightthecollaborativeeffortrequiredtoformulateeffec-tivepolicy.

Track1:ChemistryandBiology

Thechemistryandbiology(chem-bio)trackillu-

minatedtheintersectionofAIwithchem-biorisks,incorporatinginsightsfromevaluationsofgeneral-purposeanddomain-specificmodels.Thissectiondetailslessonslearnedfromcompletedmodelevalu-ations,needsandprioritiesforsubsequentroundsofevaluations,andconsiderationsforwetlabvalidationofmodeloutputs.

LessonsLearnedfromCompletedModelEvaluations

EmbracingComplexityinChem-BioModelAssessments

Thissessionhighlightedthepersistenceofthreat

actorsandthecomplexevolutionofchem-biothreats.Duringthediscussion,oneparticipantobserveda

potentiallimitationofexistingevaluationmeth-

ods,suggestingthatmarkinganentiretaskasfailedbecauseofearlysetbacksmightnotfullycapturetheresilienceandadaptabilityofthreatactors.Thiscri-tiquepositsthatamorenuancedapproachaccountingforthreatprogressionandtroubleshooting—suchasknowingtheproportionofsub-stepsthatsucceed—couldprovideamorecomprehensiveandcontinu-

ousunderstandingofthethreatlandscape.Tabletopexerciseswereproposedtoexplorethedynamicsoftroubleshootinganditerationfurther;however,theireffectivenessinthiscontextremainstobetested.

NavigatingtheComplexitiesofDual-UseDangersandDomain-SpecificModels

Inthissession,participantsnotedthatformulat-ingconcrete,detailedthreatmodelsiscrucialfor

3

understandingthelandscapeofpotentialAI-enabledchem-biothreatsandtheactorsbehindthem.This

processmightinvolveexploringnewcapabilitiesthatmaliciousactorspreviouslydidnothaveaccessto

oracceleratingandsimplifyingexistingprocesses,makingthemmoreaccessibletoabroadervarietyofactors.Oncethreatmodelsareidentifiedandtheappropriateevaluationundertaken,anotablechal-lengestillremains,asidentifiedbyoneparticipant:

thedifficultyforevaluatorstoaccuratelyembody

maliciousactors.AlthoughnotuniquetoAI/chem-

bioredteaming,thischallengearises,accordingto

theparticipant,fromatendencytounderestimatethelikelihoodthatcertainactionscouldsucceed,lead-

ingevaluatorstopotentiallyoverlookthefullextentofwhatamaliciousactormightattemptandachieve.Moreover,thisdifficultysetsthestageforaddress-

inganevenmoreformidablechallenge:developing

countermeasuresagainstdual-usethreats.These

threatsembodytheinherentdifficultyindifferen-

tiatingbetweenchem-bioknowledgethatishelpfulorbenignandthatwhichcanbemisusedorposea

significantthreat,suchasdesigningorreconstitut-

ingpathogensmoresevereanddeadlythanthose

foundinnature.Addressingthesethreats,which

mightinvolvemaskingmaliciousobjectivesbehindseeminglybenignactions,necessitatesabroader

approachtomitigation—includingsuchmeasuresasknow-your-customerrules—thanmodel-levelinter-ventions.Buildingonthiscomplexity,sessionpartici-pantsunderscoredthecriticalneedtoevaluateboth

domain-specificmodels(e.g.,biologicaldesigntools)andgeneral-purposefoundationmodelsmeticulously.Theemphasisondomain-specificmodelsstemsfromtheuniqueriskprofileassociatedwiththeirmisuse.

NeedsandPrioritiesforNextRoundofModelEvaluations

AccesstoModelsandEvaluationTools

Thenextsessionunderscoredthecrucialneedfor

independentresearcherstohaveaccesstobothpro-prietary(closed-source)modelsandrobustevaluationtools.Thisaccesscouldtakevariousforms,suchas

black-boxtesting,white-boxtesting,fine-tuning,or

supportfrommodeldeveloperstofacilitateresearcher

Participantshighlightedtheimportance

ofestablishing

mechanismsforgreatervisibilityintothemodeldevelopmentphase—particularlyformodelsplannedtobeopen-

sourced.

interaction.1Currently,legalandcontractualframe-worksgoverningdatasharingandtheconductof

evaluationsaresignificantbarrierstosuchaccess;

nondisclosureagreementscreateopacityoverstudydesigns,evaluationsalreadyperformedonmodels,

andevaluationoutputs.Thislackoftransparency

inhibitstheabilityoftheresearchcommunityto

thoroughlyassessmodelcapabilitiesandpotential

risks.Tobridgethisgap,participantshighlighted

theimportanceofestablishingmechanismsfor

greatervisibilityintothemodeldevelopmentphase—particularlyformodelsplannedtobeopen-sourcedbecauseoftheinabilitytocontroltheirdiffusiononcedeployed.Theformationofaconsortiumoranotherindependentbodycanplayacrucialroleincoordinat-ingoneachofthesechallenges.Suchanentitycouldfacilitatediscussions,mediateamongstakeholders,

andhelpclarifythelegalandcontractualaspectsofrunningdangerouscapabilityevaluations,therebystreamliningtheprocessforallinvolved.

IdentifyingRisks

AcorecomponentofthissessionwasdedicatedtoidentifyingandcategorizingAI-enabledchem-bio

1FormoreinformationonvariousformsofaccessandtheirimplicationsforAIaudits,seeCasperetal.,2024.

4

risks.Participantsdelineatedtwoprimary“buckets”

ofrisk:universallyacknowledgedrisksandrisksthatemergecontingentonthelevelofaccessandapplica-tion.Forinstance,amodelthatcouldassistindevel-

opingnuclearweaponswouldfallunderthecategoryofuniversallyacknowledgedrisks,buttherisks

associatedwithAI-assistedscreeningforsubstance

toxicitydependonwhohasaccesstomodelsandhowtheyareapplied.Thisdistinctionunderscoresthe

complexityofdefiningandmitigatingrisksinafieldinwhichthedual-usenatureoftechnologiescanblurthelinesbetweenbeneficialandharmfulapplications.TheconversationalsorevolvedaroundthepotentialofAImodelstoexpandaccesstochem-bioinformation,lowerbarrierstounderstanding,andfosterthegen-

erationofnewhypothesesandknowledge.Althoughthesecapabilitiesoffervaluetoscientificadvance-

ment,theyalsoraiseconcernsabouthowinformationunearthedbyfrontiermodelscouldbemisused.

WetLabandLabAutomationEvaluations

ConcernsAboutWetLabValidationofEvaluationOutputs

Inthethirdsessionofthetrack,participantsdelvedintotheintricaciesofwetlabvalidationofmodelout-puts,whichinvolvesverifyingtheefficacyofpoten-tiallyharmfulcompoundsdesignedbymodels.Cen-

Participantsdelineatedtwoprimary“buckets”ofrisk:universally

acknowledgedrisksandrisksthatemergecontingentonthe

levelofaccessandapplication.

traltothediscussionswereconcernsaboutthefine

linebetweenenhancingtheunderstandingofmodelcapabilitiesandthepotentialmisinterpretationof

suchvalidationeffortsasstepstowardcreatingharm-fulsubstances.Moreover,thesessionhighlighted

apprehensionsregardingthevalidationofmodels’

potentialtofacilitatechem-biothreats,emphasizingthedilemmathatdisseminationofsuchevaluative

resultscouldinadvertentlyarmmaliciousactorswithharmfulknowledge.

Track1ProposedActions

EnhanceEvaluationMethodstoCaptureComplexThreatDynamics

Toeffectivelytacklethecomplexityofchem-bio

threats,itisessentialtoadvancebeyondconventionalevaluationmethodsandembracedynamicandinter-activeassessmenttechniques,suchassimulation-

basedtools.Thesemethods,includingbutnotlim-itedtotabletopexercises,arecrucialforcapturingthenuancedbehaviorsofthreatactorswhocontinu-allyadapttheirstrategies.Bysimulatingdiverse

real-worldscenarios,theseevaluationtoolsprovideadeeperunderstandingoftheevolvingnatureofthreats,therebyenablingthedevelopmentofmoreeffectiveandresilientmitigations.

AddressLegalandEthicalConcernsofWetLabEvaluations

Formulatingstandardsforperformingwetlabevalu-ationswouldmitigatetheconcernsidentifiedin

thevalidationprocessdiscussions.Specifically,thispolicyactiontacklesthechallengesofnavigatingthelegalandethicallandscapeofchem-bioresearch,ashighlightedbythepotentialformisinterpretationofvalidationeffortsandthedisseminationofsensitiveevaluationfindings.

EnsureAdequateandAppropriateAccesstoModelsandEvaluationTools

The“NeedsandPrioritiesforNextRoundofModelEvaluations”sessionhighlightedtheneedforinde-pendentresearcherstoaccessbothproprietaryandopen-sourcemodels,aswellasrobustevaluation

5

tools,toeffectivelyassessthecapabilitiesandrisksassociatedwithfrontiermodels.

However,theopacityinlegalandcontractual

frameworks,suchasnondisclosureagreements,cur-rentlyhindersindependentresearchbyobscuring

crucialaspectsofevaluations.Toaddressthesebar-riers,effortsareneededtoestablishmechanismsthatprovidegreatertransparencyandfacilitatediscus-sionsamongstakeholders.

Track2:LossofControl

Thistrackfocusedonevaluatingmodelsforcapabili-tiesthatexceedtheboundariesofdeveloperoruserintent—andfeaturedpresentationsbyevaluation

organizationsMETRandApolloResearch.Thissec-tionexploresidentifyingandmitigatingrisksassoci-atedwithmodelautonomyanddeception—agrowingconcernasmodelsactincreasinglyindependently.

Autonomy(METR)

Inthissession,METR,formerlyARCEvals,pre-sentedtheirresearchonautonomyevaluations,fol-lowedbyanopendiscussion.

FocusonAutonomousThreats

UnlikethreatmodelscenteredonAIaugmentation

ofmaliciousactors,METRassessesrisksposedbyAIactingindependentlytoexecutepotentiallyharm-

fulactions,suchasconductingphishingattacksor

manipulatingdigitalinfrastructure.Importantly,

METRargued,thisdoesnotrequireanAItohave

harmfulgoals—amaliciousactormightinitiateharmbypromptingtheAItocarryouttasksautonomously.METRsuggestedthatalthoughthefullscopeofauton-omousAIactions’remainstobeseen,thereisacriticalneedforpreparednessandvigilancetoanticipatethesedevelopments.ThissessiondemonstratedexamplesoftasksthatMETRdevelopedtotestforrelevantautono-mouscapabilities—includingtheimplementationofmachinelearningresearch;improvementstoAIagentscaffolding;andthemanagementoflarge,complicatedcodebases.Furthermore,METRshowcasedanexam-pleofamodelcapability:a“capturetheflag”taskthat

Amaliciousactor

mightinitiateharmbypromptingtheAItocarryouttasksautonomously.

requiredreverse-engineeringCcode,whichotherwiserequiresadomainexpert30minutestocomplete.

EvaluationMethodologies

METRpresenteditsmethodologyforevaluatingAIsystems,whichfocusesonspecificity,objectivity,

andcost-effectiveness.Thesessioncoveredtasks,

methodologicalguidance,andevaluationchecks

aimedatsupportingtestvalidity.Tasks,structuredaroundMETR’sTaskStandard,measureanAIsys-tem’sabilitytoperformactivitiesrelevanttothreatmodels.TheTaskStandard,accordingtoMETR,

aimstoincreaseuniformity,scalability,andrepro-ducibilityanddecreaseduplicationofwork.METR’smethodologicalguidanceemphasizedtheimpor-

tanceofelicitingcapabilityandremovingspuriousfailures,suchasthosecausedbyethicalrefusalsortoolinglimitations(seethe“UnlockingAICapabili-ties”section).Evaluationchecks,includingreview-ingoutputsforspuriousfailures,werealsoproposedbyMETRtoensuretheintegrityoftestresultsandavoidunderestimatingagentcapabilities.Altogether,METRshared,thisevaluationmethodologyseeks

tocontinuouslymeasuredangerousmodelcapabili-ties,allowingforthedevelopmentofscalinglawsandappropriatemitigations.

UnlockingAICapabilities

Regardingunlockingmodelcapabilities,participantsrecognizedthedifficultyofguaranteeingthatAIsys-temslackspecificcapabilities.METRhighlightedtheimportanceofidentifyingspuriousfailures,inwhichmodelsfailnotbecauseofcapability,butbecauseofbugs,limitationsoftooling,poorprompting,ormodel

6

AsAImodelsgrowinsophistication,ApolloResearchbelieves

thattheircapacityfordeceptionmightalsogrow.

refusalsonthebasisofethicsorclaimedinability.Fur-therpost-trainingenhancements,METRsuggested,mightremovetheselimitations,especiallyinthreat

scenariosinwhichweightsarestolenoropen-sourced,thusinvalidatinginitialcapabilityevaluations.METRsuggestedaddingasafetybuffertothresholdsfor

policyactionasonewaytoaddressthisrisk.

Deception(ApolloResearch)

Inthissession,ApolloResearchpresentedits

researchonstrategicdeceptionevaluations,followedbyanopendiscussion.

TheAmplifyingRiskofDeception

ApolloResearch’spresentationonAIdeception

beganwitharticulatinghowdeceptionisarisk

amplifier,includingformisuseandchemical,bio-logical,radiological,andnuclearrisks,through

obfuscation—impedinggovernance,oversight,andtransparency.ApolloResearcharguedthatdeceptivebehaviorscouldemergefromtwoprimarysources:

intentionalmaliciousactsbybadactors(e.g.,scams)orunintendedoutcomesofcomplexAIsystems(e.g.,misalignment).Furthermore,asAImodelsgrowin

sophistication,ApolloResearchbelievesthattheir

capacityfordeceptionmightalsogrow,furtherham-peringeffortstoidentifyandcounteractAIdeception.

EvaluatingforDeception

ApolloResearch’sdemonstrationsprovidedan

existenceproofforAItoengageindeception.First,

a“researchassistant”GPTpresenteddeception

resultingfromAImisuse.Here,amodeldeployed

tosupportgovernmentofficialswasfine-tunedto

gathersensitiveinformationillicitlyanddeceiveitsusers.Theseconddemonstrationpresenteddecep-tionasaresultofunintendedoutcomesofcomplexAIsystems,withnofine-tuningorsteering.Here,

aninsider-tradingdemodepictedanAImodel

operatingasastocktradingagent,which,under

specificscenarios,resortedtoinsidertradingand

subsequentlyconcealeditsactivities—insomecasesdoublingdownonitsdeception(Scheurer,Balesni,andHobbhahn,2023).Thesedemonstrations,ApolloResearchargued,underscoredtheimportanceof

employingacomprehensivesuiteoftoolsforelicitinganddetectingdeceptivebehaviorsinAImodels—

includingdirectelicitation,honeypots,andinterpret-abilitymethods(Casperetal.,2024).ApolloResearchadvocatedfora“ScienceofEvals”toincreasethe

robustnessofmethodsfordetectingAIdeception—callingfordrawingfromliteratureandexpertiseonothersuchAIrisksasfairnessandbias,aswellasassuranceecosystemsforotherhigh-stakesystems(e.g.,commercialaviation)toaddresstheethical,technical,andoperationalchallengesinaneffectiveAIassuranceecosystem(ApolloResearch,2024).

Track2ProposedActions

RefineEvaluationTechniquesforDetectingDeceptioninAISystems

AfterApolloResearch’spresentationonearlyfind-ingsofAIdeception,sessionparticipantsdiscussedthenecessitytorefineandstrengthenevaluation

methodsfordetectingdeceptiveAIsystems.This

session’sdemonstrationsofAIdeception,Apollo

Researchargued,highlighttheneedforawidevari-etyoftechnicalstrategiestouncoverandunderstandmodeldeception.AccordingtoApolloResearch,

proposedframeworksforevaluationsconsideringtheinnerworkingsofmodels,enabledbygreateraccesstomodels,wouldprovidemore-thoroughinsightintothecausaldriversofmodelbehavior.This,Apollo

Researchbelieves,wouldhelptechniquesfordetec-tionkeeppacewithAI’sevolvingcomplexityandcapabilitiesforobfuscation.

7

Track3:Risk-AgnosticMethods

Thistracksetouttoproducearisk-agnosticmethod-ologicalframeworkandstrengthentherobustnessofmodelevaluations.Thefirstsessionexploredevalu-ationmethods—suchasredteaming,automated

benchmarking,andsophisticatedtaskdesign.Thesessionthatfollowedsharedbestpracticesforevalu-atingmodelcapabilities.

EvaluationMethods:FromRed

TeamingtoAutomatedBenchmarking

FrameworkforEvaluation

Intheevolvinglandscapeofmodeldevelopment,

robustevaluationmethodsareparamounttoassess

thepotentiallydangerouscapabilitiesoffrontier

models.ARANDfacilitatorpresentedadrafttable

breakingdownprominentmodelevaluationmethods(e.g.,redteaming,multiplechoicebenchmarking)bykeyattributes(e.g.,repeatability,depth,generalizabil-ity)tostimulatediscussion.TableA.1(seetheappen-dix)presentsmyinterpretationofanewrisk-agnosticframeworkofevaluationmethodologies,structuredtoremainagnostictothreats,basedonthediscus-

sioninthissession.Thisapproachacknowledgestheimportanceofwell-craftedthreatmodelsformodel

evaluation;however,theprocessofdetermininga

threatmodelfellbeyondthepurviewofthissession.Thegroup’sfocus,instead,wasonmappingthespaceofmethodologiesthatcanbegenerallyapplicable

acrossrisksorthreattypes.Bydissectingevaluationmethodologiesintomanageabledimensions,theses-sionaimedtofacilitateagreaterunderstandingofthestrengthsandlimitationsofdifferentevaluationmeth-odsforestablishingevidenceofmodelcapabilities.

EnhancingAccesstoAIEvaluationToolsWhilePreservingIntegrity

Thesessionalsoaddressedchallengesinbroaden-ingaccesstoevaluationtools(e.g.,open-sourcingtofosterinnovationandcommunityscrutiny)whilemaintainingevaluationintegrity,robustness,andvalidity.Theconversationunderscoredtheimpor-tanceofhold-outtasksthatremainunknownto

modelsuntilthetimeofevaluationandofexploring

theuseofcryptographichashingtosafeguardthe

integrityofevaluationdata.Suchapproachesasthesearecriticaltopreventtheincorporationofevalua-

tiondataintothetrainingdatasetsofthemodelstheyareintendedtoevaluate,whichcouldcompromise

therobustnessoffutureevaluations.Ensuringthatevaluationsdonotbecomepartofmodeltraining

datanecessitatesmeticulousconsiderationofbest

practicesforpublishingandsharingevaluations,

strikingadelicatebalancethatfosterstransparencyandmaintainstheintegrityandrobustnessofmodelevaluation.

EnsuringEvaluationRobustnessandValidity(METR)

KeyQuestionsforEvaluationRobustness

Fordevelopingmodelevaluations,METRhigh-

lightedaseriesofquestionsthatsupportevaluationrobustness.Aselectionofthesequestionsservesasafoundationforconstructingeffectiveevaluationswhileacknowledgingmethodologicallimitations.

•Whatistheevaluationmeasuring?Ifan

evaluationisbeingusedtoinformaparticu-laraction,thenitneedstobefaithfultothe

relevantthreatmodel.Forexample,anevalua-tiontoassesswhetheramodelissafetoopen-sourceneedstoassessthesafetyofthemodelunderfine-tuning.

Ensuringthat

evaluationsdonot

becomepartof

modeltrainingdata

necessitatesmeticulousconsiderationofbest

practicesforpublishingandsharingevaluations.

8

•Doestheevaluationscalewell?Howisit

intendedtoberun?Anevaluationmightbevalidonlyundercertainassumptions—forexample,thatthedatasethasnotappearedinamodel’spre-trainingcorpusorthemodelhascertaintoolsavailable.

•Whatarethewarningsignsofaproblem

withtheevaluation?Recognizingwarning

signsofpotentialevaluationissuesandunder-standingthecausesofmisleadinglyhighorlowscoresarecrucialformaintainingevalua-tionintegrityandaccuracy.

ModelVersusSystemEvaluation

Sessionattendeesalsoaddressedthedistinction

betweenevaluatingstand-alonemodelsthatcould

beusedordeployedadversariallyandevaluatingthebroadersystemsinwhichthesemodelsareembeddedwhendeployed.Dependingonthethreatmodelbeingconsidered,participantssuggestedthateitherofthesepossibilitiesmightbeappropriate,butitisimportanttoevaluatethoughtfully.

Track3ProposedActions

DevelopaFrameworkforRobustEvaluation

Discussionsonrisk-agnosticmethodologicalframe-worksemphasizedtheimportanceofholistically

exploringthespaceofevaluationmethodstounder-standpotentialharmsandhighlightedaneedfora

Sessionparticipants

suggestedusing

suchstrategiesas

employinghold-out

tasksandcryptographictechniquestosafeguardevaluationintegrity.

frameworkforrobustevaluation.Thesession’sfocusoncraftingamethodologicalframeworkthattran-scendsspecificthreatslaysthegroundworkforthedevelopmentofsuchaframework.

ImplementMeasurestoGuaranteethe

RobustnessandValidityofAIEvaluations

Thechallengesidentifiedwithdemocratizingmodelevaluationwhilepreservingevaluationintegrity

informthepolicyactiontoimplementmeasurestoguaranteetherobustnessandvalidityofAIevalua-tions.Sessionparticipantssuggestedusingsuchstrat-egiesasemployinghold-outtasksandcryptographictechniquestosafeguardevaluationintegrity.Theseapproachesaimtopreventthecontaminationof

modeltrainingdatasetswithevaluationdata,ensur-ingthefuturerobustnessofevaluations.

DevelopandDisseminateBestPractices

InformedbyMETR’skeyquestionsforensuring

evaluationrobustness,thisactionseekstoestablishandsharebestpracticesforeffectiveevaluations.Byclarifyingwhatevaluationsmeasure,ensuringscal-ability,andidentifyingwarningsignsofevaluationproblems,thisinitiativeaimstoequipstakeholderswiththeknowledgetodesign,select,andinterpretevaluationseffectively.

Track4:CollaborationandCoordination

Thissectionsummarizesdiscussionsheldduring

thecollaborationandcoordinationtrack,which

broughtAIresearchanddevelopmentexpertsintoconversationwithpolicyresearchersandprofession-als.Thesediscussionscoveredkeyupcomingpolicytimelines,processesfordiscerningriskthresholds,andframeworksforresponsiblymanagingfuture

modelcapabilities.

PolicyTimelinesandDeliverables

Inthissession,stakeholdersdiscussedthescopeandstatusofseveralkeyAIpolicymilestonesin2024,includingdeliverablesassignedbythefall2023exec-utiveorderonAI(Biden,2023).Participantsbroadly

9

agreedthat2024milestonesshouldbeonlyonestepinongoingAIgovernanceeffortsandthatsignificantfutureworkandcoordinationwouldberequired.

HighhopeswereexpressedthattheNationalInsti-tutesofStandardsandTechnologyAISafetyInstituteConsortiumwouldfacilitatethiscontinuedengage-ment,andparticipantshighlightedseveralopen

questionsthatshouldbeaddressedinfuturework:

•Howshouldthetimingandvenueforpub-

lishingevaluationresultsbedetermined,andwhatinformationshouldbeincludedtostrikeanappropriatebalancebetweentransparencyandresponsibledisclosure?

•Whatactivities(e.g.,evaluations,standards)arebestsuitedforgovernmentinvolvement?

•Whatevaluationmethodsaremosteffectiveatelicitingharmfulmodelcapabilities?

•Whenshouldevaluationsbeexecuted?How

caninitial,lesscostly,assessmentsreliably

indicatetheneedformore-comprehensiveandresource-intensiveevaluations?

•Whatsystems,mechanisms,orincentives

canbeimplementedtoensurethatevaluatorsthoroughlyexploreandrevealthefullspec-trumofAIcapabilities?

•Howcantheevaluationinsightscontributetoarisk-benefitanalysisforAIsystems?

RedLinesandRiskThresholds

ConceptualizingRiskandMitigation

Thissessionunderscoredtheimportanceofpreemp-tiveriskmanagementformodeldevelopment,giventhedifficultyofpredictingfuturemodelcapability.Ithighlightedthedual-useandgeneralnatureoffron-tiermodels,which,althoughdesignedforbeneficialpurposes,canbeusedbymaliciousactorstocause

signif

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论