




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ConferenceProceedings
ANTONSHENK
EvaluatingArtificial
IntelligenceforNational
SecurityandPublicSafety
InsightsfromFrontierModelEvaluation
ScienceDay
2
F
rontierModelEvaluationScienceDayassem-bledmorethan100leadingexpertsinartifi-cialintelligence(AI),nationalsecurity,andpolicytoaddresstheemergingchallengesofevaluatingthreatsfromadvancedAIsystems.Theday’sagendawasstructuredaroundfourtracks,eachfocusingonauniqueaspectofAIevaluationscienceandpolicy.Thesetracksweredevelopedtoaddressfundamentalissuesinthefieldwhilekeepingthe
meetingagendaandinvitationlistmanageable.Themeeting’sfocusonevaluationmethodologyprovidedaspecializedforumforin-depthdiscussion,distin-guishingitfrombroaderAIsecuritytopicscoveredinothervenues.Thefourtrackswereasfollows:
•Thechemistryandbiologytrackfocused
ontheintersectionofAIwithchemicaland
biologicalrisks.Thistrackutilizedinsightsfrompreviousevaluationsofgeneral-purposeanddomain-specificAImodelsandaimedtoidentifycurrentandfutureevaluationneeds,includingintegratingwetlabvalidationandautomatedlabprocesses.
•ThelossofcontroltrackexploredscenariosinwhichAIsystemscouldoperatebeyondtheintendedboundariessetbytheirdevelop-ersorusers—includingAIsystemsdeceiv-inghumansoractingautonomously.ThesediscussionsaimedtoidentifyearlywarningsignsandexplorestrategiestopreventlossofcontrolofAIsystems.
•Therisk-agnosticmethodstracksought
tooutlinecomprehensiveanduniversal
approachestoevaluatingAImodels,spanningsuchtopicsasredteaming,automatedbench-marking,andtaskdesign.ItsobjectivewastoforgeaversatileframeworkforassessingAIsystems’capabilities,applicableacrossvariedriskscenarios,toensurethatevaluationsare
consistentlyrigorousandattheforefrontofthescience.
•Thecollaborationandcoordinationtrack
aimedtoconnectstakeholdersingovernment,industry,andcivilsocietytodevelopasharedunderstandingoftheobjectivesofevaluationscience.Discussionsinthistrackcenteredonestablishingkeypolicytimelinesanddeliver-
ables,thresholdsfordangerousAIcapabilities,andvoluntaryriskmanagementpoliciesfor
scalingAIcapabilities.
Theworkshopproceedingssynthesizeinsightsfromthesesessions,outlinethecomplexitiesofeval-uatingAIfordangerouscapabilities,andhighlightthecollaborativeeffortrequiredtoformulateeffec-tivepolicy.
Track1:ChemistryandBiology
Thechemistryandbiology(chem-bio)trackillu-
minatedtheintersectionofAIwithchem-biorisks,incorporatinginsightsfromevaluationsofgeneral-purposeanddomain-specificmodels.Thissectiondetailslessonslearnedfromcompletedmodelevalu-ations,needsandprioritiesforsubsequentroundsofevaluations,andconsiderationsforwetlabvalidationofmodeloutputs.
LessonsLearnedfromCompletedModelEvaluations
EmbracingComplexityinChem-BioModelAssessments
Thissessionhighlightedthepersistenceofthreat
actorsandthecomplexevolutionofchem-biothreats.Duringthediscussion,oneparticipantobserveda
potentiallimitationofexistingevaluationmeth-
ods,suggestingthatmarkinganentiretaskasfailedbecauseofearlysetbacksmightnotfullycapturetheresilienceandadaptabilityofthreatactors.Thiscri-tiquepositsthatamorenuancedapproachaccountingforthreatprogressionandtroubleshooting—suchasknowingtheproportionofsub-stepsthatsucceed—couldprovideamorecomprehensiveandcontinu-
ousunderstandingofthethreatlandscape.Tabletopexerciseswereproposedtoexplorethedynamicsoftroubleshootinganditerationfurther;however,theireffectivenessinthiscontextremainstobetested.
NavigatingtheComplexitiesofDual-UseDangersandDomain-SpecificModels
Inthissession,participantsnotedthatformulat-ingconcrete,detailedthreatmodelsiscrucialfor
3
understandingthelandscapeofpotentialAI-enabledchem-biothreatsandtheactorsbehindthem.This
processmightinvolveexploringnewcapabilitiesthatmaliciousactorspreviouslydidnothaveaccessto
oracceleratingandsimplifyingexistingprocesses,makingthemmoreaccessibletoabroadervarietyofactors.Oncethreatmodelsareidentifiedandtheappropriateevaluationundertaken,anotablechal-lengestillremains,asidentifiedbyoneparticipant:
thedifficultyforevaluatorstoaccuratelyembody
maliciousactors.AlthoughnotuniquetoAI/chem-
bioredteaming,thischallengearises,accordingto
theparticipant,fromatendencytounderestimatethelikelihoodthatcertainactionscouldsucceed,lead-
ingevaluatorstopotentiallyoverlookthefullextentofwhatamaliciousactormightattemptandachieve.Moreover,thisdifficultysetsthestageforaddress-
inganevenmoreformidablechallenge:developing
countermeasuresagainstdual-usethreats.These
threatsembodytheinherentdifficultyindifferen-
tiatingbetweenchem-bioknowledgethatishelpfulorbenignandthatwhichcanbemisusedorposea
significantthreat,suchasdesigningorreconstitut-
ingpathogensmoresevereanddeadlythanthose
foundinnature.Addressingthesethreats,which
mightinvolvemaskingmaliciousobjectivesbehindseeminglybenignactions,necessitatesabroader
approachtomitigation—includingsuchmeasuresasknow-your-customerrules—thanmodel-levelinter-ventions.Buildingonthiscomplexity,sessionpartici-pantsunderscoredthecriticalneedtoevaluateboth
domain-specificmodels(e.g.,biologicaldesigntools)andgeneral-purposefoundationmodelsmeticulously.Theemphasisondomain-specificmodelsstemsfromtheuniqueriskprofileassociatedwiththeirmisuse.
NeedsandPrioritiesforNextRoundofModelEvaluations
AccesstoModelsandEvaluationTools
Thenextsessionunderscoredthecrucialneedfor
independentresearcherstohaveaccesstobothpro-prietary(closed-source)modelsandrobustevaluationtools.Thisaccesscouldtakevariousforms,suchas
black-boxtesting,white-boxtesting,fine-tuning,or
supportfrommodeldeveloperstofacilitateresearcher
Participantshighlightedtheimportance
ofestablishing
mechanismsforgreatervisibilityintothemodeldevelopmentphase—particularlyformodelsplannedtobeopen-
sourced.
interaction.1Currently,legalandcontractualframe-worksgoverningdatasharingandtheconductof
evaluationsaresignificantbarrierstosuchaccess;
nondisclosureagreementscreateopacityoverstudydesigns,evaluationsalreadyperformedonmodels,
andevaluationoutputs.Thislackoftransparency
inhibitstheabilityoftheresearchcommunityto
thoroughlyassessmodelcapabilitiesandpotential
risks.Tobridgethisgap,participantshighlighted
theimportanceofestablishingmechanismsfor
greatervisibilityintothemodeldevelopmentphase—particularlyformodelsplannedtobeopen-sourcedbecauseoftheinabilitytocontroltheirdiffusiononcedeployed.Theformationofaconsortiumoranotherindependentbodycanplayacrucialroleincoordinat-ingoneachofthesechallenges.Suchanentitycouldfacilitatediscussions,mediateamongstakeholders,
andhelpclarifythelegalandcontractualaspectsofrunningdangerouscapabilityevaluations,therebystreamliningtheprocessforallinvolved.
IdentifyingRisks
AcorecomponentofthissessionwasdedicatedtoidentifyingandcategorizingAI-enabledchem-bio
1FormoreinformationonvariousformsofaccessandtheirimplicationsforAIaudits,seeCasperetal.,2024.
4
risks.Participantsdelineatedtwoprimary“buckets”
ofrisk:universallyacknowledgedrisksandrisksthatemergecontingentonthelevelofaccessandapplica-tion.Forinstance,amodelthatcouldassistindevel-
opingnuclearweaponswouldfallunderthecategoryofuniversallyacknowledgedrisks,buttherisks
associatedwithAI-assistedscreeningforsubstance
toxicitydependonwhohasaccesstomodelsandhowtheyareapplied.Thisdistinctionunderscoresthe
complexityofdefiningandmitigatingrisksinafieldinwhichthedual-usenatureoftechnologiescanblurthelinesbetweenbeneficialandharmfulapplications.TheconversationalsorevolvedaroundthepotentialofAImodelstoexpandaccesstochem-bioinformation,lowerbarrierstounderstanding,andfosterthegen-
erationofnewhypothesesandknowledge.Althoughthesecapabilitiesoffervaluetoscientificadvance-
ment,theyalsoraiseconcernsabouthowinformationunearthedbyfrontiermodelscouldbemisused.
WetLabandLabAutomationEvaluations
ConcernsAboutWetLabValidationofEvaluationOutputs
Inthethirdsessionofthetrack,participantsdelvedintotheintricaciesofwetlabvalidationofmodelout-puts,whichinvolvesverifyingtheefficacyofpoten-tiallyharmfulcompoundsdesignedbymodels.Cen-
Participantsdelineatedtwoprimary“buckets”ofrisk:universally
acknowledgedrisksandrisksthatemergecontingentonthe
levelofaccessandapplication.
traltothediscussionswereconcernsaboutthefine
linebetweenenhancingtheunderstandingofmodelcapabilitiesandthepotentialmisinterpretationof
suchvalidationeffortsasstepstowardcreatingharm-fulsubstances.Moreover,thesessionhighlighted
apprehensionsregardingthevalidationofmodels’
potentialtofacilitatechem-biothreats,emphasizingthedilemmathatdisseminationofsuchevaluative
resultscouldinadvertentlyarmmaliciousactorswithharmfulknowledge.
Track1ProposedActions
EnhanceEvaluationMethodstoCaptureComplexThreatDynamics
Toeffectivelytacklethecomplexityofchem-bio
threats,itisessentialtoadvancebeyondconventionalevaluationmethodsandembracedynamicandinter-activeassessmenttechniques,suchassimulation-
basedtools.Thesemethods,includingbutnotlim-itedtotabletopexercises,arecrucialforcapturingthenuancedbehaviorsofthreatactorswhocontinu-allyadapttheirstrategies.Bysimulatingdiverse
real-worldscenarios,theseevaluationtoolsprovideadeeperunderstandingoftheevolvingnatureofthreats,therebyenablingthedevelopmentofmoreeffectiveandresilientmitigations.
AddressLegalandEthicalConcernsofWetLabEvaluations
Formulatingstandardsforperformingwetlabevalu-ationswouldmitigatetheconcernsidentifiedin
thevalidationprocessdiscussions.Specifically,thispolicyactiontacklesthechallengesofnavigatingthelegalandethicallandscapeofchem-bioresearch,ashighlightedbythepotentialformisinterpretationofvalidationeffortsandthedisseminationofsensitiveevaluationfindings.
EnsureAdequateandAppropriateAccesstoModelsandEvaluationTools
The“NeedsandPrioritiesforNextRoundofModelEvaluations”sessionhighlightedtheneedforinde-pendentresearcherstoaccessbothproprietaryandopen-sourcemodels,aswellasrobustevaluation
5
tools,toeffectivelyassessthecapabilitiesandrisksassociatedwithfrontiermodels.
However,theopacityinlegalandcontractual
frameworks,suchasnondisclosureagreements,cur-rentlyhindersindependentresearchbyobscuring
crucialaspectsofevaluations.Toaddressthesebar-riers,effortsareneededtoestablishmechanismsthatprovidegreatertransparencyandfacilitatediscus-sionsamongstakeholders.
Track2:LossofControl
Thistrackfocusedonevaluatingmodelsforcapabili-tiesthatexceedtheboundariesofdeveloperoruserintent—andfeaturedpresentationsbyevaluation
organizationsMETRandApolloResearch.Thissec-tionexploresidentifyingandmitigatingrisksassoci-atedwithmodelautonomyanddeception—agrowingconcernasmodelsactincreasinglyindependently.
Autonomy(METR)
Inthissession,METR,formerlyARCEvals,pre-sentedtheirresearchonautonomyevaluations,fol-lowedbyanopendiscussion.
FocusonAutonomousThreats
UnlikethreatmodelscenteredonAIaugmentation
ofmaliciousactors,METRassessesrisksposedbyAIactingindependentlytoexecutepotentiallyharm-
fulactions,suchasconductingphishingattacksor
manipulatingdigitalinfrastructure.Importantly,
METRargued,thisdoesnotrequireanAItohave
harmfulgoals—amaliciousactormightinitiateharmbypromptingtheAItocarryouttasksautonomously.METRsuggestedthatalthoughthefullscopeofauton-omousAIactions’remainstobeseen,thereisacriticalneedforpreparednessandvigilancetoanticipatethesedevelopments.ThissessiondemonstratedexamplesoftasksthatMETRdevelopedtotestforrelevantautono-mouscapabilities—includingtheimplementationofmachinelearningresearch;improvementstoAIagentscaffolding;andthemanagementoflarge,complicatedcodebases.Furthermore,METRshowcasedanexam-pleofamodelcapability:a“capturetheflag”taskthat
Amaliciousactor
mightinitiateharmbypromptingtheAItocarryouttasksautonomously.
requiredreverse-engineeringCcode,whichotherwiserequiresadomainexpert30minutestocomplete.
EvaluationMethodologies
METRpresenteditsmethodologyforevaluatingAIsystems,whichfocusesonspecificity,objectivity,
andcost-effectiveness.Thesessioncoveredtasks,
methodologicalguidance,andevaluationchecks
aimedatsupportingtestvalidity.Tasks,structuredaroundMETR’sTaskStandard,measureanAIsys-tem’sabilitytoperformactivitiesrelevanttothreatmodels.TheTaskStandard,accordingtoMETR,
aimstoincreaseuniformity,scalability,andrepro-ducibilityanddecreaseduplicationofwork.METR’smethodologicalguidanceemphasizedtheimpor-
tanceofelicitingcapabilityandremovingspuriousfailures,suchasthosecausedbyethicalrefusalsortoolinglimitations(seethe“UnlockingAICapabili-ties”section).Evaluationchecks,includingreview-ingoutputsforspuriousfailures,werealsoproposedbyMETRtoensuretheintegrityoftestresultsandavoidunderestimatingagentcapabilities.Altogether,METRshared,thisevaluationmethodologyseeks
tocontinuouslymeasuredangerousmodelcapabili-ties,allowingforthedevelopmentofscalinglawsandappropriatemitigations.
UnlockingAICapabilities
Regardingunlockingmodelcapabilities,participantsrecognizedthedifficultyofguaranteeingthatAIsys-temslackspecificcapabilities.METRhighlightedtheimportanceofidentifyingspuriousfailures,inwhichmodelsfailnotbecauseofcapability,butbecauseofbugs,limitationsoftooling,poorprompting,ormodel
6
AsAImodelsgrowinsophistication,ApolloResearchbelieves
thattheircapacityfordeceptionmightalsogrow.
refusalsonthebasisofethicsorclaimedinability.Fur-therpost-trainingenhancements,METRsuggested,mightremovetheselimitations,especiallyinthreat
scenariosinwhichweightsarestolenoropen-sourced,thusinvalidatinginitialcapabilityevaluations.METRsuggestedaddingasafetybuffertothresholdsfor
policyactionasonewaytoaddressthisrisk.
Deception(ApolloResearch)
Inthissession,ApolloResearchpresentedits
researchonstrategicdeceptionevaluations,followedbyanopendiscussion.
TheAmplifyingRiskofDeception
ApolloResearch’spresentationonAIdeception
beganwitharticulatinghowdeceptionisarisk
amplifier,includingformisuseandchemical,bio-logical,radiological,andnuclearrisks,through
obfuscation—impedinggovernance,oversight,andtransparency.ApolloResearcharguedthatdeceptivebehaviorscouldemergefromtwoprimarysources:
intentionalmaliciousactsbybadactors(e.g.,scams)orunintendedoutcomesofcomplexAIsystems(e.g.,misalignment).Furthermore,asAImodelsgrowin
sophistication,ApolloResearchbelievesthattheir
capacityfordeceptionmightalsogrow,furtherham-peringeffortstoidentifyandcounteractAIdeception.
EvaluatingforDeception
ApolloResearch’sdemonstrationsprovidedan
existenceproofforAItoengageindeception.First,
a“researchassistant”GPTpresenteddeception
resultingfromAImisuse.Here,amodeldeployed
tosupportgovernmentofficialswasfine-tunedto
gathersensitiveinformationillicitlyanddeceiveitsusers.Theseconddemonstrationpresenteddecep-tionasaresultofunintendedoutcomesofcomplexAIsystems,withnofine-tuningorsteering.Here,
aninsider-tradingdemodepictedanAImodel
operatingasastocktradingagent,which,under
specificscenarios,resortedtoinsidertradingand
subsequentlyconcealeditsactivities—insomecasesdoublingdownonitsdeception(Scheurer,Balesni,andHobbhahn,2023).Thesedemonstrations,ApolloResearchargued,underscoredtheimportanceof
employingacomprehensivesuiteoftoolsforelicitinganddetectingdeceptivebehaviorsinAImodels—
includingdirectelicitation,honeypots,andinterpret-abilitymethods(Casperetal.,2024).ApolloResearchadvocatedfora“ScienceofEvals”toincreasethe
robustnessofmethodsfordetectingAIdeception—callingfordrawingfromliteratureandexpertiseonothersuchAIrisksasfairnessandbias,aswellasassuranceecosystemsforotherhigh-stakesystems(e.g.,commercialaviation)toaddresstheethical,technical,andoperationalchallengesinaneffectiveAIassuranceecosystem(ApolloResearch,2024).
Track2ProposedActions
RefineEvaluationTechniquesforDetectingDeceptioninAISystems
AfterApolloResearch’spresentationonearlyfind-ingsofAIdeception,sessionparticipantsdiscussedthenecessitytorefineandstrengthenevaluation
methodsfordetectingdeceptiveAIsystems.This
session’sdemonstrationsofAIdeception,Apollo
Researchargued,highlighttheneedforawidevari-etyoftechnicalstrategiestouncoverandunderstandmodeldeception.AccordingtoApolloResearch,
proposedframeworksforevaluationsconsideringtheinnerworkingsofmodels,enabledbygreateraccesstomodels,wouldprovidemore-thoroughinsightintothecausaldriversofmodelbehavior.This,Apollo
Researchbelieves,wouldhelptechniquesfordetec-tionkeeppacewithAI’sevolvingcomplexityandcapabilitiesforobfuscation.
7
Track3:Risk-AgnosticMethods
Thistracksetouttoproducearisk-agnosticmethod-ologicalframeworkandstrengthentherobustnessofmodelevaluations.Thefirstsessionexploredevalu-ationmethods—suchasredteaming,automated
benchmarking,andsophisticatedtaskdesign.Thesessionthatfollowedsharedbestpracticesforevalu-atingmodelcapabilities.
EvaluationMethods:FromRed
TeamingtoAutomatedBenchmarking
FrameworkforEvaluation
Intheevolvinglandscapeofmodeldevelopment,
robustevaluationmethodsareparamounttoassess
thepotentiallydangerouscapabilitiesoffrontier
models.ARANDfacilitatorpresentedadrafttable
breakingdownprominentmodelevaluationmethods(e.g.,redteaming,multiplechoicebenchmarking)bykeyattributes(e.g.,repeatability,depth,generalizabil-ity)tostimulatediscussion.TableA.1(seetheappen-dix)presentsmyinterpretationofanewrisk-agnosticframeworkofevaluationmethodologies,structuredtoremainagnostictothreats,basedonthediscus-
sioninthissession.Thisapproachacknowledgestheimportanceofwell-craftedthreatmodelsformodel
evaluation;however,theprocessofdetermininga
threatmodelfellbeyondthepurviewofthissession.Thegroup’sfocus,instead,wasonmappingthespaceofmethodologiesthatcanbegenerallyapplicable
acrossrisksorthreattypes.Bydissectingevaluationmethodologiesintomanageabledimensions,theses-sionaimedtofacilitateagreaterunderstandingofthestrengthsandlimitationsofdifferentevaluationmeth-odsforestablishingevidenceofmodelcapabilities.
EnhancingAccesstoAIEvaluationToolsWhilePreservingIntegrity
Thesessionalsoaddressedchallengesinbroaden-ingaccesstoevaluationtools(e.g.,open-sourcingtofosterinnovationandcommunityscrutiny)whilemaintainingevaluationintegrity,robustness,andvalidity.Theconversationunderscoredtheimpor-tanceofhold-outtasksthatremainunknownto
modelsuntilthetimeofevaluationandofexploring
theuseofcryptographichashingtosafeguardthe
integrityofevaluationdata.Suchapproachesasthesearecriticaltopreventtheincorporationofevalua-
tiondataintothetrainingdatasetsofthemodelstheyareintendedtoevaluate,whichcouldcompromise
therobustnessoffutureevaluations.Ensuringthatevaluationsdonotbecomepartofmodeltraining
datanecessitatesmeticulousconsiderationofbest
practicesforpublishingandsharingevaluations,
strikingadelicatebalancethatfosterstransparencyandmaintainstheintegrityandrobustnessofmodelevaluation.
EnsuringEvaluationRobustnessandValidity(METR)
KeyQuestionsforEvaluationRobustness
Fordevelopingmodelevaluations,METRhigh-
lightedaseriesofquestionsthatsupportevaluationrobustness.Aselectionofthesequestionsservesasafoundationforconstructingeffectiveevaluationswhileacknowledgingmethodologicallimitations.
•Whatistheevaluationmeasuring?Ifan
evaluationisbeingusedtoinformaparticu-laraction,thenitneedstobefaithfultothe
relevantthreatmodel.Forexample,anevalua-tiontoassesswhetheramodelissafetoopen-sourceneedstoassessthesafetyofthemodelunderfine-tuning.
Ensuringthat
evaluationsdonot
becomepartof
modeltrainingdata
necessitatesmeticulousconsiderationofbest
practicesforpublishingandsharingevaluations.
8
•Doestheevaluationscalewell?Howisit
intendedtoberun?Anevaluationmightbevalidonlyundercertainassumptions—forexample,thatthedatasethasnotappearedinamodel’spre-trainingcorpusorthemodelhascertaintoolsavailable.
•Whatarethewarningsignsofaproblem
withtheevaluation?Recognizingwarning
signsofpotentialevaluationissuesandunder-standingthecausesofmisleadinglyhighorlowscoresarecrucialformaintainingevalua-tionintegrityandaccuracy.
ModelVersusSystemEvaluation
Sessionattendeesalsoaddressedthedistinction
betweenevaluatingstand-alonemodelsthatcould
beusedordeployedadversariallyandevaluatingthebroadersystemsinwhichthesemodelsareembeddedwhendeployed.Dependingonthethreatmodelbeingconsidered,participantssuggestedthateitherofthesepossibilitiesmightbeappropriate,butitisimportanttoevaluatethoughtfully.
Track3ProposedActions
DevelopaFrameworkforRobustEvaluation
Discussionsonrisk-agnosticmethodologicalframe-worksemphasizedtheimportanceofholistically
exploringthespaceofevaluationmethodstounder-standpotentialharmsandhighlightedaneedfora
Sessionparticipants
suggestedusing
suchstrategiesas
employinghold-out
tasksandcryptographictechniquestosafeguardevaluationintegrity.
frameworkforrobustevaluation.Thesession’sfocusoncraftingamethodologicalframeworkthattran-scendsspecificthreatslaysthegroundworkforthedevelopmentofsuchaframework.
ImplementMeasurestoGuaranteethe
RobustnessandValidityofAIEvaluations
Thechallengesidentifiedwithdemocratizingmodelevaluationwhilepreservingevaluationintegrity
informthepolicyactiontoimplementmeasurestoguaranteetherobustnessandvalidityofAIevalua-tions.Sessionparticipantssuggestedusingsuchstrat-egiesasemployinghold-outtasksandcryptographictechniquestosafeguardevaluationintegrity.Theseapproachesaimtopreventthecontaminationof
modeltrainingdatasetswithevaluationdata,ensur-ingthefuturerobustnessofevaluations.
DevelopandDisseminateBestPractices
InformedbyMETR’skeyquestionsforensuring
evaluationrobustness,thisactionseekstoestablishandsharebestpracticesforeffectiveevaluations.Byclarifyingwhatevaluationsmeasure,ensuringscal-ability,andidentifyingwarningsignsofevaluationproblems,thisinitiativeaimstoequipstakeholderswiththeknowledgetodesign,select,andinterpretevaluationseffectively.
Track4:CollaborationandCoordination
Thissectionsummarizesdiscussionsheldduring
thecollaborationandcoordinationtrack,which
broughtAIresearchanddevelopmentexpertsintoconversationwithpolicyresearchersandprofession-als.Thesediscussionscoveredkeyupcomingpolicytimelines,processesfordiscerningriskthresholds,andframeworksforresponsiblymanagingfuture
modelcapabilities.
PolicyTimelinesandDeliverables
Inthissession,stakeholdersdiscussedthescopeandstatusofseveralkeyAIpolicymilestonesin2024,includingdeliverablesassignedbythefall2023exec-utiveorderonAI(Biden,2023).Participantsbroadly
9
agreedthat2024milestonesshouldbeonlyonestepinongoingAIgovernanceeffortsandthatsignificantfutureworkandcoordinationwouldberequired.
HighhopeswereexpressedthattheNationalInsti-tutesofStandardsandTechnologyAISafetyInstituteConsortiumwouldfacilitatethiscontinuedengage-ment,andparticipantshighlightedseveralopen
questionsthatshouldbeaddressedinfuturework:
•Howshouldthetimingandvenueforpub-
lishingevaluationresultsbedetermined,andwhatinformationshouldbeincludedtostrikeanappropriatebalancebetweentransparencyandresponsibledisclosure?
•Whatactivities(e.g.,evaluations,standards)arebestsuitedforgovernmentinvolvement?
•Whatevaluationmethodsaremosteffectiveatelicitingharmfulmodelcapabilities?
•Whenshouldevaluationsbeexecuted?How
caninitial,lesscostly,assessmentsreliably
indicatetheneedformore-comprehensiveandresource-intensiveevaluations?
•Whatsystems,mechanisms,orincentives
canbeimplementedtoensurethatevaluatorsthoroughlyexploreandrevealthefullspec-trumofAIcapabilities?
•Howcantheevaluationinsightscontributetoarisk-benefitanalysisforAIsystems?
RedLinesandRiskThresholds
ConceptualizingRiskandMitigation
Thissessionunderscoredtheimportanceofpreemp-tiveriskmanagementformodeldevelopment,giventhedifficultyofpredictingfuturemodelcapability.Ithighlightedthedual-useandgeneralnatureoffron-tiermodels,which,althoughdesignedforbeneficialpurposes,canbeusedbymaliciousactorstocause
signif
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 寻永恒之城背影看文艺复兴辉煌走进意大利96课件
- 二零二五年度房屋信托基金设立合同范本
- 二零二五年度货运代理物流信息化系统升级合同样本
- 2025版医疗器械抵押担保融资合同
- 二零二五年度医疗行业人才简本劳动合同
- 2025版建筑安装工程施工合同范本集
- 2025版财务制度保密及内部控制合同
- 二零二五年高品质泥工劳务分包合同
- 2025版现代化仓储中心设计与施工合同
- 2025版智能办公空间使用权转让合同
- 2025工商银行房贷借款合同
- 高校辅导员考试基础知识试题题库238题(附答案)
- 小学五年级数学奥数数的整除(附练习及详解)
- 甲肝健康知识课件
- 2025至2030中国防辐射服行业发展趋势分析与未来投资战略咨询研究报告
- 社区干部考试试题及答案
- 2025年中小学暑假安全教育主题家长会 课件
- 2025年乡村文化旅游与乡村旅游融合的市场需求分析报告
- 2025-2030年中国芳烃行业市场深度调研及投资前景与投资策略研究报告
- 2025年广西专业技术人员继续教育公需科目(一)答案
- DB33-T 1431-2025 公路固化土路基施工规范-
评论
0/150
提交评论