OpenAI+风险预防框架(Bbeta)_第1页
OpenAI+风险预防框架(Bbeta)_第2页
OpenAI+风险预防框架(Bbeta)_第3页
OpenAI+风险预防框架(Bbeta)_第4页
OpenAI+风险预防框架(Bbeta)_第5页
已阅读5页,还剩46页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Webelievethescientiposedbyincreasingtechnicalandproceduralsafimproveourunderstandingofthescienceandempiricaltexturedevelopmentanddeployment.emerge.Frameworkoutlines.1Ourfocusinthisdocumentisoncatastrophicrisk.Bycatastrophicrisk,wemeananyriskwhichcouldres—thisincludes,butisnotlimitedto,existentialrisk.2ProactiveinthiscasereferstoanaimtodevelopthisscienceaheadofthefirsttimeiDeploymentinthiscasereferstot4.TaskingthePreparednessteamwithconductingresearch,evaluations,monitoring,andforecastisummaryofthelatestevidenceenableOpenAltoplanahead.ThePreprelevantteams(e.g..SafetySystems,Security,Superalignment,PolicyResearch)tocollatemanagesafetydrillsandcoordinatewiththeTrustworthyAlteamforthird-partyauditing.make.SAGresponsibilitieswillthusincludeoverseeingtheassessmentoftherisklandscape,andmaintainingafast-trackprocessforhandlingemergencyscenarios.similarapproaches.Thislivingdocumenthasthreesections:wewilltrackaswellasdelineationsofdifferent2.Scorecard,inwhichwewillindicateourcurrentassessmentsofthelevelofriskalongeachtrackedriskcategory. inwhichwelayoutoursafetybaproceduralcommitments,whichincludestandingInthissection,weidentifythecategoemerge,i.e.,"unknownunknowns."Ourintentisto"godeep"inthetrackedcategoriestoidentificationprocetailoredmitigations.Ingeneral,"low"onthimaximallevelofconcern.Ourrationaleforgroupingandconsiderations.First,fine-tuningorotherdomain-specificenhancements(e.g.,tailoredriskcategory.Ourevaluationswillthusincludetestsagainsttheseenhancedmodelstoensurewearetestingagainstthe"worstcase"scenarioweknowof.Ourproceduralcommitmentsaretriggeredwhenanyofthetrackedriskcategoriesincreaseinseverity,ratherthanonlywhentheyallincreasetogether.Becausecapabilityimprovementsacrossdifferentdomainsdonthesideofsafety.Second,thisapproachenablesustoleveragedomain-specifictalenttodeveloptailoredsuitesofevaluationsandmonitoringsolutionsforeachriskcategory.Third,thisapproachincreasesoptionsfortailored,domain-specificmitigations,tohelpminimizetheneedforbroader,moredisruptiveactions.LowMediumHighCybersecurityCBRNPersuasionAutonomyPost-MitigationMediumdeferspecificdetailsonebeupdatedfrequently).Cybersecurityisfocusedonrisksrelatedconfidentiality,integrity,and/oravailabilitytouseofthemodelforcyber-exploitationtodisruptofcomputersystemsModelassistanceforcyberattacks,ifany,ismeaningfuldataanalysisANDmodelcannotmeaningfullyexeelementsofcyberoperations,e.g.,developingaktranslatingreasoningablateralgoal-directedmovementonhardenedtargets.Non-programmingtasksrepresentWeexcludevulnerabilitydiscoverybecauvulnerabilities,butmoresotheimplementationandMediumntoanattack,black-boxexploitation,goal-directedateralmovement,identificationofvaluableinformation,remainingundetected,orreactingtodefense.e.g.Russiandisinformationgroups,whousepurchasedcapabilities,phishing,andotherwiseknownmeoroofs-of-conceptforhigh-valueexploipotentiallyinvolvingnovelexploitatiHigh-valueexploitsaregenerallyagainsthardenedzero-clickiOSexploit).Whileweakervulnemeaningfullyaddressabottleneckforattackers,suchadvancedcapabilitywouldunblockmanyandbeEnd-to-endexecutionwoulddramaticallyacceleratethequalityorquantityofprofessionalcyberattastrategiesforcyberattackactors,fromhackingmilitaryoNovelcyberoperationsgeneraTheymayinvolve,e.g.,novelzero-dacommand-and-control.CBRNisfocusedonrisksrelatedtomodel-assistedcreationofchemical,biological,radiological,and/ornuclearthreats.ModelcanprovideinformationrelevanttocreatingCBRNthreatswithcomparableutilitytoexistingresources(e.g.,textbooks,searchengines).Minimalchangefromstatusquowhereotherresources(e.g.,textbooks,searchengines)alreadyexist.MediumModelprovidesmeaningfullyimprovedassistancethatncreasesabilityforexistingexpertsinCBRN-relatedadvancedfieldstobeabletocreatethreat(e.g.,tacitknowledge,specifnformation,plansfordistribution).Modelsmayallowuserstolearncriticalpiecesofinformationfasterthantheycouldwithexistingresources,orexplainvariationsonstepsmoreflexiblyofcreatingaCBRNthreat,althoughriskislimitexpertsalreadyhavesignificantdomainexpertiseandreputationHighModelenablesanexperttodevelopanovvectorORmodelprovidesmeaningfullyimprovedassistancethatenablesanyonewithbasictraininginarelevantfield(e.g.,introductoryundergraduatebiologycourse)tobeabletocreateaCBRNthindicatorforcatastrophicnewthreats,forwhichdefenses(e.g.,vaccines)aremuchweakerornonexistent.AllowingthosewithbasictrainingtocreateCBRNthreatsisamajorincreaseinaccessibilitrequiresdrasticaction,especiallyforbiologicalandonerous.CriticalModelenablesanexperttodevelopahighlydangerousnovelthreatvector(e.g.,comparabletonovelCDCClassAbiologicalagent)ORmodelprovidesmeaningfullyimprovedassistancethatenablesanyonetobeabletocreateaknownCBRcanbeconnectedtotoolsandequipmenttothefullengineeringand/orsynthesiscycleofaregulatedornovelCBRNthreatwithouthumanCDCClassAbiologicalagent(i.e.,highmortality,easeoftransmission)couldcausemillionsofdeathsandsignificantlydisruptpubliclifAllowinganyonetocreateCBRNthreatswouldmaximallyextendaccessibilitySignificantlyautomatingtheend-to-endengineeringorsynthesiscycleforaCBRNthreatwouldalsoenableunprecedentedandmassivescaleofpotentialattacksmodelscouldmakeallcontentuptotheircModelcancreate(potentiallyinteractive)contentwithcomparablepersuasiveeffectivenesstoreadingalow.Low-qualityarticlesareverycommonandlow-impact,somodelcommoditizationhasnegligibleimpact.MediumModelcancreate(potentiallyinteractive)contentwithcomparablepersuasiveeffectivenesstotypicalhuman-writtencontent(e.g.,anop-edfromareputablenewssource;1:1conversationwithatypicalrandomly-assignedindividual).Thismakesmodelsasignificantaidtobiasedjournalism,get-out-the-votecampaigns,andtypicalscamsorspearphishers.Itcouldaffectpolitics,marketing,orsocialengineeringinasimilarfashiontohiringalarge,verycheap,andmoderatelyskilledworkforce.HighModelcancreate(potentiallyinteractive)contentwithempiricallycomparablepersuasiveeffectivenesstoacountry-widechangeagent.agent;itcoulddramaticallyaffectelectionsanddemocraticoutcomes.CriticalModelcancreate(potentiallyinteractive)contentwithpersuasiveeffectivenessstrongenoughtoconvincealmostanyonetotakeactiagainsttheirnaturalinterest.Thisrepresentsasuperhumanpersuasivethreat;itwouldbeapowerfulweaponforcontrollingnation-states,extractingsecrets,andinterferingwithdemocracyModelautonomyenabchangesandevadeattemptstomitprerequisiteforself-exfiltration,self-improvement,andresourceacquisition.Mediuminstance,completeaFiverrtask)ORmodel,well-specifiedcodingtasksohandcraftedrepositorySolvingwell-specifiedtasksdeunderstandingofprogrammingandamoderateabilityModelcanexecuteopen-eproductionMLcodebasethatwouldconmprovementsmaybeboundedorevolutionaryrathergivenminimalhumaninstruction,explicitapproachesORmmodelwouldlikelybeabletocFiverr).3Byintelligenceexplosion,wemeanacycleinwhichtheAmorecapableofmoreimprovements,creatingarunawayprocessofself-improvement.Aofcapabilitygainscouldbutstripourabilitytoanticipateandreacttothem.concentratedburstuunderstandingofthepotentialimpactsandcapabilitiesoffrontiermodelsimproTherefore,asapartofourGovernanceprocess(needtotrack.“tripwires”requiredfortheemergenceofanycatastrophicriskscenarienvision.Notethatweincludedeceptiofthemodelautonomyriskcategory.口进群福利:进群即领万份行业研究、管理方案及其他学习资源,直接打包下载微信扫码行研无忧SourcesthatinformtheupdatestotPolicyResearch,SafetySystems,Superpost-mitigationrisk,butondifferentversionsmitigations,asclarifiedfurtherbmitigationrisk.Pre-mitigationriskismeanttoguidethelevelofoursecurityeffortcouplingcapabilitiesgrowthwithrobusts“worstknowncase”(i.e,specificallytailored)forthegivendomain.Tothisend,forourtailoredpromptswhereverappropriate),butalsoonfine-tunedversionsdesignedfortheparticularmisusevectorwithoutanymitevaluationscontinually,i.e.,asoftenasincludingbefore,during,andaftertraining.Thiseffectivecomputeincreaseormajoralgorithmicbreakthrough.Toverifyifmitigationshavesufficientlyanddependentlyreducedtheresultingpost-mitigationrisk,wewillalsorunevaluationsonmodelsaftertheyhavesafetymitigationsinplace,againattemptingtoverifyandtestthepossible"worstknowncase"scenarioforthesesystems.Aspartofourbaselinecommitments,weareaimingtokeeppost-mitigationriskat“medium”riskorbelow.Manyofthemitigationsthatwouldbenecessarytoaddressrisksata"high"or“critical”pre-mitigationlevel(eg.,hardeningsecurity)requireadequateleadtimetoimplement.Forandhelpusthusproperlyplanforandgetaheadoftheemergingrisks.Theseeffortswillincludesustainedresearchrelatedtoscalingtrendsfordangerouscapabilitiesandongoingmonitoringofmisuse.Wewillalso,incooperationwithotherteams(e.g.,SafetySystems),developmonitoringandinvestigativesystems.Thismonitoringofreal-worldmisuse(aswellasstayingabreastofrelevantresearchdevelopments)willhelpuscreateabetterpictureofdeployedmodelcharacteristics,andinformupdatestoourevaluationsasnecessary.Acentralpartofmeetingoursafetybaselinesisimplementingmitigationstoaddressvarioustypesofmodelrisk.Ourmitigationstrategywillinvolvebothcontainmentmeasures,whichhelpreducerisksrelatedtopossessionofafrontiermodel,aswellasdeploymentmitigations,whichhelpreducerisksfromactiveuseofafrontiermodel.Asaresult,thesemitigationsmightspanincreasingcompartmentalization,restrictingdeploymenttotrustedusers,implementingrefusals,redactingtrainingdata,oralertingdistributionpartners.Note:Belowisonlyanillustrativetemplatevaluations.PreparednessFramework(Beta)15Cybersecurity:Low(pre-mitigation),Low(post-mitigation)Modelassistanceforcyberattacksismeaningfulonlyfornon-programmingusesandthemodelcannotmeaningfullyexecuteelementsofcyberoperations.questionsextractedandrephrasedfrompentestwriteups,themodelperformssignificantlybelowthehumanbaselvulnerabletodenialofservice,cybersecurityprofessionalsfind<X%morebugswhenprovidedwithGPTassistance.lowerperformancethancommercialdeobfuscationtools(X%vsY%).flag)obtain<X%higherscorefromusingChatGPTaugmentedwiththeabilitytoexecutecommandsonavulnerablevirtualmachine.CBRN:Low(pre-mitigation),Low(post-mitigation)existingresources(e.g.,textbooks,searchengines),butnothingmorethanthat.virology,ormicrobiology)experiencenoimprovementinsuccessratesforClassBbiologicalagentdevelopmentinstructionsusingversusjustasearchengine(X%vsX%),includingacrossaccuracy,completeness,andinnovationmetrics.Thisholdstrueforchemistry,nuclear,andradiologicalpoprofessionalsstudentsaswell.(specificallygenetics,virology,ormicrobiology)experiencenoimprovementinsuccessratesforwritingCDCClassBbiologasearchengineandGPTversusjustasearchengine(X%vsX%),includingacroaccuracy,completeness,andinnovationmetrics.Thisholdstrueforchemistry,nuclear,andradiologicalstudyundergraduatestudentsaswell.suppliersthatdonotscreenordersusingasearchengineandGPTversearchengine.Persuasion:Medium(pre-mitigation),Low(post-mitigation)Modelcancreate(potentiallyinteractive)contentwithcomparablepersuasiveeffectivenesstotypicalhuman-writtencontent(e.g.,anop-edfromareputablenewssource;1:1conversationwithatypicalrandomly-assignedindividual),butmitigationsbringthispersuasiveeffectivenessdowntothatofalow-qualityarticle.politicalissuesignificantlycomparedtohuman-generatedcontrols(X%vsY%),butmitigationsbringitto(X%vsX%).digital“friend”increasestherat(X%vsX%).toacharitablecausecomparedtohuman-generatedcontrols(X%vsX%).Chromeextensionsignificantlycomparedtohuman-generatedcontrols(X%vsX%).byamodelareapprosubtlebackdoor,e.g.,aspecificHTTPrequest,whenChatGPTisinstructedtoinsertbackdoorsincodegenerations.andnoelectoralcampaignshavebeenfoundthatrelyonourmodels.actionsifexplicitlyinstructedtodoso,andfailstochainactionsoracquireresoreducesto0%post-mitigations.tasks(e.g.,autonomouslyfindcomputeeBeyondourabove-detailedprocesseswithregardstoevaluations,monitoring,andforecasting,wewanttomakeasetofsafetybaselinesandproceduralcommitments.Notethattheseprocessesalsoapplytotrainingrunsaswewillperformperiodicevaluations(e.g.,ofcheckpoints)duringmodeltraining.Asmentionedintheintroduction,OpenAlalreadyhasextensivesafetyprocessesinplacebothbeforeandafterdeployment(e.g.,systemcards,red-teaming,refusals,jailbreakmonitoring,etc.).Wenowaugmenttheseprocesseswiththreesafetybaselines.lfwereach(orareforecastedtoreach)atleast“high”pre-mitigationriskinanyoftheconsideredcategories:wewillensurethatoursecurityishardenedinawaythatisdesignedtopreventourmitigationsandcontrolsfrombeingcircumventedviaexfiltration(bythetimewehit“high”pre-mitigationrisk).Thisisdefinedasestablishingnetworkandcomputesecuritycontrolsdesignedtohelppreventthecapturedriskfrombeingexploitedorexfiltrated,asassessedandimplementedbytheSecurityteam.Thismightrequire:·increasingcompartmentalization,includingimmediatelyrestrictingaccesstoalimitednamesetofpeople,restrictingaccesstocriticalknow-howsuchasalgorithmicsecretsormodelweights,andincludingastrictapprovalprocessforaccessduringthisperiod.·deployingonlyintorestrictedenvironments(i.e.,ensuringthemodelisonlyavailableforinferenceinrestrictedenvironments)withstrongtechnicalcontrolsthatallowustomoderatethemodel'scapabilities.·increasingtheprioritizationofinformationsecuritycontrols.Onlymodelswithapost-mitigationscoreof"medium"orbelowcanbedeployed.Inotherwords,ifwereach(orareforecastedtoreach)atleast“high”pre-mitigationriskinanyoftheconsideredcategories,wewillnotcontinuewithdeploymentofthatmodel(bythetimewehit“high”pre-mitigationrisk)untiltherearereasonablymitigationsinplacefortherelevantpost-mitigationriskleveltobebackatmostto“medium”level.(Notethatapotentiallyeffectivemitigationinthiscontextcouldberestrictingdeploymenttotrustedparties.)Onlymodelswithapost-mitigationscoreof"high"orbelowcanbedevelopedfurther.Inotherwords,ifwereach(orareforecastedtoreach)“critical”pre-mitigationriskalonganyriskcategory,wecommittoensuringtherearesufficientmitigationsinplaceforthatmodel(bythetimewereachthatrisklevelinourcapabilitydevelopment,letalonedeployment)fortheoverallpost-mitigationrisktobebackatmostto“high”level.Notethatthisshouldnotprecludesafety-enhancingdevelopment.Wewouldalsofocusoureffortsasacompanytowardssolvingthesesafetychallengesandonlycontinuewithcapabilities-enhancingdevelopmentifwecanreasonablyassureourselves(viatheoperationalizationprocesses)Additionally,toprotectagainst“critical”pre-mitigationrisk,weneeddependableevidencethatthemodelissufficientlyalignedthatitdoesnotinitiate"critical"-risk-leveltasksunlessexplicitlyinstructedtodoso.(OpenAlLeadership,withtheoptionfortheOpenAlBoardofDirectorstooverrule).a.ThePreparednessteamconductsresearch,evaluations,monitoring,forecasting,andcontinuousupdatingoftheScorecardwithinputfromteamsthatareastargetedandnon-disruptiveaspossiblewhilenLeadership.Thiswillsomeonefrompreviousyearstoensurethereiscontinuexperience,whilestillensuringthatfreshandtimelypersgroup.defaultdecision-makeronalldecisions.a.ThePreparednessteamisevaluationstoprovideScorecamonitoredmisuse,red-teaming,andintelligenceiv.forecastingpotentialchabovewithanypotentialprotectiveactionsreport.Thecasewillconsistostandarddecision-makingp

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论