知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY

上传人：策*** IP属地：山西上传时间：2023-02-25 格式：DOCX 页数：75 大小：773.68KB 积分：19.9 举报 版权申诉

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第2页

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第3页

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第4页

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第5页

已阅读5页，还剩70页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

arXiv:2302.08261v1[cs.LG]16Feb2023

KNOWLEDGE-AUGMENTEDGRAPHMACHINELEARNING

FORDRUGDISCOVERY

ASURVEYFROMPRECISIONTOINTERPRETABILITY

DavideMottin

AarhusUniversitydavide@cs.au.dk

ZhiqiangZhong

AarhusUniversityzzhong@cs.au.dk

AnastasiaBarkova

WhiteLabGenomics

abarkova@

February17,2023

ABSTRACT

TheintegrationofArtiﬁcialIntelligence(AI)intotheﬁeldofdrugdiscoveryhasbeenagrowingareaofinterdisciplinaryscientiﬁcresearch.However,conventionalAImodelsareheavilylimitedinhandlingcomplexbiomedicalstructures(suchas2Dor3Dproteinandmoleculestructures)andprovidinginterpretationsforoutputs,whichhinderstheirpracticalapplication.Asoflate,GraphMachineLearning(GML)hasgainedconsiderableattentionforitsexceptionalabilitytomodelgraph-structuredbiomedicaldataandinvestigatetheirpropertiesandfunctionalrelationships.Despiteextensiveefforts,GMLmethodsstillsufferfromseveraldeﬁciencies,suchasthelimitedabilitytohandlesupervisionsparsityandprovideinterpretabilityinlearningandinferenceprocesses,andtheirineffectivenessinutilisingrelevantdomainknowledge.Inresponse,recentstudieshaveproposedintegratingexternalbiomedicalknowledgeintotheGMLpipelinetorealisemorepreciseandinterpretabledrugdiscoverywithlimitedtraininginstances.However,asystematicdeﬁnitionforthisburgeoningresearchdirectionisyettobeestablished.Thissurveypresentsacomprehensiveoverviewoflong-standingdrugdiscoveryprinciples,providesthefoundationalconceptsandcutting-edgetechniquesforgraph-structureddataandknowledgedatabases,andformallysummarisesKnowledge-augmentedGraphMachineLearning(KaGML)fordrugdiscovery.AthoroughreviewofrelatedKaGMLworks,collectedfollowingacarefullydesignedsearchmethodology,areorganisedintofourcategoriesfollowinganovel-deﬁnedtaxonomy.Tofacilitateresearchinthispromptlyemergingﬁeld,wealsosharecollectedpracticalresourcesthatarevaluableforintelligentdrugdiscoveryandprovideanin-depthdiscussionofthepotentialavenuesforfutureadvancements.

1Introduction

Drugdiscoveryanddevelopmenthavebeenoneofthemostprominentandchallengingresearchtasksfordecades[

].Priortoadrugbeingmarketedanddistributedtopatients,itmustundergoamultitudeofresearchvalidations.From initialearlydrugdiscoverytopreclinicaldevelopment,andsubsequenttoclinicaltrialsandﬁnalregulatoryapproval,itusuallytakes10-15yearsandcostsaround2billionUSdollars[

].Thedrugdevelopmentprocesstypicallybeginswiththeidentiﬁcationofthetargetproteinornucleicacidinvolvedinaspeciﬁcdiseaseduringearlydrugdiscovery.Thisisfollowedbytheidentiﬁcationofasmallmoleculeorbiologicdrug(suchasanantibodyorprotein)thatwill interactwiththetargetandmodulateitsactivitywiththeaimofcuringorpreventingthedisease.Inthecaseofsmallmolecules,high-throughputscreeningexperimentsareperformedtoidentifypromisingcompounds,aprocessknownas“hitidentiﬁcation”.Fromthesehits,somecompoundsareselectedthroughinvitroandinvivoassaysandarechemicallyoptimisedtoimprovepropertiessuchasstability,afﬁnityorsolubility,togiverisetothe“lead”compounds.Afterseveralroundsofstructuraloptimisation,theleadmoleculebecomesadrugcandidateandcanproceedtopreclinicalstudiesinanimals,followedbyclinicalstudiesinhumans.Theidealdrugshouldbenon-toxicandhaveasfewsideeffectsaspossibleforthepatientswhilebeingsolubleandeffectivelyinteractingwiththetarget.Eachstepoftheprocessischaracterisedbyahighrateoffailureandsubstantialcosts.

OtherProficient

BiomedicalData

hasfunction

Gene

GeneOntology

…

Gene

…

hasPhenotype

hasindication

…

Drug

hassideeffect

Phenotype

PhenotypeOntology

(a)

APREPRINT-FEBRUARY17,2023

BiomedicalDataareGraphs

(b)

HumanBiomedicalKnowledge

TheoriesandEquations

DescriptionContext

MoleculeNetworkProteinStructure

Disease

DiseaseOntology

KnowledgeGraph

VirusStructureDrug-DrugInteraction

Figure1:Illustrationofreal-worldbiomedicaldataintheformofgraphsandexamplesofhumanbiomedicalknowledge.(a)Graphsareanaturalwaytorepresentbiomedicaldata,suchasmolecule2Dor3Dnetwork,proteinstructureisa3Dgraphthatrecordsthechemicalforcebetweenaminoacidresiduesandinteractionsbetweendrugsrepresentedasagraph.(b)Toyexamplesrepresenthumanbiomedicalknowledgetasksindifferentforms.Forinstance,formalscientiﬁcknowledgeincludeswell-deﬁnedtheoriesandequations.Also,thereismoregeneralexperimentalknowledge,includingdescriptioncontext,otherproﬁcientdataandknowledgegraph.

Toreducetheﬁnancialburdenandincreasethesuccessrate,researchershavebeenworkingonacceleratingdrugdiscoverybytakingadvantageofremarkableArtiﬁcialIntelligence(AI)techniques[

].Technologicaladvancesnowallowforthecreationofvastamountsofdatainareassuchasgenomics,proteomics,andimaging,whichcanbeusedtoinformthedrugdiscoveryprocess[

].AIcananalysethesedataandidentifypatternsandrelationshipsthatmightnothavebeenotherwisenoticeable,leadingtotheidentiﬁcationofnewtargetsandtheoptimisationofexistingones[

].AI-baseddrugdiscoveryisalsobeingusedtostreamlinetheprocessofdrugdevelopmentbypredictingthelikelihoodofsuccessofacandidatedrug,reducingthetimeandcostrequiredtobringanewdrugtomarket[

].Furthermore,AIisbeingusedtopredictpotentialsideeffectsandtoxicity,allowingfortheidentiﬁcationofpotentialsafetyissuesbeforeclinicaltrials[

].Withtheseadvancements,AIhasthepotentialtotransformthedrugdiscoveryprocess[

],enablingfasterandmoreefﬁcientdrugdevelopmentandbringingnewtherapiestopatientsmorequickly.

Biomedicaldataishighlyinterconnected[

]andcanbeeasilyrepresentedasgraphs(ornetworks),whichhaveavarietyofapplicationsatdifferentstagesofthedrugdiscoveryanddevelopmentprocess.Forinstance,asillustratedinFigure

-(a),biomedicaldatacanbehierarchicallyrepresentedasgraphs.Startingfromthemolecularlevel,atomscanberepresentedasnodes,andchemicalbondsasedgesof(2Dor3D)moleculargraphs[

].Onthemacro-moleculelevel,interactions(edges)betweenaminoacidresidues(nodes)organiseas(2Dor3D)proteingraphs[

].Atthecompoundlevel,edgesinthedrug-druginteraction(DDI)networkcanindicatechemicalinteractions(edges)betweendrugs(nodes)measuredbylong-termclinicalscreens[

Nevertheless,conventionalAItoolsstrugglewiththehandlingofcomplexgraph-structureddata.Thefeatureextractorsemployedbymachinelearningmodelsareoftennottransferable,requiringmanualdesignforeachspeciﬁcdatasetandtask.Althoughdeeplearningmodels[

]havethecapabilitytolearnfromrawdata,theyarestilllimitedintheirabilitytohandlecomplexgraphstructures.Inresponse,GraphMachineLearning(GML),anewclassofAImethods,hasbeenproposedtoinvestigategraph-structureddata.TheessentialideaofGMListolearneffectivefeaturerepresentationsofnodes(e.g.,drugsinDDInetworks),edges(e.g.,relationsorinteractionsbetweendrug-drugordrug-disease),orthe(sub)graphs(e.g.,moleculargraphs)[

].Thesecorrespondingnode-,edge-and(sub)graph-leveldownstreamtaskscanberealisedbasedontheselearnedrepresentations.Accordingtodifferentrepresentationlearningmechanisms,GMLapproachescanbebroadlycategorisedinto“shallow”and“deep”classes.Inparticular,atypeofdeepGMLmethodcalledGraphNeuralNetworks(GNNs)[

],whicharedeepneuralnetwork

APREPRINT-FEBRUARY17,2023

architecturesspeciﬁcallydesignedforgraph-structuredata,areattractinggrowinginterest.GNNsiterativelyupdatethefeaturesofgraphnodesbypropagatingtheinformationfromtheirneighbouringnodes.Thesemethodshavealreadybeensuccessfullyappliedtoarangeoftasksanddomains,includingdrugdiscovery[

However,despitethecurrentpaceofGMLindrugdiscovery,theysufferfromseveralseriousdeﬁciencies,includinghighdatadependency(i.e.,strongperformancereliesonhigh-qualiﬁedtrainingdataset)[

]andpoorgeneralisation(i.e.,uncertainmodelperformanceoninstancesthathaveneverbeenobservedintrainingdata)[

].Thesedeﬁcienciesoriginateprimarilyfromthemodels’data-drivennatureandtheirinabilitytoexploitthedomainknowledgeeffectively.Inaddition,therehasbeenanincreaseddemandformethodsthathelppeopleunderstandandinterprettheunderlyingmodelsandprovidesmoretrustworthiness.Inanefforttomitigatethelackofinterpretabilityandtrustworthinessofcertainmachinelearningmodelsandtoaugmenthumanreasoninganddecision-making,attentionhasbeendrawntoeXplainableArtiﬁcialIntelligence(XAI)[

]andTrustworthyArtiﬁcialIntelligence(TAI)[

]approaches,thatprovidehuman-comprehensibleexplanationsforthemodel’sinherentmechanismandoutputs.

Toaddresstheselimitations,researchersrecentlypaidattentiontoanewAIparadigm,whichwerefertoasKnowledge-augmentedGraphMachineLearning(KaGMLinshort),forsuperiordrugdiscovery.ItscoreideaistointegrateexternalhumanbiomedicalknowledgeintodifferentcomponentsoftheGMLpipelinetoachievemoreaccuratedrugdiscovery,alongwithuser-friendlyinterpretations,whichguaranteestheexpert’sknowledgeisnottobesubstituted.Biomedicalknowledgemayexistinvariousforms,asshowninFigure

-(b),includingformalscientiﬁcknowledge(e.g.,well-establishedlawsortheoriesinadomainthatgovernthepropertiesorbehavioursoftargetvariables),informalexperimentalknowledge(e.g.,well-knownfactsorrulesextractedfromlongtimeobservationsandcanalsobeinferredthroughhumans’reasoning).Thecontributionsofthissurveyarethefollowing:

•WearetheﬁrsttoproposetheconceptofKaGMLandcomprehensivelysummariseexistingwork.ThediscussionbetweenKaGMLandexistingotherparadigmsemphasisesthenoveltyofKaGMLanditspromisingpotentialforpracticalmedicalapplications.

•WeproposeanoveltaxonomyofKaGMLapproachesaccordingtodifferentschemestoincorporateknowledgeintotheGMLpipeline.Itiseasierforthereaderstoidentifythecoredesignofdifferentmodelsandlocatetheinterestingcategories(Section

).Wecreatedapublicfoldertosharecollectedresources

andwillcontributetoitcontinuously.

•Wecarefullydiscusspracticaltoolsandknowledgedatabasesthathavebeen(orarehighlypossibletobe)usedbyKaGMLmethodstosolvepracticaldrugdiscoveryproblems(Section

).Weprovideaschematicrepresentationofpossibleschemestoorganisedifferentknowledgedatabasesaboutsmallmoleculedrugsintooneknowledgegraph.

•Wecoverthemethodologiesnotonlyforsolvingscientiﬁcproblemsunderacomputersciencescenariobut,moreimportantly,forreal-worldbiomedicalapplications.OursurveyishenceofinterestnotonlytoAIresearchersbutalsotobiologistsindifferentﬁelds.Section

discussespromisingfutureworkforresearchersfrombothdisciplines

toexploit.

Table1:Summaryofthekeywordsusedintheliteraturesearch.

Area

Keywords

DrugDiscovery

KnowledgeGraph

GraphMachineLearning

DrugDiscovery,DrugDesign,DrugDevelopment,MedicineDiscovery,MedicineDesign,MedicineDevelopment

Knowledge-augmented,Knowledge-aware,Knowledge-informed,Knowledge-guided,Knowledge-enhanced,Knowledge-driven

GraphMachineLearning,GraphNeuralNetwork,GeometricMachineLearning

Searchmethodology.Allstudieswereretrievedinoneofthreefollowingways:(i)acomprehensivetop-downapproachthatconductedanextensivesearchofKaGMLpapersfrommajoracademicdatabasessuchasGoogleScholar,IEEExplore,ACMDigitalLibrary,DBLPComputerScienceBibliography,andScienceDirect,usingkeywordslistedinTable

;(ii)abottom-upapproachthatsurveyedrecentresearchoutputsinAIconferencesandworkshops;(iii)athoroughexaminationoftherelatedwork,discussionsections,andcitedreferencesfromthepapersobtainedinsteps(i)and(ii)toidentifyoverlookedworks.Wekeyword-searchedforworkscontainingaconjunctionofanyofthetermssummarisedinTable

,leadingtoaselectionofmorethan1,000articles.Approximately100werethoroughlyscannedaccordingtothecriteria,andabout20wereidentiﬁeddirectlyfromtherelatedworksections.Wheneverpossible,weprioritisedpeer-reviewedpublicationsandmajorjournals/conferences(e.g.,Nature,Nat.Commun,Nat.Mach.

/zhiqiangzhongddu/Awesome-Knowledge-augmented-GML-for-Drug-Discovery

Nodeattributematrix

Adjacencymatrix

APREPRINT-FEBRUARY17,2023

Intell,NeurIPSICML,ICLR,AAAI,KDD)towhitepapersorunreviewedsubmissions.Studieswereselectedonlyifpresentingasubsymbolicsystem,includingsomeformsofincorporatingbiomedicalknowledgeintoGMLmodelsforprecisiondrugdiscoveryandproducinganyexplanationsusingbackgroundknowledgewithGMLmodels.TheﬁnallyidentiﬁedpapersaresummarisedintodifferentcategoriesinTable

Planofthefollowingsections.Therestofthispaperisorganisedasfollows.Section

introducestheconceptofgraphmachinelearninganddiverseaddressedtasks.Section

discussesthehumanknowledgedatabaseandknowledgegraphconcept.Section

providesatechnicalexpositionofprevailingintelligentdrugparadigmsanddescribeskeyapplicationsofgraphmachinelearningandknowledgegraphindrugdiscovery.Followinganovel-deﬁnedtaxonomy,wediscussthecollectedKaGMLfordrugdiscoverypapersinSection

.RelevantpracticalresourcesareshowninSection

,includingrelevantscientiﬁctools,knowledgedatabasesandaschematicrepresentationofpossibleschemestoorganiseknowledgedatabasesaboutsmallmoleculedrugsfromvariousaspectsintooneknowledgegraph.Intheend,wescrutinisethepotentialdirectionsofKaGMLandconcludethepaperinSection

.Toassistreadersinﬁndingrelevantcontent,thepaperutilisesboxestohighlightcloselyrelatedtopics,includesﬁgurestoillustrateexamples,andtablestopresentcontrastingtopics.

2GraphMachineLearning

(a)

(b)

Graph

口Nodepropertyprediction

口Linkprediction

口Graphpropertyprediction

口Graphmodification/generation

口Etc.

36⋮61

⋯

0.4

0.3

X=⋮0.10.7

⋯

0A=⋮

4.4

9.1

⋮

1.8

00⋮01

10⋮01

01⋮10

…

Vector

⋱

…

⋱

…

basedrepresentations

…

(c)

PR(v|u)

EstimatetheprobabilityofvisitingnodevonarandomwalkstartingfromnodeuusingsomerandomwalkstrategyR.

(d)

⋮

…

1sthopaggregation

Figure2:Toyexamplesofthegraphandtypicalgraphrepresentationlearningapproaches.(a):AgraphcanbebasicallyrepresentedusinganodeattributematrixXandanadjacencymatrixA.(b)Graphrepresentationlearningcanconvertagraphintoasetofvectors,whichrecordinformationaboutthegraph.(c)Atoyexampleofrandomwalk-basedshallowGRLapproaches.(d)AtoyexampleofGNNmechanism.

Box1.FundamentalsofGraphMachineLearning

Deﬁnition1(Graph).Agraphwithnnodescanbeformallyrepresentedasg=(v,s),whichconsistsofnnodesuevand|v|=n.sCvxvdenotesthesetofedges,whereeu,vdenotestheedgebetweenuandu.NodeattributevectorxueRddescribessideinformationandmetadataofnodeu.ThenodeattributematrixXeRn×dcontainsattributevectorsforallnodesinthegraph.Similarly,edgeattributesxu,veRτforedgeeu,vcanbetakentogethertoformanedgeattributematrixXeeRm×τ.Apathfromnodeu1tonodeukisasequenceofedgesu1e1--u2...uuk.Forsubsequentdiscussion,wesummarisevandsintoanadjacencymatrixAe[0,1}n×n,whereeachentryAu,vis1ifeu,vexists,and0otherwise.AnexamplegraphanditsnodeattributematrixandadjacencymatrixareshowninFigure

-(a).

APREPRINT-FEBRUARY17,2023

Deﬁnition2(Neighbourhood).Fornodeo,itsneighbourhoodN(o)arenodesdirectlyconnectedtooing,andthenodedegreeisthesize|N(o)|.

Deﬁnition3(9-hopNeighbourhood).The9-hopneighbourhoodofnodeoisthesetofnodesthatareatadistancelessthanorequalto9fromnodeo,thatis,Nλ(o)=[u|0<d(o,u)<9}whered(.)denotestheshortestpathdistance.

Deﬁnition4(9-hopSubgraph).Subgraphsλ(o)=(v,s)isasubsetofagraphg,wherev:=(Nλ(o)n[o})ands:=((vxv)ns).

GraphAnalysisinArtiﬁcialIntelligenceEra.Toprocessthegraph-structureddata,GraphMachineLearning(GML)

[30]

isdesignedasapredominantapproachtoﬁndingeffectivedatarepresentationsfromgraphdata.TheprincipaltargetofGMListoextractthedesiredfeaturesofagraphasinformativerepresentationsthatcanbeeasilyusedbydownstreamtaskssuchasnode-level,edge-levelandgraph-level,analysis,classiﬁcationandregressiontasks.TraditionalGMLapproachesmainlyrelyonhandcraftedfeatures,includinggraphstatistics

[41],

(e.g.,degree,centralityandclusteringcoefﬁcient),kernelfunctions[

]andexpertsdesignedfeatures[

].However,traditionalGMLmodelsarebuiltontopofmanuallydesignedorprocessedfeaturesets.Thedevelopedfeatureextractorsareoftennottransferableandneedtobedesignedspeciﬁcallyforeachdatasetandtask.Theseconventionalapproachesoftensufferfrompracticallimitsonlarge-scalegraphswithrichnodeandedgeattributes.Recently,graphrepresentationlearning[

]emergedtobeapromisingdirection.

Deﬁnition5(GraphRepresentationLearning).Givenagraphg=(v,s),thetaskofgraphrepresentationlearning(orequivalentlygraphembedding)istolearnamappingfunctiontogeneratevectorrepresentationsforgraphelementsfGRL:g→Z,suchthatthelearnedrepresentations(Z),i.e.,embeddings,cancapturethestructureandsemanticsofgraph.Themappingfunction’seffectivenessisevaluatedbyapplyingZtodifferentdownstreamtasks.AtoyexampleshowsthepipelineinFigure

-(a)-(b).

DependingontheGraphRepresentationLearning(GRL)model’sinherentarchitecture,existingGRLmethodscanbecategorisedinto“shallow”or“deep”groups.ShallowGRLmethodscompriseanembeddinglookuptablethat

directlyencodeseachnodeasavectorandisoptimisedduringtraining.ThedeepGRLmethods-GraphNeuralNetworks(GNNs)-haverecentlyshownpromisingresultsinmodellingstructuralandrelationaldata[

].Deﬁnition6(GraphMachineLearningTraining).Givenagraphg=(v,s)andagraphrepresentationlearningmodelfGRL.ThegraphmachinelearningtrainingmechanismcanbedeﬁnedasoptimisingtheparametersoffGRLtominimisethedifferencesbetweenpredictionsandtrainingsignals:

findargminc(fGRLe(g),Y)(1)

whereerepresentsthetrainableparametersoffGRL,cisthelossfunctiontomeasurethedifferencesbetweenpredictionsandtrainingsignals.ThetrainingsignalYcanbeadiscreteone-hot/multi-hotvector(classiﬁcation)oracontinuousvector(regression,linkprediction).DifferentlossfunctionsandoptimisationapproachescanbeadoptedaccordingtotherequirementsoffGRLandthedownstreamtasks.

2.1ShallowGraphRepresentationLearning

ShallowGRLmethodscompriseanembeddinglookuptablewhichdirectlyencodeseachnodeasavectorandisoptimisedduringtraining.Withinthisgroup,severalSkip-Gram[

]-basedNEmethodshavebeenproposed,suchasDeepWalk[

]andnode2vec[

],aswellastheirmatrixfactorisationinterpretationNetMF[

],LINE[

]andPTE[

].AsdepictedinFigure

-(c),DeepWalkgenerateswalksequencesforeachnodeonanetworkusingtruncatedrandomwalksandlearnsnoderepresentationsbymaximisingthesimilarityofrepresentationsfornodesthatoccurinthesamewalks,thuspreservingneighbourhoodstructures.Node2vecincreasestheexpressivityofDeepWalkbydeﬁningaﬂexiblenotionofanode’snetworkneighbourhoodanddesigningasecond-orderrandomwalkstrategytosampletheneighbourhoodnodes;LINEisaspecialcaseofDeepWalkwhenthesizeofthenode’scontextissettoone;PTEcanbeviewedasthejointfactorisationofmultiplenetworks’Laplacians[

].Tocapturethestructuralidentityofnodesindependentofnetworkpositionandneighbourhood’slabels,struc2vec[

]constructsahierarchytoencodestructuralnodesimilaritiesatdifferentscales.Despitetheirrelativesuccess,shallowGRLmethodsoftenignoretherichnessofnodeattributesandonlyfocusonthenetworkstructuralinformation,whichhugelylimitstheirperformance.

APREPRINT-FEBRUARY17,2023

2.2DeepGraphRepresentationLearningwithNeuralNetwork

GraphNeuralNetworks(GNNs)areaclassofneuralnetworkmodelssuitableforprocessinggraph-structureddata.TheyusethegraphstructureAandnodefeaturesXtolearnarepresentationvectorofanodezv,ortheentiregraphzζ.ModernGNNs[

]followacommonideaofarecursiveneighbourhoodaggregationormessage-passingscheme,whereweiterativelyupdatetherepresentationofanodebyaggregatingrepresentationsofitsneighbouringnodes.Afterliterationsofaggregationormessage-passing,anode’srepresentationcapturesthegraphstructuralinformationwithinl-hopneighbourhood.Thus,wecanformallydeﬁnel-thlayerofaGNNas:

me)=AGGREGATEN([Au,v,h—1)|ueN(u)}),

me)=AGGREGATEI([Au,v|ueN(u)})h(2)

he)=COMBINE(mme))

whereAGGREGATEN(.)andAGGREGATEI(.)aretwoparameterisedfunctionstolearnduringtrainingprocess.me)isaggregatedmessagefromnodeu’sneighbourhoodnodesN(u)withtheirstructuralcoefﬁcients,andme)istheresidualmessagefromnodeuafterperforminganadjustmentoperationtoaccountforstructuraleffectsfromitsneighbourhoodnodes.After,he)isthelearnedasrepresentationvectorofnodeuwithcombiningme)andme),withaCOMBINE(.)function,atthel-thiteration/layer.Notethatweinitialiseh=xvandtheﬁnallearnedrepresentationvectorafterLiterations/layerszv=hWeillustratethelearningmechanismofGNNmodelsinFigure2-(d).Inaddition,intermsoftherepresentationofanentiregraph(zζ),wecanapplyaREADOUTfunctiontoaggregatenoderepresentationsofallnodesofthegraphg,as

zζ=READOUT([zv|Vzvev})(3)

whereREADOUTcanbeasimplepermutationinvariantfunctionsuchassummationoramoresophisticatedgraph-levelpoolingfunction.

FollowingthegeneralstructureofGNNsasdeﬁnedinEquation

,wecanfurthergeneralisetheexistingGNNsasvariantsofit.Forinstance,severalclassicandpopularGNNscanbesummarisedasTable

Table2:DeﬁnedifferentGNNvariantsaccordingtoEquation

GNNModel

AGGREGATEN(.)

AGGREGATEI(.)

COMBINE(.)

GCN[

]

W(e)w−1)

u∈Ⅳ(v)′|Ⅳ(u)||Ⅳ(v)|

W(e)w−1

′|Ⅳ(v)||Ⅳ(v)|

A(SUM(mme)))

GraphSAGE[

]

AGG([h—1)|ueN(u)})

he—1)

A(W(e).CONCAT(mme)))

GAT[

]

√u,vW(e)h—1)

u∈Ⅳ(v)

√vvW(e)he—1)

A(SUM(mme)))

GIN[

]

h—1)

u∈Ⅳ(v)

(1+e)he—1)

MLPθ(SUM(mme))))

WeonlyreviewpriorandconcurrentworkonGMLrelatedtoourcontributionswherenecessary.ForanoverviewofrecentvariantsandapplicationsofGML,werecommendthecomprehensivereviewarticles[

HYPERLINK\l"_

人人文库> 全部分类> 行业资料 > 管理策划

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY

文档简介

温馨提示

最新文档

评论

知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY

文档简介

温馨提示

最新文档

评论

相关文档