知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第1页
知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第2页
知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第3页
知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第4页
知识增强图机器学习在药物发现中的应用 KNOwLEDGE-AUGMENTED GRAPH MACHINE LEARNINGFOR DruG DISCOVERY -A SurVEY FROM PRECISION TO INTERPRETABILITY_第5页
已阅读5页,还剩70页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

arXiv:2302.08261v1[cs.LG]16Feb2023

KNOWLEDGE-AUGMENTEDGRAPHMACHINELEARNING

FORDRUGDISCOVERY

ASURVEYFROMPRECISIONTOINTERPRETABILITY

DavideMottin

AarhusUniversitydavide@cs.au.dk

ZhiqiangZhong

AarhusUniversityzzhong@cs.au.dk

AnastasiaBarkova

WhiteLabGenomics

abarkova@

February17,2023

ABSTRACT

TheintegrationofArtificialIntelligence(AI)intothefieldofdrugdiscoveryhasbeenagrowingareaofinterdisciplinaryscientificresearch.However,conventionalAImodelsareheavilylimitedinhandlingcomplexbiomedicalstructures(suchas2Dor3Dproteinandmoleculestructures)andprovidinginterpretationsforoutputs,whichhinderstheirpracticalapplication.Asoflate,GraphMachineLearning(GML)hasgainedconsiderableattentionforitsexceptionalabilitytomodelgraph-structuredbiomedicaldataandinvestigatetheirpropertiesandfunctionalrelationships.Despiteextensiveefforts,GMLmethodsstillsufferfromseveraldeficiencies,suchasthelimitedabilitytohandlesupervisionsparsityandprovideinterpretabilityinlearningandinferenceprocesses,andtheirineffectivenessinutilisingrelevantdomainknowledge.Inresponse,recentstudieshaveproposedintegratingexternalbiomedicalknowledgeintotheGMLpipelinetorealisemorepreciseandinterpretabledrugdiscoverywithlimitedtraininginstances.However,asystematicdefinitionforthisburgeoningresearchdirectionisyettobeestablished.Thissurveypresentsacomprehensiveoverviewoflong-standingdrugdiscoveryprinciples,providesthefoundationalconceptsandcutting-edgetechniquesforgraph-structureddataandknowledgedatabases,andformallysummarisesKnowledge-augmentedGraphMachineLearning(KaGML)fordrugdiscovery.AthoroughreviewofrelatedKaGMLworks,collectedfollowingacarefullydesignedsearchmethodology,areorganisedintofourcategoriesfollowinganovel-definedtaxonomy.Tofacilitateresearchinthispromptlyemergingfield,wealsosharecollectedpracticalresourcesthatarevaluableforintelligentdrugdiscoveryandprovideanin-depthdiscussionofthepotentialavenuesforfutureadvancements.

1Introduction

Drugdiscoveryanddevelopmenthavebeenoneofthemostprominentandchallengingresearchtasksfordecades[

1

,

2

,

3

].Priortoadrugbeingmarketedanddistributedtopatients,itmustundergoamultitudeofresearchvalidations.From initialearlydrugdiscoverytopreclinicaldevelopment,andsubsequenttoclinicaltrialsandfinalregulatoryapproval,itusuallytakes10-15yearsandcostsaround2billionUSdollars[

4

,

5

,

6

].Thedrugdevelopmentprocesstypicallybeginswiththeidentificationofthetargetproteinornucleicacidinvolvedinaspecificdiseaseduringearlydrugdiscovery.Thisisfollowedbytheidentificationofasmallmoleculeorbiologicdrug(suchasanantibodyorprotein)thatwill interactwiththetargetandmodulateitsactivitywiththeaimofcuringorpreventingthedisease.Inthecaseofsmallmolecules,high-throughputscreeningexperimentsareperformedtoidentifypromisingcompounds,aprocessknownas“hitidentification”.Fromthesehits,somecompoundsareselectedthroughinvitroandinvivoassaysandarechemicallyoptimisedtoimprovepropertiessuchasstability,affinityorsolubility,togiverisetothe“lead”compounds.Afterseveralroundsofstructuraloptimisation,theleadmoleculebecomesadrugcandidateandcanproceedtopreclinicalstudiesinanimals,followedbyclinicalstudiesinhumans.Theidealdrugshouldbenon-toxicandhaveasfewsideeffectsaspossibleforthepatientswhilebeingsolubleandeffectivelyinteractingwiththetarget.Eachstepoftheprocessischaracterisedbyahighrateoffailureandsubstantialcosts.

2

OtherProficient

BiomedicalData

hasfunction

Gene

GeneOntology

Gene

hasPhenotype

hasindication

Drug

hassideeffect

Phenotype

PhenotypeOntology

(a)

APREPRINT-FEBRUARY17,2023

BiomedicalDataareGraphs

(b)

HumanBiomedicalKnowledge

TheoriesandEquations

DescriptionContext

MoleculeNetworkProteinStructure

Disease

DiseaseOntology

KnowledgeGraph

VirusStructureDrug-DrugInteraction

Figure1:Illustrationofreal-worldbiomedicaldataintheformofgraphsandexamplesofhumanbiomedicalknowledge.(a)Graphsareanaturalwaytorepresentbiomedicaldata,suchasmolecule2Dor3Dnetwork,proteinstructureisa3Dgraphthatrecordsthechemicalforcebetweenaminoacidresiduesandinteractionsbetweendrugsrepresentedasagraph.(b)Toyexamplesrepresenthumanbiomedicalknowledgetasksindifferentforms.Forinstance,formalscientificknowledgeincludeswell-definedtheoriesandequations.Also,thereismoregeneralexperimentalknowledge,includingdescriptioncontext,otherproficientdataandknowledgegraph.

Toreducethefinancialburdenandincreasethesuccessrate,researchershavebeenworkingonacceleratingdrugdiscoverybytakingadvantageofremarkableArtificialIntelligence(AI)techniques[

7

,

8

,

9

,

10

,

11

].Technologicaladvancesnowallowforthecreationofvastamountsofdatainareassuchasgenomics,proteomics,andimaging,whichcanbeusedtoinformthedrugdiscoveryprocess[

9

,

12

].AIcananalysethesedataandidentifypatternsandrelationshipsthatmightnothavebeenotherwisenoticeable,leadingtotheidentificationofnewtargetsandtheoptimisationofexistingones[

13

,

8

].AI-baseddrugdiscoveryisalsobeingusedtostreamlinetheprocessofdrugdevelopmentbypredictingthelikelihoodofsuccessofacandidatedrug,reducingthetimeandcostrequiredtobringanewdrugtomarket[

10

,

14

].Furthermore,AIisbeingusedtopredictpotentialsideeffectsandtoxicity,allowingfortheidentificationofpotentialsafetyissuesbeforeclinicaltrials[

15

,

11

].Withtheseadvancements,AIhasthepotentialtotransformthedrugdiscoveryprocess[

7

],enablingfasterandmoreefficientdrugdevelopmentandbringingnewtherapiestopatientsmorequickly.

Biomedicaldataishighlyinterconnected[

16

,

17

]andcanbeeasilyrepresentedasgraphs(ornetworks),whichhaveavarietyofapplicationsatdifferentstagesofthedrugdiscoveryanddevelopmentprocess.Forinstance,asillustratedinFigure

1

-(a),biomedicaldatacanbehierarchicallyrepresentedasgraphs.Startingfromthemolecularlevel,atomscanberepresentedasnodes,andchemicalbondsasedgesof(2Dor3D)moleculargraphs[

18

,

19

].Onthemacro-moleculelevel,interactions(edges)betweenaminoacidresidues(nodes)organiseas(2Dor3D)proteingraphs[

20

,

21

].Atthecompoundlevel,edgesinthedrug-druginteraction(DDI)networkcanindicatechemicalinteractions(edges)betweendrugs(nodes)measuredbylong-termclinicalscreens[

22

,

23

].

Nevertheless,conventionalAItoolsstrugglewiththehandlingofcomplexgraph-structureddata.Thefeatureextractorsemployedbymachinelearningmodelsareoftennottransferable,requiringmanualdesignforeachspecificdatasetandtask.Althoughdeeplearningmodels[

24

,

25

,

26

]havethecapabilitytolearnfromrawdata,theyarestilllimitedintheirabilitytohandlecomplexgraphstructures.Inresponse,GraphMachineLearning(GML),anewclassofAImethods,hasbeenproposedtoinvestigategraph-structureddata.TheessentialideaofGMListolearneffectivefeaturerepresentationsofnodes(e.g.,drugsinDDInetworks),edges(e.g.,relationsorinteractionsbetweendrug-drugordrug-disease),orthe(sub)graphs(e.g.,moleculargraphs)[

27

].Thesecorrespondingnode-,edge-and(sub)graph-leveldownstreamtaskscanberealisedbasedontheselearnedrepresentations.Accordingtodifferentrepresentationlearningmechanisms,GMLapproachescanbebroadlycategorisedinto“shallow”and“deep”classes.Inparticular,atypeofdeepGMLmethodcalledGraphNeuralNetworks(GNNs)[

28

,

29

,

30

,

31

,

32

],whicharedeepneuralnetwork

3

APREPRINT-FEBRUARY17,2023

architecturesspecificallydesignedforgraph-structuredata,areattractinggrowinginterest.GNNsiterativelyupdatethefeaturesofgraphnodesbypropagatingtheinformationfromtheirneighbouringnodes.Thesemethodshavealreadybeensuccessfullyappliedtoarangeoftasksanddomains,includingdrugdiscovery[

16

,

33

,

34

].

However,despitethecurrentpaceofGMLindrugdiscovery,theysufferfromseveralseriousdeficiencies,includinghighdatadependency(i.e.,strongperformancereliesonhigh-qualifiedtrainingdataset)[

35

,

36

]andpoorgeneralisation(i.e.,uncertainmodelperformanceoninstancesthathaveneverbeenobservedintrainingdata)[

37

,

38

].Thesedeficienciesoriginateprimarilyfromthemodels’data-drivennatureandtheirinabilitytoexploitthedomainknowledgeeffectively.Inaddition,therehasbeenanincreaseddemandformethodsthathelppeopleunderstandandinterprettheunderlyingmodelsandprovidesmoretrustworthiness.Inanefforttomitigatethelackofinterpretabilityandtrustworthinessofcertainmachinelearningmodelsandtoaugmenthumanreasoninganddecision-making,attentionhasbeendrawntoeXplainableArtificialIntelligence(XAI)[

39

]andTrustworthyArtificialIntelligence(TAI)[

40

]approaches,thatprovidehuman-comprehensibleexplanationsforthemodel’sinherentmechanismandoutputs.

Toaddresstheselimitations,researchersrecentlypaidattentiontoanewAIparadigm,whichwerefertoasKnowledge-augmentedGraphMachineLearning(KaGMLinshort),forsuperiordrugdiscovery.ItscoreideaistointegrateexternalhumanbiomedicalknowledgeintodifferentcomponentsoftheGMLpipelinetoachievemoreaccuratedrugdiscovery,alongwithuser-friendlyinterpretations,whichguaranteestheexpert’sknowledgeisnottobesubstituted.Biomedicalknowledgemayexistinvariousforms,asshowninFigure

1

-(b),includingformalscientificknowledge(e.g.,well-establishedlawsortheoriesinadomainthatgovernthepropertiesorbehavioursoftargetvariables),informalexperimentalknowledge(e.g.,well-knownfactsorrulesextractedfromlongtimeobservationsandcanalsobeinferredthroughhumans’reasoning).Thecontributionsofthissurveyarethefollowing:

•WearethefirsttoproposetheconceptofKaGMLandcomprehensivelysummariseexistingwork.ThediscussionbetweenKaGMLandexistingotherparadigmsemphasisesthenoveltyofKaGMLanditspromisingpotentialforpracticalmedicalapplications.

•WeproposeanoveltaxonomyofKaGMLapproachesaccordingtodifferentschemestoincorporateknowledgeintotheGMLpipeline.Itiseasierforthereaderstoidentifythecoredesignofdifferentmodelsandlocatetheinterestingcategories(Section

5

).Wecreatedapublicfoldertosharecollectedresources

1

andwillcontributetoitcontinuously.

•Wecarefullydiscusspracticaltoolsandknowledgedatabasesthathavebeen(orarehighlypossibletobe)usedbyKaGMLmethodstosolvepracticaldrugdiscoveryproblems(Section

6

).Weprovideaschematicrepresentationofpossibleschemestoorganisedifferentknowledgedatabasesaboutsmallmoleculedrugsintooneknowledgegraph.

•Wecoverthemethodologiesnotonlyforsolvingscientificproblemsunderacomputersciencescenariobut,moreimportantly,forreal-worldbiomedicalapplications.OursurveyishenceofinterestnotonlytoAIresearchersbutalsotobiologistsindifferentfields.Section

7

discussespromisingfutureworkforresearchersfrombothdisciplines

toexploit.

Table1:Summaryofthekeywordsusedintheliteraturesearch.

Area

Keywords

DrugDiscovery

KnowledgeGraph

GraphMachineLearning

DrugDiscovery,DrugDesign,DrugDevelopment,MedicineDiscovery,MedicineDesign,MedicineDevelopment

Knowledge-augmented,Knowledge-aware,Knowledge-informed,Knowledge-guided,Knowledge-enhanced,Knowledge-driven

GraphMachineLearning,GraphNeuralNetwork,GeometricMachineLearning

Searchmethodology.Allstudieswereretrievedinoneofthreefollowingways:(i)acomprehensivetop-downapproachthatconductedanextensivesearchofKaGMLpapersfrommajoracademicdatabasessuchasGoogleScholar,IEEExplore,ACMDigitalLibrary,DBLPComputerScienceBibliography,andScienceDirect,usingkeywordslistedinTable

1

;(ii)abottom-upapproachthatsurveyedrecentresearchoutputsinAIconferencesandworkshops;(iii)athoroughexaminationoftherelatedwork,discussionsections,andcitedreferencesfromthepapersobtainedinsteps(i)and(ii)toidentifyoverlookedworks.Wekeyword-searchedforworkscontainingaconjunctionofanyofthetermssummarisedinTable

1

,leadingtoaselectionofmorethan1,000articles.Approximately100werethoroughlyscannedaccordingtothecriteria,andabout20wereidentifieddirectlyfromtherelatedworksections.Wheneverpossible,weprioritisedpeer-reviewedpublicationsandmajorjournals/conferences(e.g.,Nature,Nat.Commun,Nat.Mach.

1

/zhiqiangzhongddu/Awesome-Knowledge-augmented-GML-for-Drug-Discovery

4

Nodeattributematrix

Adjacencymatrix

APREPRINT-FEBRUARY17,2023

Intell,NeurIPSICML,ICLR,AAAI,KDD)towhitepapersorunreviewedsubmissions.Studieswereselectedonlyifpresentingasubsymbolicsystem,includingsomeformsofincorporatingbiomedicalknowledgeintoGMLmodelsforprecisiondrugdiscoveryandproducinganyexplanationsusingbackgroundknowledgewithGMLmodels.ThefinallyidentifiedpapersaresummarisedintodifferentcategoriesinTable

4

.

Planofthefollowingsections.Therestofthispaperisorganisedasfollows.Section

2

introducestheconceptofgraphmachinelearninganddiverseaddressedtasks.Section

3

discussesthehumanknowledgedatabaseandknowledgegraphconcept.Section

4

providesatechnicalexpositionofprevailingintelligentdrugparadigmsanddescribeskeyapplicationsofgraphmachinelearningandknowledgegraphindrugdiscovery.Followinganovel-definedtaxonomy,wediscussthecollectedKaGMLfordrugdiscoverypapersinSection

5

.RelevantpracticalresourcesareshowninSection

6

,includingrelevantscientifictools,knowledgedatabasesandaschematicrepresentationofpossibleschemestoorganiseknowledgedatabasesaboutsmallmoleculedrugsfromvariousaspectsintooneknowledgegraph.Intheend,wescrutinisethepotentialdirectionsofKaGMLandconcludethepaperinSection

7

.Toassistreadersinfindingrelevantcontent,thepaperutilisesboxestohighlightcloselyrelatedtopics,includesfigurestoillustrateexamples,andtablestopresentcontrastingtopics.

2GraphMachineLearning

(a)

(b)

Graph

口Nodepropertyprediction

口Linkprediction

口Graphpropertyprediction

口Graphmodification/generation

口Etc.

36⋮61

0.4

0.3

X=⋮0.10.7

0

0A=⋮

1

0

4.4

9.1

0

1.8

00⋮01

10⋮01

01⋮10

Vector

basedrepresentations

(c)

v

u

PR(v|u)

EstimatetheprobabilityofvisitingnodevonarandomwalkstartingfromnodeuusingsomerandomwalkstrategyR.

(d)

1sthopaggregation

Figure2:Toyexamplesofthegraphandtypicalgraphrepresentationlearningapproaches.(a):AgraphcanbebasicallyrepresentedusinganodeattributematrixXandanadjacencymatrixA.(b)Graphrepresentationlearningcanconvertagraphintoasetofvectors,whichrecordinformationaboutthegraph.(c)Atoyexampleofrandomwalk-basedshallowGRLapproaches.(d)AtoyexampleofGNNmechanism.

Box1.FundamentalsofGraphMachineLearning

Definition1(Graph).Agraphwithnnodescanbeformallyrepresentedasg=(v,s),whichconsistsofnnodesuevand|v|=n.sCvxvdenotesthesetofedges,whereeu,vdenotestheedgebetweenuandu.NodeattributevectorxueRddescribessideinformationandmetadataofnodeu.ThenodeattributematrixXeRn×dcontainsattributevectorsforallnodesinthegraph.Similarly,edgeattributesxu,veRτforedgeeu,vcanbetakentogethertoformanedgeattributematrixXeeRm×τ.Apathfromnodeu1tonodeukisasequenceofedgesu1e1--u2...uuk.Forsubsequentdiscussion,wesummarisevandsintoanadjacencymatrixAe[0,1}n×n,whereeachentryAu,vis1ifeu,vexists,and0otherwise.AnexamplegraphanditsnodeattributematrixandadjacencymatrixareshowninFigure

2

-(a).

5

APREPRINT-FEBRUARY17,2023

Definition2(Neighbourhood).Fornodeo,itsneighbourhoodN(o)arenodesdirectlyconnectedtooing,andthenodedegreeisthesize|N(o)|.

Definition3(9-hopNeighbourhood).The9-hopneighbourhoodofnodeoisthesetofnodesthatareatadistancelessthanorequalto9fromnodeo,thatis,Nλ(o)=[u|0<d(o,u)<9}whered(.)denotestheshortestpathdistance.

Definition4(9-hopSubgraph).Subgraphsλ(o)=(v,s)isasubsetofagraphg,wherev:=(Nλ(o)n[o})ands:=((vxv)ns).

GraphAnalysisinArtificialIntelligenceEra.Toprocessthegraph-structureddata,GraphMachineLearning(GML)

[30]

isdesignedasapredominantapproachtofindingeffectivedatarepresentationsfromgraphdata.TheprincipaltargetofGMListoextractthedesiredfeaturesofagraphasinformativerepresentationsthatcanbeeasilyusedbydownstreamtaskssuchasnode-level,edge-levelandgraph-level,analysis,classificationandregressiontasks.TraditionalGMLapproachesmainlyrelyonhandcraftedfeatures,includinggraphstatistics

[41],

(e.g.,degree,centralityandclusteringcoefficient),kernelfunctions[

42

]andexpertsdesignedfeatures[

43

].However,traditionalGMLmodelsarebuiltontopofmanuallydesignedorprocessedfeaturesets.Thedevelopedfeatureextractorsareoftennottransferableandneedtobedesignedspecificallyforeachdatasetandtask.Theseconventionalapproachesoftensufferfrompracticallimitsonlarge-scalegraphswithrichnodeandedgeattributes.Recently,graphrepresentationlearning[

28

,

29

,

30

]emergedtobeapromisingdirection.

Definition5(GraphRepresentationLearning).Givenagraphg=(v,s),thetaskofgraphrepresentationlearning(orequivalentlygraphembedding)istolearnamappingfunctiontogeneratevectorrepresentationsforgraphelementsfGRL:g→Z,suchthatthelearnedrepresentations(Z),i.e.,embeddings,cancapturethestructureandsemanticsofgraph.Themappingfunction’seffectivenessisevaluatedbyapplyingZtodifferentdownstreamtasks.AtoyexampleshowsthepipelineinFigure

2

-(a)-(b).

DependingontheGraphRepresentationLearning(GRL)model’sinherentarchitecture,existingGRLmethodscanbecategorisedinto“shallow”or“deep”groups.ShallowGRLmethodscompriseanembeddinglookuptablethat

directlyencodeseachnodeasavectorandisoptimisedduringtraining.ThedeepGRLmethods-GraphNeuralNetworks(GNNs)-haverecentlyshownpromisingresultsinmodellingstructuralandrelationaldata[

31

].Definition6(GraphMachineLearningTraining).Givenagraphg=(v,s)andagraphrepresentationlearningmodelfGRL.ThegraphmachinelearningtrainingmechanismcanbedefinedasoptimisingtheparametersoffGRLtominimisethedifferencesbetweenpredictionsandtrainingsignals:

findargminc(fGRLe(g),Y)(1)

θ

whereerepresentsthetrainableparametersoffGRL,cisthelossfunctiontomeasurethedifferencesbetweenpredictionsandtrainingsignals.ThetrainingsignalYcanbeadiscreteone-hot/multi-hotvector(classification)oracontinuousvector(regression,linkprediction).DifferentlossfunctionsandoptimisationapproachescanbeadoptedaccordingtotherequirementsoffGRLandthedownstreamtasks.

2.1ShallowGraphRepresentationLearning

ShallowGRLmethodscompriseanembeddinglookuptablewhichdirectlyencodeseachnodeasavectorandisoptimisedduringtraining.Withinthisgroup,severalSkip-Gram[

44

]-basedNEmethodshavebeenproposed,suchasDeepWalk[

45

]andnode2vec[

46

],aswellastheirmatrixfactorisationinterpretationNetMF[

47

],LINE[

48

]andPTE[

49

].AsdepictedinFigure

2

-(c),DeepWalkgenerateswalksequencesforeachnodeonanetworkusingtruncatedrandomwalksandlearnsnoderepresentationsbymaximisingthesimilarityofrepresentationsfornodesthatoccurinthesamewalks,thuspreservingneighbourhoodstructures.Node2vecincreasestheexpressivityofDeepWalkbydefiningaflexiblenotionofanode’snetworkneighbourhoodanddesigningasecond-orderrandomwalkstrategytosampletheneighbourhoodnodes;LINEisaspecialcaseofDeepWalkwhenthesizeofthenode’scontextissettoone;PTEcanbeviewedasthejointfactorisationofmultiplenetworks’Laplacians[

47

].Tocapturethestructuralidentityofnodesindependentofnetworkpositionandneighbourhood’slabels,struc2vec[

50

]constructsahierarchytoencodestructuralnodesimilaritiesatdifferentscales.Despitetheirrelativesuccess,shallowGRLmethodsoftenignoretherichnessofnodeattributesandonlyfocusonthenetworkstructuralinformation,whichhugelylimitstheirperformance.

6

APREPRINT-FEBRUARY17,2023

2.2DeepGraphRepresentationLearningwithNeuralNetwork

GraphNeuralNetworks(GNNs)areaclassofneuralnetworkmodelssuitableforprocessinggraph-structureddata.TheyusethegraphstructureAandnodefeaturesXtolearnarepresentationvectorofanodezv,ortheentiregraphzζ.ModernGNNs[

51

]followacommonideaofarecursiveneighbourhoodaggregationormessage-passingscheme,whereweiterativelyupdatetherepresentationofanodebyaggregatingrepresentationsofitsneighbouringnodes.Afterliterationsofaggregationormessage-passing,anode’srepresentationcapturesthegraphstructuralinformationwithinl-hopneighbourhood.Thus,wecanformallydefinel-thlayerofaGNNas:

me)=AGGREGATEN([Au,v,h—1)|ueN(u)}),

me)=AGGREGATEI([Au,v|ueN(u)})h(2)

he)=COMBINE(mme))

whereAGGREGATEN(.)andAGGREGATEI(.)aretwoparameterisedfunctionstolearnduringtrainingprocess.me)isaggregatedmessagefromnodeu’sneighbourhoodnodesN(u)withtheirstructuralcoefficients,andme)istheresidualmessagefromnodeuafterperforminganadjustmentoperationtoaccountforstructuraleffectsfromitsneighbourhoodnodes.After,he)isthelearnedasrepresentationvectorofnodeuwithcombiningme)andme),withaCOMBINE(.)function,atthel-thiteration/layer.Notethatweinitialiseh=xvandthefinallearnedrepresentationvectorafterLiterations/layerszv=hWeillustratethelearningmechanismofGNNmodelsinFigure2-(d).Inaddition,intermsoftherepresentationofanentiregraph(zζ),wecanapplyaREADOUTfunctiontoaggregatenoderepresentationsofallnodesofthegraphg,as

zζ=READOUT([zv|Vzvev})(3)

whereREADOUTcanbeasimplepermutationinvariantfunctionsuchassummationoramoresophisticatedgraph-levelpoolingfunction.

FollowingthegeneralstructureofGNNsasdefinedinEquation

2

,wecanfurthergeneralisetheexistingGNNsasvariantsofit.Forinstance,severalclassicandpopularGNNscanbesummarisedasTable

2

.

Table2:DefinedifferentGNNvariantsaccordingtoEquation

2

.

GNNModel

AGGREGATEN(.)

AGGREGATEI(.)

COMBINE(.)

GCN[

52

]

W(e)w−1)

u∈Ⅳ(v)′|Ⅳ(u)||Ⅳ(v)|

W(e)w−1

′|Ⅳ(v)||Ⅳ(v)|

A(SUM(mme)))

GraphSAGE[

53

]

AGG([h—1)|ueN(u)})

he—1)

A(W(e).CONCAT(mme)))

GAT[

54

]

√u,vW(e)h—1)

u∈Ⅳ(v)

√vvW(e)he—1)

A(SUM(mme)))

GIN[

55

]

h—1)

u∈Ⅳ(v)

(1+e)he—1)

MLPθ(SUM(mme))))

WeonlyreviewpriorandconcurrentworkonGMLrelatedtoourcontributionswherenecessary.ForanoverviewofrecentvariantsandapplicationsofGML,werecommendthecomprehensivereviewarticles[

56

,

HYPERLINK\l"_

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论