版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
QueryingBigGraphswithinBoundedResources1YinghuiWuUCSantaBarbaraWenfeiFanUniversityofEdinburghSouthwestJiaotongUniversityXinWangBigreal-lifegraphs2socialscale100B(1011)Webscale1T(1012)brainscale,100T(1014)Real-lifescope100M(108)USroadHumanConnectome,(TheHumanConnectomeProject,NIH)knowledgegraphBTCSemanticWebWebgraph(Google)Internet(Opteproject)AnNSABigGraphexperiment,P.Burkhardt,etal,US.NationalSecurityAgency,May2013Socialgraph(300PBuserdata)Queryingbiggraphs3GivenaqueryQandadatagraphG,findanswersQ(G)Graphpatternmatching:knowledgediscovery,socialrecommendation,drugdesigning…Reachability:cybersecurity,metabolicanalysis,softwareengineering,Internetofthings…ChallengesGraphsaretoobigHardtoreducecomputationcomplexityLimitedresourceState-of-the-artTractableapproachesSSDlinearscanfornodesearch:1PB->1.9days,1EB->5.28yrsIndexing&CompressionCanwestillanswerQwithlimitedresource?Outline4ResourceboundedqueryansweringLocalized:GraphPatternQueriesResourceboundedsimulationqueriesResourceboundedsubgraphisomorphismNon-localized:ReachabilityResourceboundedreachabilityExperimentalstudyConclusion&FutureworkQueriesanddatagraph5Localizedqueries:canbeansweredlocallyGraphpatternqueries:simulationqueries(personalizedsocialsearch,egonetworkanalysis…)matchingrelationoverdQ-neighborhood
ofapersonalizednodeNon-localizedqueriesReachabilityqueriesMichael(Personalizednode)hikinggroupcyclingclub
member?cyclingloversMichael(uniquematch)hikinggroup………cycling
club
membercyclingfanshgmhg1hg2cc1cc2cc3cl1cl2cln-1clnMichael:“findcyclingfanswhoknowbothmyfriendsincyclingclubandmyfriendsinhikinggroups”(IBMWatson,FacebookGraphSearch,AppleSiri,WolframAlphaSearch…)Michaelcc1cl3cl7cln-1cl4cl9cl5……cl6cl16Eric…CanwestillanswerQwithlimitedresource?Makingbiggraph“small”6Idea:usingasmallgraphinsteadofGtomakeitfeasibletoanswerexpensivequeriesinbiggraphs.
Reduction(boundedresources:time,space,energy…)queryexactresultsqueryapproximate
resultsApproximation(guaranteedquality:
accuracy,errorrate,…)biggraphsmallgraphexpensive!Resource-boundedqueryanswering7onlinereductionsize|GQ|<=α|G|visitα*c|G|amountofdata(α*c<1)queryresultsqueryresultsApproximationAccuracy>=ηbiggraphGsmallgraphGQexpensive!Resource-boundedalgorithmAforqueryclassL
andanyG:withresourceboundαhasaccuracyguaranteeη
Hardnessresults8Exactresource-boundedquerying:η=100%IntractabilityNP-hardforsimulationqueries(evenwhenQisapathandGisaDAG)ReductionfromSetCoverNP-hardforsubgraphqueriesImpossibilityForanyα,thereexistsNOalgorithmforreachabilityquerieswithresource-boundαand100%accuracybound
Resource-boundedsimulation9
Reductionsize|GQ|<=α|G|inO(dG|Q||GQ|)timeSimulationqueryresultsqueryresultsApproximation100%forα>=biggraphGsmallgraphGQO(|Q||G|+|G|2)dG:maximumdegreeofdQ-neighborhoodgraphofp-node;d:diameterofQ;l:distinctlabelsizeinQf:maxnumberofnodeswithasamelabel&neighborinQResource-boundedsimulation:dynamicreduction10Preprocessing(auxiliaryinformation)dynamicreduction(computereducedsubgraph)ApproximatequeryevaluationoverreducedsubgraphlocalauxiliaryinformationGBooleanguardedcondition:labelmatchingCostfunctionc(u,v)Potentialfunctionp(u,v),estimatedprobabilitythatvmatchesuboundb,determinesanupperboundofthenumberofnodestobevisitedQdegree|neighbor|<label,frequency>…uvuvlabelmatchDynamicallyupdatedauxiliaryinformationuv?Resource-boundedsimulation:dynamicreduction11preprocessingdynamicreduction(computereducedsubgraph)ApproximatequeryevaluationoverreducedsubgraphMichaelhikinggroupcyclingclub?cyclingloversMichaelhikinggroup…cycling
club
membercyclingfanshgmhg1hg2cc1cc2cc3cl1cl2cln-1clncyclingclub
cc1cc2cc3cycling
club
member?cyclingloverscln-1clncyclingfanshgmhikinggrouphikinggroupFALSE---TRUECost=1Potential=3Bound=2TRUECost=1Potential=2Bound=2bound=14visited=16Matchrelation:(Michael,Michael),(hikinggroup,hgm),(cyclingclub,cc1),(cyclingclub,cc3),(cyclinglover,cln-1),(cyclinglover,cln)Resource-boundedreachability12
Reductionsize|GQ|<=α|G|ReachabilityqueryresultsReachabilityqueryresultsApproximation(experimentallyVerified;nofalsepositive,intimeO(α|G|)biggraphGsmalltreeindexGQO(|G|)Preprocessing:landmarks13Preprocessingdynamicreduction(computelandmarkindex)ApproximatequeryevaluationoverlandmarkindexMichaelcc1cl3cl7cln-1cl4cl9cl5……cl6cl16Eric…LandmarksalandmarknodecoverscertainnumberofnodepairsReachabilityofthepairsitcoverscanbecomputedbylandmarklabelscc1“Icanreachcl3”cl3cln-1,“cl3canreachme”cl4…cl6cl16HierarchicallandmarkIndex14LandmarkIndexlandmarknodesareselectedtoencodepairwisereachabilityHierarchicalindexing:applymultipleroundsoflandmarkselectiontoconstructatreeoflandmarkscc1cl7cln-1Michaelcc1cl3cl7cln-1cl4cl9cl5……cl6cl16Eric……cl16cl3cl5cl6cl4cl9…Booleanguardedcondition(v,source,dst)Costfunctionc(v):sizeofunvisitedlandmarksinthesubtreerootedatvPotentialP(v),totalcoversizeofunvisitedlandmarksasthechildrenofvCoversizeLandmarklabels/encodingTopologicalrank/rangeResource-boundedreachability15Michaelcc1cl3cl7cln-1cl4cl9cl5……cl6cl16Ericcc1…cl7cln-1…cl16cl3cl5cl6cl4MichaelEric“drilldown”?cl9…localauxiliaryinformation“rollup”Preprocessingdynamicreduction(computelandmarkindex)Approximatequeryevaluationoverlandmarkindexbi-directedguidedtraversalCondition=FALSE--Condition=?Cost=9Potential=46Condition=?Cost=2Potential=9Condition=TRUE……ExperimentalStudy16DatasetYoutube(1.61millionnodes,4.51millionedges)
(http://netsg.cs.sfu.ca/youtubedata)
YahooWebgraph(3millionnodes,14.98millionedges)(/catalog.php?datatype=g)AlgorithmsGraphpatternmatching:ResourceboundedsimulationalgorithmRBSimOptimizedstrongsimulationpatternmatchingMatchOptResourceboundedsubgraphisomorphismRBSubOptimizedVF2Reachability:ResourceboundedreachabilityRBReachBFSandoptimizedBFSovercompressedgraphsLM:applyinglandmarkvectors(4*Log|V|landmarks)Efficiencyofresourceboundedsimulation17Varyingα(10-5),YahooRbsimis5.5timesfasterthanMatch-OPT;RBSubis6.25timesfasterthanVF2-OPTonaverageVaryingα(10-5),YoutubeAccuracy18Varyingα(10-5),accuracy,Yahoo89%-100%forsimulationqueriesbothachieves100%accuracywhenα>0.0015%,Efficiencyofresourceboundedreachability19RBreachis62.5timesfasterthanBFSand5.7timesfasterthanBFS-OPT
Varyingα(10-4),YahooVaryingα(10-4),YoutubeAccuracy20Varyingα(10-4),accuracy,Yahoo>=96%achieves100%accuracywhenα>0.05%,Conclusion21ResourceboundedqueryingforbiggraphprocessingDynamicreduction+approximatequeryansweringLocalqueries:strongsimulation,subgraphisomorphismNon-localqueries:reachabilitytunableperformance,abalanceofresourceandanswerqualityMoretobedone…Maximumaccuracyratioη
resourceboundedalgorithmscanguarantee?Graphquerypatternswithoutpersonalizednodes,moregraphqueryclasses…Distributeddeployment(MapReduce,GraphLab)Deploymentinemergingapplications(knowledgegraph,cyber
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026国航股份贵州分公司地面综合服务岗位实习生招收12人笔试参考题库及答案解析
- 2026台州市路桥区事业单位招聘40人-统考笔试备考题库及答案解析
- 吊顶施工质量检测方法
- 2026福建事业单位统考宁德市周宁县招聘57人笔试参考题库及答案解析
- 2026贵州省第三人民医院长期招聘考试备考题库及答案解析
- 2026云南省通信产业服务有限公司招聘3人笔试备考试题及答案解析
- 2026茅台集团招聘89人笔试参考题库及答案解析
- 物业管理企业客户投诉处理流程指南
- 建筑施工安全一级风险管控清单
- 4S店汽车运营管理制度
- SB/T 11094-2014中药材仓储管理规范
- GB/T 3452.4-2020液压气动用O形橡胶密封圈第4部分:抗挤压环(挡环)
- GB/T 23339-2018内燃机曲轴技术条件
- GB/T 15382-2021气瓶阀通用技术要求
- GB/T 15242.4-2021液压缸活塞和活塞杆动密封装置尺寸系列第4部分:支承环安装沟槽尺寸系列和公差
- GB/T 1176-2013铸造铜及铜合金
- 寿险经营的根本命脉-辅专课件
- 实验12土壤微生物的分离及纯化课件
- 2022年4月自考00402学前教育史试题及答案
- 工艺指标变更通知单
- 磁粉检测技术(ii级)学习培训模板课件
评论
0/150
提交评论