云计算时代的社交网络平台和技术_第1页
云计算时代的社交网络平台和技术_第2页
云计算时代的社交网络平台和技术_第3页
云计算时代的社交网络平台和技术_第4页
云计算时代的社交网络平台和技术_第5页
已阅读5页,还剩53页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

云计算时代的社交网络

平台和技术

张智威副院长,研究院,谷歌中国教授,电机工程系,加州大学11/25/20231EdChang180million(↑25%)208million(↑

3%)60million(↑

90%)60million(↑

29%)500million180million600kEngineeringGraduatesMobilePhonesBroadbandUsersInternetPopulationChinaU.S.ChinaOpportunity

China&USin2006-0772k7200011/25/20232EdChangGoogleChinaSize(~700)200engineers400otheremployeesAlmost100internsLocationsBeijing(2005)Taipei(2006)Shanghai(2007)11/25/20233EdChangOrganizingtheWorld’sInformation,Socially社区平台(SocialPlatform)云运算(CloudComputing)结论与前瞻(ConcludingRemarks)11/25/20234EdChangWeb1.0.htm.htm.htm.jpg.jpg.doc.htm.msg.htm.htm11/25/20235EdChangWebwithPeople(2.0).htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htm11/25/20236EdChang+SocialPlatforms.htm.jpg.doc.xls.msg.htm.htm.jpg.msg.htmApp(Gadget)App(Gadget)11/25/20237EdChang11/25/20238EdChang11/25/20239EdChang11/25/202310EdChang11/25/202311EdChang开放社区平台11/25/202312EdChang11/25/202313EdChang11/25/202314EdChang11/25/202315EdChang11/25/202316EdChang开放社区平台社区平台1我是谁2我的朋友3他的活动11/25/202317EdChang11/25/202318EdChang11/25/202319EdChang开放社区平台社区平台1我是谁2我的朋友3他的活动4他的东西11/25/202320EdChang11/25/202321EdChangSocialGraph11/25/202322EdChang11/25/202323EdChangWhatUsersWant?PeoplecareaboutotherpeoplecareaboutpeopletheyknowconnecttopeopletheydonotknowDiscoverinterestinginformationbasedonotherpeopleaboutwhootherpeopleareaboutwhatotherpeoplearedoing11/25/202324EdChangInformationOverflowChallengeToomanypeople,toomanychoicesofforumsandapps“Isoonneedtohireafull-timetomanagemyonlinesocialnetworks”DesiringaSocialNetworkRecommendationSystem11/25/202325EdChangRecommendationSystemFriendRecommendationCommunity/ForumRecommendationApplicationSuggestionAdsMatching11/25/202326EdChangOrganizingtheWorld’sInformation,Socially社区平台(SocialPlatform)云运算(CloudComputing)结论与前瞻(ConcludingRemarks)11/25/202327EdChangpicturesource:http://

(1)数据在云端不怕丢失不必备份(2)软件在云端不必下载自动升级(3)无所不在的云计算任何设备登录后就是你的(4)无限强大的云计算无限空间无限速度业界趋势:云计算时代的到来11/25/202328EdChang互联网搜索:云计算的例子1.用户输入查询关键字CloudComputing2.分布式预处理数据以便为搜索提供服务:GoogleInfrastructure(thousandsofcommodityserversaroundtheworld)MapReduceformassdataprocessingGoogleFileSystem3.返回搜索结果11/25/202329EdChangGivenamatrixthat“encodes”dataCollaborativeFiltering11/25/202330EdChangGivenamatrixthat“encodes”dataManyapplications(collaborativefiltering):

User

CommunityUser–UserAds–UserAds–Communityetc.UsersCommunities11/25/202331EdChangCollaborativeFiltering(CF)

[Breese,HeckermanandKadie1998]Memory-basedGivenuseru,find“similar”users(knearestneighbors)Boughtsimilaritems,sawsimilarmovies,similarprofiles,etc.DifferentsimilaritymeasuresyielddifferenttechniquesMakepredictionsbasedonthepreferencesofthese“similar”usersModel-basedBuildamodelofrelationshipbetweensubjectmattersMakepredictionsbasedontheconstructedmodel11/25/202332EdChangMemory-BasedModel

[Goldbertetal.1992;Resniketal.1994;Konstantetal.1997]

ProsSimplicity,avoidmodel-buildingstageConsMemoryandTimeconsuming,usestheentiredatabaseeverytimetomakeapredictionCannotmakepredictioniftheuserhasnoitemsincommonwithotherusers11/25/202333EdChangModel-BasedModel

[Breeseetal.1998;Hoffman1999;Bleietal.2004]

ProsScalability,modelismuchsmallerthantheactualdatasetFasterprediction,querythemodelinsteadoftheentiredatasetConsModel-buildingtakestime11/25/202334EdChangAlgorithmSelectionCriteriaNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirableCandealwithdatascarcityCloudComputing!11/25/202335EdChangModel-basedPriorWorkLatentSemanticAnalysis(LSA)ProbabilisticLSA(PLSA)LatentDirichletAllocation(LDA)11/25/202336EdChangLatentSemanticAnalysis(LSA)

[Deerwesteretal.1990]Maphigh-dimensionalcountvectorstolowerdimensionalrepresentationcalledlatentsemanticspaceBySVDdecomposition:A=U∑VTA=Word-documentco-occurrencematrixUij

=Howlikelywordibelongstotopicj∑jj

=HowsignificanttopicjisVijT=HowlikelytopicibelongstodocjWordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxTTxTTxD11/25/202337EdChangLatentSemanticAnalysis(cont.)LSAkeepsk-largestsingularvaluesLow-rankapproximationtotheoriginalmatrixSavespace,de-noisifiedandreducesparsityMakerecommendationsusingÂWord-wordsimilarity:ÂÂTDoc-docsimilarity:ÂT

ÂWord-docrelationship:ÂWordsDocsTopicsDocsTopicsTopicsTopicsWordsWxDWxKKxKKxDˆˆˆ11/25/202338EdChangProbabilisticLatentSemanticAnalysis(PLSA)[Hoffman1999;Hoffman2004]DocumentisviewedasabagofwordsAlatentsemanticlayerisconstructedinbetweendocumentsandwordsP(w,d)=P(d)P(w|d)=P(d)∑zP(w|z)P(z|d)ProbabilitydeliversexplicitmeaningP(w|w),P(d|d),P(d,w)ModellearningviaEMalgorithmP(d)dwzP(z|d)P(w|z)11/25/202339EdChangPLSAextensionsPHITS[Cohn&Chang2000]Modeldocument-citationco-occurrenceAlinearcombinationofPLSAandPHITS[Cohn&Hoffmann2001]Modelcontents(words)andinter-connectivityofdocumentsLDA[Bleietal.2003]ProvideacompletegenerativemodelwithDirichletpriorAT[Griffiths&Steyvers2004]IncludeauthorshipinformationDocumentiscategorizedbyauthorsandtopicsART[McCallum2004]IncludeemailrecipientasadditionalinformationEmailiscategorizedbyauthor,recipientsandtopics11/25/202340EdChangCombinationalCollaborativeFiltering(CCF)FusemultipleinformationAlleviatetheinformationsparsityproblemHybridtrainingschemeGibbssamplingasinitializationsforEMalgorithmParallelizationAchievelinearspeedupwiththenumberofmachines11/25/202341EdChangNotationsGivenacollectionofco-occurrencedataCommunity:C={c1,c2,…,cN}User:U={u1,u2,…,uM}Description:D={d1,d2,…,dV}Latentaspect:Z={z1,z2,…,zK}ModelsBaselinemodelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCCF:CombinationalCollaborativeFilteringCombinesbothbaselinemodels11/25/202342EdChangBaselineModelsCommunity-User(C-U)modelCommunity-Description(C-D)modelCommunityisviewedasabagofusers

canduarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachuseru1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AuseruisgeneratedfromP(u|z)Communityisviewedasabagofwords

canddarerenderedconditionallyindependentbyintroducingzGenerativeprocess,foreachwordd1.Acommunitycischosenuniformly2.AtopiczisselectedfromP(z|c)3.AworddisgeneratedfromP(d|z)11/25/202343EdChangBaselineModels(cont.)Community-User(C-U)modelCommunity-Description(C-D)model

Pros1.Personalizedcommunitysuggestion

Cons

1.C-Umatrixissparse,maysufferfrom

informationsparsityproblem2.Cannottakeadvantageofcontent

similaritybetweencommunities

Pros1.Clustercommunitiesbasedoncommunitycontent(descriptionwords)

Cons

1.Nopersonalizedrecommendation2.Donotconsidertheoverlappedusersbetweencommunities11/25/202344EdChangCCFModelCombinationalCollaborativeFiltering(CCF)modelCCFcombinesbothbaselinemodelsAcommunityisviewedas

-abagofusersANDabagofwordsByaddingC-U,CCFcanperformpersonalizedrecommendationwhichC-Dalonecannot

ByaddingC-D,CCFcanperformbetterpersonalizedrecommendationthanC-UalonewhichmaysufferfromsparsityThingsCCFcandothatC-UandC-Dcannot-P(d|u),relateusertoword-Usefulforusertargetingads11/25/202345EdChangAlgorithmRequirementsNear-real-timeRecommendationScalableTrainingIncrementalTrainingisDesirable11/25/202346EdChangParallelizingCCFDetailsomitted11/25/202347EdChangpicturesource:http://

(1)数据在云端不怕丢失不必备份(2)软件在云端不必下载自动升级(3)无所不在的云计算任何设备登录后就是你的(4)无限强大的云计算无限空间无限速度业界趋势:云计算时代的到来11/25/202348EdChangExperimentsonOrkutDatasetDatadescriptionCollectedonJuly26,2007TwotypesofdatawereextractedCommunity-user,community-description312,385users109,987communities191,034uniqueEnglishwordsCommunityrecommendationCommunitysimilarity/clusteringUsersimilaritySpeedup11/25/202349EdChangCommunityRecommendationEvaluationMethodNoground-truth,nouserclicksavailableLeave-one-out:randomlydeleteonecommunityforeachuserWhetherthedeletedcommunitycanberecoveredEvaluationmetricPrecisionandRecall11/25/202350EdChangResultsObservations:

CCFoutperformsC-UFortop20,precision/recallofCCF

aretwicehigherthanthoseofC-U

Themorecommunitiesauserhasjoined,thebetterCCF/C-Ucanpredict11/25/202351EdChangRuntimeSpeedupTheOrkutdatasetenjoysalinearspeedupwhenthenumberofmachinesisupto100Reducesthetrainingtimefromonedaytolessthan14minutesBut,whatmakesthespeedupslowdownafter100machines?11/25/202352EdChangRuntimeSpeedup(cont.)Trainingtimeconsistsoftwoparts:Computationtime(Comp)Communicationtime(Comm)11/25/202353EdChangCCFSummaryCombinationalCollaborativeFilteringFusebagsofwordsandbagsofusersinformationHybridtrainingprovidesbetterinitializationsforEMratherthanrandomseedingParallelizetohandlelarge-scaledatasets11/25/202354EdChangChina’sContributionson/to

CloudComputingParallelCCFParallelSVMs(KernelMachines)ParallelSVDParallelSpectralClusteringParallelExpectationMaximizationParallelAssociationMiningParallelLDA

11/25/202355EdChangSpeedingupSVMs

[NIPS2007]ApproximateMatrixFactorizationParallelizationOpensource@/p/psvm350+downloadssinceDecember07Ataskthattakes7dayson1

machinetakes1hourson500machines11/25/202356EdChangIncompleteCholeskyFactorization(ICF)nxnnxppxnp<<nConserveStorage11/25/202357EdChangMatrixProduct=pxnnxppxp11/25/202358EdChangOrganizingtheWorld’sInformation,Socially社区平台(SocialPlatform)云运算(CloudComputing)结论与前瞻(ConcludingRemarks)11/25/202359EdChangWebWithPeople.htm.htm.htm.jpg.jpg.doc.xls.msg.msg.htm11/25/202360EdChangWhatNextforWebSearch?

PersonalizationReturnqueryresultsconsideringpersonalpreferencesExample:DisambiguatesynonymlikefujiOops:severaltried,theproblemishardTrainingdatadifficulttocollectenough(forcollaborativefiltering)Computationalintensivetosupportpersonalization(e.g.,forpersonalizingpagerank)Userprofilemaybeincomplete,erroneous11/25/202361EdChang个人搜索智能搜索搜索“富士”可返回富士山富士苹果富士相机11/25/202362EdChang11/25/202363EdChang11/25/202364EdChang11/25/202365EdChang11/25/202366EdChangOrganizingWorld’sInformation,SociallyWebisaCollectionofDocumentsandPeopleRecommendationisaPersonalized,PushModelofSearchCollaborativeFilteringRequiresDenseInformationtobeEffectiveCloudComputingisEssential11/25/202367EdChangReferences[1]Alexainternet.http:///.[2]D.M.BleiandM.I.Jordan.Variationalmethodsforthe

dirichletprocess.InProc.ofthe21stinternational

conferenceonMachinelearning,pages373-380,2004.[3]D.M.Blei,A.Y.Ng,andM.I.Jordan.Latentdirichlet

allocation.JournalofMachineLearningResearch,

3:993-1022,2003.[4]D.CohnandH.Chang.Learningtoprobabilisticallyidentifyauthoritativedocuments.InProc.oftheSeventeenthInternationalConferenceonMachineLearning,pages167-174,2000.[5]D.CohnandT.Hofmann.Themissinglink-aprobabilisticmodelofdocumentcontentandhypertextconnectivity.InAdvancesinNeuralInformationProcessingSystems13,pages430-436,2001.[6]S.C.Deerwester,S.T.Dumais,T.K.Landauer,G.W.Furnas,andR.A.Harshman.Indexingbylatentsemanticanalysis.JournaloftheAmericanSocietyofInformationScience,41(6):391-407,1990.[7]A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximumlikelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),39(1):1-38,1977.[8]S.GemanandD.Geman.Stochasticrelaxation,gibbsdistributions,andthebayesianrestorationofimages.IEEETransactionsonPatternrecognitionandMachineIntelligence,6:721-741,1984.[9]T.Hofmann.Probabilisticlatentsemanticindexing.InProc.ofUncertaintyinArti

cialIntelligence,pages289-296,1999.[10]T.Hofmann.Latentsemanticmodelsforcollaborativefiltering.ACMTransactionsonInformationSystem,22(1):89-115,2004.[11]A.McCallum,A.Corrada-Emmanuel,andX.Wang.Theauthor-recipient-topicmodelfortopicandrolediscoveryinsocialnetworks:Experimentswithenronandacademicemail.Technicalreport,ComputerScience,UniversityofMassachusettsAmherst,2004.[12]D.Newman,A.Asuncion,P.Smyth,andM.Welling.Distributedinferenceforlatentdirichletallocation.InAdvancesinNeuralInformationProcessingSystems20,2007.[13]M.Ramoni,P.Sebastiani,andP.Cohen.Bayesianclusteringbydynamics.MachineLearning,47(1):91-121,2002.11/25/202368EdChangReferences(cont.)[14]R.Salakhutdinov,A.Mnih,andG.Hinton.Restrictedboltzmannmachinesforcollaborative

ltering.InProc.Ofthe24thinternationalconferenceonMachinelearning,pages791-798,2007.[15]E.Spertus,M.Sahami,andO.Buyukkokten.Evaluatingsimilaritymeasures:alarge-scalestudyintheorkutsocialnetwork.InProc.ofthe11thACMSIGKDDinternationalconferenceonKnowledgediscoveryindatamining,pages678-684,2005.[16]M.Steyvers,P.Smyth,M.Rosen-Zvi,andT.Gri

ths.Probabilisticauthor-topicmodelsforinformationdiscovery.InProc.ofthe10thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages306-315,2004.[17]A.StrehlandJ.Ghosh.Clusterensembles-aknowledgereuseframeworkforcombiningmultiplepartitions.JournalonMachineLearningResearch(JMLR),3:583-617,2002.[18]T.ZhangandV.S.Iyengar.Recommendersystemsusinglinearclassi

ers.JournalofMachineLearningResearch,2:313-334,2002.[19]S.ZhongandJ.Ghosh.Generativemodel-basedclusteringofdocuments:acomparativestudy.KnowledgeandInformationSystems(KAIS),8:374-384,2005.[20]L.AdmicandE.Adar.Howtosearchasocialnetwork.2004[21]T.L.GriffithsandM.Steyvers.Findingscientifictopics.ProceedingsoftheNationalAcademyofSciences,pages5228-5235,2004.[22]H.Kautz,B.Selman,andM.Shah.ReferralWeb:Combiningsocialnetworksandcollaborativefiltering.CommunitcationsoftheACM,3:63-65,1997.[23]R.Agrawal,T.Imielnski,A.Swami.Miningassociationrulesbetweensetsofitemsinlargedatabses.SIGMODRec.,22:207-116,1993.[24]J.S.Breese,D.Heckerman,andC.Kadie.Empiricalanalysisofpredictivealgorithmsforcollaborativefiltering.InProceedingsoftheFourteenthConferenceonUncertaintyinArtificalIntelligence,1998.[25]M.DeshpandeandG.Karypis.Item-basedtop-nrecommendationalgorithms.ACMTrans.Inf.Syst.,22(1):143-177,2004.11/25/202369EdChangReferences(cont.)[26]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Reidl.Item-basedcollaborativefilteringrecommendationalgorithms.InProceedingsofthe10thInternationalWorldWideWebConference,pages285-295,2001.[27]

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论