网络搜索-填补信息断层_第1页
网络搜索-填补信息断层_第2页
网络搜索-填补信息断层_第3页
网络搜索-填补信息断层_第4页
网络搜索-填补信息断层_第5页
已阅读5页,还剩40页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Web Search: Filling the Information GapYanyan Lan (兰艳艳 )Web Data Science and Engineering Research Center, Institute of Computing Technology, CASAbout Our Team Web Data Science and Engineering Research Center, ICTLab Head Researcher: Xueqi Cheng (程学旗 )Research Department (Xiaolong Jin(靳小龙 ), Jiafeng Guo(郭嘉丰 ): Web Search and Data Mining (WSDM)Yanyan Lan(兰艳艳 ) Network Analysis and Social Computing (NASC)Huawei Shen(沈华伟 ) Research Platform (Soscholar天玑学术网 )Lei Cao(曹雷 )Group of WSDMGroup of NASCGroup of Research PlatformBig data research platformVertical Collaborative SystemAnalysis EngineSocial EngineSearch EngineRecommendation EngineLearning to rank Query refinementName disambiguationLearning to recommend Collaborative filteringHeterogeneous network mining Web Search: Filling the Information GapWeb Searchmachine learningResearch Focus: Filling the Info. GapSparse topic modeling (WSDM13, WWW13, SDM13)Short text modeling (CIKM 12)Document Summarization (SIGIR 11, CIKM 10, CIKM 08)Document Annotation (WWW09) Online Indexing (CIKM07)Query Recommendation (ECIR13, CIKM 12, WWW 11, SIGIR11 QRU , SIGIR10 QRU, CIKM 10, ECAI 10)Query Similarity (CIKM 11)Query Parsing (SIGIR09)Query Refinement (SIGIR08).Top-k LTR (SIGIR 12, CIKM 12)Ranking Consistency (NIPS12)Diversity Rank (TKDE 11)Learning to Rank (FCSC, SIGIR 08)Learning with Noisy Data (IPM)Language Model (WI 08)Feature Selection (SIGIR 07).Document SpaceQuery SpaceMatchingQuery Understanding and RepresentationDocument Understanding and ProcessingRankingClassification/ClusteringStructure predictionTopic modelingSimilarity learningPerformance predictionParsing/NERTopic ModelingClassification/ClusteringLearning on graphSummarizationLanguage modelLearning to rankTodays Topics Part I: Query Understanding and Representation Part II: Ranking Algorithms and TheoriesPart I: Query Understanding and Representation查询User Search Engine=“Black Box”Query Understanding and RepresentationThe First Step of IRUser It is never easy to formulate a proper query to find what he/she needs.Lack of knowledgeUnfarmiliar with SEUnclear search intentSearch Engine Understanding and representing users search intent is critical for search success.Short: lack of context Ambiguous: multiple intentsNoisy: ill-formedWord ambiguityDifferent levels of UnderstandingTaskRepresentationStructureMichael Jordan: PersonName Berkeley: LocationM Jordan Berkele Michael Jordan BerkeleyInterestsUtility Perceived Utility + Posterior UtilitySearch Interests + Explortary InterestsUnderstand the goal of the query (Sequence)Understand the semantics of the query (Single)Michael Jordan Michael Jordan BerkeleyMichael Jordan NBA Michael JordanMichael Jordan Berkeley NBA Michael JordanIntent: NBA starIntent: academic researcherUnderstand the similarity between queries (Pair)Q: M Jordan Berkele SIGIR08,SIGIR09CIKM11CIKM10,CIKM12Understanding the Relation:Intent-Aware Query Similarity( CIKM11)Best Paper AwardMotivationApplesearch intent:looking for apple fruitssearch intent:find products of the apple companyApple tree Apple storeSimilarity between queries defined upon search intentIntent-aware query similarityExisting MethodsIntent-Not-Aware Intent-AwarePare-wise MeasuresGraph-based MeasuresIndependent measured on each pairPropagate similarity over query relation graphJaccard coefficient Beeferman et al. 2000cosine similarity Baeza-Yates et al. 2004; Wen et al. 2002Hybrid methods Zhang et al. 2006; Jones et al. 2006Jaccard & cosine Deng et al. 2009Kernel method Sahami et al. 2006Random walk Craswell et al. 2007hitting time Mei et al. 2008SimRank Antonellis et al. 2008Matrix Factorization Ma et al. 2008Graph Projection Bordino et al. 2010Problem:Mixed representationBiased by popular intentIgnore unpopular onesProblem:Propagate across the boundaryWrongly connect queries from different search intentsAppleApple treeApple storeApple treeApple store Apple /OverviewA. Identify the potential search intent of queriesI. Extract intent-aware representationsII. Apply different types of similarity measuresB. Intent-aware similarity measureofficemicrosoft officeoffice tv showms office downloadthe officeoffice shoesopenofficefootware office ukoffice season 6/The_office//title/tt0386676//en-us/products/Con: sparsePrecise information from Wisdom of crowdsPro: higher precisionsoftwareShoe supplierTV showsoftwareCon: irrelevant/spam/advertisement/ambiguityGreat Context Describing the queryPro: higher recallA. Identify Search Intents (Data)leverage two types of auxiliary dataSearch result snippets ClickthroughA. Identify Search Intents (Algorithm)Regularized Topic Modeltop search result snippets virtual documentswords in snippets wordspotential search intents topicsPLSI modellog-likelihoodTopic ModelRegularizationtwo queries share many same clicked URLs convey similar search intentco-click matrixSearch result snippetsClickthroughpowerful constraint:B.Intent-Aware Similarity Measure (Pair-wise)Similarity independently measured by pair-wise metricsI. Extract intent-aware representationsword vector representationoriginal:intent-aware:expected search intent distribution for each word occurrence wl given query qiword vector representation under k-th search intentII. Apply Pair-wise similarity measuressimilarity under k-th search intentB.Intent-Aware Similarity Measure (Graph-based)similarity calculated over the query graphI. Extract intent-aware representationsquery similarity graphoriginal:adjacency matrixJaccard coefficientspectral embeddingquery representation under k-th search intentII. Apply Graph-based similarity measuressimilarity under k-th search intentintent-aware:query similarity graph under k-th search intentthe probability that an edge will be generated between query qi with search intent sk and query qj with search intent skResultExpected inter-intra ratioPart II: RankingRanking is a Central Problem!RankingWeb Search Information FilteringRecommendationAggregatione.g. DiversityOutput a perfect ranking from multiple ranking inputConsidering the relation between itemsRanking according to the degree of relevanceDifferent levels of RankingSIGIR12,CIKM12,NIPS12ICDM11,TKDE11UAI12Relevance Ranking (Algorithm):Top-k Learning to Rank: Labeling, Ranking and Evaluation (SIGIR12)Best Student Paper AwardA Central Problem in LTRHow to

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论