信息过滤(Information Filtering)综述_第1页
信息过滤(Information Filtering)综述_第2页
信息过滤(Information Filtering)综述_第3页
信息过滤(Information Filtering)综述_第4页
信息过滤(Information Filtering)综述_第5页
已阅读5页,还剩31页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、信息过滤信息过滤(Information Filtering,IF)综述综述中科院计算所软件室 王斌2001.12.10主要内容主要内容lIF的基本概念lIF系统的分类lIF系统的组成lIF系统的评估lIF的现状及发展趋势一、基本概念一、基本概念定义定义lIF定义:从动态的信息流中将满足用户兴趣的信息挑选出来,用户的兴趣一般在较长一段时间内不会改变(静态)。Selective Dissemination of Information(SDI),来自图书馆领域。Routing,来自Message Understanding。Current Awareness, Data MiningIF vs

2、IR/分类分类/IElIF&IR:广义地讲,IF是IR的一部分Database动态,需求静态;Database静态,需求静态User Profile vs QueryIF用户要对系统有所了解,IR不需要。IF要涉及到用户建模/个人隐私等社会问题lIF&CategorizationCategorization中的Category不会经常改变。相对而言,User Profile会动态变化lIF&IEIF关心相关性,IE只关心抽取的那些部分,不管相关性IF applicationslInternet Search Results FilterlPersonal Email F

3、ilterlList Server/Newsgroup FilterlBrowser FilterlFilter for childrenlFilter for customers: recommendation二、二、IF分类体系分类体系IF分类示意图分类示意图Initiative of operation lActive IF systemsCollect and send relevant info to usersPush to usersInfo overload, so make accurate user profilelPassive IF systemsNot collect

4、 info for usersEmail or Usenet newsLocation of operationlAt the info sourcePost profiles to info providerClipping serviceUsually pay feelAt a filtering serverInfo provider send info to serverServe distributed info to userslAt the user siteLocal filtering systemSuch as outlook & Netscape Email &a

5、mp; FoxmailFiltering approachlCognitive filteringContent-based filteringDocument content vs user profileslSociological filteringCollaborative filtering, or properties-based filteringSimilarity between usersRecommendation systemsUser modeling & User clusteringComplement for content-based systemsM

6、ethods of acquiring knowledge about userslExplicit approachUser interrogationFilling formslImplicit approachRecording user behaviorTime/times/context/activity(save/discard/print/browsing/click)/etc.lExplicit & Implicit approachDocument space (case-based)Stereotypic inference(predefined default p

7、rofile,then change during scanning)三、三、IF系统的组成系统的组成一般组成一般组成(d)LearningComponentUserInformationProvider(b)FilteringComponent(a)DataAnalyzerComponent(c)User-ModelComponentupdatesfeedbackrelevantdata itemsrepresented data itemsdata itemspersonal detailsuserprofileData-analyzer componentlBe close to the

8、 info providerlObtain or collect data from the info providerlAnalyze & represent documents(such as Boolean Model, VSM, etc)lPass the representation to the filtering componentUser-model componentlGather info about users(explicitly and/or implicitly)lConstruct the user profiles or other user model

9、s(rules, VSM, documents center) lPass the user models to the filtering componentlUser models must be suitable for the document representationFiltering componentlThe heart of the IF systemlMatch the user profiles with the represented data itemslDecision may be binary or probabilistic (ordered by rank

10、) lThe selected items relevancy can be determined by the userlThe relevancy info can be sent to the learning component (feedback info)Learning componentlTo improve further filteringlDetect shifts in users interestslUpdate the user-model Two concepts used in IF systemslSystem based on the statistical

11、 conceptlSystem based on the knowledge-based conceptStatistical conceptlUser-model component:Profile is a weighted-vector of index terms(such as: VSM, LSI)lFiltering componentCorrelation, Cosine measureRobertson&Sparck-Jones formula (PRM)(nave) Bayesian classifierlLearning componentFeedback, que

12、ry reconstruction(such as: Rocchio)Knowledge-based conceptlRule-based and Semantic-nets filtering systems:Rule (if . Then take action), obsolescence problemUser profile represents by semantic-net (wordnet)lNeural-network filtering systemslGenetic-based filtering systemsUser modeling for IF systemslA

13、cquisition of the data for the modelImplicit approach: observation of user behaviorExplicit approach: fill forms, interact (feedback)lData included in the modelShallow semantics: keywordsEnhanced user model, high level knowledge about the user(background past experience)lSemantic networks/Stereotypi

14、c inference/Statistical inference on the relationship between words in docslUnderlying ArchitectureAgent/neural networks for auto inferred modelVSM/LSI for explicit inferenceConcept model for intelligent systemsKeyword system for statistically-based systemsLearning in IF systemslMethods of LearningL

15、earning by observationLearning by feedbackUser-training learninglFrequency of learningCritical learningPeriodic learning四、四、IF系统的评估系统的评估Methods & MeasuresEvaluation methods of IF systemslEvaluation by ExperimentslEvaluation by Simulation: such as TREClAnalytical Evaluation Measures of evaluation

16、 of IF systemslSimple Precision & RecalllStatistical MeasurementsCorrelation(User evaluation vs. System evaluation): Rank vectorlSet-based MeasurementsUtility=(A*R+)+(B*N+)+(C*R-)+(D*N-), NormalizeASP(average set precision)=P*R, if P or R=0, ASP is not suitablelUser-oriented MeasuresCoverage Rat

17、io=|Rk|/|U|=|AU|/|U|, Rk is the number of documents known to the userNovelty=|Ru|/(|Ru|+|Rk|)五、五、IF的现状及发展趋势的现状及发展趋势Current situationlIF system is indispensablelBut IF system is unreliableCommercial IF systems relevancy is about 50%Results of the TREC experiments are poorUser prefers to read non-rele

18、vant info, fear the loss of important infolStill many things to do to improve the effectiveness of IF systemsUser modelinglIntegrate several methods to model the users(Not only keywords, but also property of users and other parameters)lProfile updating & updating timelInclude a learning modulelQ

19、ueries formulation and tracking their changes over timeFiltering techniqueslGoal: get more relevant docs, although get some non-relevant docslCombining several methodslResearch directions:Intelligent agents: decentralized, based on trust,evolve, compete & collaborateVisualization techniques: mapVariety of multiple implicit resources on user behavior: open profiling standardFiltering of multimedia repositories:VOD,

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论