信息过滤(Ination Filtering)综述.ppt_第1页
信息过滤(Ination Filtering)综述.ppt_第2页
信息过滤(Ination Filtering)综述.ppt_第3页
信息过滤(Ination Filtering)综述.ppt_第4页
信息过滤(Ination Filtering)综述.ppt_第5页
已阅读5页,还剩31页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、信息过滤(Information Filtering,IF)综述,中科院计算所软件室 王斌 2001.12.10,主要内容,IF的基本概念 IF系统的分类 IF系统的组成 IF系统的评估 IF的现状及发展趋势,一、基本概念,定义,IF定义: 从动态的信息流中将满足用户兴趣的信息挑选出来,用户的兴趣一般在较长一段时间内不会改变(静态)。 Selective Dissemination of Information(SDI),来自图书馆领域。 Routing,来自Message Understanding。 Current Awareness, Data Mining,IF vs IR/分类/IE

2、,IF&IR:广义地讲,IF是IR的一部分 Database动态,需求静态;Database静态,需求静态 User Profile vs Query IF用户要对系统有所了解,IR不需要。 IF要涉及到用户建模/个人隐私等社会问题 IF&Categorization Categorization中的Category不会经常改变。相对而言,User Profile会动态变化 IF&IE IF关心相关性,IE只关心抽取的那些部分,不管相关性,IF applications,Internet Search Results Filter Personal Email Filter List Serv

3、er/Newsgroup Filter Browser Filter Filter for children Filter for customers: recommendation,二、IF分类体系,IF分类示意图,Initiative of operation,Active IF systems Collect and send relevant info to users Push to users Info overload, so make accurate user profile Passive IF systems Not collect info for users Emai

4、l or Usenet news,Location of operation,At the info source Post profiles to info provider Clipping service Usually pay fee At a filtering server Info provider send info to server Serve distributed info to users At the user site Local filtering system Such as outlook & Netscape Email & Foxmail,Filteri

5、ng approach,Cognitive filtering Content-based filtering Document content vs user profiles Sociological filtering Collaborative filtering, or properties-based filtering Similarity between users Recommendation systems User modeling & User clustering Complement for content-based systems,Methods of acqu

6、iring knowledge about users,Explicit approach User interrogation Filling forms Implicit approach Recording user behavior Time/times/context/activity(save/discard/print/browsing/click)/etc. Explicit & Implicit approach Document space (case-based) Stereotypic inference(predefined default profile,then

7、change during scanning),三、IF系统的组成,一般组成,(d) Learning Component,User,Information Provider,(b) Filtering Component,(a) Data Analyzer Component,(c) User-Model Component,updates,feedback,relevant data items,represented data items,data items,personal details,user profile,Data-analyzer component,Be close t

8、o the info provider Obtain or collect data from the info provider Analyze & represent documents(such as Boolean Model, VSM, etc) Pass the representation to the filtering component,User-model component,Gather info about users(explicitly and/or implicitly) Construct the user profiles or other user mod

9、els(rules, VSM, documents center) Pass the user models to the filtering component User models must be suitable for the document representation,Filtering component,The heart of the IF system Match the user profiles with the represented data items Decision may be binary or probabilistic (ordered by ra

10、nk) The selected items relevancy can be determined by the user The relevancy info can be sent to the learning component (feedback info),Learning component,To improve further filtering Detect shifts in users interests Update the user-model,Two concepts used in IF systems,System based on the statistic

11、al concept System based on the knowledge-based concept,Statistical concept,User-model component: Profile is a weighted-vector of index terms(such as: VSM, LSI) Filtering component Correlation, Cosine measure Robertson&Sparck-Jones formula (PRM) (nave) Bayesian classifier Learning component Feedback,

12、 query reconstruction(such as: Rocchio),Knowledge-based concept,Rule-based and Semantic-nets filtering systems: Rule (if . Then take action), obsolescence problem User profile represents by semantic-net (wordnet) Neural-network filtering systems Genetic-based filtering systems,User modeling for IF s

13、ystems,Acquisition of the data for the model Implicit approach: observation of user behavior Explicit approach: fill forms, interact (feedback) Data included in the model Shallow semantics: keywords Enhanced user model, high level knowledge about the user(background past experience) Semantic network

14、s/Stereotypic inference/Statistical inference on the relationship between words in docs Underlying Architecture Agent/neural networks for auto inferred model VSM/LSI for explicit inference Concept model for intelligent systems Keyword system for statistically-based systems,Learning in IF systems,Met

15、hods of Learning Learning by observation Learning by feedback User-training learning Frequency of learning Critical learning Periodic learning,四、IF系统的评估,Methods & Measures,Evaluation methods of IF systems,Evaluation by Experiments Evaluation by Simulation: such as TREC Analytical Evaluation,Measures

16、 of evaluation of IF systems,Simple Precision & Recall Statistical Measurements Correlation(User evaluation vs. System evaluation): Rank vector Set-based Measurements Utility=(A*R+)+(B*N+)+(C*R-)+(D*N-), Normalize ASP(average set precision)=P*R, if P or R=0, ASP is not suitable User-oriented Measure

17、s Coverage Ratio=|Rk|/|U|=|AU|/|U|, Rk is the number of documents known to the user Novelty=|Ru|/(|Ru|+|Rk|),五、IF的现状及发展趋势,Current situation,IF system is indispensable But IF system is unreliable Commercial IF systems relevancy is about 50% Results of the TREC experiments are poor User prefers to rea

18、d non-relevant info, fear the loss of important info Still many things to do to improve the effectiveness of IF systems,User modeling,Integrate several methods to model the users(Not only keywords, but also property of users and other parameters) Profile updating & updating time Include a learning m

19、odule Queries formulation and tracking their changes over time,Filtering techniques,Goal: get more relevant docs, although get some non-relevant docs Combining several methods Research directions: Intelligent agents: decentralized, based on trust,evolve, compete & collaborate Visualization techniques: map Variety of multiple implicit resources on user behavior: open profiling standard Filtering of multimedia repositories:VOD, not t

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论