




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、信息过滤信息过滤(Information Filtering,IF)综述综述中科院计算所软件室 王斌2001.12.10主要内容主要内容lIF的基本概念lIF系统的分类lIF系统的组成lIF系统的评估lIF的现状及发展趋势一、基本概念一、基本概念定义定义lIF定义:从动态的信息流中将满足用户兴趣的信息挑选出来,用户的兴趣一般在较长一段时间内不会改变(静态)。Selective Dissemination of Information(SDI),来自图书馆领域。Routing,来自Message Understanding。Current Awareness, Data MiningIF vs
2、IR/分类分类/IElIF&IR:广义地讲,IF是IR的一部分Database动态,需求静态;Database静态,需求静态User Profile vs QueryIF用户要对系统有所了解,IR不需要。IF要涉及到用户建模/个人隐私等社会问题lIF&CategorizationCategorization中的Category不会经常改变。相对而言,User Profile会动态变化lIF&IEIF关心相关性,IE只关心抽取的那些部分,不管相关性IF applicationslInternet Search Results FilterlPersonal Email F
3、ilterlList Server/Newsgroup FilterlBrowser FilterlFilter for childrenlFilter for customers: recommendation二、二、IF分类体系分类体系IF分类示意图分类示意图Initiative of operation lActive IF systemsCollect and send relevant info to usersPush to usersInfo overload, so make accurate user profilelPassive IF systemsNot collect
4、 info for usersEmail or Usenet newsLocation of operationlAt the info sourcePost profiles to info providerClipping serviceUsually pay feelAt a filtering serverInfo provider send info to serverServe distributed info to userslAt the user siteLocal filtering systemSuch as outlook & Netscape Email &a
5、mp; FoxmailFiltering approachlCognitive filteringContent-based filteringDocument content vs user profileslSociological filteringCollaborative filtering, or properties-based filteringSimilarity between usersRecommendation systemsUser modeling & User clusteringComplement for content-based systemsM
6、ethods of acquiring knowledge about userslExplicit approachUser interrogationFilling formslImplicit approachRecording user behaviorTime/times/context/activity(save/discard/print/browsing/click)/etc.lExplicit & Implicit approachDocument space (case-based)Stereotypic inference(predefined default p
7、rofile,then change during scanning)三、三、IF系统的组成系统的组成一般组成一般组成(d)LearningComponentUserInformationProvider(b)FilteringComponent(a)DataAnalyzerComponent(c)User-ModelComponentupdatesfeedbackrelevantdata itemsrepresented data itemsdata itemspersonal detailsuserprofileData-analyzer componentlBe close to the
8、 info providerlObtain or collect data from the info providerlAnalyze & represent documents(such as Boolean Model, VSM, etc)lPass the representation to the filtering componentUser-model componentlGather info about users(explicitly and/or implicitly)lConstruct the user profiles or other user model
9、s(rules, VSM, documents center) lPass the user models to the filtering componentlUser models must be suitable for the document representationFiltering componentlThe heart of the IF systemlMatch the user profiles with the represented data itemslDecision may be binary or probabilistic (ordered by rank
10、) lThe selected items relevancy can be determined by the userlThe relevancy info can be sent to the learning component (feedback info)Learning componentlTo improve further filteringlDetect shifts in users interestslUpdate the user-model Two concepts used in IF systemslSystem based on the statistical
11、 conceptlSystem based on the knowledge-based conceptStatistical conceptlUser-model component:Profile is a weighted-vector of index terms(such as: VSM, LSI)lFiltering componentCorrelation, Cosine measureRobertson&Sparck-Jones formula (PRM)(nave) Bayesian classifierlLearning componentFeedback, que
12、ry reconstruction(such as: Rocchio)Knowledge-based conceptlRule-based and Semantic-nets filtering systems:Rule (if . Then take action), obsolescence problemUser profile represents by semantic-net (wordnet)lNeural-network filtering systemslGenetic-based filtering systemsUser modeling for IF systemslA
13、cquisition of the data for the modelImplicit approach: observation of user behaviorExplicit approach: fill forms, interact (feedback)lData included in the modelShallow semantics: keywordsEnhanced user model, high level knowledge about the user(background past experience)lSemantic networks/Stereotypi
14、c inference/Statistical inference on the relationship between words in docslUnderlying ArchitectureAgent/neural networks for auto inferred modelVSM/LSI for explicit inferenceConcept model for intelligent systemsKeyword system for statistically-based systemsLearning in IF systemslMethods of LearningL
15、earning by observationLearning by feedbackUser-training learninglFrequency of learningCritical learningPeriodic learning四、四、IF系统的评估系统的评估Methods & MeasuresEvaluation methods of IF systemslEvaluation by ExperimentslEvaluation by Simulation: such as TREClAnalytical Evaluation Measures of evaluation
16、 of IF systemslSimple Precision & RecalllStatistical MeasurementsCorrelation(User evaluation vs. System evaluation): Rank vectorlSet-based MeasurementsUtility=(A*R+)+(B*N+)+(C*R-)+(D*N-), NormalizeASP(average set precision)=P*R, if P or R=0, ASP is not suitablelUser-oriented MeasuresCoverage Rat
17、io=|Rk|/|U|=|AU|/|U|, Rk is the number of documents known to the userNovelty=|Ru|/(|Ru|+|Rk|)五、五、IF的现状及发展趋势的现状及发展趋势Current situationlIF system is indispensablelBut IF system is unreliableCommercial IF systems relevancy is about 50%Results of the TREC experiments are poorUser prefers to read non-rele
18、vant info, fear the loss of important infolStill many things to do to improve the effectiveness of IF systemsUser modelinglIntegrate several methods to model the users(Not only keywords, but also property of users and other parameters)lProfile updating & updating timelInclude a learning modulelQ
19、ueries formulation and tracking their changes over timeFiltering techniqueslGoal: get more relevant docs, although get some non-relevant docslCombining several methodslResearch directions:Intelligent agents: decentralized, based on trust,evolve, compete & collaborateVisualization techniques: mapVariety of multiple implicit resources on user behavior: open profiling standardFiltering of multimedia repositories:VOD,
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 水墨插画风儿童故事绘本双管齐下
- 典型行政法学试题与答案汇编
- 执业医师考试各科目重难点分析试题及答案
- 中国文化自信的时代意义试题及答案
- 护理临床研究设计试题及答案分析
- 常见错误与解决方案执业医师考试试题及答案
- 护理技能提升策略执业护士考试试题及答案
- 网络文化对青少年心理的影响试题及答案
- 护理学实践能力考核试题及答案
- 护理统计学基本知识试题及答案
- 2025年乡村振兴战略相关考试试题及答案
- 2025防撞缓冲车标准
- 中职ps期末考试试卷及答案
- 高温下质子交换膜燃料电池密封垫泄漏机理分析
- 廉洁课件教学课件
- 2024-2025学年全国版图知识竞赛(小学组)考试题库(含答案)
- 光催化反应的化学机理试题及答案
- 2025-2030年中国科技金融行业前景预测及投资战略规划研究报告
- 美育课程中的跨学科融合教学实践
- 2024年湖北省竹溪县事业单位公开招聘医疗卫生岗笔试题带答案
- 四川2025年四川美术学院招聘辅导员笔试历年参考题库附带答案详解
评论
0/150
提交评论