[经济学]Applications and Trends in Data Mining.ppt_第1页
[经济学]Applications and Trends in Data Mining.ppt_第2页
[经济学]Applications and Trends in Data Mining.ppt_第3页
[经济学]Applications and Trends in Data Mining.ppt_第4页
[经济学]Applications and Trends in Data Mining.ppt_第5页
已阅读5页,还剩46页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

2019年5月3日星期五,Data Mining: Concepts and Techniques,1,Data Mining: Concepts and Techniques Chapter 11 Applications and Trends in Data Mining,楊尚儒,王國華 2006 Jiawei Han and Micheline Kamber. All rights reserved.,2019年5月3日星期五,Data Mining: Concepts and Techniques,2,2019年5月3日星期五,Data Mining: Concepts and Techniques,3,Applications and Trends in Data Mining,Data mining applications Data mining system products and research prototypes Additional themes on data mining Social impacts of data mining Trends in data mining Summary,2019年5月3日星期五,Data Mining: Concepts and Techniques,4,Data Mining Applications,Data mining is an interdisciplinary field with wide and diverse applications There exist nontrivial gaps between data mining principles and domain-specific applications Some application domains Financial data analysis Retail industry Telecommunication industry Biological data analysis,2019年5月3日星期五,Data Mining: Concepts and Techniques,5,Data Mining for Financial Data Analysis,Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality Design and construction of data warehouses for multidimensional data analysis and data mining View the debt and revenue changes by month, by region, by sector, and by other factors Access statistical information such as max, min, total, average, trend, etc. Loan payment prediction/consumer credit policy analysis feature selection and attribute relevance ranking Loan payment performance Consumer credit rating,2019年5月3日星期五,Data Mining: Concepts and Techniques,6,Financial Data Mining,Classification and clustering of customers for targeted marketing multidimensional segmentation by nearest-neighbor, classification, decision trees, etc. to identify customer groups or associate a new customer to an appropriate customer group Detection of money laundering and other financial crimes integration of from multiple DBs (e.g., bank transactions, federal/state crime history DBs) Tools: data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools (find unusual access sequences),2019年5月3日星期五,Data Mining: Concepts and Techniques,7,Data Mining for Retail Industry,Retail industry: huge amounts of data on sales, customer shopping history, etc. Applications of retail data mining Identify customer buying behaviors Discover customer shopping patterns and trends Improve the quality of customer service Achieve better customer retention and satisfaction Enhance goods consumption ratios Design more effective goods transportation and distribution policies,2019年5月3日星期五,Data Mining: Concepts and Techniques,8,Data Mining in Retail Industry (2),Ex. 1. Design and construction of data warehouses based on the benefits of data mining Multidimensional analysis of sales, customers, products, time, and region Ex. 2. Analysis of the effectiveness of sales campaigns Ex. 3. Customer retention: Analysis of customer loyalty Use customer loyalty card information to register sequences of purchases of particular customers Use sequential pattern mining to investigate changes in customer consumption or loyalty Suggest adjustments on the pricing and variety of goods Ex. 4. Purchase recommendation and cross-reference of items,2019年5月3日星期五,Data Mining: Concepts and Techniques,9,Data Mining for Telecomm. Industry (1),A rapidly expanding and highly competitive industry and a great demand for data mining Understand the business involved Identify telecommunication patterns Catch fraudulent activities Make better use of resources Improve the quality of service Multidimensional analysis of telecommunication data Intrinsically multidimensional: calling-time, duration, location of caller, location of callee, type of call, etc.,2019年5月3日星期五,Data Mining: Concepts and Techniques,10,Data Mining for Telecomm. Industry (2),Fraudulent pattern analysis and the identification of unusual patterns Identify potentially fraudulent users and their atypical usage patterns Detect attempts to gain fraudulent entry to customer accounts Discover unusual patterns which may need special attention Multidimensional association and sequential pattern analysis Find usage patterns for a set of communication services by customer group, by month, etc. Promote the sales of specific services Improve the availability of particular services in a region Use of visualization tools in telecommunication data analysis,2019年5月3日星期五,Data Mining: Concepts and Techniques,11,Biomedical Data Analysis,DNA sequences: 4 basic building blocks (nucleotides): adenine (A), cytosine (C), guanine (G), and thymine (T). Gene: a sequence of hundreds of individual nucleotides arranged in a particular order Humans have around 30,000 genes Tremendous number of ways that the nucleotides can be ordered and sequenced to form distinct genes Semantic integration of heterogeneous, distributed genome databases Current: highly distributed, uncontrolled generation and use of a wide variety of DNA data Data cleaning and data integration methods developed in data mining will help,2019年5月3日星期五,Data Mining: Concepts and Techniques,12,DNA Analysis: Examples,Similarity search and comparison among DNA sequences Compare the frequently occurring patterns of each class (e.g., diseased and healthy) Identify gene sequence patterns that play roles in various diseases Association analysis: identification of co-occurring gene sequences Most diseases are not triggered by a single gene but by a combination of genes acting together Association analysis may help determine the kinds of genes that are likely to co-occur together in target samples Path analysis: linking genes to different disease development stages Different genes may become active at different stages of the disease Develop pharmaceutical interventions that target the different stages separately Visualization tools and genetic data analysis,2019年5月3日星期五,Data Mining: Concepts and Techniques,13,Applications and Trends in Data Mining,Data mining applications Data mining system products and research prototypes Additional themes on data mining Social impacts of data mining Trends in data mining Summary,2019年5月3日星期五,Data Mining: Concepts and Techniques,14,How to Choose a Data Mining System?,Commercial data mining systems have little in common Different data mining functionality or methodology May even work with completely different kinds of data sets Need multiple dimensional view in selection Data types: relational, transactional, text, time sequence, spatial? System issues running on only one or on several operating systems? a client/server architecture? Provide Web-based interfaces and allow XML data as input and/or output?,2019年5月3日星期五,Data Mining: Concepts and Techniques,15,How to Choose a Data Mining System? (2),Data sources ASCII text files, multiple relational data sources support ODBC connections (OLE DB, JDBC)? Data mining functions and methodologies One vs. multiple data mining functions One vs. variety of methods per function More data mining functions and methods per function provide the user with greater flexibility and analysis power Coupling with DB and/or data warehouse systems Four forms of coupling: no coupling, loose coupling, semitight coupling, and tight coupling Ideally, a data mining system should be tightly coupled with a database system,2019年5月3日星期五,Data Mining: Concepts and Techniques,16,How to Choose a Data Mining System? (3),Scalability Row (or database size) scalability Column (or dimension) scalability Curse of dimensionality: it is much more challenging to make a system column scalable that row scalable Visualization tools “A picture is worth a thousand words” Visualization categories: data visualization, mining result visualization, mining process visualization, and visual data mining Data mining query language and graphical user interface Easy-to-use and high-quality graphical user interface Essential for user-guided, highly interactive data mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,17,Examples of Data Mining Systems (1),Mirosoft SQLServer 2005 Integrate DB and OLAP with mining Support OLEDB for DM standard SAS Enterprise Miner A variety of statistical analysis tools Data warehouse tools and multiple data mining algorithms IBM Intelligent Miner A wide range of data mining algorithms Scalable mining algorithms Toolkits: neural network algorithms, statistical methods, data preparation, and data visualization tools Tight integration with IBMs DB2 relational database system,2019年5月3日星期五,Data Mining: Concepts and Techniques,18,Examples of Data Mining Systems (2),SGI MineSet Multiple data mining algorithms and advanced statistics Advanced visualization tools Clementine (SPSS) An integrated data mining development environment for end-users and developers Multiple data mining algorithms and visualization tools,2019年5月3日星期五,Data Mining: Concepts and Techniques,19,11.3 Additional Themes on Data Mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,20,Theoretical Foundations of Data Mining,資料化約: 根據此理論資料探勘的基礎是減少資料的描述。在大型資料庫裡,資料化約能換來對查詢的快速近似應答。 資料壓縮: 根據此理論,資料探勘的基礎是對給定的資料進行壓縮,一般是依照bits, association rules, decision trees, clusters等進行編碼實做。,2019年5月3日星期五,Data Mining: Concepts and Techniques,21,Theoretical Foundations of Data Mining,模式發現: 在這個理論中,資料探勘的基礎是在於資料庫中發現模式,例如關聯法則、分類模型、序列模式。 機率理論: 根據統計理論而來的這一個理論中,資料探勘的基礎是發現隨機變數的聯合機率分布。 微觀經濟觀點: 把資料探勘當作發現模式的任務,透過資料探勘來發現那些對企業決策過程有用的並在一定程度上有趣的模式。這個觀點認為,如果模式有用的話,則認為是有趣的觀點。,2019年5月3日星期五,Data Mining: Concepts and Techniques,22,Theoretical Foundations of Data Mining,歸納資料庫 根據此理論,一個資料庫要被看作是由儲存在資料庫中的模式和資料所組成的。資料探勘的問題變成了對資料庫進行歸納的問題,他的任務是查詢資料庫中的資料和理論(模式)。,2019年5月3日星期五,Data Mining: Concepts and Techniques,23,Statistical Data Mining,書上所介紹的資料探勘技術主要都是資料庫導向的。但是還有很多用於統計資料尤其是數值資料分析的技術。這些技術被過展到應用到科學以及經濟或是社會科學資料中。 回歸: 用來預測一個或是多個預 測變數來的響應變數的值 。同時也有很多回歸方法 ,例如線性回歸、多重回 歸、加權回歸、多項式回 歸等。,2019年5月3日星期五,Data Mining: Concepts and Techniques,24,Statistical Data Mining,廣義化線性迴歸: 這些模型和他們的廣義化模型 ,允許一個分類響應數和一系 列預測變數相關,這和使用線 性迴歸模型中的數值響應變數 類似 變異數分析: 用一個數值響應變數和 一個或多個分類變量來 描述兩個或多個群組的資料。,2019年5月3日星期五,Data Mining: Concepts and Techniques,25,Statistical Data Mining,回歸樹: 利用MSE(平均平方差) 所建立的一個二元樹, 可以利用在分類或是預 測之上。 混合效應模型: 用來分析依個或是多個 組變數分類的資料。他 們透過一個或多個因數 欄描述一個響應變數和 一些共變數之間的關連。,2019年5月3日星期五,Data Mining: Concepts and Techniques,26,Statistical Data Mining,因數分析: 用來決定哪些變數一起產生了一個給定因子。譬如對於許多精神病學資料,不可能接測量某個特別的因子;然而,用於測量其他的數量(考試成績)是可能的。,2019年5月3日星期五,Data Mining: Concepts and Techniques,27,Statistical Data Mining,判別式分析 這種技術可以用來預測分類響應變數,他不像廣義 化線性模型假設自變數遵循多元正態分佈,整個過 程企圖決定幾個判別函式,用來區別由響應變數定 義的群組。,2019年5月3日星期五,Data Mining: Concepts and Techniques,28,Statistical Data Mining,時間序列: 很多技術用來分析時間序列資料。例如:自回歸方法,univariate ARIM,自動回歸組合模型,長記憶的時間序列模型。 品質控制: 各種統計法可以用來準備品質控制的圖表。例如:Shewhart圖表和Cusum圖表。這些統計包括:平均值、標準差、區間、計數、移動平均、移動標準差和移動區間。,2019年5月3日星期五,Data Mining: Concepts and Techniques,29,Statistical Data Mining,倖存分析: 此分析起初是用來預測一個病人經過治療之後能活到多少t的時間。此分析也可用在製造設備,估計工業設備的生命週期。,2019年5月3日星期五,Data Mining: Concepts and Techniques,30,Visual Data Mining,Visualization: 使用CG(電腦繪圖)去建立影像來輔助了解複雜或大型的資料。 Visual Data Mining: 結合資料視覺化和資料探勘兩門科學。 CG 多媒體系統 使用者介面 高效能的運算,2019年5月3日星期五,Data Mining: Concepts and Techniques,31,Visual Data Mining,Visual Data Mining 資料視覺化 資料探勘結果視覺化 資料探勘過程視覺化 互動式的視覺化資料探勘 資料視覺化 在資料庫或是資料倉儲中的資料可以建立不同等級或是分級,也可以看作是由不同屬性和維度所組成的,由盒狀圖、三維立方體、資料分布圖、曲線曲面等等方式顯示。,2019年5月3日星期五,Data Mining: Concepts and Techniques,32,2019年5月3日星期五,Data Mining: Concepts and Techniques,33,Visual Data Mining,資料探勘結果視覺化 將探勘後所得的資料利用視覺化的形式去表示出來。形式包括散列圖、盒狀圖、決策樹、關聯原則、叢集、孤立點以及廣義化規則等。,Visualization of Data Mining Results in SAS Enterprise Miner,2019年5月3日星期五,Data Mining: Concepts and Techniques,34,Visual Data Mining,資料探勘過程視覺化 利用視覺化的形描述探勘過程,使用者可以瞭解: 資料的取出 資料的出處 清理、整合、前處理及探勘的過程。 探勘方法的選取 結果儲存,2019年5月3日星期五,Data Mining: Concepts and Techniques,35,Visual Data Mining,資料探勘過程視覺化可以提供使用者許多不同的資料探勘處理方式。,Understand variations with visualized data,See your solution discovery process clearly,2019年5月3日星期五,Data Mining: Concepts and Techniques,36,Visual Data Mining,互動式的視覺化資料探勘 利用視覺化工具協助使用者作出正確的資料探勘決策。 例如使用彩色的扇形區來表示,這種表示可以幫助使用者決定哪個扇形區會先被選中,哪個地方是最好的分隔點。,2019年5月3日星期五,Data Mining: Concepts and Techniques,37,Visual Data Mining,Interactive Visual Mining by Perception-Based Classification (PBC),2019年5月3日星期五,Data Mining: Concepts and Techniques,38,Data Mining and Collaborative Filtering,現今使用者會在網路上面對成千上萬的產品以及服務,建議系統(Recommender System)可以透過產品建議來協助使用者進行線上交易。在此(Collaborative Filtering)被廣泛的應用。,2019年5月3日星期五,Data Mining: Concepts and Techniques,39,11.4 Social Impacts of Data Mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,40,Ubiquitous and Invisible Data Mining,Data mining 已經被廣泛的在日常生活中被使用,即使使用者本身不了解或是沒有注意到,資料探勘已經被大量的使用在日常生活中。 Customer relationship management Web-wide tracking,2019年5月3日星期五,Data Mining: Concepts and Techniques,41,Data Mining, Privacy, and Data Security,隨著資料探勘及電子資料的發展,越來越多的資訊被記錄。這樣的狀況連帶的影響到我們的隱私及資料安全。 1980年,OECD提出了Fair information practices來保護使用者的隱私及資料安全。 Purpose specification and use limitation Openness Security Safeguards Individual Participation,2019年5月3日星期五,Data Mining: Concepts and Techniques,42,Data Mining, Privacy, and Data Security,Data Mining for Counterterrorism Data security-enhancing techniques Privacy-preserving data mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,43,Applications and Trends in Data Mining,Data mining applications Data mining system products and research prototypes Additional themes on data mining Social impacts of data mining Trends in data mining Summary,2019年5月3日星期五,Data Mining: Concepts and Techniques,44,Trends in Data Mining (1),Application exploration development of application-specific data mining system Invisible data mining (mining as built-in function) Scalable data mining methods Constraint-based mining: use of constraints to guide data mining systems in their search for interesting patterns Integration of data mining with database systems, data warehouse systems, and Web database systems Invisible data mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,45,Trends in Data Mining (2),Standardization of data mining language A standard will facilitate systematic development, improve interoperability, and promote the education and use of data mining systems in industry and society Visual data mining New methods for mining complex types of data More research is required towards the integration of data mining methods with existing data analysis techniques for the complex types of data Web mining Privacy protection and information security in data mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,46,Applications and Trends in Data Mining,Data mining applications Data mining system products and research prototypes Additional themes on data mining Social impacts of data mining Trends in data mining Summary,2019年5月3日星期五,Data Mining: Concepts and Techniques,47,Summary,Domain-specific applications include biomedicine (DNA), finance, retail and telecommunication data mining There exist some data mining systems and it is important to know their power and limitations Visual data mining include data visualization, mining result visualization, mining process visualization and interactive visual mining There are many other scientific and statistical data mining methods developed but not covered in this book Also, it is important to study theoretical foundations of data mining Intelligent query answering can be integrated with mining It is important to watch privacy and security issues in data mining,2019年5月3日星期五,Data Mining: Concepts and Techniques,48,References (1),M. Ankerst, C. Elsen, M. Ester, and H.-P. Kriegel. Visual classification: An interactive approach to decision tree construction. KDD99, San Diego, CA, Aug. 1999. P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 1998. S. Benninga and B. Czaczkes. Financial Modeling. MIT Press, 1997. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984. M. Berthold and D. J. Hand. Intelligent Data Analysis: An Introduction. Springer-Verlag, 1999. M. J. A. Berry and G. Linoff. Mastering Data Mining: The Art and Science of Customer Relationship Management. John Wiley & Sons, 1999. A. Baxevanis and B. F. F. Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. John Wiley & Sons, 1998. Q. Chen, M. Hsu, and U. Dayal. A data-warehouse/OLAP framework for scalable telecommunication tandem traffic analysis. ICDE00, San Diego, CA, Feb. 2000. W. Cleveland. Visualizing Data. Hobart Press, Summit NJ, 1993. S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. VLDB98, New York, NY, Aug. 1998.,2019年5月3日星期五,Data Mining: Concepts and Techniques,49,References (2),J. L. Devore. Probability and Statistics for Engineering and the Science, 4th ed. Duxbury Press, 1995. A. J. Dobson. An Introduction to Generalized Linear Models. Chapman and Hall, 1990. B. Gates. Business the Speed of Thought. New York: Warner Books, 1999. M. Goebel and L. Gruenwald. A survey of data mining and knowledge discovery softw

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论