应用市场网上信息挖掘的在线书店.doc_第1页
应用市场网上信息挖掘的在线书店.doc_第2页
应用市场网上信息挖掘的在线书店.doc_第3页
应用市场网上信息挖掘的在线书店.doc_第4页
应用市场网上信息挖掘的在线书店.doc_第5页
已阅读5页,还剩22页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

毕业设计(论文)外文文献翻译专业计算机科学与技术学生姓名李蕾班级学号指导教师信息工程学院 applications of web mining for marketing of online bookstoresabstract: the purpose of this study is to identify potential customers of online bookstores through web content mining without customers transaction records and demographic information. our study first creates a list of scholars whose research field is in information technology and categories of it expertise. we then use a search engine to count the numbers of web pages related to scholars and expertise. these data are pre-processed with three key steps before being used: filtering abnormal data, normalizing data, and generating binary data. association analysis and hierarchical cluster analysis are employed to generate the clusters of scholars and the clusters of expertise. in order to test the accuracy of using web mining to predict clients interested booklists, our study evaluates the accuracy of prediction through survey. the results show that the accuracy rate of the recommended booklists targeted on potential customers (scholars) is statistically significant.keywords: web mining; association analysis; hierarchical cluster analysis; marketing; online bookstore1. introductionthe exponential growth of the internet and the evolution of the multimedia technology have grown electronic commerce (e-commerce) and offered a new business model for those industries using physical distribution system. because there are no limitations of time and space on the internet, customers, therefore, can browse among products and order easily. in the midst of an information explosion, the demand from customers starving for knowledge increases. under the circumstances, with the aid of the internet, the online bookstore not only provides the convenience for reading, but also customizes individual service, both combined to satisfy readers demand of knowledge. at present, one of the marketing approaches in the online bookstore is to email booklists to all customers in the database, expecting to increase the response and purchase rate through the massive contact. there is no pre-classification or screening in this “one to all” approach. an improved approach is to email one of the pre-designed booklists to each customer based on their past transaction records and interests. however, the most idealistic method is to use “one-to-one” marketing, which is an approach that concentrates on providing services or products to one customer at a time by identifying and then meeting their individual needs.data mining is the technique of sorting through large amounts of data and picking out relevant information (han & kamber, 2001), and can be used to achieve one-to-one marketing (berry & linoff, 1997). through the use of sophisticated algorithms identifying trends within data that go beyond simple analysis, users have the ability to identify key attributes of business processes and target opportunities.web mining, in general, is the application of data mining techniques to discover patterns from the web (baraglia and silvestri, 2007, chakrabarti, 2002, cooley et al., 1999 r. cooley, b. mobasher and j. srivastava, data preparation for mining world wide web browsing patterns, journal of knowledge and information system 1 (1) (1999), pp. 532.cooley et al., 1999, eirinaki and vazirgiannis, 2003, liu, 2007, mobasher et al., 2000 and olson and shi, 2007). for example, using association analysis to analyze users usage data, which records the users behavior when the user browses or makes transactions, on the web site and the results can make the content of the website to fit correctly with the users needs. different with data mining, there are no existing data available for web mining. web miners can use name or terminology to search and to collect data. there are lots of valuable information on the web, but it is not easy to find it. search engines provide the initial act needed to conduct more complex form of web mining.the research objectives of this article are to capitalize on search engines to collect data, identify the characteristics of potential customers through association analysis and cluster analysis, and provide recommended booklists to the targeted potential customers to improve the current marketing mode employed by the majority of online bookstores: target blindly on customers. online bookstores use two modes for marketing: finding customers for each book and finding books for each customer. these two modes, basically, can be connected to commercial value through the concept of communities of practice.the concept of a community of practice was introduced by lave and wenger (1991) and they defined it as the process of social learning that occurs when people who have a common interest in some subject or problem collaborate over an extended period to share ideas, find solutions, and build innovations. it refers as well to the stable group that is formed from such regular interactions. more recently, communities of practice have become associated with knowledge management as people have begun to see them as ways of developing social capital, nurturing new knowledge, stimulating innovation, or sharing existing tacit knowledge within an organization (wenger, mcdermott, & snyder, 2002). a community used in knowledge management looks like a cluster used in marketing. both communities and clusters show high internal homogeneity within cluster, e.g. people have similar purchase preference or a common interest in a product within cluster. there are two types of communities of practice: face-to-face based and web based communities of practice (or virtual communities). in short, from the perspective of the internet marketing, our study employs web mining to search and to collect customers data, identifies two kinds of clusters (communities) by association analysis and cluster analysis from the data, and finally aims at increasing customer satisfaction and the success of marketing.the major strength of our research is to find potential customers without existing customers background information and transaction records. in the past, most online bookstores marketed products or services based on existing customers databases and cannot actively contact potential customers those customers never transacted with online bookstores or their background information are not available. therefore, using web mining to search potential customers may increase online bookstores sales and profits.2. literature review2.1. web miningweb mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the world wide web. the web presents new challenges to the traditional data mining algorithms that work on flat data. kosala and blockeel (2000) surveyed the research in the area of web mining, suggested three web mining categories web content mining, web structuring mining, and web usage mining, and then situated some of the research with respect to theses categories. the three different types of web mining are defined as (pierrakos et al., 2003 and wikipedia, 2008): web usage mining is the application that uses data mining to analyze and discover interesting patterns of users usage data on the web. the usage data records the users behavior when the user browses or makes transactions on the web site. web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. web content mining is the automatic process to discover useful information from the content of a web page. the type of the web content may consist of text, image, audio or video data in the web. web content mining sometimes is called web text mining, because the text content is the most widely researched area. there are two groups of web content mining strategies: those that directly mine the content of documents and those that improve on the content search of other tools like search engines ( galeas, 2008). the information gathered through web mining is evaluated (sometimes with the aid of software graphing applications) by using traditional data mining techniques, such as clustering and classification, association, and examination of sequential patterns.2.2. association analysisassociation analysis is defined as the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data (han & kamber, 2001). association rule is widely used for market basket analysis. an example of such a rule is when a customer purchases a computer, calculating the probability that the software will be purchased together. the results can be used to develop marketing or advertising strategies and the items that are frequently purchased together can be placed in close proximity in order to promote the sales of such items together (berry and linoff, 1997 and han and kamber, 2001).following the original definition by agrawal, imielinski, and swami (1993), association mining is defined as:let i=i1,i2,in be a set of n binary attributes called items. let d=t1,t2,tm be a set of transactions called the database. each transaction t has a unique transaction id and contains a subset of the items in i. a rule xy is defined as an implication of the form where x,yi and xy=. the sets of items x and y are called antecedent (left-hand-side or lhs) and consequent (right-hand-side or rhs) of the rule.there are two important parameters used in association analysis to measure the accuracy and utility of association rule: support and confidence (han and kamber, 2001 and paolo, 2003). the support of an association pattern refers to the percentage of item-relevant data in the transaction data base. for example, for item a, the support is defined as:support(a)=the numbers of transactions containing a/total numbers of transactionslet a and b are sets of items, and the confidence is used to measure how often items in b appear in transaction that contain a. given a set of transactions data, the confidence of ab is defined as:(1)high support means that the item set in the association rule appears very often. high confidence means that the inference of the association rule is reliable. to assure the quality of association rules, we need to set a minimum support threshold to assure the item set in the association rule appears often, and set a minimum confidence threshold to assure the inference of the association rule is reliable.however, both high support and confidence also lead to the result of fewer interesting rules, which will be a problem in a small size of data. therefore, it is better to reference users demand to decide and balance both the high values of parameters and the numbers of interesting rules.a thorough discussion of algorithms in association analysis is beyond the scope of this paper. its basic algorithm has been applied widely (berry and linoff, 1997, han and kamber, 2001, olson and shi, 2007 and paolo, 2003).2.3. cluster analysisthe primary objective of cluster analysis is to classify objects (respondents, products, events, etc.) so that each object is very similar to others in the cluster (hair, anderson, tatham, & black, 1998). the resulting clusters of objects should show high internal (within-cluster) homogeneity and high external (between-cluster) heterogeneity (hair et al., 1998). cluster analysis may reveal associations and structure in data which, though not previously evident, nevertheless are sensible and useful once found (clustan, 2008).hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis. a key component of the analysis is the repeated calculation of distance measures between objects, and between clusters once objects begin to be grouped into clusters. the outcome is represented graphically as a dendrogram or tree graph.the initial data for the hierarchical cluster analysis of n objects is a set of n(n1)/2 object-to-object distances and a linkage function for computation of the cluster-to-cluster distances. the most common algorithms for hierarchical clustering are: single linkage, complete linkage, average linkage, centroid method, and wards method. these algorithms differ mainly in how the distance between clusters is calculated (hair et al., 1998 and statistics, 2008).3. methodology3.1. data collection and processingin taiwan, a variety of books and textbooks are sold through online bookstores and university professors are important online shoppers. moreover, they are the key persons who decide textbooks used in class. most online bookstores promote books and communicate with customers (e.g. university professors) through email. due to the problem of email spam the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages, most customers tend to delete the advertising mails directly upon receiving them. therefore, it is indeed a challenge regarding attracting receivers interests and encouraging them to read the mail.the meanings of an effective electronic mail are twofold. first, the keynote of an electronic mail should entice potential clients. second, the content of an electronic mail should be advantageous to them. in order to achieve the previous two objectives, we have to know in advance regarding clients interests and expertise. usually, it is very difficult to acquire customers individual information. to protect individual privacy, most respondents are very sensitive to answering questions relating to their privacy and do not have willingness to offer their profiles to companies. not like online bookstores which have well-established customers databases or transaction records, our study has to find an alternative approach to obtain customers interests and expertise data.with the evolution of information technology, our life is increasingly inseparable from the internet. most of the data on the internet are public and accessible. using the internet is an effective and convenient approach to gain customers data. the data collection and data processing are depicted as follows (the framework is shown on fig. 1).full-size image (57k)fig. 1.data collection, processing, and data mining.view within article3.1.1. data collection1. form a list of scholars and types of expertise our study firstly employed the database from national science council, taiwan ( .tw), to gain randomly a list of scholars (university professors) whose research interests are in information technology in taiwan and form types of expertise of information technology. after screening and deleting invalid data, we obtained a list of 200 in both scholars and expertise in information technology field. 2. count the numbers of web pageswe used search engines to search and count the numbers of web pages related to both scholars and expertise (see table 1).table 1. the numbers of web pages related to both scholars and expertise. expertise scholars 001002003004005006007008200artificial neural network1822395898312genetic algorithms6610131286811123evolution computation71181621011256fuzzy logic40101655111125grey theory4848326941data mining6310341661110web mining1116797085141expert system23335382793computer-aided instruction354811262261distance education5213131213441015e-learning655126870912internet-based education1697131318516multimedia16331111489115information retrieval210445811555computer vision12137672392computer graphics11710944130511natural language processing2852459963machine translation41073986812115cad0611167861151ic design111111191490612vlsi design11110102411167computer architecture39323122936:wireless communication37261501062view within article3.1.2. data processingthese data were pre-processed with three key steps before being used. 1. filter abnormal data first, we filtered abnormal data. for example, some names are popular or the same with famous people, which normally results in large numbers of web pages when we search. but many of them are unrelated with our target audiences those scholars in information technology field in taiwan. in order to solve this problem, we delete the names resulting in extremely large or small web pages.2. normalize dataafter deleting abnormal data, the next step is data normalization. considering the scholars expertise (denoted by x) and the names of scholars (denoted by y), the formula of normalized web page index of x and y is illustrated as follows:(2) data normalization can guarantee the value reflecting the intensity of specific expertise from specific scholars, and avoid the bias resulting from the names the same with famous people or the bias of producing abundant web pages due to the popularity of some specific expertise.3. generate binary dataafter normalizing the data, the last step of the data processing is to generate the binary data according to the following rule:if the normalized web page index is greater than the threshold,then the binary web page index of x and y=1,else the binary web page index of x and y=0.where 1 represents that the scholar has the expertise and 0 shows that the scholar does not have the expertise. for each expertis

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论