




已阅读5页,还剩41页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
杭州电子科技大学毕业设计(论文)外文文献翻译毕业设计(论文)题目基于ASP的个人图书管理系统翻译(1)题目基于协同过滤和内容预测的改进推荐算法翻译(2)题目一种基于混合体裁的个性化推荐算法学 院计算机学院专 业软件工程专业姓 名万佳琦班 级13108411学 号13108103指导教师傅婷婷译文一:基于协同过滤和内容预测的改进推荐算法研究1摘要本文提出了一种结合稀疏矩阵填充方法和协同过滤算法的算法,为了解决当系统面临一个新的项目和一些稀疏的数据时协同过滤推荐系统的“冷启动”问题。该算法提高了用户或项目的相似性计算的准确性。它预测未来填补的项目,它填补了稀疏的用户项目分数矩阵。该算法实现了一个准确的虚拟得分,并填写了一个虚拟用户项目评分表。然后进行基于此预测分数的形式。我们在MovieLens数据集上尝试。实验结果表明,该算法可以有效提高评价预测的准确性。在一定程度上,该算法解决了“冷启动”问题。索引词:推荐系统,冷启动,协同过滤,稀疏矩阵。1. 简介随着Web2.0和电子商务的迅猛发展,大量的互联网用户产生的海量数据。互联网用户所面临的问题是如何从如何找到更多的信息,如何找到更有效的信息。传统的信息检索方法难以满足不同用户的需求。由于不考虑用户之间的差异,搜索系统为所有的用户将返回相同的结果。但事实上即使使用相同的关键字,不同的用户会专注于搜索不同的信息。在此背景下,满足不同用户的不同需求,不同用户的个性化推荐,成为电子商务的新的发展方向和信息提供商。基于推荐算法的个性化推荐方法成为一个热门的研究课题1。目前,在所提出的推荐技术,协同过滤算法是众所周知的,最流行的和成功的方法。然而,传统的协同过滤算法存在一些问题,如稀疏性,可扩展性,“冷启动”和准确性2。基于协同过滤的推荐算法是非常依赖于用户项目分数的。只有当用户项目评分表产生时,才能实现推荐结果。但对于一个新项目,当没有人来评估该项目,该项目的分数将被填充。因此这项目变得乏善可陈,就不可能被推荐。这个问题导致了新的项目难以启动,这是著名的“冷启动”问题。为了解决这一问题,本文提出了一种基于协同过滤和内容预测的改进的推荐算法。当它填充用户项目分数矩阵时,该算法简单地分析了项目的相关内容。然后利用相关内容对该项目进行评分预测,并通过协同过滤推荐算法的方法进行推荐计算。2. 传统协同推荐算法的步骤常用的传统的协同算法实现了以下推荐步骤3,4。首先,建立用户项目评分矩阵。其次,填充矩阵中的空格。第三,计算用户的相似性,然后查找邻居的用户或项目。最后,计算和生成推荐结果。2.1 建立项目的用户评分矩阵首先,用户的偏好必须收集。用户可以通过各种方式向系统提交自己的偏好。Wei Jiang,Liping Yang;Research ofimprovedrecommendationalgorithmbasedoncollaborativefilteringandcontentpredictionA; 2016 11th International Conference on Computer Science & Education (ICCSE)C;IEEE;P:598-602在收集足够的数据后,该算法处理这些数据。根据不同的行为分析方法,该算法应用了一些统计方法,如加权或分组实现关于用户偏好的一个二维矩阵,即以用户项目分数的形式。2.2 用户项目评分矩阵的填充上面提到的方法所产生的分数矩阵是非常稀疏的。因此,如果相似性计算只依赖于用户的评价分数,不可避免地会有错误。因此,用户项目分数矩阵通过填写的数据,改变矩阵的密度。然后在常规分数矩阵,有灌装,模式填充,集群填充几种主要的方法。本文在传统的协同推荐算法计算中,通过填充方法对用户项目分数矩阵进行填充。它设置了缺席的得分,一个固定的值,这一般是得分系统的平均值,或是用户的平均得分或项目的平均得分。2.3 计算相似性,发现邻居用户或邻居项目在填充用户项目分数矩阵后,下一步将是根据相似的用户或用户的信息,计算类似的用户对项目的偏向。然后该算法基于类似的用户或类似的项目产生推荐。在最典型的协同过滤算法中有两个分支:一个是基于用户的协同过滤,另一种是基于项目的协同过滤。他们有一个共同点是,他们都需要计算的相似性,然后根据相似找到邻居用户或相似的项目的邻居5。常用的计算相似性的方程如下:(1) 余弦相似性的计算公式:(2) 相关相似性的计算公式:(3) 修正的余弦相似度计算公式: 以基于用户的协同过滤为例。在上述三个方程中,sim(x,y)表示用户X和用户Y之间的相似性。Rx,i(或Ry,i)表示由用户X(或Y)对项目i进行评估的评价得分。I(x,y)表示由用户X和用户Y进行评估的项目的集合。I(x)或I(y)表示由用户X或用户Y进行评估的项目的集合。表示由用户X(或Y)对项目评估的平均得分。2.4 计算和生成推荐结果上述计算后,可以实现邻居用户或邻居的项目。然后,在数据的基础上,任何项目都是通过经典的公式计算用户的推荐值(4)计算推荐值。最后生成推荐结果。以基于用户的协同过滤为例。在上述方程,Py,i表示目标用户y为项目i所给予的推荐值。Rx,i表示由目标用户Y的最近邻用户X对该项目i的得分进行评估。k是最近的邻居的数量,它可以直接规定或通过阈值决定,或被认作为前K个其相似性大于阈值的用户。3. 传统算法的改进与优化针对“冷启动”问题,改进了传统算法中填充分数矩阵的评价方法。我们主要在以下四个方面对算法进行优化。 利用过滤法建立了基于特征的物品指示矩阵。一般情况下,协同过滤推荐系统将简单地描述一个项目。例如,当豆瓣(/)推荐一部电影时,它将介绍电影的主要演员和电影风格。该系统将介绍电影是否是一个喜剧或一个悲剧,或一个各种元素的混合风格。这些描述标签可以被视为项目的关键字。因此,对于每一个项目有Xi=A1,A2,.,Aj,.An,有一些关键字或标签来描述它的内容,当Aj表示Xi的j的特征时,Aj是一个布尔值。如果它等于1,它表明Xi具有此功能,否则,它表明不具备这个功能。因此,所有的项目都可以形成一个二进制的二维矩阵。如表1所示:表1 基于特征的项目指示矩阵(1) 根据项目和用户对项目的评价指标的内容特征的相关性,对项目内容中的用户评价的权重进行了排序。项目的特征提取后,我们可以通过用户对项目的评价来分析相应的特征信息来过滤信息。通过用户的偏好和项目的特征匹配,系统判断它是否可以向用户推荐这个项目。本文采用Winnow算法6来分析电影的评价。在文本分类领域,Winnow算法的影响是广泛认可。在Winnow算法中,Xi为布尔特征值。Winnow设置每个词的权重,然后权重将构成一个线性阈值函数:WiXi,其中是阈值,Wi是初始值为0.5的权重。用户根据此用户曾经评估过的项目的关键字设置参数。例如,如果用户已经评估了一个动作类型的电影,这个用户参数将被设置为1,否则将被设置为0。如果用户对一部电影的评价升级,相应的项目关键字的权重会增加,否则,它的权重会减少。在训练中,每个用户的每一个项目将被计算,然后这个用户将达到最优权重的总和。如果这个和小于阈值,然而,这个用户的评价得分超过,我们将增加每个关键字的权重为双。如果总数超过,然而,用户的评价得分低于,我们将分配2个关键字的权重。如果权重是合适的,他们不会改变。而在训练集中,权重将被循环调整,直到所有的项目的权重都有正确操作,或将分发一个特定的时间,直到权重不会改变。训练后,对于每个用户Y,都有一组不同类型的电影Wy,k的权重。(2) 基于内容预测,我们可以实现虚拟用户的评价得分Ry,i。我们将它的值赋为公式(5),其中ry,i=0表示在用户项目分数矩阵中,由用户y评价的项目i的得分是无效的。我们执行了一个还没有由用户进行评估的初步预测的新的项目。然后,我们可以实现每个用户的电影的功能权重(例如用户y)。因此,对于一个电影i,预测的评价值Py,i生成:其中Wy,k是由用户Y评价的第k个类型的电影的权重,和Ik,i是电影的k个特征的值。(3)我们将过滤不够准确的项目。基于内容预测的推荐算法填补了用户项目评价矩阵中的空白,并修改了稀疏的用户项目评价矩阵。因此,准确地说,这个矩阵必须表示用户的偏好,否则,下一个预测值将是非常不准确的。为了保证预测方法是有效的,我们应该对预测结果进行初步的筛选。只有当结果是足够精确的,它可以证明训练后产生的权重符合用户的偏好。我们将执行以下的最佳措施。a)超前滤波。我们预先过滤用户。如果用户的评级数太少,因为如此少的样本数量,然后,我们可以考虑,这是不可能实现准确的预测值。此用户的评价被认为是无效的。因此,只有当用户的等级数(RN)超过一定数量,预测可产生。在本文中,RN的值被设置为90到100之间。b)根据用户的其他评价修改。我们用相应的比例测量评价。相应的比例(CP)是用户的预测结果对应于他的评价这个用户的RN的数目的比率。只有当足够高的,这个用户的预测值可以被视为准确。在本文中,CP的值分别设置为75%和80%。经过上述步骤的优化,我们可以确定用户的RN是足以被用来实现准确的预测结果。 (4)通过使用用户项目评分矩阵和虚拟用户的评价得分,我们可以实现一个虚拟的用户项目分数矩阵。在此基础上,利用传统的协同过滤算法中的相似度计算方法,计算相似度的计算方法。4. 试验与评价4.1 数据集和度量在本文中,我们使用MovieLens数据集进行实验,其中的数据是电影分数从1到5,标志着有看过电影的用户。MovieLens有两个不同大小的适合不同规模的算法库。我们选择的数据集MovieLens M1作为本研究的实验数据。在我们的实验中的评价指标是精度的推荐和覆盖率(CR)。1) 评价精度:一般情况下,得分预测有两种评价指标,一种是平均绝对误差(MAE),另一种是均方根误差(RMSE) 7,8。由于MAE是更受欢迎和更容易理解,本文使用的MAE来评估实验数据。假设在测试数据集上的目标客户的推荐数据集是Y = yi | i = 1,2,.,n,而真正的评级数据集是R = ri | i = 1,2,.,n。对于每一个不是0的“预测等级”是满意的公式(6)如下:其中N是测试数据集之间的项目数由目标用户给出的预测值和真实的评价值都不是0。当MAE较少时,可以达到更高的推荐精度。2) 覆盖率(CR):CR是可以预测项目总金额的项目的总数的比率。因此,假设为用户提供的预测值集是Y = yi | i = 1,2,.,n,然后yi 0的数量是Ki,用户Y的覆盖率为CR=Ki/N。4.2 实验结果与分析在这一领域的大量的实验论文和研究论文表明,余弦相似性度量方法的预测精度优于其他算法。因此,本文在不同的参数的情况下,通过使用以下的四种不同的策略进行了比较实验:a) 使用余弦相似的传统的推荐算法(CRA);b) 利用改进的余弦相似性的传统推荐算法;c) 基于内容的预测和利用余弦相似度的协同过滤的改进推荐算法(IRA);d) 基于内容预测和协同过滤的改进的余弦相似度的改进推荐算法。 我们的实验取得了CRA,IRA和非优化的IRA的结果。下面的表2到表5分别展示了CRA和IRA不同参数设置的结果。如上所述,使用改进的余弦相似性的推荐算法比其他的更好。在下表中的结果表明,该推荐结果可以更准确,如果CP值是更大。例如,当CP为80%时的推荐结果比在75%时的结果更准确。表2 传统推荐算法的结果表3 改进推荐算法的结果(RN = 90,CP = 75%)表4 改进推荐算法的结果(RN = 90,CP = 80%)表5 改进推荐算法的结果(RN = 100,CP = 80%) 下面的图1表明采用修正的余弦相似度和RN的实验结果为100,CP为80%。试验表明,MAE值的变化与IRA,CRA和非优化的IRA的不同数量的邻居有关。结果表明,在本文中提出的优于其他两种算法的基于内容的混合协同过滤推荐算法,随着参数CP和RN的调整有更高的精度。此外,虽然非优化的IRA的覆盖率几乎是100%,但在该算法可以推荐的所有项目,它的MAE的值比IRA和CRA少很多。证明优化推荐算法比没有优化混合算法具有更好的推荐精度。图1 MAE的值观随邻居的数量而变化图2 IRA和CRA在不同情况下的参数设置的CR值图2显示在CRA和IRA的CR值不同参数设置。结果表明,CR在IRA不同参数设置下的值比CRA更好。此外,它表明,覆盖率也随着邻居的增量增加。结果表明,基于内容的预测可以保持高的精度,同时具有高的覆盖率。这一结果表明,该方法可以帮助克服“冷启动”的问题。因此,如上所示的结果表明,本文提出的推荐算法,结合基于内容预测和协同过滤算法的方法是有效的,它执行得很好。5. 总结本文提出了一种新的基于内容预测的预测项目,即使没有被评估也能生成推荐用户使用的协同过滤推荐算法。我们的实验结果表明,我们的方法不仅可以确保预测的准确性,但也可以提高覆盖率。它是一种有效的、可行的算法。6.感谢这项工作是由中国的基础研究基金资助的中央大学资助下的No.2662015QC040。原文一:Research of Improved Recommendation Algorithm Based on Collaborative Filtering and Content Prediction AbstractThis paper proposes an algorithm combining the sparse matrix filling method and the collaborative filtering algorithm, in order to solve the collaborative filtering recommendation systems “cold start” problem when the system confronts a new item and some sparse data. This algorithm improves the accuracy of similarity calculation for the user or the item. It predicts ahead the item which is to be filled when it fills the sparse user-item scores matrix. The algorithm achieves an accurate virtual score and fills out a virtual user-item scores form. Then the algorithm carries out the prediction based on this scores form. We experimented on the MovieLens dataset. The experiment results showed that this algorithm can improve the accuracy of the evaluation prediction effectively. To a certain extent, this algorithm solves the “cold start” problem.Index TermsRecommendation system, cold start, collaborative filtering, sparse matrix.INTRODUCTION With rapid development of web2.0 and e-commerce, the massive Internet users produce massive data. The problems confronted by the Internet users change from how to find more information to how to find more effective information. Conventional information searching method is hard to satisfy the demand of different users. Because the difference among the users is not considered, the searching system returns the same results for all the users. But in fact different users will focus on different information to be searched, even if the same keyword is used. Against this background, in order to satisfy different users different demand, the personalized recommendation with different contents for the different users becomes the new development direction for the e-commerce and information provider. The personalized recommendation methods based on recommendation algorithms become a hot research topic 1. Currently, among the proposed recommendation technology, the collaborative filtering algorithm is well known to be the most popular and successful method. However, there are problems in the conventional collaborative filtering algorithm, such as sparsity, expansibility, “cold start” and accuracy 2. The recommendation algorithm based on collaborative filtering is very dependent on the user-item scores form. The recommendation results can be achieved only if the user-item scores form is produced. But for a new item, when there is nobody to evaluate the item, this items scores form will be filled by the means. Thus this item becomes unimpressive and it will be impossible to be recommended. This problem leads to the new item hard to be started, which is the famous “cold start” problem. In order to solve the problem, this paper proposes an improved recommendation algorithm based on collaborative filtering and content prediction. This algorithm simply analyzes the items relative contents when it fills the user-item scores matrix. Then the algorithm carries out scoring prediction for this item using the relative contents and performs the recommendation calculation by the means of the collaborative filtering recommendation algorithm.STEPS OF CONVENTIONAL COLLABORATIVE RECOMMENDATION ALGORITHM The conventional collaborative algorithm which is used most popularly realizes the recommendation by the following steps 3,4. Firstly, build the user-item scores matrix. Secondly, fill the blanks in the matrix. Thirdly, calculate the similarity of the users and then find the neighbor users or items. Finally, compute and produce the recommendation results.Building the User-item Scores Matrix Firstly the users preference must be collected. The user can submit his preference to the system through various means. The algorithm deals with the data properly after sufficient data has been collected. According to different behavioral analytic methods, the algorithm applies some statistic methods such as weighting or grouping to achieve a two-dimensional matrix about the users preference, namely user-item scores form.Filling the User-item Scores Matrix The scores matrix generated by the methods mentioned above is very sparse. Therefore, there are errors inevitably, if the similarity is calculated only relying on the users evaluation scores. Thus the user-item scores matrix should be filled by data in order to make the matrix denser. Then in the conventional scores matrix, there are such main methods as means filling, modes filling and clusters filling. This paper fills the user-item scores matrix by the means filling methods in the conventional collaborative recommendation algorithm calculation. It sets up the absent score to a fixed value which is generally the mean of the scoring system or is the users average score or is the items average score.Calculating the Similarity and Finding the Neighbor Users or Neighbor Items After filling the user-item scores matrix, the next step is to calculate the similar users or items according to this users preference. And then the algorithm produces the recommendation based on the similar users or similar items. There are two branches in the most typical collaborative filtering algorithm: one is the collaborative filtering based on users, the other is the collaborative filtering based on items. What they have in common is that both of them demand calculating the similarity and then find the neighbor users or the neighbors of similar items according to the similarity 5. The common equations of calculating the similarity are as following.The equation of calculating the Cosine similarity: The equation of calculating the correlation similarity:The modified equation of calculating the cosine similarity: Take the collaborative filtering based on users for example. In the above three equations, the function sim( x, y) denotesthe similarity between the user x and the user y . Rx,i (or Ry,i )denotes the evaluation score assessed by the user x (or y ) for the item i . The I( x, y ) denotes the set of the items which are evaluated by both the user x and the user y . The I( x ) (or I( y ) ) denotes the set of the items which are evaluated by the user x or the user y . The x (or y ) denotes the average score assessed by the user x (or y ) for the items.Calculating and Generating the Recommendation Results After the above calculation, the neighbor users or neighbor items can be achieved. And then, on the basis of the data, the users recommendation values for any items are calculated by means of the classical Eq. (4) about calculating recommendation values. At last the recommendation results are generated. Take the collaborative filtering based on users for example.In the above equation, py,i denotes the recommendation value given by the objective user y for the item i . The Rx,i is the score evaluated by one of the objective user y s nearest neighbor users x to the item i . The k is the number of the nearest neighbors, which can be prescribed directly or be decided by means of the threshold, or be considered as the top k users whose similarity is more than the threshold.IMPROVEMENT AND OPTIMIZATION OF CONVENTIONAL ALGORITHM In view of the problem “cold start”, we improve the evaluation methods about filling the scores matrix in the conventional algorithm. We mainly optimize the algorithm in the following four aspects. We build the items indication matrix based on the features by use of the filtration method. Generally, the collaborative filtering recommendation system would describe an item simply. For example, when Douban (/ ) recommends a film, it will introduce the main actors in the film and its style. The system will introduce whether the film is a comedy or a tragedy, or a mixed style of various elements. These description labels can be considered as the items keywords. Thus, for each item X A , A , A , A , there are some keywords or labels to describe its content, Where Aj denotes the j -th feature of Xi . The Aj is a boolean value. If it equals 1, it indicates that Xi possesses this feature, otherwise, it indicates that Xi doesnt possess this feature. Therefore, all the items can form a binary two-dimensional matrix about keywords as shown in Table I.TABLE I. THE ITEMS INDICATION MATRIX BASED ON FEATURESFeature A1Feature A2Feature AnItem X1011Item X 2110Item Xm101 We train the weight of users evaluation in the item content according to the content features correlation among the items and users evaluation for the items. After the items features are extracted, we can analyze the corresponding feature information to filter the information by means of users evaluation about the item. By virtue of users preference and the items feature matching, the system judges whether it can recommend this item to the user or not. This paper adopts the algorithm Winnow 6 to analyze the evaluation of films. In the field of texts classification, the effect of Winnow is recognized widely. In the algorithm Winnow, the Xi is considered as a boolean feature value. Winnow will set the weight of each word, and then the weights will compose a linear threshold function: Wi Xi, where is the threshold and Wi is the weight whose initial value is 0.5. The user parameter is set up according to the keyword of the item which this user has ever evaluated. For example, if the user has evaluated a film of action type, this user parameter will be set to 1; otherwise it will be set to 0. If users evaluation of a film promotes, the films weight of corresponding item keyword will increase, otherwise, its weight will reduce. In the training, each item of each user will be calculated, and then the sum of the optimal weights on this user will be achieved. If this sum is fewer than the threshold , yet the evaluation score of this user is more than, we will increase the weight of each keyword to be double. If this sum is more than, yet the users evaluation score is fewer than, we will divide the weight of each keyword by 2. If the weights are suitable, they wouldnt be changed. In the training set, the weights will either be adjusted circularly until all the item weights are operated correctly, or will be circulated a certain times until the weights wouldnt change. After the training, there is a weight set of different type of films Wy,k for each user y . We can achieve the virtual users evaluation score Ry,i based on content prediction. We assign its value as shown in Eq. (5), where ry,i 0 indicates that in the user-item scores matrix, the score of the item i evaluated by the user y is void.We perform the preliminary prediction for a new item which hasnt been evaluated by the user. Then we can achieve the film features weight of each user (for example the user y). Thus,for a film i,a predicted evaluation value py,i is generated: where Wy,k is the k-th type of films weight evaluated by the user y , and Ik ,i is the film i s value for its k-th feature. We will filter the items which are not accurate enough. The recommendation algorithm based on content prediction mentioned above fills the blanks in the user-item evaluation matrix and modify the sparse user-item evaluation matrix. Therefore, this matrix must indicate the users preference accurately, or else, the next prediction value will be wildly inaccurate. In order to ensure that the prediction method is effective, we should filter the prediction results preliminarily. Only if the result is accurate enough, it can prove that the weights generated after training conform to the users preference. We will perform the following optimal measures. Filtering in advance. We filter the users in advance. If a users rating number is too few, because of so few numbers of samples, then we can consider that it is impossible to achieve the accurate prediction value. This users evaluation is considered to be invalid. Therefore, only if the users rating number ( RN ) is m
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 煤炭工业政策解读
- 2025年小型路面保洁设备项目合作计划书
- 2025年智能压力发生器合作协议书
- 2025年工艺礼品加工设备项目合作计划书
- 2025年结构化布线系统的检测设备项目合作计划书
- 智慧树知道网课《恰同学少年(湘潭大学)》课后章节测试答案
- 高考二元思辨作文之开头、论证、结尾段落
- 2025年全国成人高等学校招生考试(历史地理)(高起本)经典试题及答案五
- 2025年中国可堆肥塑料袋行业市场全景分析及前景机遇研判报告
- 2025本地二手车买卖协议书
- 不同负重增强式训练对跆拳道运动员下肢肌肉力量和灵敏素质的影响
- 网络安全等级保护备案表(2025版)
- 村书记考试试题及答案
- 《库存优化模型》课件
- 幼儿园办公家具教学家具采购招标文件
- 生产承包劳务合同协议
- 2023-2024部编人教版5五年级语文上册电子课本课件【全册】
- 抓草机管理制度
- 选煤厂安全知识培训课件
- 全新人教版七年级上册生物教案(全册)
- 道路工程检验批划分
评论
0/150
提交评论