slope-one算法.docx

上传人：仙*** IP属地：河南上传时间：2020-04-14 格式：DOCX 页数：7 大小：123.33KB 积分：12 举报 版权申诉

slope-one算法.docx_第1页

slope-one算法.docx_第2页

slope-one算法.docx_第3页

slope-one算法.docx_第4页

slope-one算法.docx_第5页

已阅读5页，还剩2页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

简要概述Slope One 是一系列应用于协同过滤的算法的统称。由 Daniel Lemire和Anna Maclachlan于2005年发表的论文中提出。有争议的是，该算法堪称基于项目评价的non-trivial 协同过滤算法最简洁的形式。该系列算法的简洁特性使它们的实现简单而高效，而且其精确度与其它复杂费时的算法相比也不相上下。该系列算法也被用来改进其它算法Slope One 算法试图同时满足这样的的 5 个目标：1. 易于实现和维护：普通工程师可以轻松解释所有的聚合数据，并且算法易于实现和测试。 2. 运行时可更新的：新增一个评分项，应该对预测结果即时产生影响。 3. 高效率的查询响应：快速的执行查询，可能需要付出更多的空间占用作为代价。 4. 对初次访问者要求少：对于一个评分项目很少的用户，也应该可以获得有效的推荐。 5. 合理的准确性：与最准确的方法相比，此方法应该是有竞争力的，准确性方面的微小增长不能以简单性和扩展性的大量牺牲为代价。基本概念Slope One的基本概念很简单, 例子1, 用户X, Y和A都对Item1打了分. 同时用户X,Y还对Item2打了分, 用户A对Item2可能会打多少分呢?UserRating to Item 1Rating to Item 2X53Y43A4?根据SlopeOne算法, 应该是:4 - (5-3) + (4-3)/2 = 2.5.解释一下. 用户X对Item1的rating是5, 对Item2的rating是3, 那么他可能认为Item2应该比Item1少两分. 同时用户Y认为Item2应该比Item1少1分. 据此我们知道所有对Item1和Item2都打了分的用户认为Item2会比Item1平均少1.5分. 所以我们有理由推荐用户A可能会对Item2打(4-1.5)=2.5分;很简单是不是? 找到对Item1和Item2都打过分的用户, 算出rating差的平均值, 这样我们就能推测出对Item1打过分的用户A对Item2的可能Rating, 并据此向A用户推荐新项目.这里我们能看出Slope One算法的一个很大的优点, 在只有很少的数据时候也能得到一个相对准确的推荐, 这一点可以解决Cold Start的问题.加权算法接下来我们看看加权算法(Weighted Slope One). 如果有100个用户对Item1和Item2都打过分, 有1000个用户对Item3和Item2也打过分. 显然这两个rating差的权重是不一样的. 因此我们的计算方法是(100*(Rating 1 to 2) + 1000(Rating 3 to 2) / (100 + 1000)代码实现using System;using System.Collections.Generic;using System.Linq;using System.Text;namespace SlopeOne public class Rating public float Value get; set; public int Freq get; set; public float AverageValue get return Value / Freq; public class RatingDifferenceCollection : Dictionary private string GetKey(int Item1Id, int Item2Id) return (Item1Id Item2Id) ? Item1Id + / + Item2Id : Item2Id + / + Item1Id ; public bool Contains(int Item1Id, int Item2Id) return this.Keys.Contains(GetKey(Item1Id, Item2Id); public Rating thisint Item1Id, int Item2Id get return thisthis.GetKey(Item1Id, Item2Id); set thisthis.GetKey(Item1Id, Item2Id) = value; public class SlopeOne public RatingDifferenceCollection _DiffMarix = new RatingDifferenceCollection(); / The dictionary to keep the diff matrix public HashSet _Items = new HashSet(); / Tracking how many items totally public void AddUserRatings(IDictionary userRatings) foreach (var item1 in userRatings) int item1Id = item1.Key; float item1Rating = item1.Value; _Items.Add(item1.Key); foreach (var item2 in userRatings) if (item2.Key = item1Id) continue; / Eliminate redundancy int item2Id = item2.Key; float item2Rating = item2.Value; Rating ratingDiff; if (_DiffMarix.Contains(item1Id, item2Id) ratingDiff = _DiffMarixitem1Id, item2Id; else ratingDiff = new Rating(); _DiffMarixitem1Id, item2Id = ratingDiff; ratingDiff.Value += item1Rating - item2Rating; ratingDiff.Freq += 1; / Input ratings of all users public void AddUerRatings(IListIDictionary Ratings) foreach(var userRatings in Ratings) AddUserRatings(userRatings); public IDictionary Predict(IDictionary userRatings) Dictionary Predictions = new Dictionary(); foreach (var itemId in this._Items) if (userRatings.Keys.Contains(itemId) continue; / User has rated this item, just skip it Rating itemRating = new Rating(); foreach (var userRating in userRatings) if (userRating.Key = itemId) continue; int inputItemId = userRating.Key; if (_DiffMarix.Contains(itemId, inputItemId) Rating diff = _DiffMarixitemId, inputItemId; itemRating.Value += diff.Freq * (userRating.Value + diff.AverageValue * (itemId inputItemId) ? 1 : -1); itemRating.Freq += diff.Freq; Predictions.Add(itemId, itemRating.AverageValue); return Predictions; public static void Test() SlopeOne test = new SlopeOne(); Dictionary userRating = new Dictionary(); userRating.Add(1, 5); userRating.Add(2, 4); userRating.Add(3, 4); test.AddUserRatings(userRating); userRating = new Dictionary(); userRating.Add(1, 4); userRating.Add(2, 5); userRating.Add(3, 3); userRating.Add(4, 5); test.AddUserRatings(userRating); userRating = new Dictionary(); userRating.Add(1, 4); userRating.Add(2, 4); userRating.Add(4, 5); test.AddUserRatings(userRating); userRating = new Dictionary(); userRating.Add(1, 5); userRating.Add(3, 4); IDictionary Predictions = test.Predict(userRating); foreach (var rating in Predictions) Console.WriteLine(Item + rating.Key + Rating: + rating.Value); 学术型描述其基本的想法来自于简单的一元线性模型 w=f(v)=v+b。已知一组训练点 (vi,wi)ni=1，利用此线性模型最小化预测误差的平方和，我们可以获得利用上式获得了b的取值后，对于新的数据点vnew，我们可以利用 wnew=b+vnew 获得它的预测值。直观上我们可以把上面求偏移 b 的公式理解为 wi 和 vi 差值的平均值利用上面的直观，我们定义item i 相对于 item j 的平均偏差：其中Sj,i()表示同时对itemi和j给予了评分的用户集合，而card()表示集合包含的元素数量。有了上面的定义后，我们可以使用获得用户u对 itemj的预测值。当把所有这种可能的预测平均起来，可以得到：其中Rj表示所有用户u已经给予评分且满足条件 (ij且Sj,i非空) 的item集合。对于足够稠密的数据集，我们可以使用近

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论

 联系客服

本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。人人文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知人人文库网，我们立即给予删除！

川公网安备: 51019002004831号 | 备案号:蜀ICP备2022000484号-2 | 经营许可证: 川B2-20220663
Copyright © 2020-2025 renrendoc.com 人人文库版权所有违法与不良信息举报电话：400-852-1180

/ 7

  0
 分享

复制分享文档地址

https://www.renrendoc.com/p-72281858.html

复制

下载本文档