




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、前言随着大数据思想实施的落地,推荐系统也开始倍受关注。不光是用推荐系统,像搜索,社交网络,音乐,餐饮,地图服务等等。,各种互联网应用都开始应在以前,我们没有使用推荐算法的时候,我们是通过设置各种约束条件,匹配数据的自然属性呈现给用户,这种就是基于规则的系统。比如,用户了一个商品,我们会推荐同类别的其他商品,通过类别属性作为推荐的规则。后来问题就出现了,当用户买了多种类别的不同商品的时候,前一条规则就失败了,我们要进一步设计规则,IT 类别优先推荐,价格高的优先推荐.几个回合下来,我们要不停的增加规则,以至于规则有可能的会前后的规则会让推荐结果越来越不好,而且还无法解释是为什么。,增加一条新推荐
2、算法从另一角度入手,解决了基于规则设置的问题。下面将用 Mahout 来构建一个职位推荐算法引擎。目录Mahout 推荐框架概述需求分析:职位推荐引擎指标设计算法模型:推荐算法架构设计:职位推荐引擎系统架构程序开发:基于 Mahout 的推荐算法实现1.2.3.4.5.1. Mahout 推荐系统框架概述Mahout 框架包含了一套完整的推荐系统引擎,标准化的数据结构,多样的算法实现,简单的开发流程。Mahout 推荐的推荐系统引擎是模块化的,分为 5 个主要部分组成:数据模型,相似度算法,近邻算法,推荐算法,算法评分器。更详细的介绍,请参考文章:从源代码剖析 Mahout 推荐引擎2. 需求
3、分析:职位推荐引擎指标设计下面从一个公司案例出发来全面的解释,如何进行职位推荐引擎指标设计。案例介绍:互联网某职业社交,主要包括 个人简历展示页,人脉圈,及,职位发布,职位申请,教育培训等。用户在完成后,需的个人信息,包括教育背景,工作经历,项目经历,技能专长,你是否想找工作!当你选择“是”(求职中),的职位。等等信息。然后,你要告诉库中为你推荐你可能感会从数据通过简短的描述,我们可以粗略地看出,这家职业社交的和主营业务。点有 2 个:··用户:尽可能多的保存有效完整的用户资料服务:帮助用户找到工作,帮助猎头和企业找到员工因此,职位推荐引擎 将成为这个的功能。KPI 指标设
4、计通过推荐带来的职位浏览量: 职位网页的 PV通过推荐带来的职位申请量: 职位网页的有效转化··3. 算法模型:推荐算法2 个测试数据集:··pv.csv: 职位被浏览的信息,包括用户 ID,职位 IDjob.csv: 职位基本信息,包括职位 ID,工资标准1). pv.csv2 列数据:用户 ID,职位 ID(userid,jobid)········浏览:2500 条用户数:1000 个,用户 ID:1-1000职位数:200 个,职位 ID:1-200部分数据:2).
5、job.csv3 列数据:职位 ID,··,工资标准(jobid,create_date,salary)职位数:200 个,职位 ID:1-200部分数据:1,2013-01-24,56002,2011-03-02,54003,2011-03-14,81004,2012-10-05,22005,2011-09-03,141006,2011-03-05,65007,2012-06-06,370008,2013-02-18,55009,2010-07-05,750010,2010-01-23,670011,2011-09-19,520012,2010-01-19,2970013
6、,2013-09-28,600014,2013-10-23,330015,2010-10-09,270016,2010-07-14,51001,112,1362,1873,1653,13,244,84,1995,325,1006,147,597,1478,929,1659,809,17110,4510,3110,110,152为了完成 KPI 的指标,我们把问题用“技术”语言转化一下:我们需要让职位的推荐结果更准确,从而增加用户的点击。····1. 组合使用推荐算法,选出“评估推荐器”验证得分较高的算法2. 人工验证推荐结果3. 职位有时效荐的结果应该是
7、发布半年内的职位4. 工资的标准,应不低于用户浏览职位工资的平均值的 80%我们选择 UserCF,ItemCF,SlopeOne 的 3 种推荐算法,进行 7 种组合的测试。·userCF1: LogLikelihoodSimilarity + NearestNUserNeighborhood +GenericBooleanPrefUserBasedRecommenderuserCF2: CityBlockSimilarity+ NearestNUserNeighborhood +GenericBooleanPrefUserBasedRecommenderuserCF3: User
8、Tanimoto + NearestNUserNeighborhood +GenericBooleanPrefUserBasedRecommenderitemCF1: LogLikelihoodSimilarity + GenericBooleanPrefItemBasedRecommender itemCF2: CityBlockSimilarity+ GenericBooleanPrefItemBasedRecommender itemCF3: ItemTanimoto + GenericBooleanPrefItemBasedRecommenderslopeOne:SlopeOneRec
9、ommender······关于的推荐算法的详细介绍,请参考文章:Mahout 推荐算法 API 详解关于算法的组合的详细介绍,请参考文章:从源代码剖析 Mahout 推荐引擎4. 架构设计:职位推荐引擎系统架构17,2010-05-13,2900018,2010-01-16,2180019,2013-05-23,570020,2011-04-24,5900上图中,左边是 Application 业务系统,右边是 Mahout,下边是 Hadoop 集群。·1. 当数据量不太大时,并且算法复杂,直接选择用 MahoutCSV
10、 或者 Database 数据,在单机内存中进行计算。Mahout 是多线程的应用,会并行使用单机所有系统。·2. 当数据量很大时,选择并行化算法(ItemCF),先业务系统的数据导入到 Hadoop 的 HDFS 中,然后用 MahoutHDFS 实现算法,这时算法的性能与整个 Hadoop 集群有关。·3. 计算后的结果,保存到数据库中,方便5. 程序开发:基于 Mahout 的推荐算法实现开发环境 mahout 版本为 0.8。 ,请参考文章:用 Maven 构建 Mahout 项目新建 Java 类:····Recommend
11、erEvaluator.java, 选出“评估推荐器”验证得分较高的算法RecommenderResult.java, 对指定数量的结果人工比较RecommenderFilterOutdateResult.java,排除过期职位RecommenderFilterSalaryResult.java,排除工资过低的职位1). RecommenderEvaluator.java, 选出“评估推荐器”验证得分较高的算源代码:public class RecommenderEvaluator final static int NEIGHBORHOOD_NUM = 2; final static int
12、RECOMMENDER_NUM = 3; public static void main(String args) throws TasteException, IOException String file = "datafile/job/pv.csv" DataM dataM = RecommendFactory.buildDataMNoPref(file); userLoglikelihood(dataM); userCityBlock(dataM); userTanimoto(dataM); itemLoglikelihood(dataM); itemCityBlo
13、ck(dataM); itemTanimoto(dataM); slopeOne(dataM); public static RecommenderBuilder userLoglikelihood(DataM dataM) throwsTasteException, IOException System.out.println("userLoglikelihood"); UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIH
14、OOD, dataM); UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataM, NEIGHBORHOOD_NUM); RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);RecommendFactory.evalu
15、ate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder userCityBlock(DataM dataM) throws TasteException, IOException System.out
16、.println("userCityBlock"); UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataM); UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataM, NEIGHBORHOOD_NUM); R
17、ecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2
18、); return recommenderBuilder; public static RecommenderBuilder userTanimoto(DataM dataM) throws TasteException, IOException System.out.println("userTanimoto"); UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataM); UserNeighborhood use
19、rNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataM, NEIGHBORHOOD_NUM); RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AV
20、ERAGE_ABSOLUTE_DIFFERENCE,recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder itemLoglikelihood(DataM dataM) throwsTasteException, IOException System.out.println("itemLoglikelihoo
21、d"); ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENC
22、E, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder itemCityBlock(DataM dataM) throws TasteException, IOException System.out.println("itemCityBlock"); ItemSimilarity itemS
23、imilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,recommenderBuilder, null, dataM,
24、 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder itemTanimoto(DataM dataM) throws TasteException, IOException System.out.println("itemTanimoto"); ItemSimilarity itemSimilarity = RecommendFactory.itemSimi
25、larity(RecommendFactory.SIMILARITY.TANIMOTO, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluato
26、r(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder slopeOne(DataM dataM) throws TasteException, IOException System.out.println("slopeOne"); RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();RecommendFactory.evalua
27、te(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder knnLoglikelihood(DataM dataM) throws TasteException, IOException System.o
28、ut.println("knnLoglikelihood"); ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);RecommendF
29、actory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder knnTanimoto(DataM dataM) throws TasteException, IOException
30、System.out.println("knnTanimoto"); ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);RecommendFac
31、tory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder knnCityBlock(DataM dataM) throws TasteException, IOException S
32、ystem.out.println("knnCityBlock"); ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataM); RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);RecommendFa
33、ctory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; public static RecommenderBuilder svd(DataM dataM) throws TasteException System.out.println(&quo
34、t;svd"); RecommenderBuilder recommenderBuilder = RecommendFactory.svdRecommender(new ALSWRFactorizer(dataM, 5, 0.05, 10);RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataM, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, n
35、ull, dataM, 2); return recommenderBuilder; public static RecommenderBuilder treeClusterLoglikelihood(DataM dataM)throws TasteException System.out.println("treeClusterLoglikelihood"); UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD,
36、dataM); ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity(RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_ CLUSTER, userSimilarity); RecommenderBuilder recommenderBuilder = RecommendFactory.treeClusterRecommender(clusterSimilarity, 3);RecommendFactory.evaluate(RecommendFactory.E
37、VALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE,recommenderBuilder, null, dataM, 0.7);运行结果,台输出:可视化“评估推荐器”输出:userLoglikelihoodAVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.2741487771272658Recommender IR Evaluator: Precision:0.6424242424242422,Recall:0.4098360655737705 userCityBlockAVERAGE_ABSOLUTE_DIFFERENCE Ev
38、aluater Score:0.575306732961736Recommender IR Evaluator: Precision:0.919580419580419,Recall:0.4371584699453552 userTanimotoAVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.5546485136181523Recommender IR Evaluator: Precision:0.6625766871165644,Recall:0.41803278688524603 itemLoglikelihoodAVERAGE_ABSOLUTE
39、_DIFFERENCE Evaluater Score:0.5398332608612343Recommender IR Evaluator: Precision:0.26229508196721296,Recall:0.26229508196721296itemCityBlockAVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9251437840891661Recommender IR Evaluator: Precision:0.02185792349726776,Recall:0.02185792349726776itemTanimotoAVE
40、RAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9176432856689655Recommender IR Evaluator: Precision:0.26229508196721296,Recall:0.26229508196721296slopeOneAVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.0 Recommender IR Evaluator: Precision:0.01912568306010929,Recall:0.01912568306010929 RecommendFactory.sta
41、tsEvaluator(recommenderBuilder, null, dataM, 2); return recommenderBuilder; UserCityBlock 算法评估的结果是最好的,基于UserCF 的算法比 ItemCF 都要好,SlopeOne算法几乎没有得分。2). RecommenderResult.java, 对指定数量的结果人工比较为得到差异化结果,我们分别取 UserCityBlock,itemLoglikelihood,对推荐结果人工比较。源代码:public class RecommenderResult final static int NEIGHBO
42、RHOOD_NUM = 2; final static int RECOMMENDER_NUM = 3; public static void main(String args) throws TasteException, IOException String file = "datafile/job/pv.csv" DataM dataM = RecommendFactory.buildDataMNoPref(file); RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataM);台输出:只截取
43、部分结果我们查看 uid=974 的用户推荐信息:搜索 pv.csv:.userCityBlock=>uid:968,(61,0.333333) itemLoglikelihood=>uid:968,(121,1.429362)(153,1.239939)(198,1.207726)userCityBlock=>uid:969, itemLoglikelihood=>uid:969,(75,1.326499)(30,0.873100)(85,0.763344)userCityBlock=>uid:970, itemLoglikelihood=>uid:970
44、,(13,0.748417)(156,0.748417)(122,0.748417)userCityBlock=>uid:971, itemLoglikelihood=>uid:971,(38,2.060951)(104,1.951208)(83,1.941735)userCityBlock=>uid:972, itemLoglikelihood=>uid:972,(131,1.378395)(4,1.349386)(87,0.881816)userCityBlock=>uid:973, itemLoglikelihood=>uid:973,(196,1.4
45、32040)(140,1.398066)(130,1.380335)userCityBlock=>uid:974,(19,0.200000) itemLoglikelihood=>uid:974,(145,1.994049)(121,1.794289)(98,1.738027). RecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataM); LongPrimitiveIterator iter = dataM.getUserIDs(); while (iter.hasNext() long uid =
46、 iter.nextLong(); System.out.print("userCityBlock=>"); result(uid, rb1, dataM); System.out.print("itemLoglikelihood=>"); result(uid, rb2, dataM); public static void result(long uid, RecommenderBuilder recommenderBuilder, DataMdataM) throws TasteException List list = recomme
47、nderBuilder.buildRecommender(dataM).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, false); 搜索 job.csv:上面两种算法,推荐的结果都是 2010 年的职位,这些结果并不是太好,接下来我们要排除过期职位,只保留 2013 年的职位。3).RecommenderFilterOutdateResult.java,排除过期职位源代码:public class RecommenderFilterOutdateResult final static int NE
48、IGHBORHOOD_NUM = 2; final static int RECOMMENDER_NUM = 3; public static void main(String args) throws TasteException, IOException String file = "datafile/job/pv.csv" DataM dataM = RecommendFactory.buildDataMNoPref(file); RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataM); R
49、ecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataM); LongPrimitiveIterator iter = dataM.getUserIDs(); while (iter.hasNext() long uid = iter.nextLong(); System.out.print("userCityBlock=>"); filterOutdate(uid, rb1, dataM); System.out.print("itemLoglikelihood=>&qu
50、ot;); filterOutdate(uid, rb2, dataM); > jobjob$jobid %in% c(145,121,98,19), jobid create_date salary 1919 2013-05-2357009898 2010-01-152900121 121 2010-06-195300145 145 2013-08-026800> pvwhich(pv$userid=974), userid jobid 2426974 1062427974 1732428974822429974 188243097478 public static void f
51、ilterOutdate(long uid, RecommenderBuilder recommenderBuilder, DataMdataM) throws TasteException, IOException Set jobids = getOutdateJobID("datafile/job/job.csv"); IDRescorer rescorer = new JobRescorer(jobids); List list = recommenderBuilder.buildRecommender(dataM).recommend(uid, RECOMMENDE
52、R_NUM, rescorer); RecommendFactory.showItems(uid, list, true); public static Set getOutdateJobID(String file) throws IOException BufferedReader br = new BufferedReader(new FileReader(new File(file); Set jobids = new HashSet(); String s = null; while (s = br.readLine() != null) String cols = s.split(
53、","); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Date date = null; try date = df.parse(cols1); if (date.getTime() < df.parse("2013-01-01").getTime() jobids.add(Long.parseLong(cols0); catch (ParseException e) e.printStackTrace(); br.close(); return jobi
54、ds; class JobRescorer implements IDRescorer final private Set jobids; public JobRescorer(Set jobs) this.jobids = jobs; Override public double rescore(long id, double originalScore) return isFiltered(id) ? Double.NaN : originalScore; 台输出:只截取部分结果我们查看 uid=994 的用户推荐信息:搜索 pv.csv:搜索 job.csv:> jobjob$jobid %in% c(19,145,89),> pvwhich(pv$userid=974), userid jobid 2426974 1062427974 1732428974822429974 1882
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 工程项目的利益冲突案例试题及答案
- 2025标准的企业租赁合同
- 公共关系的传播模型试题及答案
- 英语教学课件Unit 5 What are the shirts made of?Revision 一轮复习课
- 2025货车租赁合同模板
- 沙发创新设计方案
- 市政工程考试时间管理策略分享及试题及答案
- 2025年经济法考生热点试题及答案
- 助力成才的2025年工程项目管理试题及答案
- 有效备考中级经济师试题及答案分享
- 家长开放日家长意见反馈表
- 初中英语2023年中考专题训练任务型阅读-完成表格篇
- 数据中台-项目需求规格说明书
- 田径运动会检查员报告表
- 高级政工师职称面试题
- 老年人能力评估师高级第六章-需求评估
- 业主维权授权委托书范文
- 第四代EGFR-C797S药物管线及专利调研报告
- 骨科基础知识解剖篇
- 梁山伯与祝英台小提琴谱乐谱
- 有机硅化学课件-有机硅化合物的化学键特性
评论
0/150
提交评论