《Pandas数据处理》课件 项目四 从多个数据框获取信息_第1页
《Pandas数据处理》课件 项目四 从多个数据框获取信息_第2页
《Pandas数据处理》课件 项目四 从多个数据框获取信息_第3页
《Pandas数据处理》课件 项目四 从多个数据框获取信息_第4页
《Pandas数据处理》课件 项目四 从多个数据框获取信息_第5页
已阅读5页,还剩101页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

去掉停用词

问题描述现有某电视剧弹幕信息,请去掉弹幕信息里面的停用词,然后以列表的形式输出弹幕中词频最高的10个词。contentslikeCounttv_name0二刷的朋友有吗20111我希望一切能重来312这段眼神变化的太妙了913良心啊,一小时18414基本都好201............59995这个叶爸有点东西2271259996眼镜掉在案发现场了901259997俺的眼睛掉在厂里了101259998他不戴假发你更不习惯171259999那是什么药呀3312输出结果词语词频孩子2030爬山1913严良1511真的1407一个1305妈妈939演技902一起865普普846感觉782问题分析问题描述问题解答怎样将句子切割成为词语?

怎样把弹幕信息表和停用词表联合起来?怎样统计词频?cut()merge()value_counts()操作提示利用jieba库中的cut()函数对弹幕信息进行分词后转换为数据框,将之与停用词数据框进行合并,筛选出不在停用词表中的词语,统计这些词出现的词频,这样得到了题目要求的结果。程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码pandas提供了大量能使我们快速便捷地处理数据的函数和方法。程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码jieba是python的一个中文分词库,具有高性能、高准确率、可扩展等特点程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码contentslikeCounttv_name0二刷的朋友有吗20111我希望一切能重来312这段眼神变化的太妙了913良心啊,一小时18414基本都好201............59995这个叶爸有点东西2271259996眼镜掉在案发现场了901259997俺的眼睛掉在厂里了101259998他不戴假发你更不习惯171259999那是什么药呀3312data=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码切割后的列表为:['$','0','1','2','3','4','5','6','7','8','9......'非独','靠','顺','顺着','首先','!',',',':',';','?']程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码

stopword0$10213243.....741!742,743:744;745?生成的停用词表stop_word=pd.DataFrame(stop_word,columns=["stopword"])程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码word0二刷1的2朋友3有4吗......339467那339468是339469什么339470药339471呀word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])程序代码word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])contentslikeCounttv_name0二刷的朋友有吗20111我希望一切能重来312这段眼神变化的太妙了913良心啊,一小时18414基本都好201............59995这个叶爸有点东西2271259996眼镜掉在案发现场了901259997俺的眼睛掉在厂里了101259998他不戴假发你更不习惯171259999那是什么药呀3312“二刷的朋友有吗我希望一切能重来这段眼神变化的太妙了良心啊,一小时……好了警官你是下一个好一个不戴眼镜的斯文败类儿子你啥时候学习啊居然还不说实话?我不戴假发更厉害演完这部电影,伊能静开始怕了你看我还有机会吗这个叶爸有点东西眼镜掉在案发现场了俺的眼睛掉在厂里了他不戴假发你更不习惯那是什么药呀”程序代码word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])“二刷的朋友有吗我希望一切能重来这段眼神变化的太妙了良心啊,一小时……好了警官你是下一个好一个不戴眼镜的斯文败类儿子你啥时候学习啊居然还不说实话?我不戴假发更厉害演完这部电影,伊能静开始怕了你看我还有机会吗这个叶爸有点东西眼镜掉在案发现场了俺的眼睛掉在厂里了他不戴假发你更不习惯那是什么药呀”[‘二刷’,‘的’,‘朋友’,‘有’,‘吗’,‘我’,‘希望’,……‘他’,’不戴’,‘假发’,你,‘更不’,‘习惯’,‘那是’,‘什么’,‘药’,‘呀’]程序代码word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])[‘二刷’,‘的’,‘朋友’,‘有’,‘吗’,‘我’,‘希望’,……‘他’,’不戴’,‘假发’,你,‘更不’,‘习惯’,‘那是’,‘什么’,‘药’,‘呀’]word0二刷1的2朋友3有4吗......339467那339468是339469什么339470药339471呀程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")wordstopword0二刷NaN1的的2朋友NaN3有有4吗吗.........339467那那339468是是339469什么什么339470药NaN339471呀呀程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码wordstopword0二刷NaN2朋友NaN6希望NaN9重来NaN10这段NaN.........339451案发现场NaN339455眼睛NaN339458厂里NaN339462戴假发NaN339466习惯NaNword=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代码词语数量孩子2023爬山1911严良1511真的1407一个1305

...

...卡尔1胡成1亲热1碰过1案发现场1word=word.value_counts()程序代码importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\项目四\某电视剧弹幕信息.csv")stop_word=open(r"D:\pydata\项目四\停用词.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))任务小结merge()函数通过列或索引将两个数据框相关的数据行合并成一行,构成一个新的数据框。为了提供更为灵活的操作来满足实际工作的需要。一展身手现有某电视剧弹幕信息,请去掉弹幕信息里面的停用词,然后以列表的形式输出第一集的弹幕中词频最高的10个词。结果为如下列表:['爬山','真实','一起','电影','丰田','不错','感觉','欺负','秦昊','真的']制作团队制作:刘学重庆市九龙坡职业教育中心选取男士最喜欢的电影主讲人:刘学重庆市九龙坡职业教育中心问题描述现有三张表,“users”(用户信息)表,“ratings”(评分)表,“movies”(电影信息)表,三个表的字段如图所示,请统计出男士最喜欢的10部电影的信息。UserID用户idGender性别Age年龄Occupation职业Zip-code邮编MovieID电影idTitle电影名Genres类型UserID用户idMovieID电影idRating评分Timestamp时间戳users

ratingsmovies输出结果MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0问题分析问题描述问题解答最终输出的信息从那几个表中获取?

怎样对表进行合并?怎样得出男性评分最高的电影?3张表都需要merge()先合并表再统计操作提示首先是合并评分表和用户信息表,得出男性评分最高的电影的ID和评分,然后把得到的新表和电影信息表进行合并,最后对评分进行降序排序就得出了男性最喜欢的电影信息。程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代码pandas提供了大量能使我们快速便捷地处理数据的函数和方法。程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代码

MovieIDTitleGenres01ToyStory(1995)Animation|Children's|Comedy12Jumanji(1995)Adventure|Children's|Fantasy23GrumpierOldMen(1995)Comedy|Romance34WaitingtoExhale(1995)Comedy|Drama45FatheroftheBridePartII(1995)Comedy............38783948MeettheParents(2000)Comedy38793949RequiemforaDream(2000)Drama38803950Tigerland(2000)Drama38813951TwoFamilyHouse(2000)Drama38823952Contender,The(2000)Drama|ThrillerMovies表中的数据程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码UserIDMovieIDRatingTimestamp011193597830076011661397830210921914397830196831340849783002754123555978824291...............1000204604010911956716541100020560401094595670488710002066040562595670474610002076040109649567156481000208604010974956715569ratings表中的数据程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码UserIDGenderAgeOccupationZip-code01F1104806712M56167007223M25155511734M4570246045M252055455..................60356036F25153260360366037F4517600660376038F5611470660386039F4500106060396040M25611106

users表中的数据程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code111935978300760F1104806716613978302109F1104806719143978301968F11048067134084978300275F11048067123555978824291F11048067........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=pd.merge(ratings,users,on="UserID",how="inner")评分表和用户表合并后的数据框程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code213575978298709M561670072230684978299000M561670072215374978299620M56167007226473978299351M561670072221944978299297M561670072........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=info[info["Gender"]=="M"]筛选出所有男性用户后的表程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码MovieID

14.13055223.17523832.99415242.48235352.888298...…39483.6418383949468181839514.04347839523.787986男性用户对各个电影的评分平均值info=info.groupby("MovieID")["Rating"].mean()程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码MovieIDTitleGenresRating1ToyStory(1995)Animation|Children's|Comedy4.1305522Jumanji(1995)Adventure|Children's|Fantasy3.1752383GrumpierOldMen(1995)Comedy|Romance2.9941524WaitingtoExhale(1995)Comedy|Drama2.4823535FatheroftheBridePartII(1995)Comedy2.888298............3948MeettheParents(2000)Comedy3.6418383949RequiemforaDream(2000)Drama4.1741073950Tigerland(2000)Drama3.6818183951TwoFamilyHouse(2000)Drama4.0434783952Contender,The(2000)Drama|Thriller3.787986各个电影的男性用户评分均值res=pd.merge(movies,info,on="MovieID")程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0............3460HillbillysinaHauntedHouse(1967)Comedy1.0834PhatBeach(1996)Comedy1.03136JamesDeanStory,The(1957)Documentary1.03904UninvitedGuest,An(2000)Drama1.0684Windows(1980)Drama1.0排序后的数据表res=res.sort_values(by="Rating",ascending=False)程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.00985SmallWonders(1996)Documentary5.003233SmashingTime(1967)Comedy5.003280Baby,The(1973)Horror5.003172Ulysses(Ulisse)(1954)Adventure5.00439DangerousGame(1993)Drama5.00130Angela(1995)Drama5.003656Lured(1947)Crime5.001830FollowtheBitch(1998)Comedy5.00989SchlafesBruder(BrotherofSleep)(1995)Drama5.003517Bells,The(1926)Crime|Drama5.002931TimeoftheGypsies(Domzavesanje)(1989)Drama4.833245IAmCuba(SoyCuba/YaKuba)(1964)Drama4.75598WindowtoParis(1994)Comedy4.6753Lamerica(1994)Drama4.67res=res.round({"Rating":2})程序代码importpandasaspdmovies=pd.read_table(r"D:\pydata\项目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\项目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\项目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代码MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0res.head(10)任务小结merge()函数通过列或索引将两个数据框相关的数据行合并成一行,构成一个新的数据框。为了提供更为灵活的操作来满足实际工作的需要。一展身手请根据“movies”(电影信息)表,“users”(用户信息)表,“ratings”(评分)表,求出女性最不喜欢的10部电影。结果如图所示。MovieIDTitleGenresRating3695ToxicAvengerPartIII:TheLastTemptationof...Comedy|Horror1.075BigBully(1996)Comedy|Drama1.01439MeetWallySparks(1997)Comedy1.02207JamaicaInn(1939)Drama1.02256Parasite(1982)Horror|Sci-Fi1.03899Circus(2000)Comedy1.03027Slaughterhouse2(1988)Horror1.03592TimeMasters(LesMaîtresduTemps)(1982)Animation|Sci-Fi1.03574Carnosaur3:PrimalSpecies(1996)Horror|Sci-Fi1.02039Cheetah(1989)Adventure|Children's1.0制作团队制作:刘学重庆市九龙坡职业教育中心统计各竞赛项目的人数主讲人:刘学重庆市九龙坡职业教育中心问题描述学校技能大赛启动后,老师收到了各个班级的技能大赛报名表,怎样快速地统计出各个项目的参数人数呢?老师收到的报名表文件如图所示。输出结果

比赛项目人数2019C程序设计522019VF数据库482020C程序设计442020VF数据库41三维动画3二维动画制作18二维动画制作(2021级)129图像处理(2021级)168图文混排147幻灯片制作129表格处理113视频剪辑(2021级)129问题分析问题描述问题解答怎样将多个文件的数据读入到一个数据框?

数据要以什么为依据来分组然后计算出各个项目的参赛人数?依次追加“比赛项目”操作提示首先是读取班级表文件夹下面的第一个数据表,然后把其他的表追加到它的后面,最后通过gro

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论