MOOC Python网络爬虫程序技术-深圳信息职业技术学院 中国大学慕课答案_第1页
MOOC Python网络爬虫程序技术-深圳信息职业技术学院 中国大学慕课答案_第2页
MOOC Python网络爬虫程序技术-深圳信息职业技术学院 中国大学慕课答案_第3页
MOOC Python网络爬虫程序技术-深圳信息职业技术学院 中国大学慕课答案_第4页
MOOC Python网络爬虫程序技术-深圳信息职业技术学院 中国大学慕课答案_第5页
已阅读5页,还剩15页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

MOOCPython网络爬虫程序技术-深圳信息职业技术学院中国大学慕课答案测验1、问题:importflaskapp=flask.Flask(__name__)@app.route(/)defindex():try:fobj=open(index.htm,rb)data=fobj.read()fobj.close()returndataexceptExceptionaserr:returnstr(err)if__name__==__main__:app.run()index.htm文件h1WelcomePythonFlaskWeb/h1ItisveryeasytomakeawebsitebyPythonFlask.那么访问:5000可以看到index.htm的结果?选项:A、正确B、错误正确答案:【正确】GETPOST1、问题:服务器程序可以接受get与post的提交信息importꢀflaskapp=flask.Flask(__name__)@app.route(/,____________________)defꢀindex():ꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀprovince=flask.request.values.get(province)ꢀifꢀprovinceꢀinꢀflask.request.valuesꢀelseꢀꢀꢀꢀꢀꢀꢀꢀꢀcityꢀ=ꢀflask.request.values.get(city)ꢀifꢀcityꢀinꢀflask.request.valuesꢀelseꢀꢀꢀꢀꢀꢀꢀꢀꢀnoteꢀ=ꢀflask.request.values.get(note)ꢀifꢀnoteꢀinꢀflask.request.valuesꢀelseꢀꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀprovince+,+city+\n+noteꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀstr(err)ifꢀ__name__==__main__:ꢀꢀꢀꢀapp.run()缺失的语句是选项:A、methods=[GET,POST]B、method=[GET,POST]C、methods=[POST]D、method=[POST]正确答案:【methods=[GET,POST]】POST1、问题:编程客户端client.py程序如下:importꢀurllib.parseimportꢀurllib.requesturl=:5000try:ꢀꢀꢀꢀprovince=ꢀurllib.parse.quote(广东)ꢀꢀꢀꢀcity=ꢀurllib.parse.quote(深圳)ꢀꢀꢀꢀdata=province=+province+city=+cityꢀꢀꢀꢀ___________________________ꢀꢀꢀꢀ____________________________ꢀꢀꢀꢀhtmlꢀ=ꢀhtml.read()ꢀꢀꢀꢀhtmlꢀ=ꢀhtml.decode()ꢀꢀꢀꢀprint(html)exceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀprint(err)服务器server.py程序importꢀflaskapp=flask.Flask(__name__)@app.route(/,methods=[POST])defꢀindex():ꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀprovince=flask.request.form.get(province)ꢀifꢀprovinceꢀinꢀflask.request.formꢀelseꢀꢀꢀꢀꢀꢀꢀꢀꢀcityꢀ=ꢀflask.request.form.get(city)ꢀifꢀcityꢀinꢀflask.request.formꢀelseꢀꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀprovince+,+cityꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀstr(err)ifꢀ__name__==__main__:ꢀꢀꢀꢀapp.run()缺失的语句是选项:A、data=data.decode();html=urllib.request.urlopen(:5000,data=data)B、data=data.encode();html=urllib.request.urlopen(:5000,data=data)C、data=data.encode();html=urllib.request.urlopen(:5000?data=+data)D、data=data.decode();html=urllib.request.urlopen(:5000?data=+data)正确答案:【data=data.encode();html=urllib.request.urlopen(:5000,data=data)】下载文件1、问题:服务器程序可以下载文件图像.jpgimportꢀflaskimportꢀosapp=flask.Flask(__name__)@app.route(/)defꢀindex():ꢀꢀꢀꢀifꢀfileNameꢀnotꢀinꢀflask.request.values:ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀ图像.jpgꢀꢀꢀꢀelse:ꢀꢀꢀꢀꢀꢀꢀꢀdataꢀ=ꢀbꢀꢀꢀꢀꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀ_____________________________________________ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀifꢀfileNameꢀ!=ꢀꢀandꢀos.path.exists(fileName):ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfobjꢀ=ꢀopen(fileName,ꢀrb)ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀ_________________________ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfobj.close()ꢀꢀꢀꢀꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀdataꢀ=ꢀstr(err).encode()ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀdataifꢀ__name__==__main__:ꢀꢀꢀꢀapp.run()缺失的语句是选项:A、fileName=flask.request.values.get(fileName);data=fobj.read()B、fileName=flask.request.args.get(fileName);data=fobj.read()C、fileName=flask.request.form.get(fileName);data=fobj.read()D、都不对正确答案:【fileName=flask.request.values.get(fileName);data=fobj.read()】上传文件1、问题:服务器程序接受客户端上传的文件名称fileName,然后获取文件数据保存importꢀflaskapp=flask.Flask(__name__)@app.route(/upload,methods=[POST])defꢀuploadFile():ꢀꢀꢀꢀmsg=ꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀifꢀfileNameꢀinꢀflask.request.values:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfileNameꢀ=ꢀflask.request.values.get(fileName)ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀ__________________________________ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfobj=open(uploadꢀ+fileName,wb)ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfobj.write(data)ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀfobj.close()ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀmsg=OKꢀꢀꢀꢀꢀꢀꢀꢀelse:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀmsg=没有按要求上传文件ꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀprint(err)ꢀꢀꢀꢀꢀꢀꢀꢀmsg=str(err)ꢀꢀꢀꢀreturnꢀmsgifꢀ__name__==__main__:ꢀꢀꢀꢀapp.run()缺失的语句是选项:A、data=flask.request.read()B、data=flask.request.get_data()C、data=flask.request.values.read()D、data=flask.request.values.get_data()正确答案:【data=flask.request.get_data()】数据库1、问题:classꢀStudentDB:ꢀꢀꢀꢀdefꢀopenDB(self):ꢀꢀꢀꢀꢀꢀꢀꢀself.con=sqlite3.connect(students.db)ꢀꢀꢀꢀꢀꢀꢀꢀself.cursor=self.con.cursor()ꢀꢀꢀꢀdefꢀcloseDB(self):ꢀꢀꢀꢀꢀꢀꢀꢀmit()ꢀꢀꢀꢀꢀꢀꢀꢀself.con.close()ꢀꢀꢀꢀdefꢀinitTable(self):ꢀꢀꢀꢀꢀꢀꢀꢀres={}ꢀꢀꢀꢀꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀself.cursor.execute(createꢀtableꢀstudentsꢀ(Noꢀvarchar(16)ꢀprimaryꢀkey,Nameꢀvarchar(16),ꢀSexꢀvarchar(8),ꢀAgeꢀint))ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀres[msg]=OKꢀꢀꢀꢀꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀres[msg]=str(err)ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀresꢀꢀꢀꢀdefꢀinsertRow(self,No,Name,Sex,Age):ꢀꢀꢀꢀꢀꢀꢀꢀres={}ꢀꢀꢀꢀꢀꢀꢀꢀtry:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀ___________________________________________ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀres[msg]=OKꢀꢀꢀꢀꢀꢀꢀꢀexceptꢀExceptionꢀasꢀerr:ꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀꢀres[msg]=str(err)ꢀꢀꢀꢀꢀꢀꢀꢀreturnꢀres程序插入一条学生记录,缺失的语句是选项:A、self.cursor.execute(insertintostudents(No,Name,Sex,Age)values(%s,%s,%s,%s),(No,Name,Sex,Age))B、self.cursor.execute(insertintostudents(No,Name,Sex,Age)values(%s,%s,%s,%d),(No,Name,Sex,Age))C、self.cursor.execute(insertintostudents(No,Name,Sex,Age)values(@No,@Name,@Sex,@Age),(No,Name,Sex,Age))D、self.cursor.execute(insertintostudents(No,Name,Sex,Age)values(?,?,?,?),(No,Name,Sex,Age))正确答案:【self.cursor.execute(insertintostudents(No,Name,Sex,Age)values(?,?,?,?),(No,Name,Sex,Age))】测验1、问题:importꢀres=testingꢀsearchreg=r[A-Za-z]+\bm=re.search(reg,s)whileꢀm!=None:ꢀꢀꢀꢀstart=m.start()ꢀꢀꢀꢀend=m.end()ꢀꢀꢀꢀprint(s[start:end],end=ꢀ)ꢀꢀꢀꢀs=s[end:]ꢀꢀꢀꢀm=re.search(reg,s)结果:选项:A、testingB、testingsearchC、searchD、searchtesting正确答案:【testingsearch】测验11、问题:importflaskapp=flask.Flask(web)@app.route(/,___________)defindex():#......returnhelloapp.run()程序要求能接收POST数据,缺失的语句是选项:A、methods=[GET]B、methods=[POST]C、method=[GET]D、method=[POST]正确答案:【methods=[POST]】2、问题:importres="abbcabab"——————————————print(re.search(reg,s))查找到"abab",缺失的语句是选项:A、reg=r(ab)+B、reg=r(ab)+$C、reg=rab+$D、reg=rab+正确答案:【reg=r(ab)+$】3、问题:importurllib.requestresp=urllib.request.urlopen(:5000)______________print(html)获取网站的HTML文本数据,缺少的语句是:选项:A、html=resp.read.decode()B、html=resp.read().decode()C、html=resp.read.encode()D、html=resp.read().encode()正确答案:【html=resp.read().decode()】4、问题:importꢀres=searchingꢀsearch_______________print(re.search(reg,s))查找s中的第一个search字符串,缺失的语句是选项:A、reg=r[a-zA-Z]+B、reg=r[a-zA-Z]+$C、reg=r^[a-zA-Z]+$D、reg=r$[a-zA-Z]+^正确答案:【reg=r[a-zA-Z]+】5、问题:importꢀres=searchingꢀsearch_______________print(re.search(reg,s))查找s中的最后一个search单词,缺失的语句是选项:A、reg=r[A-Za-z]+$B、reg=r^[A-Za-z]+$C、reg=r^[A-Za-z]+D、reg=r[A-Za-z]+正确答案:【reg=r[A-Za-z]+$】6、问题:importrereg=rx[^ab0-9]ym=re.search(reg,xayx2yxcy)print(m)结果匹配xcy:_sre.SRE_Matchobject;span=(6,9),match='xcy'选项:A、正确B、错误正确答案:【正确】7、问题:importrereg=rx[0-9]ym=re.search(reg,xyx2y)print(m)结果匹配x2y:_sre.SRE_Matchobject;span=(0,2),match='xy'选项:A、正确B、错误正确答案:【错误】8、问题:importrereg=rcar\bm=re.search(reg,Thecarisblack)print(m)结果匹配car,因为car后面是以个空格:_sre.SRE_Matchobject;span=(4,7),match='car'选项:A、正确B、错误正确答案:【正确】9、问题:importrereg=ra\nb?m=re.search(reg,ca\nbcabc)print(m)结果匹配a\n\b:_sre.SRE_Matchobject;span=(1,4),match='ab'选项:A、正确B、错误正确答案:【错误】10、问题:importres=xaabababym=re.search(rab|ba,s)print(m)结果匹配ab或者ba都可以:_sre.SRE_Matchobject;span=(2,4),match='ba'选项:A、正确B、错误正确答案:【错误】BeautifulSoup1、问题:frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheDormouse'sstory/b/ppclass=storyOnceuponatimetherewerethreelittlesisters;andtheirnameswereahref=/elsieclass=sisterid=link1Elsie/a,ahref=/lacieclass=sisterid=link2Lacie/aandahref=/tillieclass=sisterid=link3Tillie/a;andtheylivedatthebottomofawell./ppclass=story.../p/body/html'''soup=BeautifulSoup(doc,lxml)_______________________________print(tag)程序结果找到class=story的p元素,缺失的语句是选项:A、tag=soup.find(p,attrs={class:story})B、tag=soup.find(p,attr={class:story})C、tag=soup.find(p)D、tag=soup.find(p,class=story)正确答案:【tag=soup.find(p,attrs={class:story})】BeautifulSoup1、问题:frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheDormouse'sstory/b/ppclass=storyOnceuponatimetherewerethreelittlesisters;andtheirnameswereahref=/elsieclass=sisterid=link1Elsie/a,ahref=/lacieclass=sisterid=link2Lacie/aandahref=/tillieclass=sisterid=link3Tillie/a;andtheylivedatthebottomofawell./ppclass=story.../p/body/html'''soup=BeautifulSoup(doc,lxml)___________________________________fortagintags:print(tag)查找文档中class=sister的元素,缺失语句是:选项:A、tags=soup.find(name=None,attrs={class:sister})B、tags=soup.find(attrs={class:sister})C、tags=soup.find_all(attrs={class:sister})D、tags=soup.find_all(name=None,attrs={class:sister})正确答案:【tags=soup.find_all(name=None,attrs={class:sister})】查找1、问题:frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheDormouse'sstory/b/ppclass=storyOnceuponatimetherewerethreelittlesisters;andtheirnameswereahref=/elsieclass=sisterid=link1Elsie/a,ahref=/lacieclass=sisterid=link2Lacie/aandahref=/tillieclass=sisterid=link3Tillie/a;andtheylivedatthebottomofawell./ppclass=story.../p/body/html'''soup=BeautifulSoup(doc,lxml)______________________________________fortagintags:print(tag[href])结果是:/elsie/lacie/tillie缺失的语句是:选项:A、tags=soup.select(pa)B、tags=soup.select(p[]a)C、tags=soup.select(p[class]a)D、tags=soup.select(p[class='story']a)正确答案:【tags=soup.select(p[class='story']a)】测验21、问题:查找文档中所有p超级链接包含的文本值frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheDormouse'sstory/b/ppclass=storyOnceuponatimetherewerethreelittlesisters;andtheirnameswereahref=/elsieclass=sisterid=link1Elsie/a,ahref=/lacieclass=sisterid=link2Lacie/aandahref=/tillieclass=sisterid=link3Tillie/a;andtheylivedatthebottomofawell./ppclass=story.../p/body/html'''soup=BeautifulSoup(doc,lxml)__________________________________fortagintags:________________________缺失的语句是:选项:A、tags=soup.find(p);print(tag.text)B、tags=soup.find(p);print(tag[text])C、tags=soup.find_all(p);print(tag.text)D、tags=soup.find_all(p);print(tag[text])正确答案:【tags=soup.find_all(p);print(tag.text)】2、问题:找出文档中pclass=titlebTheDormouse'sstory/b/p的b元素节点的所有父节点的名称。frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheDormouse'sstory/b/ppclass=storyOnceuponatimetherewerethreelittlesisters;andtheirnameswereahref=/elsieclass=sisterid=link1Elsie/a,ahref=/lacieclass=sisterid=link2Lacie/aandahref=/tillieclass=sisterid=link3Tillie/a;andtheylivedatthebottomofawell./ppclass=story.../p/body/html'''soup=BeautifulSoup(doc,lxml)print()________________________________whiletag:print()____________________________缺失的语句是:选项:A、tag=soup.find(b);tag=tag.parentB、tag=soup.find(b);tag=tag[parent]C、tag=soup.find_all(b);tag=tag.parentD、tag=soup.find_all(b);tag=tag[parent]正确答案:【tag=soup.find(b);tag=tag.parent】3、问题:获取p元素的所有直接子元素节点frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheiDormouse's/istory/bOnceuponatime.../p/body/html'''soup=BeautifulSoup(doc,lxml)________________forxin__________________:print(x)缺失的语句是:选项:A、tag=soup.find(p);tag.childrenB、tag=soup.find(p);tag.childC、tag=soup.find_all(p);tag.childrenD、tag=soup.find_all(p);tag.child正确答案:【tag=soup.find(p);tag.children】4、问题:获取p元素的所有子孙元素节点frombs4importBeautifulSoupdoc='''htmlheadtitleTheDormouse'sstory/title/headbodypclass=titlebTheiDormouse's/istory/bOnceuponatime.../p/body/html'''soup=BeautifulSoup(doc,lxml)______________________________forxin________________________:print(x)缺失的语句是:选项:A、tag=soup.find(p);tag.childrenB、tag=soup.find(p);tag.descendantsC、tag=soup.find_all(p);tag.childrenD、tag=soup.find_all(p);tag.descendants正确答案:【tag=soup.find(p);tag.descendants】5、问题:soup.select(a)查找文档中所有a元素节点;选项:A、正确B、错误正确答案:【正确】6、问题:soup.select(pa)查找文档中所有p节点下的所有a元素节点;选项:A、正确B、错误正确答案:【正确】7、问题:soup.select(p[class='story']a)查找文档中所有属性class=story的p节点下的所有a元素节点;选项:A、正确B、错误正确答案:【正确】8、问题:soup.select(p[class]a)查找文档中所有具有class属性的p节点下的所有a元素节点;选项:A、正确B、错误正确答案:【正确】9、问题:soup.select(a[id='link1'])查找属性id=link1的a节点;选项:A、正确B、错误正确答案:【正确】10、问题:soup.select(bodyheadtitle)查找body下面head下面的title节点;选项:A、正确B、错误正确答案:【正确】递归1、问题:(1)books.htmh3计算机/h3ulliahref=database.htm数据库/a/liliahref=program.htm程序设计/a/liliahref=network.htm计算机网络/a/li/ul(2)database.htmh3数据库/h3ulliahref=mysql.htmMySQL数据库/a/li/ul(3)program.htmh3程序设计/h3ulliahref=python.htmPython程序设计/a/liliahref=java.htmJava程序设计/a/li/ul(4)network.htmh3计算机网络/h3(5)mysql.htmh3MySQL数据库/h3(6)python.htmh3Python程序设计/h3(7)java.htmh3Java程序设计/h3frombs4importBeautifulSoupimporturllib.requestdefspider(url):try:data=urllib.request.urlopen(url)data=data.read()data=data.decode()soup=BeautifulSoup(data,lxml)print(soup.find(h3).text)____________________________________forlinkinlinks:href=link[href]___________________________________spider(url)exceptExceptionaserr:print(err)start_url=:5000spider(start_url)print(TheEnd)递归调用选项:A、links=soup.select(a);url=start_url+hrefB、links=soup.select(li);url=start_url+/+hrefC、links=soup.select(a);url=start_url+/+hrefD、links=soup.select(li);url=start_url+href正确答案:【links=soup.select(a);url=start_url+/+href】测试1、问题:frombs4importBeautifulSoupimporturllib.requestclassStack:def__init__(self):self.st=[]defpop(self):returnself.st.pop()defpush(self,obj):self.st.append(obj)defempty(self):returnlen(self.st)==0defspider(url):stack=Stack()stack.push(url)whilenotstack.empty():url=stack.pop()try:data=urllib.request.urlopen(url)data=data.read()data=data.decode()soup=BeautifulSoup(data,lxml)print(soup.find(h3).text)links=soup.select(a)foriin_______________________________:href=links[i][href]url=start_url+/+hrefstack.push(url)exceptExceptionaserr:print(err)start_url=:5000spider(start_url)print(TheEnd)选项:A、range(len(links)-1,-1,-1)B、range(len(links),-1,-1)C、range(len(links)-1,0,-1)D、range(len(links),0,-1)正确答案:【range(len(links)-1,-1,-1)】测试1、问题:在主线程中启动一个子线程执行reading函数。importthreadingimporttimeimportrandomdefreading():foriinrange(10):print(reading,i)time.sleep(random.randint(1,2))_______________________________r.setDaemon(False)r.start()print(TheEnd)选项:A、r=threading.Thread(reading)B、r=threading.Thread(target=reading())C、r=threading.Thread(target=reading)D、r=Thread(target=reading)正确答案:【r=threading.Thread(target=reading)】测试1、问题:不重复访问网站,使用队列的程序frombs4importBeautifulSoupimporturllib.requestclassQueue:def__init__(self):self.st=[]deffetch(self):returnself.st.pop(0)defenter(self,obj):self.st.append(obj)defempty(self):returnlen(self.st)==0defspider(url):globalurlsqueue=Queue()queue.enter(url)while________________________:url=queue.fetch()ifurlnotinurls:try:urls.append(url)data=urllib.request.urlopen(url)data=data.read()data=data.decode()soup=BeautifulSoup(data,lxml)print(soup.find(h3).text)links=soup.select(a)forlinkinlinks:________________url=start_url+/+hrefqueue.enter(url)exceptExceptionaserr:print(err)start_url=:5000urls=[]spider(start_url)print(TheEnd)选项:A、queue.empty();href=link[href]B、notqueue.empty();href=link[href]C、queue.empty();href=link.hrefD、notqueue.empty();href=link.href正确答案:【notqueue.empty();href=link[href]】测验31、问题:defspider(url):ꢀꢀ#获取新的地址newUrlifnewUrl:spider(newUrl)下面说法正确的是:选项:A、不是递归调用B、一定会出现死循环C、找不到newUrl时会结束递归调用D、找不到newUrl时也不会结束递归调用正确答案:【找不到newUrl时会结束递归调用】2、问题:深度优先爬取说法正确的是选项:A、结果与递归调用爬取一样B、结果与递归调用爬取不一样C、效率比函数递归调用爬取低D、效率比函数递归调用爬取高正确答案:【结果与递归调用爬取一样】3、问题:广度优先爬取数据,说法正确的是:选项:A、爬取数据的顺序与深度优先的不同B、爬取数据的顺序与深度优先的相同C、爬取数据的顺序与函数递归方法相同D、都不对正确答案:【爬取数据的顺序与深度优先的不同】4、问题:有一个dowbload(url)函数下载url图像:importthreadingdefdownload(url):pass用多线程调用它,方法是:选项:A、T=threading.Thread(target=download,args=[url])T.start()B、T=threading.Thread(target=download,args=url)T.start()C、T=threading.Thread(target=download,args=(url))T.start()D、都不对正确答案:【T=threading.Thread(target=download,args=[url])T.start()】5、问题:爬取网站的很多图片时,说法正确是:选项:A、使用单线程效率高,程序简单B、使用单线程效率高,程序复杂C、使用多线程效率高,程序简单D、使用多线程效率高,程序复杂正确答案:【使用多线程效率高,程序复杂】6、问题:url=/weather/101280601.shtmlheaders={User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.0x64;en-US;rv:1.9pre)Gecko/2008072421Minefield/3.0.2pre}req=urllib.request.Request(url,headers=headers)data=urllib.request.urlopen(req)data=data.read()其中headers的作用是为了模拟浏览器选项:A、正确B、错误正确答案:【正确】7、问题:soup.select(body[class]a)查找body下面所有具有class属性的节点下面的a节点;选项:A、正确B、错误正确答案:【正确】8、问题:soup.select(body[class])查找body下面所有具有class属性的节点;选项:A、正确B、错误正确答案:【正确】9、问题:soup.select(bodyheadtitle)查找body下面head下面的title节点;选项:A、正确B、错误正确答案:【正确】10、问题:soup.select(a[id='link1'])查找属性id=link1的a节点;选项:A、正确B、错误正确答案:【正确】测试1、问题:importscrapyclassMySpider(scrapy.Spider):name=mySpiderdefstart_requests(self):url=':5000'_________________________________________defparse(self,response):print(response.url)data=response.body.decode()print(data)选项:A、yieldscrapy.Request(url=url,callback=self.parse)B、yieldscrapy.Request(url=url,callback=parse)C、returnscrapy.Request(url=url,callback=self.parse)D、returnscrapy.Request(url=url,callback=parse)正确答案:【yieldscrapy.Request(url=url,callback=self.parse)】测验1、问题:fromscrapy.selectorimportSelectorhtmlText='''htmlbodybookstorebooktitlelang=engHarryPotter/titleprice29.99/price/bookbooktitlelang=engLearningXML/titleprice39.95/price/book/bookstore/body/html'''selector=Selector(text=htmlText)print(type(selector));print(selector)_______________print(type(s))print(s)查找所有的title选项:A、s=selector.xpath(title)B、s=selector.xpath(//title)C、s=selector.xpath(/title)D、s=selector.xpath(///title)正确答案:【s=selector.xpath(//title)】测验1、问题:fromscrapy.selectorimportSelectorhtmlText='''htmlbodybookstoretitlebooks/titlebooktitleNovel/titletitlelang=engHarryPotter/titleprice29.99/price/bookbooktitleTextBook/titletitlelang=engLearningXML/titleprice39.95/price/book/bookstore/body/html'''selector=Selector(text=htmlText)_____________________________________foreins:print(e)程序结果:Selectorxpath='//book/title'data='titleNovel/title'Selectorxpath='//book/title'data='titlelang=engHarryPotter/title'Selectorxpath='//book/title'data='titleTextBook/title'Selectorxpath='//book/title'data='titlelang=engLearningXML/title'选项:A、s=selector.xpath(/book).xpath(./title)B、s=selector.xpath(//book).xpath(./title)C、s=selector.xpath(//book).xpath(/title)D、s=selector.xpath(/book).xpath(/title)正确答案:【s=selector.xpath(//book).xpath(./title)】测验1、问题:fromscrapy.selectorimportSelectorhtmlText='''htmlbodybookstorebookid=b1titlelang=englishHarryPotter/titleprice29.99/price/bookbookid=b2titlelang=chinese学习XML/titleprice39.95/price/book/bookstore/body/html'''selector=Selector(text=htmlText)____________________________________________________print(s.extract_first())s=selector.xpath(//book[@id='b1']/title)print(s.extract_first())程序结果:学习XMLtitlelang=englishHarryPotter/title选项:A、s=selector.xpath(//book/title[@lang='chinese']/text)B、s=selector.xpath(//book/title[lang='chinese']/text)C、s=selector.xpath(//book/title[@lang='chinese']/text())D、s=selector.xpath(//book/title[@lang='chinese']/text)正确答案:【s=selector.xpath(//book/title[@lang='chinese']/text())】测试1、问题:fromscrapy.selectorimportSelectorhtmlText='''htmlbodybookstorebookid=b1titlelang=englishHarryPotter/titleprice29.99/price/bookbookid=b2titlelang=chinese学习XML/titleprice39.95/price/book/bookstore/body/html'''selector=Selector(text=htmlText)____________________________________print(s.extract_first())____________________________________print(s.extract_first())程序结果:titlelang=englishHarryPotter/titletitlelang=chinese学习XML/title选项:A、s=selector.xpath(//book[position=1]/title);s=selector.xpath(//book[position=2]/title)B、s=selector.xpath(//book[position()=2]/title);s=selector.xpath(//book[position()=1]/title)C、s=selector.xpath(//book[position=2]/title);s=selector.xpath(//book[position=1]/title)D、s=selector.xpath(//book[position()=1]/title);s=selector.xpath(//book[position()=2]/title)正确答案:【s=selector.xpath(//book[position()=1]/title);s=selector.xpath(//book[position()=2]/title)】测验41、问题:selector.xpath(//bookstore/book)搜索bookstore下一级的book元素,找到2个;选项:A、正确B、错误正确答案:【正确】2、问题:selector.xpath(//body/book)搜索body下一级的book元素,结果为空;选项:A、正确B、错误正确答案:【正确】3、问题:selector.xpath(//body//book)搜索body下book元素,找到2个;选项:A、正确B、错误正确答案:【正确】4、问题:selector.xpath(/body//book)搜索文档下一级的body下的book元素,找结果为空,因为文档的下一级是html元素,不是body元素;选项:A、正确B、错误正确答案:【正确】5、问题:selector.xpath(/html/body//book)或者selector.xpath(/html//book)搜索book元素,找到2个;选项:A、正确B、错误正确答案:【正确】6、问题:selector.xpah(//book/title)搜索文档中所有book下一级的title元素,找到2个,结果与selector.xpah(//title)、selector.xpath(//bookstore//title)一样;选项:A、正确B、错误正确答案:【正确】7、问题:selector.xpath(//book//price)与selector.xpath(//price)结果一样,都是找到2个price元素;选项:A、正确B、错误正确答案:【正确】8、问题:selector.xpath(/book//price)与selector.xpath(//price)结果一样;选项:A、正确B、错误正确答案:【错误】9、问题:selector.xpath(//book[id='book1']//price)与selector.xpath(//price)结果一样;选项:A、正确B、错误正确答案:【错误】10、问题:selector.xpath(//price/text()).extract()返回price下面的文本字符串选项:A、正确B、错误正确答案:【错误】课程考试1、问题:importflask____________@app.route(/)defindex():returnhelloapp.run()选项:A、app=flask.Flask(web)B、app=flask(web)C、app=Flask(web)D、app=flask.Flask()正确答案:【app=flask.Flask(web)】2、问题:importres=abbcabab——————————————print(re.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论