提升你的编码工作流程利用ChatGPT和GitHub Copilot

上传人：媚*** IP属地：境外上传时间：2024-04-17 格式：PPTX 页数：81 大小：40.82MB 积分：12 举报 版权申诉

已阅读5页，还剩76页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Supercharge

your

codingworkflow

with

ChatGPT

andGitHub

Copilot#NICAR23Jonathan

Soma,

js4571@,

@dangerscarf,

/supercharge-coding-ai-toolsJonathan

Soma,

js4571@,

@dangerscarf,

Don't

hold

your

questions

untilthe

end,

just

raise

your

hand.There's

always

something

newWe

would

scrape

awebsite,

but

just

learned

alittle

tiny

bit

Python

NICARHelp

knowing

where

startThere

aremany

toolsthat

you

canuse

toscrape

websites,including:Beautiful

Soup:

Python

libraryfor

pullingdata

outofHTML

and

XML

files.Scrapy:

open-source

andcollaborative

web

crawlingframeworkforPython.ParseHub:

web

scraping

toolthat

allows

youto

extractdata

from

dynamic

websites

(JavaScript,

AJAX,

etc.)Octoparse:

web

scrapingtool

thatcan

beused

toextractdatafromwebsiteswithoutcoding.WebHarvy:

visual

web

scraper

that

can

used

toscrapedatafrom

websites

without

coding.Cheerio:

fast,

flexible,and

leanimplementation

ofcorejQuery

designedspecifically

for

the

server.Selenium:

abrowser

automationtool

thatcan

beused

toscrape

data

fromwebsites

that

use

JavaScript

toloadcontent.Puppeteer:

aNode.js

librarythat

providesa

high-levelAPI

controlheadless

Chrome

Chromiumover

theDevTools

Protocol.These

arejust

few

examplesof

the

manyweb

scrapingtools

available.

The

choice

oftool

will

dependon

yourspecific

needsand

thetype

ofwebsite

youare

tryingtoscrape.What

are

some

tools

could

use

scrape

web

sites?Here

are

somepopular

web

scraping

tools

thatyou

can

usewithPython:Beautiful

Soup:

Python

libraryfor

pullingdata

outofHTML

and

XML

files.

allows

youto

parsethe

HTMLand

XML

documents

and

find

the

data

youneed

usingselectors

CSSand

XPath.Scrapy:

open-source

andcollaborative

web

crawlingframework

for

Python.

providesan

integrated

way

tofollow

links

andextractdata

fromwebsites.Selenium:

browser

automationtool

thatcan

beused

toscrape

data

fromwebsites

that

use

JavaScript

toloadcontent.

Seleniumcan

beused

withPython

usingtheSelenium

Pythonbindings.PyQuery:

Python

library

similar

jQuery

thatallowsyou

tomake

jQueryqueries

XML

documents.

Itis

alightweight

library

thatallows

youto

manipulateHTMLand

XML

documents

anatural

way.Requests-HTML:

python

library

for

parsing

HTML

andXML

with

anHTTP

client.

isbuilt

ontop

ofrequestsand

allowsyou

interact

withweb

pages

ina

simpleandintuitiveway.lxml:

python

libraryfor

processing

XML

and

HTML.It

isbuilt

topof

libxml2

and

libxslt,

andprovides

simpleandefficientwayto

extract

data

from

web

pages.Mechanize:

python

libraryfor

automatinginteractionwithwebsites.

Itallows

youto

fillin

forms,and

simulateabrowser'sbehavior.These

toolsare

widelyused

Pythonweb

scraping,they

allhavetheir

ownadvantages

anddisadvantages.

The

choiceof

toolwill

depend

onyour

specificneeds

and

thetype

ofwebsiteyou

aretrying

toscrape.What

are

some

tools

could

use

scrape

web

sites

using

Python?Beautiful

Soup

often

considered

one

the

easiesttools

use

for

web

scraping

with

Python.

Pythonlibrary

for

pulling

data

out

HTML

and

XML

files,

and

itallows

you

parse

the

HTML

and

XML

documents

andfind

the

data

you

need

using

selectors

CSS

andXPath.

designed

make

easy

navigate,

search,and

modify

the

parse

tree.It

has

simple

and

intuitive

API,

and

its

documentation

iswell-organized

and

easy

follow,

making

greatchoice

for

beginners.

also

has

large

community

ofusers,

there

are

plenty

resources

available

helpyou

with

any

questions

issues

you

may

have.Another

tool

that

easy

use

for

scraping

web

pages

isrequests-html.

built

top

requests

and

allows

you

interact

with

web

pages

simple

and

intuitiveway,

has

similar

methods

BeautifulSoup

for

parsingpages,

but

also

compatible

with

rendered

pages.It's

important

note

that

when

scraping

web

pages,

it'simportant

check

the

website's

terms

use

and

berespectful

the

website's

resources

and

servers.

Somewebsites

may

block

scraper

have

CAPTCHAs

toprevent

scraping.What

the

easiest

tool

could

use

scrape

web

sites

using

Python?Caveat?

Benefit?:

Prompt

engineeringCaveat:

Staying

dateThere

areseveral

popular

CSSframeworks

thatare

widelyused

web

development:Bootstrap:

Bootstrap

isone

the

most

popularCSSframeworks.

isan

open-sourceframework

thatprovidesa

responsive

grid

systemand

pre-designed

UIcomponentssuch

asbuttons,forms,

andnavigation.Foundation:FoundationisanotherpopularCSSframework

thatprovides

aresponsive

gridsystem

andpre-designed

UIcomponents.

alsoincludes

additionalfeatures

such

flexbox,accessibility,

and

anadvancedHTML/CSS/JS

framework.Bulma:

Bulmais

lightweight

CSSframework

thatfocuses

providinga

consistent

and

moderndesign.

Itincludes

responsive

gridsystem,

pre-designed

UIcomponents,

andsupport

forSass.TailwindCSS:

Tailwind

CSS

utility-first

CSSframework

that

provideslow-levelutility

classesforcontrolling

layout,spacing,

andmore.

Itis

designedto

becustomizable

and

composable,

anddoes

not

include

pre-designed

components.Materialize:Materializeisa

CSS

framework

thatisbased

onthe

MaterialDesign

guidelinescreated

byGoogle.

Itprovides

responsivegrid

systemand

pre-designed

UIcomponents

that

aredesigned

tolook

andfeellike

MaterialDesign.These

are

justa

few

examples

ofwidely

used

CSSframeworks,

andthere

aremany

othergood

optionsavailable

well.

The

best

choice

for

youwill

depend

onyour

specific

needsand

preferences.What

are

some

minimalist

CSS

frameworks

use

for

designing

web

page?My

training

data

was

current

2021,

knowledge

current

until

that

point.

amnot

aware

any

updates

changes

that

have

occurred

since

then.When

your

information

current

to?/Kantrowitz/status/1613168223054188545Beautiful

Soup

often

considered

one

the

easiesttools

use

for

web

scraping

with

Python.

Pythonlibrary

for

pulling

data

out

HTML

and

XML

files,

and

itallows

you

parse

the

HTML

and

XML

documents

andfind

the

data

you

need

using

selectors

CSS

andXPath.

designed

make

easy

navigate,

search,and

modify

the

parse

tree.It

has

simple

and

intuitive

API,

and

its

documentation

iswell-organized

and

easy

follow,

making

greatchoice

for

beginners.

also

has

large

community

ofusers,

there

are

plenty

resources

available

helpyou

with

any

questions

issues

you

may

have.Another

tool

that

easy

use

for

scraping

web

pages

isrequests-html.

built

top

requests

and

allows

you

interact

with

web

pages

simple

and

intuitiveway,

has

similar

methods

BeautifulSoup

for

parsingpages,

but

also

compatible

with

rendered

pages.It's

important

note

that

when

scraping

web

pages,

it'simportant

check

the

website's

terms

use

and

berespectful

the

website's

resources

and

servers.

Somewebsites

may

block

scraper

have

CAPTCHAs

toprevent

scraping.What

the

easiest

tool

could

use

scrape

web

sites

using

Python?Help

with

examples

and

samplecodeSource:

https://diataxis.fr/Here

simple

example

how

you

can

use

BeautifulSoup

toscrape

headlines

from

news

website:In

the

above

code

snippet,

first

send

request

the

newswebsite

using

the

requests

library.

Then

parse

the

HTMLcontent

using

BeautifulSoup's

html.parser.After

that,

are

using

the

find_all()

method

find

all

theh2

tags

the

HTML

which

represent

the

headlines

this

case.

Then

are

using

for

loop

iterate

over

the

headlinesand

printing

out

the

text

each

headline

using

the

textattribute.It

important

note

that

the

exact

selectors

and

tags

used

tofind

the

headlines

will

depend

the

specific

website

andstructure

the

HTML.Please

keep

mind

that

some

websites

may

block

scraping

orhave

CAPTCHAs

prevent

scraping.

It's

important

berespectful

website's

resources

and

servers

and

check

thewebsite's

terms

use

before

scraping.Please

write

very

simple

example

scraping

with

BeautifulSoup

where

you

printout

all

the

headlines

news

website七

。蛐bbc.comI

NewsAt

least

57Murdaugh

juror＼Mental

health玉已I

Elements•••c!J

*屯呛臼咕青□．；＞＞y11

tabindex=11-l11

aria-hidden=11t

rue11>:·.

</a></div></li><li

class=11media-list_item

media-list—item--211>0·

</li>T<li

class=11media-list_item

media-list_item--311>T<div

class=11media

media--ove

rlayblock-link"data-bbc-container=11hero11data-bbc-title=11The

Indian-American

CEO

who

wants

president"data-bbc-source=11India11

data-bbc-metadata=11{11CHD11:

"card:

:311

}11><div

class=11media_image11>…</div>T

<div

class=11media—content">T

<h3

class=11media_title11>

$0<a

class=11media—link"

href=11!1tt

www_._bb_c_._c_om

w_o_r:ld-=as_i_a=-ind

ia=-6A_8_07

21211rev=11hero3

Iheadline">-

</a></h3><a

class=11media—tag

tag

tag--india11href=11L..

rev=11hero3

Isou

rce11>news_…</a></div><a

class=11block-link—overlay-link"href="区扭...

jia.media--overlay.block-link

div.media

content

h3.mediatitleStyles

ComputedFilterLayout

Event

Listeners

>>017

1众．．．X．．．:

hov.els+..,.甲团．丿～丿upyterScraping

headlines

from

the

BBC

homepage

Last

Checkpoint:

few

seconds

ago(autosaved)虎俨LogoutFile

Edit

View

InsertCell

KernelWidgets

HelpTrustedI

Python

(ipykernel)

0个「江I►

Run

■

C片笸

＋

茨屯贮In

[l]:I

import

requestsMarkdownV芦1£rom

bs4import

Beau豆fulSoup#Make

HTTP

reques七七o七hewebsi七eurl

='https:///'

response=reques七s.ge七(url)#

Parse七he

HTML con七en七soup=

Beau七ifulSoup(response.text,'html.parser')#

Findheadlines=all

headlinessoup.find_all('h3')#

Prin七OU七七he七ex七of

each

headline|

forheadline

headlines:print(headline.tex七）UK

willprovideChallenger2七anks

Ukraine

SunakIran

executes

British-Iranian

Akbarion

spy

chargesTop

secre七documents

reportedly

found

Biden

cacheCaveat:

Nothing's

perfect!Caveat:

Nothing's

perfect

does

thingsexactly

how

Jonathan

Soma,

the

personwith

the

most

opinions,

wants"normal"

BeautifulSoupCSS

selectors"get

all

the

links"links

[]for

title

doc.find_all("h3",class_='title'):link

title.find('a')links.append(link)links

doc.select("h3.title

a")Iteration

and

the

value

contextYes,you

can

use

CSS

selectors

with

BeautifulSoup

tofindelements

awebpageinstead

ofthefind_all()

method.BeautifulSoup

provides

select()

method

thatallows

youtouse

CSSselectors

tofindelements.Here

isanexampleofhowyoucanuseCSSselectorswithBeautifulSoup

scrapeheadlines

froma

newswebsite:In

the

abovecodesnippet,

weare

usingthe

select()

methodto

findall

the

tags

theHTML

whichrepresent

theheadlinesin

thiscase,using

theCSSselector

h2.

Thenwe

areusingaforloop

iterateover

theheadlines

and

printingout

thetext

ofeachheadline

usingthetext

attribute.It's

worth

notingthat

youcan

use

morecomplex

selectorstoselect

elements,

youcanuse

theselect()

methodwithselectors

.class

toselectelements

byclassand#id

toselect

elementsbyid.It's

important

notethat

selectors

and

cssclasses

used

tofindthe

headlines

willdepend

onthe

specific

websiteand

structureofthe

HTML.Please

keepinmindthatsome

websitesmayblockscrapingorhave

CAPTCHAs

preventscraping.

It's

important

toberespectful

ofawebsite's

resourcesandservers

andcheckthewebsite's

terms

ofuse

beforescraping.Can

that

using

CSS

selectors

instead

.find_all?Yes,

you

can

remove

whitespace

from

before

and

after

the

headline

text

using

the

strip()

methodin

python.Here

example

how

you

can

use

the

strip()

method

remove

whitespace

from

beforeand

after

the

headlines:In

the

above

code

snippet,

are

usingthe

strip()

method

remove

any

whitespace

from

before

and

after

the

headline

text.

Thismethod

removes

any

leading

trailing

whitespace

characters

(spaces,

tabs,

newlines,

etc.)

thestring.You

can

also

use

the

lstrip()

rstrip()

method

you

want

remove

whitespace

only

fromthe

left

right

side

the

string

respectively.It's

important

note

that

some

websites

may

have

non-printable

characters

their

HTML,

suchcases

you

can

use

the

.replace()

method

regular

expression

remove

those

characters.Can

remove

whitespace

from

before/after

the

headline?The

CSS

selector

for

tag

with

the

class

"media

link"

immediately

inside

tag

with

theclass

"media

title"

h3.media

title

a.media

link.This

selector

uses

the

child

combinator

match

only

tags

that

are

immediate

children

tags.The

h3.media

title

selector

targets

tag

with

the

class

"media

title"

anda.media

link

targets

tag

with

the

class

"media

link".You

can

also

use

the

space

combinator

select

tags

that

are

descendants

tags.

For

exampleh3.media

title

a.media

link

will

select

all

the

tags

with

class

"media

link"

that

areinside

tag

with

class

"media

title".It's

important

note

that

the

above

selectors

assume

that

the

website's

HTML

structure

follows

theexact

format

described,

some

cases

the

website's

structure

may

different.What

the

CSS

selector

for

tag

with

the

class

"media

link"

immediatelyinside

tag

with

the

class

"media

title"?You

can

use

the

get()

method

access

the

hrefattribute

tag

that

selected

BeautifulSoup.Here

example

how

you

can

out

the

hrefattribute

tag:In

the

above

code

snippet,

are

using

the

select()method

find

all

the

tags

with

class

"media

link"and

then

using

for

loop

iterate

over

the

links,

andusing

the

get()

method

access

the

href

attribute

ofeach

link.It's

worth

noting

that

the

href

attribute

not

present

inthe

HTML,

the

get()

method

will

return

None

youshould

check

the

returned

value

get()

method

beforeusing

it.How

out

the

href

for

element

selected

BeautifulSoup?Caveat:

Not

everything

understandable...Anti-caveat:

accepts

infinite

follow-upsrequests.get()

used

make

HTTP

GET

request

specified

URL.

The

get()

method

ispart

the

requests

library,

which

popular

library

for

sending

HTTPrequests

Python.An

HTTP

GET

request

the

most

common

type

request

and

used

retrieve

data

from

server.The

request

contains

URL,

which

specifies

the

location

the

resource

retrieved.

The

serverthen

responds

with

the

requested

resource,

exists,

and

status

code

indicating

whether

therequest

was

successful.The

get()

method

takes

one

argument,

which

the

URL

the

resource

retrieved.

returns

aresponse

object,

which

contains

the

server's

response

the

request.

The

response

object

hasvarious

properties

and

methods,

such

text,

content,

status_code,

etc.

that

can

used

toaccess

and

analyze

the

response.You

can

also

use

the

post()

method

make

HTTP

POSTrequest,

put()

make

HTTP

PUTrequest

and

delete()

make

HTTP

DELETE

request

etc.It's

important

note

that

the

requests

library

not

part

the

python

standard

library,

you

needto

install

before

using

your

code

using

pip

install

requests.Why

requests.get?The

requests.get()

function

used

retrieve

information

from

website.

function

from

apython

library

called

requests.

When

you

use

this

function,

you

provide

the

website's

address,

andthe

function

sends

request

that

website

asking

for

information.

The

website

then

sends

back

aresponse

which

includes

the

information

you

asked

for.

The

get()

method

just

one

the

manyHTTP

requests

that

you

can

use

interact

with

website.

There

are

also

post(),

put(),delete(),

etc.It's

important

note

that

requests

not

part

the

python

standard

library

and

you

need

install

itbefore

using

your

code

running

pip

install

requests

your

command

line.Please

keep

mind

that

some

websites

may

block

scraping

have

CAPTCHAs

prevent

scraping.It's

important

respectful

website's

resources

and

servers

and

check

the

website's

terms

ofuse

before

scraping.Explain

that

much

less

technical

languageI

have

dataframe

with

two

columns,

countyand

social

spending.

using

the

followingR

code

plot

horizontal

bar

graph

eachcountry's

social

spendinglibrary(ggplot2)ggplot(df)

+aes(x=`Social

Spending`,

y=Country)

+geom_col()how

make

sort

the

bars

socialspendingYou

can

sort

the

bars

descending

order

social

spending

byadding

the

"desc"

argument

the

"arrange"

function

within

theggplot

call.

Here's

the

modified

codeThe

"coord_flip"

function

flips

the

and

axes

createahorizontal

bar

chart.

The

"labs"

function

sets

the

x-axis

label

to"Social

Spending".

The

"theme_bw"

function

sets

the

plotbackground

white,

and

the

"theme"

function

centers

the

plot

title.The

"scale_x_continuous"

function

formats

the

x-axis

labels

asdollar

amounts.

The

"scale_y_discrete"

function

reverses

the

orderof

the

y-axis

categories.

Finally,

the

"geom_text"

function

addsdollar

values

labels

the

bars,

and

the

"geom_col"

functionYes,

you

can

add

the

reorder

function

inside

the

aescall

sort

the

bars

the

Social

Spendingcolumn.

Here's

the

modified

code

with

this

approach:is

there

way

this

that

doesn't

add

asmuch

codewIn

this

code,

the

reorder

function

sorts

the

Countryvariable

the

SocialSpending

variable,

which

isdone

inside

the

aes

call.

This

creates

bar

chart

withbars

sorted

social

spending,

without

the

need

foradditional

geom_col

arrange

calls.

The

labsfunction

adds

axis

labels,

and

the

theme_bw

functionsets

white

background.Translating

your

skillsHere

example

how

you

can

use

Playwright

scrapeheadlines

and

hrefs

from

news

website,

instead

usingBeautifulSoup:Translate

the

code

below

into

using

playwright

instead

using

BeautifulSoup:Caveat:

Very

much

not

perfectFixing

bugsI

apologize

for

the

confusion.You

should

use

import

playwright

instead

fromplaywright

import

async_playwright

the

firstline

the

code.Here

the

corrected

code:I'm

getting

the

error

"ImportError:

cannot

import

name

'async_playwright'

from'playwright'"The

error

"RuntimeError:

This

event

loop

already

running"

israised

because

the

event

loop

already

running

and

you

can'tcreate

new

one.You

could

fix

this

error

removing

the

lineasyncio.get_event_loop().run_until_complete(main())

from

the

code,

because

trying

run

the

event

loopagain,

and

the

event

loop

already

running.Here

the

corrected

code:I'm

getting

the

error

"RuntimeError:

This

event

loop

already

running"The

error

"UnboundLocalError:

local

variable

'playwright'referen

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

提升你的编码工作流程利用ChatGPT和GitHub Copilot

文档简介

温馨提示

最新文档

评论

提升你的编码工作流程利用ChatGPT和GitHub Copilot

文档简介

温馨提示

最新文档

评论

相关文档