提升你的编码工作流程利用ChatGPT和GitHub Copilot_第1页
提升你的编码工作流程利用ChatGPT和GitHub Copilot_第2页
提升你的编码工作流程利用ChatGPT和GitHub Copilot_第3页
提升你的编码工作流程利用ChatGPT和GitHub Copilot_第4页
提升你的编码工作流程利用ChatGPT和GitHub Copilot_第5页
已阅读5页,还剩76页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Supercharge

your

codingworkflow

with

ChatGPT

andGitHub

Copilot#NICAR23Jonathan

Soma,

js4571@,

@dangerscarf,

/supercharge-coding-ai-toolsJonathan

Soma,

js4571@,

@dangerscarf,

Don't

hold

your

questions

untilthe

end,

just

raise

your

hand.There's

always

something

newWe

would

like

to

scrape

awebsite,

but

we

just

learned

alittle

tiny

bit

of

Python

at

NICARHelp

knowing

where

to

startThere

aremany

toolsthat

you

canuse

toscrape

websites,including:Beautiful

Soup:

A

Python

libraryfor

pullingdata

outofHTML

and

XML

files.Scrapy:

An

open-source

andcollaborative

web

crawlingframeworkforPython.ParseHub:

A

web

scraping

toolthat

allows

youto

extractdata

from

dynamic

websites

(JavaScript,

AJAX,

etc.)Octoparse:

A

web

scrapingtool

thatcan

beused

toextractdatafromwebsiteswithoutcoding.WebHarvy:

A

visual

web

scraper

that

can

be

used

toscrapedatafrom

websites

without

coding.Cheerio:

A

fast,

flexible,and

leanimplementation

ofcorejQuery

designedspecifically

for

the

server.Selenium:

abrowser

automationtool

thatcan

beused

toscrape

data

fromwebsites

that

use

JavaScript

toloadcontent.Puppeteer:

aNode.js

librarythat

providesa

high-levelAPI

to

controlheadless

Chrome

or

Chromiumover

theDevTools

Protocol.These

arejust

a

few

examplesof

the

manyweb

scrapingtools

available.

The

choice

oftool

will

dependon

yourspecific

needsand

thetype

ofwebsite

youare

tryingtoscrape.What

are

some

tools

I

could

use

to

scrape

web

sites?Here

are

somepopular

web

scraping

tools

thatyou

can

usewithPython:Beautiful

Soup:

A

Python

libraryfor

pullingdata

outofHTML

and

XML

files.

It

allows

youto

parsethe

HTMLand

XML

documents

and

find

the

data

youneed

usingselectors

like

CSSand

XPath.Scrapy:

An

open-source

andcollaborative

web

crawlingframework

for

Python.

It

providesan

integrated

way

tofollow

links

andextractdata

fromwebsites.Selenium:

A

browser

automationtool

thatcan

beused

toscrape

data

fromwebsites

that

use

JavaScript

toloadcontent.

Seleniumcan

beused

withPython

usingtheSelenium

Pythonbindings.PyQuery:

A

Python

library

similar

to

jQuery

thatallowsyou

tomake

jQueryqueries

on

XML

documents.

Itis

alightweight

library

thatallows

youto

manipulateHTMLand

XML

documents

in

anatural

way.Requests-HTML:

A

python

library

for

parsing

HTML

andXML

with

anHTTP

client.

It

isbuilt

ontop

ofrequestsand

allowsyou

to

interact

withweb

pages

ina

simpleandintuitiveway.lxml:

A

python

libraryfor

processing

XML

and

HTML.It

isbuilt

on

topof

libxml2

and

libxslt,

andprovides

a

simpleandefficientwayto

extract

data

from

web

pages.Mechanize:

A

python

libraryfor

automatinginteractionwithwebsites.

Itallows

youto

fillin

forms,and

simulateabrowser'sbehavior.These

toolsare

widelyused

in

Pythonweb

scraping,they

allhavetheir

ownadvantages

anddisadvantages.

The

choiceof

toolwill

depend

onyour

specificneeds

and

thetype

ofwebsiteyou

aretrying

toscrape.What

are

some

tools

I

could

use

to

scrape

web

sites

using

Python?Beautiful

Soup

is

often

considered

one

of

the

easiesttools

to

use

for

web

scraping

with

Python.

It

is

a

Pythonlibrary

for

pulling

data

out

of

HTML

and

XML

files,

and

itallows

you

to

parse

the

HTML

and

XML

documents

andfind

the

data

you

need

using

selectors

like

CSS

andXPath.

It

is

designed

to

make

it

easy

to

navigate,

search,and

modify

the

parse

tree.It

has

a

simple

and

intuitive

API,

and

its

documentation

iswell-organized

and

easy

to

follow,

making

it

a

greatchoice

for

beginners.

It

also

has

a

large

community

ofusers,

so

there

are

plenty

of

resources

available

to

helpyou

with

any

questions

or

issues

you

may

have.Another

tool

that

is

easy

to

use

for

scraping

web

pages

isrequests-html.

It

is

built

on

top

of

requests

and

allows

you

to

interact

with

web

pages

in

a

simple

and

intuitiveway,

it

has

similar

methods

as

BeautifulSoup

for

parsingpages,

but

is

also

compatible

with

JS

rendered

pages.It's

important

to

note

that

when

scraping

web

pages,

it'simportant

to

check

the

website's

terms

of

use

and

to

berespectful

of

the

website's

resources

and

servers.

Somewebsites

may

block

scraper

or

have

CAPTCHAs

toprevent

scraping.What

is

the

easiest

tool

I

could

use

to

scrape

web

sites

using

Python?Caveat?

Benefit?:

Prompt

engineeringCaveat:

Staying

up

to

dateThere

areseveral

popular

CSSframeworks

thatare

widelyused

in

web

development:Bootstrap:

Bootstrap

isone

of

the

most

popularCSSframeworks.

It

isan

open-sourceframework

thatprovidesa

responsive

grid

systemand

pre-designed

UIcomponentssuch

asbuttons,forms,

andnavigation.Foundation:FoundationisanotherpopularCSSframework

thatprovides

aresponsive

gridsystem

andpre-designed

UIcomponents.

It

alsoincludes

additionalfeatures

such

as

flexbox,accessibility,

and

anadvancedHTML/CSS/JS

framework.Bulma:

Bulmais

a

lightweight

CSSframework

thatfocuses

on

providinga

consistent

and

moderndesign.

Itincludes

a

responsive

gridsystem,

pre-designed

UIcomponents,

andsupport

forSass.TailwindCSS:

Tailwind

CSS

is

a

utility-first

CSSframework

that

provideslow-levelutility

classesforcontrolling

layout,spacing,

andmore.

Itis

designedto

becustomizable

and

composable,

anddoes

not

include

pre-designed

UI

components.Materialize:Materializeisa

CSS

framework

thatisbased

onthe

MaterialDesign

guidelinescreated

byGoogle.

Itprovides

a

responsivegrid

systemand

pre-designed

UIcomponents

that

aredesigned

tolook

andfeellike

MaterialDesign.These

are

justa

few

examples

ofwidely

used

CSSframeworks,

andthere

aremany

othergood

optionsavailable

as

well.

The

best

choice

for

youwill

depend

onyour

specific

needsand

preferences.What

are

some

minimalist

CSS

frameworks

to

use

for

designing

a

web

page?My

training

data

was

current

as

of

2021,

so

my

knowledge

is

current

up

until

that

point.

I

amnot

aware

of

any

updates

or

changes

that

have

occurred

since

then.When

is

your

information

current

to?/Kantrowitz/status/1613168223054188545Beautiful

Soup

is

often

considered

one

of

the

easiesttools

to

use

for

web

scraping

with

Python.

It

is

a

Pythonlibrary

for

pulling

data

out

of

HTML

and

XML

files,

and

itallows

you

to

parse

the

HTML

and

XML

documents

andfind

the

data

you

need

using

selectors

like

CSS

andXPath.

It

is

designed

to

make

it

easy

to

navigate,

search,and

modify

the

parse

tree.It

has

a

simple

and

intuitive

API,

and

its

documentation

iswell-organized

and

easy

to

follow,

making

it

a

greatchoice

for

beginners.

It

also

has

a

large

community

ofusers,

so

there

are

plenty

of

resources

available

to

helpyou

with

any

questions

or

issues

you

may

have.Another

tool

that

is

easy

to

use

for

scraping

web

pages

isrequests-html.

It

is

built

on

top

of

requests

and

allows

you

to

interact

with

web

pages

in

a

simple

and

intuitiveway,

it

has

similar

methods

as

BeautifulSoup

for

parsingpages,

but

is

also

compatible

with

JS

rendered

pages.It's

important

to

note

that

when

scraping

web

pages,

it'simportant

to

check

the

website's

terms

of

use

and

to

berespectful

of

the

website's

resources

and

servers.

Somewebsites

may

block

scraper

or

have

CAPTCHAs

toprevent

scraping.What

is

the

easiest

tool

I

could

use

to

scrape

web

sites

using

Python?Help

with

examples

and

samplecodeSource:

https://diataxis.fr/Here

is

a

simple

example

of

how

you

can

use

BeautifulSoup

toscrape

headlines

from

a

news

website:In

the

above

code

snippet,

we

first

send

a

request

to

the

newswebsite

using

the

requests

library.

Then

we

parse

the

HTMLcontent

using

BeautifulSoup's

html.parser.After

that,

we

are

using

the

find_all()

method

to

find

all

theh2

tags

in

the

HTML

which

represent

the

headlines

in

this

case.

Then

we

are

using

a

for

loop

to

iterate

over

the

headlinesand

printing

out

the

text

of

each

headline

using

the

textattribute.It

is

important

to

note

that

the

exact

selectors

and

tags

used

tofind

the

headlines

will

depend

on

the

specific

website

andstructure

of

the

HTML.Please

keep

in

mind

that

some

websites

may

block

scraping

orhave

CAPTCHAs

to

prevent

scraping.

It's

important

to

berespectful

of

a

website's

resources

and

servers

and

check

thewebsite's

terms

of

use

before

scraping.Please

write

a

very

simple

example

of

scraping

with

BeautifulSoup

where

you

printout

all

of

the

headlines

on

a

news

website七

。蛐bbc.comI

NewsAt

least

57Murdaugh

juror\Mental

health玉已I

Elements•••c!J

*屯呛臼咕青□.;>>y11

tabindex=11-l11

aria-hidden=11t

rue11>:·.

</a></div></li><li

class=11media-list_item

media-list—item--211>0·

</li>T<li

class=11media-list_item

media-list_item--311>T<div

class=11media

media--ove

rlayblock-link"data-bbc-container=11hero11data-bbc-title=11The

Indian-American

CEO

who

wants

to

be

US

president"data-bbc-source=11India11

data-bbc-metadata=11{11CHD11:

"card:

:311

}11><div

class=11media_image11>…</div>T

<div

class=11media—content">T

<h3

class=11media_title11>

==

$0<a

class=11media—link"

href=11!1tt

www_._bb_c_._c_om

w_o_r:ld-=as_i_a=-ind

ia=-6A_8_07

21211rev=11hero3

Iheadline">-

</a></h3><a

class=11media—tag

tag

tag--india11href=11L..

rev=11hero3

Isou

rce11>news_…</a></div><a

class=11block-link—overlay-link"href="区扭...

jia.media--overlay.block-link

div.media

content

h3.mediatitleStyles

ComputedFilterLayout

Event

Listeners

>>017

A

1众...X...:

hov.els+..,.甲团.丿~丿upyterScraping

headlines

from

the

BBC

homepage

Last

Checkpoint:

a

few

seconds

ago(autosaved)虎俨LogoutFile

Edit

View

InsertCell

KernelWidgets

HelpTrustedI

Python

3

(ipykernel)

0个「江I►

Run

C片笸

茨屯贮In

[l]:I

import

requestsMarkdownV芦1£rom

bs4import

Beau豆fulSoup#Make

an

HTTP

reques七七o七hewebsi七eurl

='https:///'

response=reques七s.ge七(url)#

Parse七he

HTML con七en七soup=

Beau七ifulSoup(response.text,'html.parser')#

Findheadlines=all

headlinessoup.find_all('h3')#

Prin七OU七七he七ex七of

each

headline|

forheadline

in

headlines:print(headline.tex七)UK

willprovideChallenger2七anks

to

Ukraine

-

SunakIran

executes

British-Iranian

Akbarion

spy

chargesTop

secre七documents

reportedly

found

in

Biden

cacheCaveat:

Nothing's

perfect!Caveat:

Nothing's

perfect

does

thingsexactly

how

I,

Jonathan

Soma,

the

personwith

the

most

opinions,

wants"normal"

BeautifulSoupCSS

selectors"get

all

the

links"links

=

[]for

title

in

doc.find_all("h3",class_='title'):link

=

title.find('a')links.append(link)links

=

doc.select("h3.title

a")Iteration

and

the

value

of

contextYes,you

can

use

CSS

selectors

with

BeautifulSoup

tofindelements

on

awebpageinstead

ofthefind_all()

method.BeautifulSoup

provides

a

select()

method

thatallows

youtouse

CSSselectors

tofindelements.Here

isanexampleofhowyoucanuseCSSselectorswithBeautifulSoup

to

scrapeheadlines

froma

newswebsite:In

the

abovecodesnippet,

weare

usingthe

select()

methodto

findall

the

h2

tags

in

theHTML

whichrepresent

theheadlinesin

thiscase,using

theCSSselector

h2.

Thenwe

areusingaforloop

to

iterateover

theheadlines

and

printingout

thetext

ofeachheadline

usingthetext

attribute.It's

worth

notingthat

youcan

use

morecomplex

selectorstoselect

elements,

youcanuse

theselect()

methodwithselectors

like

.class

toselectelements

byclassand#id

toselect

elementsbyid.It's

important

to

notethat

selectors

and

cssclasses

used

tofindthe

headlines

willdepend

onthe

specific

websiteand

structureofthe

HTML.Please

keepinmindthatsome

websitesmayblockscrapingorhave

CAPTCHAs

to

preventscraping.

It's

important

toberespectful

ofawebsite's

resourcesandservers

andcheckthewebsite's

terms

ofuse

beforescraping.Can

we

do

that

using

CSS

selectors

instead

of

.find_all?Yes,

you

can

remove

whitespace

from

before

and

after

the

headline

text

using

the

strip()

methodin

python.Here

is

an

example

of

how

you

can

use

the

strip()

method

to

remove

whitespace

from

beforeand

after

the

headlines:In

the

above

code

snippet,

we

are

usingthe

strip()

method

to

remove

any

whitespace

from

before

and

after

the

headline

text.

Thismethod

removes

any

leading

or

trailing

whitespace

characters

(spaces,

tabs,

newlines,

etc.)

in

thestring.You

can

also

use

the

lstrip()

or

rstrip()

method

if

you

want

to

remove

whitespace

only

fromthe

left

or

right

side

of

the

string

respectively.It's

important

to

note

that

some

websites

may

have

non-printable

characters

in

their

HTML,

in

suchcases

you

can

use

the

.replace()

method

or

a

regular

expression

to

remove

those

characters.Can

we

remove

whitespace

from

before/after

the

headline?The

CSS

selector

for

an

a

tag

with

the

class

of

"media

link"

immediately

inside

of

an

h3

tag

with

theclass

of

"media

title"

is

h3.media

title

>

a.media

link.This

selector

uses

the

child

combinator

>

to

match

only

a

tags

that

are

immediate

children

of

h3

tags.The

h3.media

title

selector

targets

an

h3

tag

with

the

class

of

"media

title"

anda.media

link

targets

an

a

tag

with

the

class

of

"media

link".You

can

also

use

the

space

combinator

to

select

a

tags

that

are

descendants

of

h3

tags.

For

exampleh3.media

title

a.media

link

will

select

all

the

a

tags

with

class

of

"media

link"

that

areinside

of

an

h3

tag

with

class

of

"media

title".It's

important

to

note

that

the

above

selectors

assume

that

the

website's

HTML

structure

follows

theexact

format

as

described,

in

some

cases

the

website's

structure

may

be

different.What

is

the

CSS

selector

for

an

a

tag

with

the

class

of

"media

link"

immediatelyinside

of

an

h3

tag

with

the

class

of

"media

title"?You

can

use

the

get()

method

to

access

the

hrefattribute

of

an

a

tag

that

is

selected

by

BeautifulSoup.Here

is

an

example

of

how

you

can

print

out

the

hrefattribute

of

an

a

tag:In

the

above

code

snippet,

we

are

using

the

select()method

to

find

all

the

a

tags

with

class

of

"media

link"and

then

using

a

for

loop

to

iterate

over

the

links,

andusing

the

get()

method

to

access

the

href

attribute

ofeach

link.It's

worth

noting

that

if

the

href

attribute

is

not

present

inthe

HTML,

the

get()

method

will

return

None

so

youshould

check

the

returned

value

of

get()

method

beforeusing

it.How

do

I

print

out

the

href

for

an

element

selected

by

BeautifulSoup?Caveat:

Not

everything

is

understandable...Anti-caveat:

AI

accepts

infinite

follow-upsrequests.get()

is

used

to

make

an

HTTP

GET

request

to

a

specified

URL.

The

get()

method

ispart

of

the

requests

library,

which

is

a

popular

library

for

sending

HTTPrequests

in

Python.An

HTTP

GET

request

is

the

most

common

type

of

request

and

is

used

to

retrieve

data

from

a

server.The

request

contains

a

URL,

which

specifies

the

location

of

the

resource

to

be

retrieved.

The

serverthen

responds

with

the

requested

resource,

if

it

exists,

and

a

status

code

indicating

whether

therequest

was

successful.The

get()

method

takes

one

argument,

which

is

the

URL

of

the

resource

to

be

retrieved.

It

returns

aresponse

object,

which

contains

the

server's

response

to

the

request.

The

response

object

hasvarious

properties

and

methods,

such

as

text,

content,

status_code,

etc.

that

can

be

used

toaccess

and

analyze

the

response.You

can

also

use

the

post()

method

to

make

a

HTTP

POSTrequest,

put()

to

make

a

HTTP

PUTrequest

and

delete()

to

make

a

HTTP

DELETE

request

etc.It's

important

to

note

that

the

requests

library

is

not

part

of

the

python

standard

library,

so

you

needto

install

it

before

using

it

in

your

code

using

pip

install

requests.Why

is

it

requests.get?The

requests.get()

function

is

used

to

retrieve

information

from

a

website.

It

is

a

function

from

apython

library

called

requests.

When

you

use

this

function,

you

provide

the

website's

address,

andthe

function

sends

a

request

to

that

website

asking

for

information.

The

website

then

sends

back

aresponse

which

includes

the

information

you

asked

for.

The

get()

method

is

just

one

of

the

manyHTTP

requests

that

you

can

use

to

interact

with

a

website.

There

are

also

post(),

put(),delete(),

etc.It's

important

to

note

that

requests

is

not

part

of

the

python

standard

library

and

you

need

to

install

itbefore

using

it

in

your

code

by

running

pip

install

requests

in

your

command

line.Please

keep

in

mind

that

some

websites

may

block

scraping

or

have

CAPTCHAs

to

prevent

scraping.It's

important

to

be

respectful

of

a

website's

resources

and

servers

and

check

the

website's

terms

ofuse

before

scraping.Explain

that

in

much

less

technical

languageI

have

a

dataframe

with

two

columns,

countyand

social

spending.

I

am

using

the

followingR

code

to

plot

a

horizontal

bar

graph

of

eachcountry's

social

spendinglibrary(ggplot2)ggplot(df)

+aes(x=`Social

Spending`,

y=Country)

+geom_col()how

do

i

make

it

sort

the

bars

by

socialspendingYou

can

sort

the

bars

in

descending

order

of

social

spending

byadding

the

"desc"

argument

to

the

"arrange"

function

within

theggplot

call.

Here's

the

modified

codeThe

"coord_flip"

function

flips

the

x

and

y

axes

to

createahorizontal

bar

chart.

The

"labs"

function

sets

the

x-axis

label

to"Social

Spending".

The

"theme_bw"

function

sets

the

plotbackground

to

white,

and

the

"theme"

function

centers

the

plot

title.The

"scale_x_continuous"

function

formats

the

x-axis

labels

asdollar

amounts.

The

"scale_y_discrete"

function

reverses

the

orderof

the

y-axis

categories.

Finally,

the

"geom_text"

function

addsdollar

values

as

labels

to

the

bars,

and

the

"geom_col"

functionYes,

you

can

add

the

reorder

function

inside

the

aescall

to

sort

the

bars

by

the

Social

Spendingcolumn.

Here's

the

modified

code

with

this

approach:is

there

a

way

to

do

this

that

doesn't

add

asmuch

codewIn

this

code,

the

reorder

function

sorts

the

Countryvariable

by

the

SocialSpending

variable,

which

isdone

inside

the

aes

call.

This

creates

a

bar

chart

withbars

sorted

by

social

spending,

without

the

need

foradditional

geom_col

or

arrange

calls.

The

labsfunction

adds

axis

labels,

and

the

theme_bw

functionsets

a

white

background.Translating

your

skillsHere

is

an

example

of

how

you

can

use

Playwright

to

scrapeheadlines

and

hrefs

from

a

news

website,

instead

of

usingBeautifulSoup:Translate

the

code

below

into

using

playwright

instead

of

using

BeautifulSoup:Caveat:

Very

much

not

perfectFixing

bugsI

apologize

for

the

confusion.You

should

use

import

playwright

instead

of

fromplaywright

import

async_playwright

in

the

firstline

of

the

code.Here

is

the

corrected

code:I'm

getting

the

error

"ImportError:

cannot

import

name

'async_playwright'

from'playwright'"The

error

"RuntimeError:

This

event

loop

is

already

running"

israised

because

the

event

loop

is

already

running

and

you

can'tcreate

a

new

one.You

could

fix

this

error

by

removing

the

lineasyncio.get_event_loop().run_until_complete(main())

from

the

code,

because

it

is

trying

to

run

the

event

loopagain,

and

the

event

loop

is

already

running.Here

is

the

corrected

code:I'm

getting

the

error

"RuntimeError:

This

event

loop

is

already

running"The

error

"UnboundLocalError:

local

variable

'playwright'referen

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论