[Python][Crawling][Scraping]BeautifulSoup과 파서로 엘리먼트 선택하기(3)

Kamangs 2019. 5. 11. 00:01

2019. 5. 11. 00:01

728x90

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://www.pythonscraping.com/pages/page1.html')
bs = BeautifulSoup(html.read(), 'html.parser')

print(bs.body)
print('*' * 100)
print(bs.body.parent)
print('*' * 100)
for child in bs.body.children:
    print('child.name:', end='')
    print(child.name)
    print('child.string:', end='')
    print(child.string)
    print('*' * 100)

이번에는 위의 코드를 사용해서 각각의 엘리먼트를 선택하는 법을 보도록 하자.

print(bs.body)

당연히 body를 보려면 위와 같이 사용한다.

출력결과는 아래와 같다.

그냥 출력하면 위와같이 모든 값이 다 보인다.

print(bs.body.parent)

위처럼 사용하면 body의 위의 엘리먼트, 여기서는 html엘리먼트를 의미한다.

당연히 그냥 출력하면 위 처럼 출력된다. html엘리먼트의 전문이 출력된다.

for child in bs.body.children:
    print('child.name:', end='')
    print(child.name)
    print('child.string:', end='')
    print(child.string)
    print('*' * 100)

자식을 보고싶으면 children을 사용한다.

다만 children은 보려면 for문을 통해서 봐야한다. 왜냐하면 iterable한 객체이기 때문이다.

print('child.name:', end='')
print(child.name)
print('child.string:', end='')
print(child.string)

name은 해당 엘리먼트의 이름을 출력하게 된다.

string은 해당 엘리먼트의 내용을 출력하게 된다.

출력하다보면 child.name이 None을 출력하는 경우가 있다.

이것은 그냥 강제개행을 의미하고 별 의미가 없다.

이 경우는 걸러주면된다.

저작자표시

'Programming > Python-Crawling And Scraping' 카테고리의 다른 글

[Python][Crawling][Scraping]Scraping시에 error 핸들링하기(4) (0)	2019.05.11
[Python][Crawling][Scraping]BeautifulSoup과 html파서(2) (0)	2019.04.21
[Python][Crawling][Scraping]크롤링과 스크래핑, 그리고 원리(1) (0)	2019.04.21

Kamang's IT Blog IT블로그입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Kamang's IT Blog

[Python][Crawling][Scraping]BeautifulSoup과 파서로 엘리먼트 선택하기(3)

'Programming > Python-Crawling And Scraping' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역