[Python][Crawling][Scraping]Scraping시에 error 핸들링하기(4)

Kamangs 2019. 5. 11. 11:56

2019. 5. 11. 11:56

728x90

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup


def get_element(element, url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        return None
    try:
        bs = BeautifulSoup(html.read(), 'html.parser')
        result = getattr(bs, element).string
    except AttributeError as e:
        return None
    return result


result = get_element('title', 'http://www.pythonscraping.com/pages/page1.html')
if result is None:
    print('Title could not be found')
else:
    print(result)

스크래핑이나 크롤링을 할 때 보면 여러가지 에러상황에 부딪히게 된다.

그 에러상황을 핸들링 하는 간단한 방법을 알아보자.

try:
    html = urlopen(url)
except HTTPError as e:
    return None

일단 html을 호출하는데 그 페이지가 현재 제대로됬는지 안됬는지 알아야한다.

위와 같이 try, except를 걸어준다.

try:
    bs = BeautifulSoup(html.read(), 'html.parser')
    result = getattr(bs, element).string
except AttributeError as e:
    return None
return result

또한 bs의 에러캐치구문을 걸어준다.

bs에서 일어나는 에러는 여기서 잡아준다.

그 외에도 여러가지 에러 상황을 경험할 수 있으니 그 때마다 에러캐치를 해줘야 나중에 크롤링 봇을 만들 수 있다.

저작자표시

'Programming > Python-Crawling And Scraping' 카테고리의 다른 글

[Python][Crawling][Scraping]BeautifulSoup과 파서로 엘리먼트 선택하기(3) (0)	2019.05.11
[Python][Crawling][Scraping]BeautifulSoup과 html파서(2) (0)	2019.04.21
[Python][Crawling][Scraping]크롤링과 스크래핑, 그리고 원리(1) (0)	2019.04.21

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Kamang's IT Blog

[Python][Crawling][Scraping]Scraping시에 error 핸들링하기(4)

'Programming > Python-Crawling And Scraping' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역