ChatGPT를 활용하여 웹크롤링 코드 작성해보기 (Blackrock ETF 예제)

ChatGPT가 웹크롤링 코드도 잘 작성할수 있을까?

요즘 ChatGPT의 엄청난 능력들이 여기저기 확인되고 있는데요, 제가 기존 블로그 포스트에서도 올렸었던 ETF 웹사이트의 크롤링 코드를 ChatGTP가 잘 작성할수 있을지 실험을 해 보았습니다.
먼저 Blackrock사의 개별 ETF 정보를 크롤링하는 작업을 해보았던 하단 포스팅을 먼저 참고해주시길 바랄게요~

크롤링 코드중에 ETF 페이지 내의 "Key Facts" 섹션 하에 있는 "Net Assets of Fund"의 텍스트 필드값(예를들어 "$15,216,074,479")을 가져오는 것을 Prompt로 작성해보도록 하겠습니다.

Prompt 작성

기본적으로 Chrome DevTools과 같은 도구를 이용해서 HTML Element를 분석할 수 있다고 가정합니다. 기존 포스트에서 작성한대로 추출을 원하는 부분의 HTML 태그를 사전적으로 분석해보고 ChatGPT에게 해당 작업을 명령하면 될 것 같습니다.

Chrome DevTool을 통해 화면상에서 추출을 원하는 element를 선택하여 html 구조를 먼저 파악해봅니다.

<div class="float-left in-left col-totalNetAssetsFundLevel ">
    <span class="caption" data-label="" data-hascontent="no">
    	Net Assets of Fund
        <span class="as-of-date">
            as of Feb 13, 2023
        </span>
    </span>
    <span class="data">
    	$15,646,884,475
    </span>
</div>

이제 Prompt 를 작성해 보았습니다. 코드를 작성하는 Prompt는 하단과 같이 순차적으로 나열해주는 것이 유용할 것 같네요.

완성된 ChatGPT Prompt

write a python code that performs the following tasks :
1) crawl https://www.ishares.com/us/products/239565/ishares-iboxx-high-yield-corporate-bond-etf
2) extract text from span tag(wrapped by div tag whose class name contains "float-left in-left col-totalNetAssetsFundLevel") whose class name is "data"
3) convert text from currency type to number
4) return the number

위 Prompt에 대한 답변으로 ChatGPT가 작성해준 코드는 다음과 같습니다.

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ishares.com/us/products/239565/ishares-iboxx-high-yield-corporate-bond-etf'

# Send a GET request to the URL
response = requests.get(url)

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the div tag with class "float-left in-left col-totalNetAssetsFundLevel"
div_tag = soup.find('div', {'class': 'float-left in-left col-totalNetAssetsFundLevel'})

# Find the span tag with class "data" inside the div tag
span_tag = div_tag.find('span', {'class': 'data'})

# Extract the text from the span tag and convert it to a number
text = span_tag.text.strip()
match = re.search(r'\$([\d,]+(\.\d{1,2})?)', text)  # Find the currency symbol and the number
if match:
    number = float(match.group(1).replace(',', ''))
else:
    number = None

# Return the number
print(number)

crawler.py

스크립트를 실행해보니... 너무나도 깔끔하게 바로 원하는 값이 추출되는 것을 확인하였습니다😲


python crawler.py 


(out)15216074479.0

ChatGPT의 잠재력은 정말 대단한 것 같습니다

Prompt 변형을 통해 추가 추출을 원하는 데이터와 추출 후 처리방법 등을 포함하여 기존 예제의 작업을 100% 수행하는 코드도 충분히 작성하여 줄 것 같습니다.
웹크롤링 작업도 향후 ChatGPT의 큰 활용분야로 자리매김할 것 같은데요, 최근에 유투브에서 본 인상깊은 말을 공유드리면서 포스팅을 마치도록 하겠습니다~

향후 가장 촉망받는 프로그래밍 언어는 "영어"가 될 것이다.

ChatGPT를 활용하여 웹크롤링 코드 작성해보기 (Blackrock ETF 예제)

ChatGPT가 웹크롤링 코드도 잘 작성할수 있을까?

Prompt 작성

완성된 ChatGPT Prompt

ChatGPT의 잠재력은 정말 대단한 것 같습니다

You might also like

Writing a web crawling code using ChatGPT (Blackrock ETF example)

Python - Crawling ETF information using the BeautifulSoup library.

Python - BeautifulSoup 라이브러리를 이용해 해외 ETF 정보 크롤링(Crawling) 하기

Subscribe to new posts.