Writing a web crawling code using ChatGPT (Blackrock ETF example)

Can ChatGPT write web crawling code well?

In this post, I experimented whether the ChatGTP could write the crawling code of the ETF website that I uploaded in the existing blog post.
First of all, please refer to the following post that I coded to crawl the individual ETF information from Blackrock's website.

Mission of this post

I am going to write the chatGPT prompt for fetching the text field value(e.g. "$15,216,074,479") of the "Net Assets of Fund" under the "Key Facts" section within the individual ETF page.

Writing the Prompt

I assume that you have the basic knowledge of html document,and can analyze html elements using the tools such as Chrome Devtools. As I wrote in the previous post, you can analyze the HTML tags in which you want to extract first and make order ChatGPT for that work.

Select the element you want to extract on the browser to identify the HTML structure first.

<div class="float-left in-left col-totalNetAssetsFundLevel ">
    <span class="caption" data-label="" data-hascontent="no">
    	Net Assets of Fund
        <span class="as-of-date">
            as of Feb 13, 2023
        </span>
    </span>
    <span class="data">
    	$15,646,884,475
    </span>
</div>

HTML element to examine

Now I have written the prompt. It would be useful to list the prompt to write the code sequentially as shown below.

ChatGPT Prompt

write a python code that performs the following tasks :
1) crawl https://www.ishares.com/us/products/239565/ishares-iboxx-high-yield-corporate-bond-etf
2) extract text from span tag(wrapped by div tag whose class name contains "float-left in-left col-totalNetAssetsFundLevel") whose class name is "data"
3) convert text from currency type to number
4) return the number

The code written by ChatGPT is as follows.

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ishares.com/us/products/239565/ishares-iboxx-high-yield-corporate-bond-etf'

# Send a GET request to the URL
response = requests.get(url)

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the div tag with class "float-left in-left col-totalNetAssetsFundLevel"
div_tag = soup.find('div', {'class': 'float-left in-left col-totalNetAssetsFundLevel'})

# Find the span tag with class "data" inside the div tag
span_tag = div_tag.find('span', {'class': 'data'})

# Extract the text from the span tag and convert it to a number
text = span_tag.text.strip()
match = re.search(r'\$([\d,]+(\.\d{1,2})?)', text)  # Find the currency symbol and the number
if match:
    number = float(match.group(1).replace(',', ''))
else:
    number = None

# Return the number
print(number)

crawler.py

When I ran the script ... I found that the value I wanted was extracted so neatly 😲


python crawler.py 


(out)15216074479.0

The potential of ChatGPT seems to be great

It is also likely to fully write the code that performs 100% of the task of the existing example, including the additional data desired to extract and data processing.
Web crawling work will also become a big use of ChatGPT in the future. I will finish the post by sharing the impressive words I saw in YouTube recently.

The most popular programming language in the future will be "English".

Writing a web crawling code using ChatGPT (Blackrock ETF example)

Can ChatGPT write web crawling code well?

Mission of this post

Writing the Prompt

ChatGPT Prompt

The potential of ChatGPT seems to be great

You might also like

Why Yield to Maturity Matters for Fixed Income ETFs (HYG Case Study)

Simulating ETF trading strategy using CNN-Fear-and-Greed index

Crawling historical CNN-Fear-and-Greed Index data

Live chat with stock and cryptocurrency price charts (Discord x Alpha.bot)

Subscribe to new posts.