NBA Custom Web Scraper


Professional project
Python code solution and explanation
requests, bs4, BeautifulSoup

Introduction

Although I’m a huge NBA fan, I usually don’t have the time to catch up on games every single day. So today we’ll be making a program that automates the process of collecting the current standings of NBA teams from ESPN. It retrieves data about the number of games won, lost, and tied by each team. This information is then organized into a dictionary.

Solution

Let’s start by importing the required libraries. The requests library in Python is a useful tool for making HTTP requests to websites and processing the response data, allowing users to easily retrieve and interact with data. beautifulsoup is used for web scraping and parsing HTML and XML documents, allowing users to easily extract data for further analysis.

import requests
from bs4 import BeautifulSoup

I will be retrieving the data from ESPN. This is the link: https://www.espn.com/nba/standings

This is what the website will look like.

ESPN NBA standings page
URL = 'https://www.espn.com/nba/standings'

Now we can get a request from the URL and save the text.

response = requests.get(URL)
page = response.text

We can use BeautifulSoup to grab a hold of the document using the HTML parser.

soup = BeautifulSoup(page, "html.parser")

Now how will we grab the team names and the data with them? We need to find a specific class with all the team names, and a specific class with all the wins, losses, ties, etc.

We can use Ctrl + Shift + C to navigate to the text you want. Once you click on it, you will see the HTML code corresponding to it. I noticed that all the team names had the ‘hide-mobile’ class and all the wins had the ‘stat-cell’ class.

Now we can use the soup.find_all() method, and insert the class name. Remember to add the underscore after class_.

teams = soup.find_all(class_='hide-mobile')
wins = soup.find_all(class_='stat-cell')

If we print the following, we will get all the HTML code, but not the text. For each team, we can simply grab the text, but it’s not the same for wins. There are 13 sections for wins. I developed this logic to have a key:value pair where key is the team and value is a list containing the wins. I used a list comprehension for the wins. Remember to use .text for each individual item.

start = 0
end = 13
eastern_teams = {}
for eastern_team in teams[0:15]:
    eastern_teams[f"{num}. {eastern_team.text}"] = [x.text for x in wins[start:end]]
    start += 13
    end += 13
    num += 1
print("Eastern Conference")
print(eastern_teams)
num = 1
western_teams = {}
for western_team in teams[15:31]:
    western_teams[f"{num}. {western_team.text}"] = [x.text for x in wins[start:end]]
    start += 13
    end += 13
    num += 1
OUTPUT
Eastern Conference

{
'1. Boston Celtics': ['38', '16', '.704', '-', '20-7', '18-9', '7-1', '22-11', '117.6', '111.6', '+6.0', 'W1', '6-4'],

'2. Milwaukee Bucks': ['37', '17', '.685', '1', '23-5', '14-12', '7-4', '21-13', '114.5', '111.8', '+2.7', 'W8', '9-1'], 

'3. Philadelphia 76ers': ['34', '18', '.654', '3', '20-8', '14-10', '5-4', '19-12', '114.6', '110.9', '+3.7', 'L1', '8-2'],
 
'4. Cleveland Cavaliers': ['34', '22', '.607', '5', '22-6', '12-16', '9-3', '20-10', '111.8', '106.5', '+5.3', 'W3', '6-4'], 

'5. Brooklyn Nets': ['32', '22', '.593', '6', '16-10', '16-12', '6-5', '22-11', '114.4', '112.6', '+1.8', 'L2', '5-5'], 

'6. Miami Heat': ['29', '25', '.537', '9', '17-9', '12-16', '6-3', '13-16', '108.5', '108.4', '+0.1', 'L2', '5-5'], 

'7. New York Knicks': ['30', '26', '.536', '9', '14-15', '16-11', '4-7', '21-15', '114.2', '112.4', '+1.8', 'W2', '5-5'],
 
'8. Atlanta Hawks': ['27', '28', '.491', '11.5', '13-11', '14-17', '5-4', '17-17', '116.1', '116.5', '-0.4', 'L2', '4-6'], 

'9. Chicago Bulls': ['26', '28', '.481', '12', '16-11', '10-17', '5-4', '20-15', '114.3', '113.8', '+0.5', 'L1', '6-4'], 

'10. Indiana Pacers': ['25', '30', '.455', '13.5', '17-12', '8-18', '3-5', '17-15', '114.7', '116.9', '-2.2', 'L1', '2-8'], 

'11. Toronto Raptors': ['25', '30', '.455', '13.5', '15-12', '10-18', '4-9', '15-19', '113.0', '112.4', '+0.6', 'W2', '5-5'], 

'12. Washington Wizards': ['24', '29', '.453', '13.5', '12-12', '12-17', '5-3', '13-17', '112.8', '113.5', '-0.7', 'L3', '6-4'], 

'13. Orlando Magic': ['22', '33', '.400', '16.5', '13-14', '9-19', '3-7', '11-23', '111.1', '114.2', '-3.1', 'L1', '5-5'], 

'14. Charlotte Hornets': ['15', '40', '.273', '23.5', '7-17', '8-23', '5-7', '8-27', '112.0', '118.6', '-6.6', 'L4', '4-6'], 

'15. Detroit Pistons': ['14', '41', '.255', '24.5', '7-21', '7-20', '0-8', '6-24', '112.3', '119.7', '-7.4', 'L2', '2-8']
}


Western Conference
{
'1. Denver Nuggets': ['38', '17', '.691', '-', '26-4', '12-13', '10-5', '28-11', '117.4', '113.0', '+4.4', 'W1', '6-4'], 

'2. Memphis Grizzlies': ['33', '21', '.611', '4.5', '22-5', '11-16', '6-2', '15-16', '116.0', '112.2', '+3.8', 'W1', '2-8'], 

'3. Sacramento Kings': ['30', '23', '.566', '7', '16-11', '14-12', '5-5', '18-11', '119.3', '116.6', '+2.7', 'W1', '5-5'], 

'4. LA Clippers': ['31', '26', '.544', '8', '14-11', '17-15', '4-4', '17-15', '111.2', '110.9', '+0.3', 'W2', '8-2'], 

'5. Phoenix Suns': ['30', '26', '.536', '8.5', '19-9', '11-17', '8-0', '20-14', '112.7', '111.3', '+1.4', 'W3', '8-2'], 

'6. Dallas Mavericks': ['29', '26', '.527', '9', '19-9', '10-17', '7-2', '21-13', '112.5', '112.0', '+0.5', 'W1', '5-5'], 

'7. Golden State Warriors': ['28', '26', '.519', '9.5', '21-6', '7-20', '4-4', '17-11', '118.4', '118.2', '+0.2', 'W2', '6-4'], 

'8. New Orleans Pelicans': ['29', '27', '.518', '9.5', '20-9', '9-18', '7-4', '18-14', '115.5', '113.5', '+2.0', 'W3', '3-7'], 

'9. Minnesota Timberwolves': ['29', '28', '.509', '10', '20-12', '9-16', '7-7', '20-18', '115.4', '115.5', '-0.1', 'L1', '6-4'], 

'10. Utah Jazz': ['27', '28', '.491', '11', '18-11', '9-17', '4-5', '19-16', '117.5', '116.6', '+0.9', 'L2', '5-5'], 

'11. Portland Trail Blazers': ['26', '28', '.481', '11.5', '14-12', '12-16', '5-7', '18-15', '114.5', '114.3', '+0.2', 'L2', '5-5'], 

'12. Oklahoma City Thunder': ['25', '28', '.472', '12', '16-11', '9-17', '4-6', '12-16', '117.3', '116.4', '+0.9', 'L1', '5-5'], 

'13. Los Angeles Lakers': ['25', '29', '.463', '12.5', '13-12', '12-17', '1-9', '12-18', '117.1', '118.4', '-1.3', 'L1', '5-5'], 

'14. San Antonio Spurs': ['14', '40', '.259', '23.5', '9-21', '5-19', '2-7', '5-30', '112.6', '122.6', '-10.0', 'L9', '1-9'], 

'15. Houston Rockets': ['13', '41', '.241', '24.5', '8-19', '5-22', '1-8', '7-29', '110.1', '118.0', '-7.9', 'L3', '3-7']
}
Solution

To view my full code, please visit my GitHub repository:
https://github.com/Gursehaj-Singh/nba-standings-webscraper

import requests
from bs4 import BeautifulSoup

URL = 'https://www.espn.com/nba/standings'

response = requests.get(URL)
page = response.text

soup = BeautifulSoup(page, "html.parser")
teams = soup.find_all(class_='hide-mobile')
wins = soup.find_all(class_='stat-cell')

start = 0
end = 13
num = 1

eastern_teams = {}
for eastern_team in teams[0:15]:
    eastern_teams[f"{num}. {eastern_team.text}"] = [x.text for x in wins[start:end]]
    start += 13
    end += 13
    num += 1
print("Eastern Conference")
print(eastern_teams)

num = 1
western_teams = {}
for western_team in teams[15:31]:
    western_teams[f"{num}. {western_team.text}"] = [x.text for x in wins[start:end]]
    start += 13
    end += 13
    num += 1

print("Western Conference")
print(western_teams)
Further Steps

Now that we’ve retrieved a dictionary filled with team:wins pairs, we can now work further with it. Instead of having to run the code every time, we can make the program send an email every day or every week, according to your preference. This way, you can stay caught up with your favorite team and the NBA standings, without even having to search it up.

Hope you enjoyed it.

,