The brief for my first task using SQLite in Python was being employed as a sports data consultant providing insights to basketball teams, broadcasters and sports media outlets.
My first task was to find out which teams are the most consistent and which players are the most impactful.
To complete the task you decide to analyze the NBA Boxscore Dataset, which includes 3 tables:
game_info: contains information about each game between two teams, including things like the scores and the outcome.
team_stats: contains detailed statistics for each team in each game, such as points scored, rebounds, assists, and more.
player_stats: contains individual game stats for each player, including points, assists, rebounds, and other performance details.
I began by importing SQLite3 and pandas modules
I then established a connection to the SQLite database before running queries to retrieve the data.
To check the connection, I first decided to write a query to retrieve the first row from one of the tables. I used "SELECT *" to retrieve all features of the dataset, then "FROM game_info" as the database to get those features from and lastly "LIMIT 1" to return the first row of data.
I then ran the query and displayed the result.
The next query I wanted to run was to find the top ten players for points scored (PTS) and the team that they play for. Here I used "SELECT" to return "player", "team" and "PTS" FROM "player_stats". Then ORDER BY "PTS" decending with LIMIT 10 for a top ten list. This list showed Donovan Mitchell and Damian Lillard joint top with 71 points. The list however did call out that this data set may need to be cleansed as you can see there are duplicate entries for some players, such as James Harden, meaning this is not a fully accurate top ten list.