For my second learning using SQLite in Python, my brief was working at a media analytics company focused on streaming insights. My task was to analyze the MUBI movie dataset to uncover trends between ratings, genres, and movie success
The tables I used in the MUBI dataset were
movies
ratings
I began by importing SQLite3, pandas, matplotlib, seaborn and wordcloud modules
I then established a connection to the SQLite database before running queries to retrieve the data.
To check the connection, I first wrote a query to retrieve the first row from one of the tables. I used "SELECT *" to retrieve all features of the dataset, then "FROM movies" as the database to get those features from and lastly "LIMIT 1" to return the first row of data.
I then ran the query and displayed the result.
My first task was to analyse some of the most popular movies, exploring the year they were released. To do this I used "SELECT movie_title, movie_release_year" FROM "movies" table and sorted in descending order of "movie_popularity" with a LIMIT of 10. Then df.head() was used to call 5 rows, ensuring the query worked as expected.
In order to improve marketing campaigns and promotional efforts, I was then tasked with identifying critically acclaimed films. To do this I used "SELECT movie_title, movie_release_year, rating" FROM "movies" table and sorted in descending order of "movie_popularity". Then df.head() was used to call 5 rows, ensuring the query worked as expected.
I then used this data to visualize how many successful movies were released each year using matplotlib. You can clearly see an upwards trend in releases from 1990 until 2020. The numbers then plummet, based on the year, this is like due to the challenges the movie industry faced during the Covid19 pandemic.
In order to improve marketing campaigns and promotional efforts, I was then tasked with identifying critically acclaimed films. To do this I used "SELECT movie_title, movie_release_year, rating" FROM "movies" table and sorted in descending order of "movie_popularity". Then df.head() was used to call 5 rows, ensuring the query worked as expected.
I then used this data to visualize