Analysing Movie Data

Practice Lab from Coursera - Using SQLite in Python

SQLITE IN PYTHON

For my second learning using SQLite in Python, my brief was working at a media analytics company focused on streaming insights. My task was to analyze the MUBI movie dataset to uncover trends between ratings, genres, and movie success

The tables I used in the MUBI dataset were

movies
ratings

Import modules

I began by importing SQLite3, pandas, matplotlib, seaborn and wordcloud modules

Connect to database

I then established a connection to the SQLite database before running queries to retrieve the data.

Checking the connection

To check the connection, I first wrote a query to retrieve the first row from one of the tables. I used "SELECT *" to retrieve all features of the dataset, then "FROM movies" as the database to get those features from and lastly "LIMIT 1" to return the first row of data.

I then ran the query and displayed the result.

Most popular movies

My first task was to analyse some of the most popular movies, exploring the year they were released. To do this I used "SELECT movie_title, movie_release_year" FROM "movies" table and sorted in descending order of "movie_popularity" with a LIMIT of 10. Then df.head() was used to call 5 rows, ensuring the query worked as expected.

Top-rated movies

In order to improve marketing campaigns and promotional efforts, I was then tasked with identifying critically acclaimed films. To do this I used "SELECT movie_title, movie_release_year, rating" FROM "movies" table and sorted in descending order of "movie_popularity". Then df.head() was used to call 5 rows, ensuring the query worked as expected.

I then used this data to visualize how many successful movies were released each year using matplotlib. You can clearly see an upwards trend in releases from 1990 until 2020. The numbers then plummet, based on the year, this is like due to the challenges the movie industry faced during the Covid19 pandemic.

Popular Directors

In order to improve marketing campaigns and promotional efforts, I was then tasked with identifying critically acclaimed films. To do this I used "SELECT movie_title, movie_release_year, rating" FROM "movies" table and sorted in descending order of "movie_popularity". Then df.head() was used to call 5 rows, ensuring the query worked as expected.

I then used this data to visualize