Box Office Mojo Predictions

Predicting Studio Quarterly Gross

Start Date End Date Duration Members
01/25/2016 02/05/2016 1 week 1

The iPython Notebook for this project can be found here (not available yet)


I scraped studio data from http://www.boxofficemojo.com to predict with a 76% accuracy total quarterly revenue gross for different studios.


Data was scraped from Box Office Mojo

Tools Used

Use Tool
Code Python
Scraping Beautiful Soup
Prediction Linear Regression


This was my first solo data science project.

I thought it was challenging to scrape all of the data from boxofficemojo.com because the html elements don't have any ids.


My first mistake was not paying attention to how the data would be used. I started collected all of the data into a multidimensional Python dictionary, such as one would have if storing data into MongoDB. However I couldn't build a csv table with the data.

My second mistake was not being more careful with how much data I would need to build a reliable an accurate Linear Regression model. I only had a couple hundreds of rows for each studio instead of thousands or millions I could have had for movies.


My project was quite unique and that is one of the reasons I enjoyed it.

I also enjoyed finding that one model could be use from one studio to the next.


The only conflicts I had were with the data that I had and the mistakes that I made.

What would I do differently

I would probably have picked a different problem with a more realistic use.

I would also pay more attention to not make the mistakes that I did.