DAVID DUPUIS

New-York Times

Building a New-York Times Article Recommendation Engine

Start Date End Date Duration Members
02/29/2016 03/11/2016 2 weeks 1

Summary

I built a scalable pipeline for a New York Times article recommendation engine using auto-summarisation and LDA modeling. I presented my results using a web interface designed with Flask and Material Design Lite

Ressources

I used the New-York Times Article API to download metadata on articles

I also used BeautifulSoup to scrape the entire articles off of the website

Tools Used

Use Tool
Code Python
Scraping Beautiful Soup
Recommendation Latent Semantic Analysis (LSA)
Storage MongoDB

Challenges

I only had some key words to create my word vectorizer, in order to solve that problem I also scraped the articles and summarized them. Then I used the summaries as well.

It was my first time working with Flask, and I found it to be a challenge

Mistakes/Failures

I was coding in Python 3 but it didn't accept the LSI tools I wanted to try.

To present my results I spent lots of time with Flask and didn't make a great presentation instead

Enjoyed

I enjoyed learning how to build a recommendation engine and thinking about how other companies built their own.

Conflicts

I had conflicts with the Python version and using LSI tools.

What would I do differently

I would have spent more time on finding ways to improve the recommendation engine that the presentation.