DAVID DUPUIS

MTA Turnstile Data

Analyzing MTA Turnstile Data to Strike more Effectively

Start Date End Date Duration Members
01/19/2016 01/25/2016 1 week 4

Summary

For this project, we used New-York MTA turnstile data to determine which subway line is most used during the month.

Ressources

Data was collected from the New-York MTA turnstile data at: MTA-Turnstile-data

Tools Used

Use Tool
Code Python

Challenges

Turnstiles had cumulative counts and thus the exact number of individuals that went through each turnstile during a day had to be computed.

Some turnstiles had negative cumulative counts. To compensate for this we decided to take the absolute value where the count was no lower than -10 000 and anywhere else the count was set to 0.

We had to compute an estimate on how many individuals were taking a subway line at a given time. To do this we considered that for every station the same number of people took the different lines branching out from that station and in either direction.

Mistakes/Failures

The data was flawed and we didn't use it over a period long enough that the errors would dissapate.

Enjoyed

To make this project fun we imagined we were a data science group attempting to get a contract with the New-York MTA Labor Union. We were therefore looking for the busiest line so that the Union could strike more effictevily.

Leadership

I strongly encouraged our team to use classes which I believed would have allowed better analysis of our problem.

Conflicts

A conflict arouse when I attempted to get the group to code the Python script with classes. But the individuals in my group were not familiar with object-oriented programming.

What would I do differently

For this project I would have taken on a more systematic approach. In other words, build different functions to compute reliable information separately about our problem in order to build an efficient analytical solution.