ETL Project

ISSUED PARKING TICKETS ON TORONTO GREEN P PARKING SPACES

MEMBERS

Anna Francesca Gatus
Christopher Habib
Siddharth Krishnan

INTRODUCTION

The Toronto Parking Authority is a local Board of the City of Toronto which owns and operates the system of Municipal off-street parking lots ('Green P') and the on-street metered parking. Approximately 2.8 million parking tickets are issued annually across the City of Toronto. The Issued Parking Tickets dataset contains non-identifiable information relating to each parking ticket issued for each calendar year. The tickets are issued by Toronto Police Services (TPS) personnel as well as persons certified and authorized to issue tickets by TPS.

Our group chose to combine 2015 Issued Parking Tickets and Green P Parking. Final table has the following columns: Parking ID, Parking Rate, Address, Infraction Description and Set Fine Amount. Link to code can be found here.

METHODS

Using the ETL processes, the following tasks were done:

Extract:

Extracted 2015 Issued Parking Tickets Data from Toronto Open Data Catalogue.

Source here

Link to three .csv files here

Extracted 2015 Green P Parking Data from The Toronto Parking Authority Open Data Catalogue

Source here

Link to Json file here

Transform:

Issued Parking Tickets Data

Used Python Pandas library to load and read the three .csv files.

Used pd.concat function to combine the three DataFrame results.

Stored addresses by selecting location 2 column and putting it in a list.

Green P Parking Data

Used Python Json library to load and read json file.

Used a for loop to collect parking id, address and rate data and stored information to corresponding lists.

Results were saved as DataFrame.

Used Python pandas library to convert lists to DataFrame.

Stored addresses by selecting address column and putting it in a list.

Addresses of Issued Parking Tickets must be transformed to be identical to addresses of Green P Parking so that it can be merged.

Converted address list to upper case.

Removed dots from address.

Used a for loop and if, elif, else statements to change the following:

east to E

west to W

street to ST

blvd to BLVD

avenue to AVE

road to RD

dr to DR

circle to CRCL

lane to LANE

drive to DRIVE

Stored cleaned up data to a list called streets.

Verified if the counts of common addresses on both datasets match.

Merged two DataFrames using the clean address column.

Load:

Created SQL connection.

Exported to MySQL. Since the final output is a DataFrame, we decided to load the data into a relational database.

Final table to be used in the production database has the following columns: Parking ID, Parking Rate, Address, Infraction Description and Set Fine Amount. Reason why these columns were selected is to determine possible relationship between parking rate vs infractions, and parking rate vs set fine amount. Other analysis that can be done would be: Which location has the highest infraction? What infraction is the most common? Does high parking rate causing infractions? Does high fine prevent infraction?