ISSUED PARKING TICKETS ON TORONTO GREEN P PARKING SPACES
MEMBERS
- Anna Francesca Gatus
- Christopher Habib
- Siddharth Krishnan
INTRODUCTION
The Toronto Parking Authority is a local Board of the City of Toronto which owns and operates the system of Municipal off-street parking lots ('Green P') and the on-street metered parking. Approximately 2.8 million parking tickets are issued annually across the City of Toronto. The Issued Parking Tickets dataset contains non-identifiable information relating to each parking ticket issued for each calendar year. The tickets are issued by Toronto Police Services (TPS) personnel as well as persons certified and authorized to issue tickets by TPS.
Our group chose to combine 2015 Issued Parking Tickets and Green P Parking. Final table has the following columns: Parking ID, Parking Rate, Address, Infraction Description and Set Fine Amount. Link to code can be found here.
METHODS
Using the ETL processes, the following tasks were done:
Extract:
- Extracted 2015 Issued Parking Tickets Data from Toronto Open Data Catalogue.
Source here
Link to three .csv files here
- Extracted 2015 Green P Parking Data from The Toronto Parking Authority Open Data Catalogue
Source here
Link to Json file here
Transform:
- Issued Parking Tickets Data
- Used Python Pandas library to load and read the three .csv files.
- Used pd.concat function to combine the three DataFrame results.
- Stored addresses by selecting location 2 column and putting it in a list.
- Green P Parking Data
- Used Python Json library to load and read json file.
- Used a for loop to collect parking id, address and rate data and stored information to corresponding lists.
- Results were saved as DataFrame.
- Used Python pandas library to convert lists to DataFrame.
- Stored addresses by selecting address column and putting it in a list.
- Addresses of Issued Parking Tickets must be transformed to be identical to addresses of Green P Parking so that it can be merged.
- Converted address list to upper case.
- Removed dots from address.
- Used a for loop and if, elif, else statements to change the following:
- east to E
- west to W
- street to ST
- blvd to BLVD
- avenue to AVE
- road to RD
- dr to DR
- circle to CRCL
- lane to LANE
- drive to DRIVE
- Stored cleaned up data to a list called streets.
- Verified if the counts of common addresses on both datasets match.
- Merged two DataFrames using the clean address column.
Load:
- Created SQL connection.
- Exported to MySQL. Since the final output is a DataFrame, we decided to load the data into a relational database.
- Final table to be used in the production database has the following columns: Parking ID, Parking Rate, Address, Infraction Description and Set Fine Amount. Reason why these columns were selected is to determine possible relationship between parking rate vs infractions, and parking rate vs set fine amount. Other analysis that can be done would be: Which location has the highest infraction? What infraction is the most common? Does high parking rate causing infractions? Does high fine prevent infraction?