Capstone Project blog
Captsone final REPORT 2020 |
|
|
|
June 29 Authored
by: |
Capstone Project - The Battle of Neighborhoods (Week 2)
Report
Table of contents
|
|
“” |
|
Introduction: Business Problem
The aim of
this project is to find a safe and secure location for opening of commercial
establishments in Vancouver, Canada. Specifically, this report will be
targeted to stakeholders interested in opening any business place like Grocery Store in Vancouver City,
Canada. The first
task would be to choose
the safest borough by analysing crime data for opening a
grocery store and short
listing a neighbourhood, where grocery stores are not amongst
the most commom venues, and yet as
close to the city as possible. We will
make use of our data science tools to analyse data and focus on the safest
borough and explore its neighborhoods and the 10 most common venues in each
neighborhood so that the best neighborhood where grocery store is not amongst
the most common venue can be selected. Data
Based on
definition of our problem, factors that will influence our decission are:
We will be
using the geographical coordinates of Vancouver to plot neighbourhoods in a
borough that is safe and in the city's vicinity, and finally cluster our
neighborhoods and present our findings. Following
data sources will be needed to extract/generate the required information: ·
Part 1:
Using a real world data set from Kaggle containing the Vancouver Crimes from
2003 to 2019: A dataset consisting of the crime statistics of each
Neighbourhoof in Vancouver along with type of crime, recorded year, month and
hour. ·
Part 2:
Gathering additional information of the list of officially categorized
boroughs in Vancouver from Wikipedia.: Borough information will be used to map
the existing data where each neighbourhood can be assigned with the right
borough. ·
Part 3:
Creating a new consolidated dataset of the Neighborhoods, along with their
boroughs, crime data and the respective Neighbourhood's co-ordinates.: This data will be
fetched using OpenCage Geocoder to find the safest borough and explore the
neighbourhood by plotting it on maps using Folium and perform exploratory
data analysis. ·
Part 4:
Creating a new consolidated dataset of the Neighborhoods, boroughs, and the
most common venues and the respective Neighbourhood along with co-ordinates.: This data will be
fetched using Four Square API to explore the neighbourhood venues and to
apply machine learning algorithm to cluster the neighbourhoods and present
the findings by plotting it on maps using Folium. ·
Part 1:
Using a
real world data set from
Kaggle containing the Vancouver Crimes from
2003 to 2019
Vancouver
Crime Report
Properties of the Crime Report
Importing
all the necessary Libraries
Reading
from the Dataset
Due to
sheer amount of data(~ 600,000 rows), it was not possible to process all of
them and instead for this project we will be considering the recent crime
report of the 2018.
|
Changing the
name of columns to lowercase
Part 2: Gathering additional
information about the Neighborhood from Wikipedia
As part of
data set Borough which the neighborhood was part of was not categorized, so we
will create a dictionary of Neighbborhood and based on data in the
following Wikipedia page.
Methodology
Categorized
the methodologysection into two parts:
- Exploratory Data Analysis: Visualise the
crime repots in different Vancouver boroughs to idenity the safest borough
and normalise the neighborhoods of that borough. We will Use the resulting
data and find 10 most common venues in beach neighborhood.
- Modelling: To help
stakeholders choose the right neighborhood within a borough we will be clusterinvg
similar neighborhoods using K - means clustering which is a form of
unsupervised machine learning algorithm that clusters data based on
predefined cluster size. We will use K-Means clustering to address this
problem so as to group data based on existing venues which will help in
the decision making process.
Results and Discussion
The objective
of the business problem was to help stakeholders identify one of the safest
borough in Vancouver, and an appropriate neighborhood within the borough to set
up a commercial establishment especially a Grocery store. This has been
achieved by first making use of Vancouver crime data to identify a safe borugh
with considerable number of neighborhood for any business to be viable. After
selecting the borough it was imperative to choose the right neighborhosod where
grocery shops were not among venues in a close proximity to each other. We
achieved this by grouping the neighborhoods into clusters to assist the
stakeholders by providing them with relavent data about venues and safety of a
given neighborhood.
Conclusion
We have
explored the crime data to understand different types of crimes in all
neighborhoods of Vancouver and later categorized them into different boroughs,
this helped us group the neighborhoods into boroughs and choose the safest
borough first. Once we confirmed the borough the number of neighborhoods for
consideration also comes down, we further shortlist the neighborhoods based on
the common venues, to choose a neighborhood which best suits the business
problem.
Comments
Post a Comment