City Of Dreams— Mumbai

6 min readApr 18, 2021

Introduction

This article is based upon my final project for IBM’s Data Science Professional course.

Mumbai, the city of dreams, is also the financial capital of India. Anyone who has tried looking for houses in the city knows how difficult and tormenting the task is. This is also true for people looking to pursue different entrepreneurial openings in the city’s already saturated markets. Our aim for this article is to analyze and cluster the different neighborhoods of Mumbai city based upon a variety of factors.

This would be helpful for anyone and everyone looking for compatible neighborhoods while finding new homes

Data Sources

I have collected the data from www.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai .This website contains a list of different neighborhoods in Mumbai, along with the Latitude and longitude position of that Area.

Other sources of data include the Foursquare API which is used to find out the most common venues in each neighborhood and the HERE API which helps us discover the services and facilities available in the locality.

Methodology

Web scrapping done using the pandas and Using a geocoding library (Nominatim), I plotted the dataset on the map

Now that I have found out all the neighborhood that will be analyzed we plot them on a map of Mumbai city using their coordinates. Python’s Folium library is used for this purpose.

The next task is to find out the most common venues around each neighborhood. For this, I used the Foursquare API. Using this API we find out the different venues and their categories around a 1 kilometer radius of each neighborhood. (The data obtained by this API is OLD and I tried to replace with google but I couldn’t figure out the API deployment)

After one-hot encoding the venue categories and finding out their mean values for each neighborhood, I can easily understand the venue categories which are frequently found in the neighborhood. The dataset at this point looks as follows:

Sorting the values of all the category columns in descending order, for each neighborhood can easily gives the top ten most common venue categories in each locality.

Top 10 common Venue in Each Neighborhood

I now used the HERE API’s neat free-text query feature to look for facilities of the choice in each neighborhood within a 1 kilometre radius. The search was limited to a 100 results per facility.The amenities that I considered include Hospitals, Schools, Emergency services, Leisure, Banks and Cinema. This data combined with the sorted venues dataset and the average rent data gives us a dataset which would be extremely helpful for analyzing our final results i.e. when we obtain our cluster labels.

Data is being saved as ‘Mumbai Neighborhood Data.csv’

But for clustering purposes, we use the venues dataset, combined with the amenities data found using the HERE API. The amenities data is merged only after it has been standardized.

We drop the ‘Neighborhood’ column from the dataset and finish thedata preparation procedure.

For clustering,we use the K-means clustering algorithm, which need us to define the number of clusters (k), we want our data to be divided into. To find out the ideal value of k, we use the Elbow Method. This involves plotting the values of distortion scores against different values of k. The elbow of the curve gives us the ideal value of k. The Yellowbrick Python library is used for easily visualizing this.

As seen in the graph, the ideal value for k is found to be 5. We therefore fit our data using K-means clustering with k=5. The resultant cluster labels are added to our final exploration dataset.

We now have completed clustering the neighborhoods of Mumbai. It now time to preview the results.

Results

Plotting the results from our clustering procedure on a map of the city we obtain the following result.

It is now possible to find out the distinguishing properties of each neighborhoods and the various characteristics which make them similar or dissimilar to one another.

For each cluster, we check the constituent neighborhoods, plot it on the map and also plot a series of bar graphs representing the top 3 venues in each of the of most common venues column of the final exploration dataset. This would not only help us find the resemblance in prevailing businesses in the cluster but would also help enterprises as well as tenants understand their locality. The observations for the first cluster is below.

After exploring each cluster individually, we can also plot the amenities offered by the various clusters against each other to ensure home-hunters personalized choices while making their decision.

This is done by comparing the mean value of each amenity in each cluster.

Conclusions

All the clusters are completely saturated with shopping facilities except cluster 5. Therefore, it may seem a viable location for up and coming shopping complexes.

A large number of schools and hospitals are located in specific clusters, while there seems to be a lack of them in other neighborhoods.

Cinema Lovers should check the properties in Cluster 1

Indian Restaurants are spread all across Mumbai and are the most common places across all the clusters.

In the end, it all boils down to an individual’s preferences and choices. Using the above investigations one can interpret the results to fit their needs and find the best possible solutions for their situation.

The complete code and analysis is present here

References

www.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai
FoursquareAPI
Heremaps API
Mapping Your Favorite Coffee Shop using Google Places API and Folium
A tale of Two Cities