Why Heatmaps?

Heatmaps are useful for visualizing the density of data points and in data science it's often used in a grid form or combined with data about our data (metadata) such as correlations to make correlation plots.

But what else can we use heatmaps for?

Here's a real world example for a heatmap I created for a client.

Imagine that you're the owner of a gym and you've done a fantastic job of growing the business. You've opened up several gyms in your area and now have 15,000 members along with their address information. You're planning to open up a new gym but real estate prices are high in your area so you need to make sure there are plenty of customers where you will build if you want to maximize revenue.

Let's get started

This tutorial will show you how to create an interactive heatmap overlaying Google maps. The end result will be an HTML file that you can open and zoom in/out of or pan through to visualize your customer addresses. Our end result will look like this.

For this tutorial we will be using the gmplot library. More information and requirements can be found here.

To install use pip install gmplot

The dataset we will be using is the Addresses in the City of Los Angeles dataset from data.gov.

# Import the necessary libraries
import pandas as pd
import gmplot
# For improved table display in the notebook
from IPython.display import display

raw_data = pd.read_csv("Addresses_in_the_City_of_Los_Angeles.csv")

# Success! Display the first 5 rows of the dataset
0 1137623 222B149 1161 222B149-1161 14707 NaN W SUNNY DR NaN NaN 91342 34.30125 -118.45398 6.424560e+06 1.932322e+06 A V 7.0
1 1137668 192B141 592 192B141-592 16057 NaN W COLUMBUS LANE NaN NaN 91343 34.22328 -118.48269 6.415752e+06 1.903987e+06 A V 12.0
2 1137673 192B141 540 192B141-540 16068 NaN W COUSTEAU LANE NaN NaN 91343 34.22420 -118.48299 6.415666e+06 1.904324e+06 A V 12.0
3 1137736 133-5A211 269 133-5A211-269 1027 NaN W MIGNONETTE ST NaN NaN 90012 34.06015 -118.25233 6.485240e+06 1.844369e+06 U C 1.0
4 1137643 192B141 595 192B141-595 16058 NaN W COOK LANE NaN NaN 91343 34.22362 -118.48269 6.415754e+06 1.904113e+06 A V 12.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 998067 entries, 0 to 998066
Data columns (total 18 columns):
HSE_ID            998067 non-null int64
PIN               998067 non-null object
PIND              998067 non-null object
HSE_NBR           998067 non-null int64
HSE_FRAC_NBR      55445 non-null object
HSE_DIR_CD        997408 non-null object
STR_NM            998065 non-null object
STR_SFX_CD        979724 non-null object
STR_SFX_DIR_CD    2175 non-null object
UNIT_RANGE        19424 non-null object
ZIP_CD            998067 non-null int64
LAT               998067 non-null float64
LON               998067 non-null float64
X_COORD_NBR       998067 non-null float64
Y_COORD_NBR       998067 non-null float64
ASGN_STTS_IND     998067 non-null object
ENG_DIST          998000 non-null object
CNCL_DIST         996943 non-null float64
dtypes: float64(5), int64(3), object(10)
memory usage: 137.1+ MB


In this dataset the values we will be using are the LAT and LON columns which represent the lattitude and longitude for each of the residents in the dataset. We also see that there are almost 1 million records. We are going to reduce that size for our example to just the first 15,000 records.

# Let's limit the dataset to the first 15,000 records for this example
data = raw_data.head(n=15000)

# Store our latitude and longitude
latitudes = data["LAT"]
longitudes = data["LON"]
# Creating the location we would like to initialize the focus on. 
# Parameters: Lattitude, Longitude, Zoom
gmap = gmplot.GoogleMapPlotter(34.0522, -118.2437, 10)

# Overlay our datapoints onto the map
gmap.heatmap(latitudes, longitudes)
# Generate the heatmap into an HTML file

Click here to view my_heatmap.html