Performing Spatial Queries#

Introduction#

Spatial queries allows you to select features in a layer by their spatial relationships with features from another layer. GeoPandas provides sjoin() and sjoin_nearest() functions that can be used to perform a spatial query.

Overview of the Task#

We will be working with 2 data layers for the city of Melbourne, Australia. Given the data layers for the pubs and bars in the city and the locations of all metro stations, we want to find out which bars and pubs are within 500 meters of a metro station.

Input Layers:

  • metro_stations_accessbility.zip: A shapefile of metro stations in Melbourne

  • Bars_and_pubs__with_patron_capacity.csv: CSV file with location of bars and pubs in Melbournse

Output:

  • spatial_query.gpkg: A GeoPackage containing all bars and pubs within 500 meters of a metro station

Data Credit

  • 2019 The City of Melbourne Open Data Portal. Data provided by Metro Trains Melbourne and Census of Land Use and Employment (CLUE)

Watch Video Walkthrough https://github.com/spatialthoughts/geopython-tutorials/blob/main/images/yt_logo.png?raw=1

Setup and Data Download#

The following blocks of code will install the required packages and download the datasets to your Colab environment.

import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'

download(data_url + 'Bars_and_pubs__with_patron_capacity.csv')
download(data_url + 'metro_stations_accessbility.zip')
Downloaded data/Bars_and_pubs__with_patron_capacity.csv
Downloaded data/metro_stations_accessbility.zip

Procedure#

Read the Bars_and_pubs__with_patron_capacity.csv file and convert it to a GeoDataFrame.

csv_file = 'Bars_and_pubs__with_patron_capacity.csv'
csv_path = os.path.join(data_folder, csv_file)
barspubs_df = pd.read_csv(csv_path) 
barspubs_df.iloc[:, :5]
Census year Block ID Property ID Base property ID Street address
0 2002 247 106238 106238 192-202 Lygon Street
1 2002 252 106244 106244 160-162 Lygon Street
2 2006 16 104018 104018 172-192 Flinders Street
3 2002 214 106186 106186 414-422 Lygon Street
4 2008 76 589841 105749 221 Little Lonsdale Street
... ... ... ... ... ...
3333 2017 85 105746 105746 183-265 La Trobe Street
3334 2017 74 104662 104662 106-112 Hardware Street
3335 2017 268 108154 108153 49 Rathdowne Street
3336 2017 104 104085 104085 167-175 Franklin Street
3337 2017 33 100730 100730 12-16 Bank Place

3338 rows × 5 columns

You will note that the data contains multiple features for each establishment from different years. Let’s sort by Property ID and check.

barspubs_df = barspubs_df.sort_values('Property ID')
barspubs_df.iloc[:, :5]
Census year Block ID Property ID Base property ID Street address
3129 2017 103 100160 100160 196-200 A'Beckett Street
709 2011 103 100160 100160 196-200 A'Beckett Street
2553 2015 103 100160 100160 196-200 A'Beckett Street
247 2008 103 100160 100160 196-200 A'Beckett Street
2201 2010 103 100160 100160 196-200 A'Beckett Street
... ... ... ... ... ...
2579 2015 1105 628712 628712 717-731 Collins Street
2832 2016 66 635138 635138 13 Heffernan Lane
3069 2017 66 635138 635138 13 Heffernan Lane
3058 2016 270 664626 104468 230 Grattan Street
3228 2017 270 664626 104468 230 Grattan Street

3338 rows × 5 columns

We are interested only in the location of the establishment. So we can de-duplicate the dataframe and keep only 1 record per unique Property ID.

barspubs_df = barspubs_df.drop_duplicates(subset=['Property ID'], keep='first')
barspubs_df.iloc[:, :5]
Census year Block ID Property ID Base property ID Street address
3129 2017 103 100160 100160 196-200 A'Beckett Street
1087 2011 409 100441 100441 118-126 Ireland Street
410 2004 315 100514 100514 204-206 Arden Street
1501 2009 33 100727 100727 5-9 Bank Place
1159 2013 33 100730 100730 12-16 Bank Place
... ... ... ... ... ...
1601 2009 2391 616966 616966 2 Boundary Road
1182 2013 1110 620312 593737 23-37 Star Crescent
3028 2016 1105 628712 628712 717-731 Collins Street
2832 2016 66 635138 635138 13 Heffernan Lane
3058 2016 270 664626 104468 230 Grattan Street

308 rows × 5 columns

Now that we are done with data cleaning, let’s turn this dataframe in to a spatial layer. The location of the establishment is defined using the x coordinate and y coordinate columns.

barspubs_df.iloc[:, -5:]
Trading name Number of patrons x coordinate y coordinate Location
3129 Nomads Industry Backpackers 200 144.957445 -37.809985 (-37.80998494, 144.9574447)
1087 Railway Hotel 241 144.942118 -37.806054 (-37.80605366, 144.9421177)
410 North Melbourne Football Club Social Club 210 144.941312 -37.799055 (-37.79905531, 144.9413118)
1501 Mitre Tavern 300 144.960311 -37.816805 (-37.81680457, 144.9603112)
1159 Melbourne Savage Club 90 144.960577 -37.816545 (-37.81654467, 144.9605765)
... ... ... ... ... ...
1601 Vodka Locka & Red Leaf Restaurant 40 144.939233 -37.795318 (-37.79531826, 144.9392333)
1182 Harbour Town Hotel 498 144.937830 -37.813150 (-37.81315031, 144.9378299)
3028 Bar Nicional 75 144.950093 -37.820687 (-37.82068699, 144.9500934)
2832 Union Electric 72 144.966591 -37.811802 (-37.81180178, 144.9665913)
3058 HJC Bar 200 144.961247 -37.797398 (-37.79739836, 144.9612468)

308 rows × 5 columns

Create a geometry column and use it to define a new GeoDataFrame.

geometry = gpd.points_from_xy(barspubs_df['x coordinate'],barspubs_df['y coordinate'])
barspubs_gdf = gpd.GeoDataFrame(barspubs_df, crs='EPSG:4326', geometry=geometry)

Next, we will read the zipped shapefile of metro stations as a GeoDataFrame.

zip_file = 'metro_stations_accessbility.zip'
zip_path = os.path.join('zip://', data_folder, zip_file)
metrostations_gdf = gpd.read_file(zip_path)
metrostations_gdf
station pids he_loop lift geometry
0 Alamein No No No POINT (145.07956 -37.86884)
1 Albion Dot Matrix No No POINT (144.82471 -37.77766)
2 Alphington Dot Matrix No No POINT (145.03125 -37.7784)
3 Altona LCD No No POINT (144.8306 -37.86725)
4 Anstey No No No POINT (144.96056 -37.7619)
... ... ... ... ... ...
214 Williams Landing Dot Matrix Yes Yes POINT (144.74719 -37.8701)
215 Aircraft No No No POINT (144.76081 -37.8666)
216 Flemington Racecourse No No No POINT (144.9072 -37.78759)
217 Showgrounds No No No POINT (144.91498 -37.78355)
218 Southland LCD Yes Yes POINT (145.04867 -37.95818)

219 rows × 5 columns

Visualize the layers.

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
barspubs_gdf.plot(ax=ax, color='red', alpha=0.5, marker='+')
metrostations_gdf.plot(ax=ax, color='blue', alpha=1, marker='o')

minx, miny, maxx, maxy = barspubs_gdf.total_bounds
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

legend_elements = [
    plt.plot([],[], color='red', alpha=0.5, marker='+', label='Bars and Pubs', ls='')[0],
    plt.plot([],[], color='blue', alpha=1, marker='o', label='Metro Stations', ls='')[0]]
ax.legend(handles=legend_elements, loc='upper right')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
../_images/79b4be07cb8203d53beba2fbe0e94d14048d98170fe53e5db2ff05a903736fc0.png

For performing any analysis, we must use a Projected CRS. Let’s reproject to a local CRS for Melbourne. GDA 2020 / MGA zone 55 EPSG:7855.

metrostations_gdf_reprojected = metrostations_gdf.to_crs('EPSG:7855')
barspubs_gdf_reprojected = barspubs_gdf.to_crs('EPSG:7855')

We now buffer the metro stations to 500 meters. GeoPandas uses shapely for buffer operation. We specify buffer parameters to match QGIS’s default style.

radius = 500

buffer_geometry = metrostations_gdf_reprojected.buffer(
    radius, resolution=5, cap_style=1, join_style=1, mitre_limit=2)
metrostations_gdf_reprojected['geometry'] = buffer_geometry

Visualize the buffers.

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
barspubs_gdf_reprojected.plot(ax=ax, color='red', alpha=0.5, marker='+')
metrostations_gdf_reprojected.plot(ax=ax, facecolor='none', edgecolor='blue',  alpha=1, marker='o')

minx, miny, maxx, maxy = barspubs_gdf_reprojected.total_bounds
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

legend_elements = [
    plt.plot([],[], color='red', alpha=0.5, marker='+', label='Bars and Pubs', ls='')[0],
    plt.plot([],[], color='blue', alpha=1, marker='o', label='Metro Station Buffers', ls='')[0]]
ax.legend(handles=legend_elements, loc='upper right')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
../_images/f7981656e372a38df0df911f477200107a470d2a431dbf5350394a71bbd6942c.png

As we want to do a spatial join, we need to dissolve the resulting buffers. Otherwise we will get 1 feature for every intersecting buffer

metrostations_gdf_reprojected['dissolvefield'] = 1
dissolved_buffers = metrostations_gdf_reprojected.dissolve(by='dissolvefield')
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
barspubs_gdf_reprojected.plot(ax=ax, color='red', alpha=0.5, marker='+')
dissolved_buffers.plot(ax=ax, facecolor='none', edgecolor='blue',  alpha=1)

minx, miny, maxx, maxy = barspubs_gdf_reprojected.total_bounds
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

legend_elements = [
    plt.plot([],[], color='red', alpha=0.5, marker='+', label='Bars and Pubs', ls='')[0],
    plt.plot([],[], color='blue', alpha=1, marker='o', label='Metro Station Buffers (Dissolved)', ls='')[0]]
ax.legend(handles=legend_elements, loc='upper right')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
../_images/2544b2c422030260e8a1389c549303a04c5d38a8ba0d78c2c778353775af7080.png

Now do a spatial join to select all bars and pubs within the dissolved buffer region.

selected = barspubs_gdf_reprojected.sjoin(
    dissolved_buffers, how='inner', predicate='within')

For our final visualization, we show the selected establishments within the buffer zone.

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15,7)
barspubs_gdf_reprojected.plot(ax=ax, color='red', alpha=0.5, marker='+')
dissolved_buffers.plot(ax=ax, facecolor='none', edgecolor='blue',  alpha=1)
selected.plot(ax=ax, color='yellow', alpha=0.5, marker='+')

minx, miny, maxx, maxy = barspubs_gdf_reprojected.total_bounds
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

legend_elements = [
    plt.plot([],[], color='red', alpha=0.5, marker='+', label='Bars and Pubs', ls='')[0],
    plt.plot([],[], color='blue', alpha=1, marker='o', label='Metro Station Buffers', ls='')[0],
    plt.plot([],[], color='yellow', alpha=1, marker='+', label='Selected Bars and Pubs', ls='')[0]
]

ax.legend(handles=legend_elements, loc='upper right')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

plt.show()
../_images/af4d2d9fe5c961a78931ddb218dff3a71518cc25d20bc1c8169441c098514741.png

Finally, we save all our layers into a output geopackage.

output_file = 'spatial_query.gpkg'
output_path = os.path.join(output_folder, output_file)

barspubs_gdf_reprojected.to_file(filename=output_path, layer='bars_and_pubs')
metrostations_gdf_reprojected.to_file(filename=output_path, layer='metro_stations')
selected.to_file(filename=output_path, layer='selected_locations')

If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)