Skip to main content
Ctrl+K
Geospatial Python Tutorials - Home
  • Geospatial Python Tutorials

GeoPandas

  • Bulk Geocoding Addresses
  • Performing Spatial Queries
  • Extract a Shapefile Subset
  • Performing Fuzzy Table Joins

XArray for Raster Data Processing

  • Basic Raster Styling and Analysis
  • Raster Mosaicing and Clipping
  • Sampling Raster Data with XArray
  • Zonal Statistics with XArray
  • Extracting a Time Series
  • Create a Raster from Array

XArray for Climate Data Processing

  • Aggregating Precipitation Time Series
  • Calculating Climate Anomaly
  • Computing Climate Trends
  • Transform Longitude Values

Dask for Parallel Processing

  • Creating Spatial Data
  • Creating a Median Composite with Dask

XEE (XArray + Google Earth Engine)

  • Exporting an ImageCollection to NetCDF
  • Downloading Images with XEE

Segment Geospatial

  • Extracting Farm Boundaries using Segment Geospatial
  • Detecting Mine Perimeter with Segment Geospatial

Web APIs

  • Locating Nearest Facility with Origin-Destination Matrix
  • Natural Language Processing using OpenAI API
  • Weather Forecast using OpenMeteo API
  • Colab logo Colab
  • Repository
  • Open issue
  • .ipynb

Computing Climate Trends

Contents

  • Introduction
  • Overview of the Task
  • Setup and Data Download
  • Selecting a Region of Interest
  • Local Compute Cluster
  • Calculate Yearly Time-Series
  • Compute the Trend

Computing Climate Trends#

Introduction#

We have many high-quality global climate datasets with a lon historic record of various climatic variables. Using cloud-hosted datasets and XArray, we can compute pixel-wise long-term trends for the climate variable of interest. This analysis helps us identify hotspots experienceing extreme climate change.

Overview of the Task#

We will be using the TerraClimate gridded dataset of monthly climate and climatic water balance at high spatial resolution (~4km grid size) with a long time-series (1958-current). We will extract a time-series of monthly maximum temperatures and compute per-pixel linear trend. The data is processed in distributed manner using Dask and the results are saved as a GeoTIFF file.

Input Layers:

  • agg_terraclimate_tmax_1958_CurrentYear_GLOBE.nc: Cloud-hosted NetCDF file for each climate variable.

Output:

  • tmax_slope.tif: A GeoTIFF file with slope of trend at each pixel for the chosen region.

  • tmax_yearly_subset.nc: A NetCDF file of yearly aggregated subset for the chosen region.

Data Credit:

  • Abatzoglou, J.T., S.Z. Dobrowski, S.A. Parks, K.C. Hegewisch, 2018, Terraclimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958-2015, Scientific Data 5:170191, doi:10.1038/sdata.2017.191

  • Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866. https://doi.org/10.1371/journal.pone.0231866

Setup and Data Download#

%%capture
if 'google.colab' in str(get_ipython()):
    !pip install geopandas rioxarray cartopy dask[distributed] netCDF4
import cartopy
import cartopy.crs as ccrs
import os
import geopandas as gpd
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
import dask
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

Selecting a Region of Interest#

GeoBondaries is an open databse of political administrative boundaries at different administrative levels. We download the Admin1 Boundaries.

def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print(f'Downloaded {filename}')
    else:
        print(f'File {filename} exists.')

admin_boundaries = 'geoBoundariesCGAZ_ADM1.gpkg'

geoboundaries_url = 'https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/'
download(geoboundaries_url + admin_boundaries)
admin_boundaries_filepath = os.path.join(data_folder, admin_boundaries)
admin_gdf = gpd.read_file(admin_boundaries_filepath, encoding='utf-8')

Select the country by its 3-letter ISO code. Here we use the code GHA for Ghana.

country = 'GHA'
country_gdf = admin_gdf[admin_gdf['shapeGroup'] == country]
country_gdf

Select a region. We select the Greater Accra Region.

region_gdf = country_gdf[country_gdf['shapeName'] == 'Greater Accra Region']
region_gdf
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(5,5)
region_gdf.plot(
    ax=ax,
    edgecolor='#969696',
    facecolor='none',
    alpha=0.5)
ax.set_axis_off()
plt.show()
../_images/5181fa3a16c25c682e196d1bbcf34361406bc98b102ce56bb4d9876c9c6bde3e.png

Get the bounding box to filter the XArray dataset.

bounds = region_gdf.total_bounds
lon_min, lat_min, lon_max, lat_max = bounds

Local Compute Cluster#

Setup a Local Dask Cluster. This will enable parallel computing on your computer using available CPU cores.

from dask.distributed import Client, progress
client = Client()  # set up local cluster on the machine
client

Calculate Yearly Time-Series#

TerraClimate large dataset that is hosted on a THREDDS Data Server (TDS) and served using the OPeNDAP (Open Data Access Protocol) protocol. XArray has built-in support to efficiently read and process OPeNDAP data where we can stream and process only the required pixels without downloading entire dataset.

We want to analyze the long-term yearly trend of monthly maximum temperatures. Since the input data comes as monthly values, we first aggregate the data to yearly time steps.

terraclimate_url = 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/'
variable = 'tmax'
filename = f'agg_terraclimate_{variable}_1958_CurrentYear_GLOBE.nc'

remote_file_path = os.path.join(terraclimate_url, filename)
ds = xr.open_dataset(
    remote_file_path,
    chunks='auto',
    engine='netcdf4',
)
ds
# Select the variable
da = ds.tmax

# Select data within the bounding box

# Make sure the data is sorted for slice() to work correctly
da = da.sortby([da.lon, da.lat])

# Add 0.1 degree buffer
da_subset = da.sel(
    lon=slice(lon_min - 0.1, lon_max + 0.1),
    lat=slice(lat_min - 0.1, lat_max + 0.1)
)
da_subset

Aggregate the data to yearly means.

da_yearly = da_subset.groupby('time.year').mean('time')
da_yearly

Process and load the data into memory. This may take a few minutes. Check the Dask dashboard to see the progress

Note: The dask dashboard is not available on Colab.

%%time
da_yearly = da_yearly.compute()

Plot a time-series at a single location to see the trend.

# Choose a location within the region for plots
location = (5.65, -0.20) # Accra

time_series = da_yearly.interp(lat=location[0], lon=location[1])

fig, ax = plt.subplots(1, 1)

fig.set_size_inches(15, 7)
time_series.plot.line(ax=ax, x='year', marker='o', linestyle='-', linewidth=1)
ax.set_title('Annual Mean Monthly Maximum Temperature')
plt.show()
../_images/b3f8f198719b67d5a7aeedb9191350bb595582e74dcd8fcb3a961121d2da81ae.png

Compute the Trend#

We can fit a polynomial at each pixel of the DataArray using the xarray.DataArray.polyfit method. Here we fit a linear trendline. As out input dataset has yearly time steps, the slope of the fitted trendline will be degrees per year. To make the trend more interpretable, we multiply it by 100 to get the results in the unit degrees per century.

trend = da_yearly.polyfit('year', 1) # fit polynomial of degree 1
slope = trend.polyfit_coefficients[0,...] * 100 # per year -> per century

Visualize the slope of trendline.

projection = ccrs.PlateCarree()

cbar_kwargs = {
    'orientation':'horizontal',
    'fraction': 0.025,
    'pad': 0.05,
    'extend':'neither',
    'label': '°C per century'
}

fig, ax = plt.subplots(1, 1, subplot_kw={'projection': projection})
fig.set_size_inches(8, 8)
slope.plot.imshow(
    ax=ax,
    cmap='YlOrBr',
    transform=ccrs.PlateCarree(),
    add_labels=False,
    cbar_kwargs=cbar_kwargs)

region_gdf.plot(
    ax=ax,
    edgecolor='#000000',
    facecolor='none',
    alpha=0.5)

ax.set_extent((lon_min,lon_max,lat_min,lat_max), crs = ccrs.PlateCarree())

plt.title(f'Slope of Monthly Maximum Temperature Trend', fontsize = 12)
plt.show()
../_images/8a1bd552de3ee7e3d29b3005d0349dabf711eead0d16731cc2a2a3d4a4be7cff.png

Save the resulting slope raster as a GeoTIFF.

# Assign a CRS to the DataArray
slope.rio.write_crs('EPSG:4326', inplace=True)
# Clip to the GeoDataFrame coundary
clipped = slope.rio.clip(region_gdf.geometry.values)
# Write the file
output_slope_file = f'{variable}_slope.tif'
output_slope_path = os.path.join(output_folder, output_slope_file)
if not os.path.exists(output_slope_path):
    clipped.rio.to_raster(output_slope_path)
    print('Saved the file at ', output_slope_path)

We can also save the yearly aggregated subset as a NetCDF for other downstream analysis.

local_subset_file = f'{variable}_yearly_subset.nc'
local_subset_filepath = os.path.join(output_folder, local_subset_file)
if not os.path.exists(local_subset_filepath):
    da_yearly.to_netcdf(path=local_subset_filepath)
    print('Saved the file at ', local_subset_filepath)

If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)

previous

Calculating Climate Anomaly

next

Transform Longitude Values

Contents
  • Introduction
  • Overview of the Task
  • Setup and Data Download
  • Selecting a Region of Interest
  • Local Compute Cluster
  • Calculate Yearly Time-Series
  • Compute the Trend

By Ujaval Gandhi

This work is licensed under a CC BY 4.0 license.