{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "VCi5DKHB7jag"
},
"source": [
"# Sampling Raster Data with XArray\n",
"\n",
"## Introduction\n",
"\n",
"Many scientific and environmental datasets come as gridded rasters. If you want to know the value of a variable at a single or multiple locations, you can use sampling techniques to extract the values. XArray has powerful indexing methods that allow us to extract values at multiple coordinates easily.\n",
"\n",
"## Overview of the Task\n",
"\n",
"In this tutorial, we will take a raster file of temperature anomalies and a CSV file with locations of all urban areas in the US. We will use Pandas and Xarray to find the temperature anomaly in all the urban areas and find the top 10 areas experiencing the highest anomaly. We will also use GeoPandas to save the results as a vector layer.\n",
"\n",
"**Input Layers**:\n",
"* `t.anom.202207.tif`: Raster grid of temprature anomaly for the month of July 2022 in the US.\n",
"* `2021_Gaz_ua_national.zip`: A CSV file with point locations representing urban areas in the US.\n",
"\n",
"**Output Layers**:\n",
"* `tanomaly.gpkg` : A GeoPackage containing a vector layer of point locations with anomaly values sampled from the raster.\n",
"\n",
"\n",
"**Data Credit**:\n",
"* [US July 2021 Temperature Anomaly](https://www.cpc.ncep.noaa.gov/products/GIS/GIS_DATA/). NOAA Climate Prediction Center. Retrieved 2022-09\n",
"* [US Gazetteer files: 2021](https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.2021.html) United States Census Bureau. Retrieved 2022-09.\n",
"\n",
"\n",
"**Watch Video Walkthrough** "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YauILMlxtHa_"
},
"source": [
"## Setup and Data Download\n",
"\n",
"The following blocks of code will install the required packages and download the datasets to your Colab environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "aQpTzbC0l73O"
},
"outputs": [],
"source": [
"%%capture\n",
"if 'google.colab' in str(get_ipython()):\n",
" !pip install rioxarray"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ghkgm9n_ZX-u"
},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd\n",
"import geopandas as gpd\n",
"import rioxarray as rxr\n",
"import rasterio as rio\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sSvffB_ToEE9"
},
"outputs": [],
"source": [
"data_folder = 'data'\n",
"output_folder = 'output'\n",
"\n",
"if not os.path.exists(data_folder):\n",
" os.mkdir(data_folder)\n",
"if not os.path.exists(output_folder):\n",
" os.mkdir(output_folder)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "t_fRwiWBqyM9",
"outputId": "b9b99f89-523a-4697-af20-ecf08297f3a7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloaded data/t.anom.202207.tif\n",
"Downloaded data/2021_Gaz_ua_national.zip\n"
]
}
],
"source": [
"def download(url):\n",
" filename = os.path.join(data_folder, os.path.basename(url))\n",
" if not os.path.exists(filename):\n",
" from urllib.request import urlretrieve\n",
" local, _ = urlretrieve(url, filename)\n",
" print('Downloaded ' + local)\n",
"\n",
"raster_file = 't.anom.202207.tif'\n",
"csv_file = '2021_Gaz_ua_national.zip'\n",
"\n",
"data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'\n",
"\n",
"download(data_url + raster_file)\n",
"download(data_url + csv_file)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2xOPeZxZw9RN"
},
"source": [
"## Data Pre-Processing"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pq7nppDh-BVg"
},
"source": [
"First we read the raster file using `rioxarray`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 351
},
"id": "J_brtBxhxFzg",
"outputId": "c0ed025f-7d0c-44ec-8be3-5e6aecde987b"
},
"outputs": [
{
"data": {
"text/html": [
"
<xarray.DataArray (band: 1, y: 32, x: 64)> Size: 8kB\n", "[2048 values with dtype=float32]\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 512B -130.0 -129.0 -128.0 ... -69.0 -68.0 -67.0\n", " * y (y) float64 256B 52.0 51.0 50.0 49.0 ... 24.0 23.0 22.0 21.0\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " TIFFTAG_SOFTWARE: GrADS version 2.0.2 \n", " TIFFTAG_XRESOLUTION: 0.171875\n", " TIFFTAG_YRESOLUTION: 0.265625\n", " TIFFTAG_RESOLUTIONUNIT: 2 (pixels/inch)\n", " AREA_OR_POINT: Area
<xarray.DataArray (band: 1, y: 32, x: 64)> Size: 8kB\n", "array([[[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 512B -130.0 -129.0 -128.0 ... -69.0 -68.0 -67.0\n", " * y (y) float64 256B 52.0 51.0 50.0 49.0 ... 24.0 23.0 22.0 21.0\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " TIFFTAG_SOFTWARE: GrADS version 2.0.2 \n", " TIFFTAG_XRESOLUTION: 0.171875\n", " TIFFTAG_YRESOLUTION: 0.265625\n", " TIFFTAG_RESOLUTIONUNIT: 2 (pixels/inch)\n", " AREA_OR_POINT: Area
<xarray.DataArray (y: 32, x: 64)> Size: 8kB\n", "array([[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)\n", "Coordinates:\n", " band int64 8B 1\n", " * x (x) float64 512B -130.0 -129.0 -128.0 ... -69.0 -68.0 -67.0\n", " * y (y) float64 256B 52.0 51.0 50.0 49.0 ... 24.0 23.0 22.0 21.0\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " TIFFTAG_SOFTWARE: GrADS version 2.0.2 \n", " TIFFTAG_XRESOLUTION: 0.171875\n", " TIFFTAG_YRESOLUTION: 0.265625\n", " TIFFTAG_RESOLUTIONUNIT: 2 (pixels/inch)\n", " AREA_OR_POINT: Area
\n", " | GEOID | \n", "NAME | \n", "UATYPE | \n", "ALAND | \n", "AWATER | \n", "ALAND_SQMI | \n", "AWATER_SQMI | \n", "INTPTLAT | \n", "INTPTLONG | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "37 | \n", "Abbeville, LA Urban Cluster | \n", "C | \n", "29189598 | \n", "298416 | \n", "11.270 | \n", "0.115 | \n", "29.967156 | \n", "-92.095966 | \n", "
1 | \n", "64 | \n", "Abbeville, SC Urban Cluster | \n", "C | \n", "11271136 | \n", "19786 | \n", "4.352 | \n", "0.008 | \n", "34.179273 | \n", "-82.379776 | \n", "
2 | \n", "91 | \n", "Abbotsford, WI Urban Cluster | \n", "C | \n", "5426584 | \n", "13221 | \n", "2.095 | \n", "0.005 | \n", "44.948612 | \n", "-90.315875 | \n", "
3 | \n", "118 | \n", "Aberdeen, MS Urban Cluster | \n", "C | \n", "7416338 | \n", "52820 | \n", "2.863 | \n", "0.020 | \n", "33.824742 | \n", "-88.554591 | \n", "
4 | \n", "145 | \n", "Aberdeen, SD Urban Cluster | \n", "C | \n", "33032902 | \n", "120864 | \n", "12.754 | \n", "0.047 | \n", "45.463186 | \n", "-98.471033 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3596 | \n", "98101 | \n", "Zapata--Medina, TX Urban Cluster | \n", "C | \n", "13451264 | \n", "0 | \n", "5.194 | \n", "0.000 | \n", "26.889081 | \n", "-99.266192 | \n", "
3597 | \n", "98182 | \n", "Zephyrhills, FL Urbanized Area | \n", "U | \n", "112593840 | \n", "1615599 | \n", "43.473 | \n", "0.624 | \n", "28.285373 | \n", "-82.198969 | \n", "
3598 | \n", "98209 | \n", "Zimmerman, MN Urban Cluster | \n", "C | \n", "24456008 | \n", "2495147 | \n", "9.443 | \n", "0.963 | \n", "45.455850 | \n", "-93.606705 | \n", "
3599 | \n", "98236 | \n", "Zumbrota, MN Urban Cluster | \n", "C | \n", "4829469 | \n", "0 | \n", "1.865 | \n", "0.000 | \n", "44.292793 | \n", "-92.670931 | \n", "
3600 | \n", "98263 | \n", "Zuni Pueblo, NM Urban Cluster | \n", "C | \n", "11876786 | \n", "0 | \n", "4.586 | \n", "0.000 | \n", "35.071066 | \n", "-108.823725 | \n", "
3601 rows × 9 columns
\n", "\n", " | AWATER_SQMI | \n", "INTPTLAT | \n", "INTPTLONG | \n", "geometry | \n", "tanomaly | \n", "
---|---|---|---|---|---|
0 | \n", "0.115 | \n", "29.967156 | \n", "-92.095966 | \n", "POINT (-92.09597 29.96716) | \n", "-0.272462 | \n", "
1 | \n", "0.008 | \n", "34.179273 | \n", "-82.379776 | \n", "POINT (-82.37978 34.17927) | \n", "-0.704404 | \n", "
2 | \n", "0.005 | \n", "44.948612 | \n", "-90.315875 | \n", "POINT (-90.31588 44.94861) | \n", "0.130515 | \n", "
3 | \n", "0.020 | \n", "33.824742 | \n", "-88.554591 | \n", "POINT (-88.55459 33.82474) | \n", "0.018811 | \n", "
4 | \n", "0.047 | \n", "45.463186 | \n", "-98.471033 | \n", "POINT (-98.47103 45.46319) | \n", "1.311932 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3596 | \n", "0.000 | \n", "26.889081 | \n", "-99.266192 | \n", "POINT (-99.26619 26.88908) | \n", "-1.866690 | \n", "
3597 | \n", "0.624 | \n", "28.285373 | \n", "-82.198969 | \n", "POINT (-82.19897 28.28537) | \n", "-0.064647 | \n", "
3598 | \n", "0.963 | \n", "45.455850 | \n", "-93.606705 | \n", "POINT (-93.6067 45.45585) | \n", "0.607703 | \n", "
3599 | \n", "0.000 | \n", "44.292793 | \n", "-92.670931 | \n", "POINT (-92.67093 44.29279) | \n", "0.007632 | \n", "
3600 | \n", "0.000 | \n", "35.071066 | \n", "-108.823725 | \n", "POINT (-108.82372 35.07107) | \n", "-0.732679 | \n", "
3601 rows × 5 columns
\n", "\n", " | NAME | \n", "tanomaly | \n", "
---|---|---|
0 | \n", "Osburn, ID Urban Cluster | \n", "5.214408 | \n", "
1 | \n", "Kellogg, ID Urban Cluster | \n", "4.543093 | \n", "
2 | \n", "Libby, MT Urban Cluster | \n", "4.543093 | \n", "
3 | \n", "Bonners Ferry, ID Urban Cluster | \n", "4.487802 | \n", "
4 | \n", "Orofino, ID Urban Cluster | \n", "4.411723 | \n", "
5 | \n", "Grangeville, ID Urban Cluster | \n", "4.411723 | \n", "
6 | \n", "La Grande, OR Urban Cluster | \n", "4.300000 | \n", "
7 | \n", "Baker City, OR Urban Cluster | \n", "4.300000 | \n", "
8 | \n", "Rathdrum, ID Urban Cluster | \n", "4.138633 | \n", "
9 | \n", "Deer Park, WA Urban Cluster | \n", "4.138633 | \n", "