Downloading Images with XEE#
Introduction#
When using data hosted in Google Earth Engine in Python-based workflows - one of the main pain points is the extraction of large datasets. This typically involves a long running Export process which makes subsequent processing of the results challenging. With the introduction of new ee.data.computePixels() API from GEE, we can now directly fetch image data as an array. The XEE package makes this process seamless and extracts the results as an XArray Dataset. We can then use rioxarray to clip and save the extracted XArray DataSet to a GeoTIFF file directly without going through an Export task.
For advanced XEE workflows with Dask, see the Processing Time-Series with XEE and XArray tutorial.
Overview of the Task#
LandScan Population Data Global 1km is one of the highest quality global gridded population datasets and is made easily accessible via the Awesome GEE Community Catalog. In this tutorial, we will download the 2021 population grid for Kenya as a GeoTIFF file directly from the cloud-hosted dataset.
Note: You must have a Google Earth Engine account to complete this section. If you do not have one, follow our guide to sign up.
Input Layers:
ne_10m_admin_0_countries_ind.zip: A shapefile of country boundaries
Output Layers:
population.tif: A GeoTIFF file of gridded population
Data Credit:
Sims, K., Reith, A., Bright, E., McKee, J., & Rose, A. (2022). LandScan Global 2021. Oak Ridge National Laboratory. https://doi.org/10. 48690/1527702
Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com.
Running the Notebook:
The preferred way to run this notebook is on Google Colab.
Setup and Data Download#
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!pip install xee rioxarray
import datetime
import ee
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import rioxarray as rxr
import xarray as xr
from xee import helpers
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
shapefile = 'ne_10m_admin_0_countries_ind.zip'
data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'
download('{}/{}'.format(data_url,shapefile))
Downloaded data/ne_10m_admin_0_countries_ind.zip
Initialize EE with the High-Volume EndPoint which is recommended to be used with XEE for workflows that do not use a lot of server side processing and are primarily for extracting data from stored collections.. Replace the value of the cloud_project variable with your own project id that is linked with GEE.
cloud_project = 'spatialthoughts' # replace with your project id
try:
ee.Initialize(
project=cloud_project,
opt_url='https://earthengine-highvolume.googleapis.com')
except:
ee.Authenticate()
ee.Initialize(
project=cloud_project,
opt_url='https://earthengine-highvolume.googleapis.com')
Data Preparation#
We read the Natural Earth administrative regions shapefile and select a country.
shapefile_path = os.path.join(data_folder, shapefile)
boundaries_gdf = gpd.read_file(shapefile_path)
Select the country of your choice using the 3-letter ISO code for your country. Here we use KEN for Kenya.
country = boundaries_gdf[boundaries_gdf['ADM0_A3'] == 'KEN']
geometry = country.geometry.union_all()
geometry
Select the year and dataset.
year = 2023
start_date = ee.Date.fromYMD(year, 1, 1)
end_date = ee.Date.fromYMD(year+1, 1, 1)
landscan = ee.ImageCollection('projects/sat-io/open-datasets/ORNL/LANDSCAN_GLOBAL')
Apply the filters and select the image to download using the Earth Engine Python API syntax.
filtered = landscan.filter(ee.Filter.date(start_date, end_date))
image = filtered.first()
print(image.get('system:index').getInfo())
landscan-global-2023
We now open the image. XEE works only with ImageCollections so we convert the image to an ImageCollection.
image_col = ee.ImageCollection([image])
XEE requires explicit grid parameters. We extract these using the helper function fit_geometry.
output_crs = 'EPSG:3857' # Web Mercator
output_scale = 1000 # 1km
grid_params = helpers.fit_geometry(
geometry=geometry,
geometry_crs='EPSG:4326', # Input geometry CRS
grid_crs=output_crs,
grid_scale=(output_scale, -output_scale)
)
grid_params
{'crs': 'EPSG:3857',
'crs_transform': (1000.0, 0.0, 3772000.0, 0.0, -1000.0, 561000.0),
'shape_2d': (891, 1083)}
Now we can use XEE to load the ImageCollection as an XArray Dataset.
ds = xr.open_dataset(
image_col,
engine='ee',
**grid_params
)
ds
<xarray.Dataset> Size: 4MB
Dimensions: (time: 1, y: 1083, x: 891)
Coordinates:
* time (time) datetime64[ns] 8B 2023-01-01
* y (y) float64 9kB 5.605e+05 5.595e+05 ... -5.205e+05 -5.215e+05
* x (x) float64 7kB 3.772e+06 3.774e+06 ... 4.662e+06 4.662e+06
Data variables:
b1 (time, y, x) float32 4MB ...The Dataset has only 1 time coordinate (the chosen year) and 1 variable (the band b1). Select it to get a DataArray.
da = ds.isel(time=0).b1
da
<xarray.DataArray 'b1' (y: 1083, x: 891)> Size: 4MB
[964953 values with dtype=float32]
Coordinates:
* y (y) float64 9kB 5.605e+05 5.595e+05 ... -5.205e+05 -5.215e+05
* x (x) float64 7kB 3.772e+06 3.774e+06 ... 4.662e+06 4.662e+06
time datetime64[ns] 8B 2023-01-01
Attributes:
id: b1
data_type: {'type': 'PixelType', 'precision': 'int', 'min': -2147483...
dimensions: [43200, 21600]
crs: EPSG:3857
crs_transform: [0.008333333333, 0, -180, 0, -0.008333333333, 89.99999999...Downloading Data#
We can now clip and format the dataset and save it as a GeoTIFF.
country_reprojected = country.to_crs(output_crs)
da_clipped = da \
.rio.clip(country_reprojected.geometry)
da_clipped
<xarray.DataArray 'b1' (y: 1081, x: 889)> Size: 4MB
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
* y (y) float64 9kB 5.595e+05 5.585e+05 ... -5.195e+05 -5.205e+05
* x (x) float64 7kB 3.774e+06 3.774e+06 ... 4.66e+06 4.662e+06
time datetime64[ns] 8B 2023-01-01
spatial_ref int64 8B 0
Attributes:
id: b1
data_type: {'type': 'PixelType', 'precision': 'int', 'min': -2147483...
dimensions: [43200, 21600]
crs_transform: [0.008333333333, 0, -180, 0, -0.008333333333, 89.99999999...Finally we save the results as a GeoTIFF file.
output_file = 'population.tif'
output_path = os.path.join(output_folder, output_file)
da_clipped.rio.to_raster(output_path, driver='COG')
If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)