Extracting a Time Series

Extracting a Time Series#

Introduction#

XArray makes it very easy to work with time-series data. The DataArray structure provides an efficient way to organize and represent earth observation time-series. We also get the benefit of many time-series processing functiions for temporal interpolation and smoothing. This tutorial shows how we can take individual GeoTIFF files and organize them as a XArray Dataset, extract time-series at any coordinate and do linear interpolation of masked observation.

Overview of the Task#

We will take MODIS Vegetation Indices Version 6.1 data which are generated every 16 days at 250 meter (m) spatial resolution for 2020 year and extract a time-series of Normalised Differnece Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) at different a point location.

Input Layers:

modis_vegetation_indices_2020.zip: A zip file containing GeoTIFF files for MODIS Vegetation Indicies for the state of Karnataka, India.

Output:

time_series.csv: A CSV file containing extracted values

Data Credit:

Didan, K. (2015). MOD13Q1 MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061. NASA EOSDIS Land Processes DAAC. Accessed 2023-05 from https://doi.org/10.5067/MODIS/MOD13Q1.006

Setup and Data Download#

The following blocks of code will install the required packages and download the datasets to your Colab environment.

%%capture
if 'google.colab' in str(get_ipython()):
    !pip install rioxarray

from cycler import cycler
import datetime
import geopandas as gpd
import glob
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import rioxarray as rxr
import xarray as xr
import zipfile

data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

filename = 'modis_vegetation_indices_2020.zip'
data_url = 'https://github.com/spatialthoughts/geopython-tutorials/releases/download/data/'

download(data_url + filename)

Downloaded data/modis_vegetation_indices_2020.zip

Data Pre-Processing#

First we unzip and extract the images to a folder.

zipfile_path = os.path.join(data_folder, filename)
with zipfile.ZipFile(zipfile_path) as zf:
  zf.extractall(data_folder)

We get 3 individual GeoTIFF files per time step.

NDVI: Normalised Differnece Vegetation Index (NDVI) values
EVI: Enhanced Vegetation Index (EVI) values
VI_Quality: Bitmask containing information about cloudy pixels

The information about the image date and measured variable is part of the file name.

For example the file MOD13A1.006__500m_16_days_NDVI_doy2020161_aid0001.tif contains NDVI values for the Day of the Year (DOY) 161 of the year 2020.

We will now go through the directory and extract this information to create a XArray DataArray.

def path_to_datetimeindex(filepath):
  filename = os.path.basename(filepath)
  pattern = r'doy(\d+)'
  match = re.search(pattern, filepath)
  if match:
      doy_value = match.group(1)
      timestamp = datetime.datetime.strptime(doy_value, '%Y%j')
      return timestamp
  else:
    print('Could not extract DOY from filename', filename)


timestamps = []
filepaths = []

files = os.path.join(data_folder, 'modis_vegetation_indices_2020', '*.tif')
for filepath in glob.glob(files):
  timestamp = path_to_datetimeindex(filepath)
  filepaths.append(filepath)
  timestamps.append(timestamp)

unique_timestamps = sorted(set(timestamps))

scenes = []

for timestamp in unique_timestamps:
  ndvi_filepattern = r'NDVI_doy{}'.format(timestamp.strftime('%Y%j'))
  evi_filepattern = r'EVI_doy{}'.format(timestamp.strftime('%Y%j'))
  qa_filepattern = r'VI_Quality_doy{}'.format(timestamp.strftime('%Y%j'))
  ndvi_filepath = [
      filepath for filepath in filepaths
      if re.search(ndvi_filepattern, filepath)][0]
  evi_filepath = [
      filepath for filepath in filepaths
      if re.search(evi_filepattern, filepath)][0]
  qa_filepath = [
      filepath for filepath in filepaths
      if re.search(qa_filepattern, filepath)][0]

  ndvi_band = rxr.open_rasterio(ndvi_filepath, chunks={'x':512, 'y':512})
  ndvi_band.name = 'NDVI'
  evi_band = rxr.open_rasterio(evi_filepath, chunks={'x':512, 'y':512})
  evi_band.name = 'EVI'
  qa_band = rxr.open_rasterio(qa_filepath, chunks={'x':512, 'y':512})
  qa_band.name = 'DetailedQA'

  # First 2 bits are MODLAND QA Bits with summary information
  # Extract the value of the 2-bits using bitwise AND operation
  summary_qa = qa_band & 0b11
  qa_band.name = 'SummaryQA'

  # The pixel values of NDVI/EVI files are stored as integers
  # We need to apply the scaling factor to get the actual values as floats
  scale_factor = 0.0001
  scaled_bands = [
      ndvi_band * scale_factor,
      evi_band * scale_factor,
      qa_band,
      summary_qa]
  scene = xr.concat(scaled_bands, dim='band')
  scene = scene.assign_coords(band=['NDVI', 'EVI', 'DetailedQA', 'SummaryQA'])
  scenes.append(scene)

We now have a list of individual XArray Datasets for each time step. We merge them to a single DataSet along the time dimension.

time_var = xr.Variable('time', list(unique_timestamps))
time_series_scenes = xr.concat(scenes, dim=time_var)
time_series_scenes

<xarray.DataArray 'NDVI' (time: 22, band: 4, y: 1656, x: 1090)> Size: 1GB
dask.array<concatenate, shape=(22, 4, 1656, 1090), dtype=float64, chunksize=(1, 1, 512, 512), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 9kB 74.05 74.06 74.06 74.06 ... 78.58 78.59 78.59
  * y            (y) float64 13kB 18.48 18.47 18.47 18.46 ... 11.59 11.59 11.58
    spatial_ref  int64 8B 0
  * band         (band) <U10 160B 'NDVI' 'EVI' 'DetailedQA' 'SummaryQA'
  * time         (time) datetime64[ns] 176B 2020-01-01 2020-01-17 ... 2020-12-18

Masking Clouds#

Before we can extract the time-series, we need to mask cloudy-pixels. We have saved the first 2-bits of the mask in the SummaryQA band. The pixel values range from 0-3 and represent the following.

0: Good data, use with confidence
1: Marginal data, useful but look at detailed QA for more information
2: Pixel covered with snow/ice
3: Pixel is cloudy

We select all pixels with value 0 or 1 and mask the remaining ones. We can apply this as a vectorized operation across all scenes instead of iterating over each image. This is a very fast and efficient operation enabled by XArray.

summary_qa = time_series_scenes.sel(band='SummaryQA')
time_series_scenes_masked = time_series_scenes \
  .sel(band=['NDVI', 'EVI']) \
  .where(summary_qa <= 1)

Let’s preview the result of our cloud masking operation for one of the images

image = time_series_scenes.isel(time=8).squeeze()
masked = time_series_scenes_masked.isel(time=8).squeeze()

fig, (ax0, ax1) = plt.subplots(1,2)
fig.set_size_inches(10,7)


image.sel(band='NDVI').plot.imshow(
    ax=ax0, vmin=0, vmax=0.7, add_colorbar=False, cmap='Greens')
masked.sel(band='NDVI').plot.imshow(
    ax=ax1, vmin=0, vmax=0.7, add_colorbar=False, cmap='Greens')

ax0.set_title('Original')
ax1.set_title('Masked')

for ax in [ax0, ax1]:
    ax.set_axis_off()
    ax.set_aspect('equal')

plt.show()

../_images/0c6493204be5d9289d52b2b1602b6e22a510095ace138ccd9cc2be724c6bf8af.png

Plotting the time-series#

We can extract the time-series at a specific X,Y coordinate. As our coordinates may not exactly match the DataArray coords, we use the interp() method to get the nearest value.

coordinate = (13.16, 77.35) # Lat/Lon
time_series = time_series_scenes_masked \
  .sel(band=['NDVI', 'EVI']) \
  .interp(y=coordinate[0], x=coordinate[1], method='nearest')

Let’s plot a time-series chart. We use Matplotlib cycler to set custom colors for each line.

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)

# Select custom colors for NDVI and EVI series
colorlist = ['#006837', '#78c679']
custom_cycler = cycler(color=colorlist)
ax.set_prop_cycle(custom_cycler)
time_series.plot.line(
    ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
ax.set_title('MODIS NDVI and EVI Time-Series')
plt.show()

../_images/ecae3c17b2cc9ccd6cafef31da61a4a930294050c4782b576c997b32d9923e6e.png

We have gaps in the time-series due to masked cloudy-pixels. Replace them with linearly interpolated values using XArray’s interpolate_na() function.

# As we are proceesing the time-series,
# it needs to be in a single chunk along the time dimension
time_series = time_series.chunk(dict(time=-1))
time_series_interpolated = time_series\
  .interpolate_na(dim='time', method='linear')

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)

# Select custom colors for NDVI and EVI series
colorlist = ['#006837', '#78c679']
custom_cycler = cycler(color=colorlist)
ax.set_prop_cycle(custom_cycler)
time_series_interpolated.plot.line(
    ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
ax.set_title('MODIS NDVI and EVI Time-Series')
plt.show()

../_images/e94485def26d836c701a1d9c9ecc9e6aacd10c9000c40fc41ddf8de4c223b34e.png

Saving the Time-Series#

Convert the extracted time-series to a Pandas DataFrame.

df = time_series_interpolated.to_pandas().reset_index()
df.head()

band	time	NDVI	EVI
0	2020-01-01	0.5311	0.2707
1	2020-01-17	0.4827	0.2563
2	2020-02-02	0.4367	0.2451
3	2020-02-18	0.4206	0.2433
4	2020-03-05	0.4408	0.2313

Save the DataFrame as a CSV file.

output_filename = 'time_series.csv'
output_filepath = os.path.join(output_folder, output_filename)
df.to_csv(output_filepath, index=False)

If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)