Aggregating Time Series#

Introduction#

XArray that can perform time-series aggregation with a simple and concise syntax using the resample() method. We can easily change the frequency of the original time-series using the desired aggregation method. This tutorial shows how to take a daily precipitation time-series and aggregate it to a monthly total precipitation time-series.

Overview of the Task#

We will download the gridded precipitation data from Indian Meterological department (IMD) using imdlib, aggregate the daily time-series to monthly total rainfall and download the resulting rasters as GeoTIFF files.

Input Data:

  • Gridded daily precipitation binary data from IMD.

Output:

  • Monthly precipitation rasters as GeoTIFF files.

Data and Software Credit:

  • Pai D.S., Latha Sridhar, Rajeevan M., Sreejith O.P., Satbhai N.S. and Mukhopadhyay B., 2014: Development of a new high spatial resolution (0.25° X 0.25°)Long period (1901-2010) daily gridded rainfall data set over India and its comparison with existing data sets over the region; MAUSAM, 65, 1(January 2014), pp1-18.

  • Nandi, S., Patel, P., and Swain, S. (2024). IMDLIB: An open-source library for retrieval, processing and spatiotemporal exploratory assessments of gridded meteorological observation datasets over India. Environmental Modelling and Software, 71 (105869), https://doi.org/10.1016/j.envsoft.2023.105869

Setup and Data Download#

The following blocks of code will install the required packages and download the datasets to your Colab environment.

%%capture
if 'google.colab' in str(get_ipython()):
    !pip install rioxarray imdlib
import os
import imdlib as imd
import rioxarray as xr
from matplotlib import pyplot as plt
import pandas as pd
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

Download gridded binary datasets using imdlib.

start_year = 2015
end_year = 2020
variable = 'rain' # other options are ('tmin'/ 'tmax')
data = imd.get_data(
    variable,
    start_year,
    end_year,
    fn_format='yearwise',
    file_dir=data_folder
)
Downloading: rain for year 2015
Downloading: rain for year 2016
Downloading: rain for year 2017
Downloading: rain for year 2018
Downloading: rain for year 2019
Downloading: rain for year 2020
Download Successful !!!

Data Pre-Processing#

First we read the data and create a XArray Dataset. The NoData values are stored as -999 so we set those to NaN.

data = imd.open_data(
    variable,
    start_year,
    end_year,
    'yearwise',
    data_folder)

ds = data.get_xarray()
ds = ds.where(ds['rain'] != -999.)
ds
<xarray.Dataset> Size: 305MB
Dimensions:  (time: 2192, lat: 129, lon: 135)
Coordinates:
  * lat      (lat) float64 1kB 6.5 6.75 7.0 7.25 7.5 ... 37.75 38.0 38.25 38.5
  * lon      (lon) float64 1kB 66.5 66.75 67.0 67.25 ... 99.25 99.5 99.75 100.0
  * time     (time) datetime64[ns] 18kB 2015-01-01 2015-01-02 ... 2020-12-31
Data variables:
    rain     (time, lat, lon) float64 305MB nan nan nan nan ... nan nan nan nan
Attributes:
    Conventions:  CF-1.7
    title:        IMD gridded data
    source:       https://imdpune.gov.in/
    history:      2025-04-08 18:33:42.491078 Python
    references:   
    comment:      
    crs:          epsg:4326

Aggregating Data#

The input data containsdaily precipitation values. We want to aggregate these to monthly totals. XArray allows specifying the target frequency using a concise frequncy string from Pandas. The string 1MS creates a time-series grouping from Month Start as 1 month intervals. We use sum() after resampling to calculate the monthly totals.

The result is a XArray Dataset with 12 time-steps for each year.

ds_monthly = ds.resample(time='1MS').sum()

Plotting the time-series#

Select the rain variable.

da_original = ds['rain']
da_monthly = ds_monthly['rain']

We can extract the time-series at a specific X,Y coordinate. As our coordinates may not exactly match the DataArray coords, we use the interp() method to get the nearest value.

coordinate = (13.16, 77.35) # Lat/Lon

original_time_series = da_original \
  .interp(lat=coordinate[0], lon=coordinate[1], method='nearest')

aggregated_time_series = da_monthly \
  .interp(lat=coordinate[0], lon=coordinate[1], method='nearest')
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)

original_time_series.plot.line(
    ax=ax, x='time', marker='o', markersize=1,
    linestyle='-', linewidth=1,)
ax.set_title('Daily Precipitation Time-Series')
plt.show()
../_images/c15e269b8884c5b02ebcfa52ce1a55b68018e9b14884c05168c13c19f07ba200.png

Let’s plot the aggregated time-series chart.

fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)

aggregated_time_series.plot.line(
    ax=ax, x='time', marker='o',
    linestyle='-', linewidth=1, markersize=2)
ax.set_title('Monthly Precipitation Time-Series')

plt.show()
../_images/a0a49d8e2eecb71ec71e032c259b6ed05a7d5b8dff066d6a7067e3f5c99a8682.png

Export Monthly Images as GeoTIFFs#

We can iterate through each time step and export the monthly rasters as GeoTIFF files.

for time in da_monthly.time.values:
  image = da_monthly.sel(time=time)
  # transform the image to suit rioxarray format
  image = image \
    .rio.write_crs('EPSG:4326') \
    .rio.set_spatial_dims('lon', 'lat')
  date = pd.to_datetime(time).strftime('%Y-%m')
  output_file = f'image_{date}.tif'
  output_path = os.path.join(output_folder, output_file)
  image.rio.to_raster(output_path, driver='COG')
  print('Wrote file:', output_path)

If you want to give feedback or share your experience with this tutorial, please comment below. (requires GitHub account)