Cal-Adapt Analytics Engine: Threshold Tools Basics

A notebook on how to use the climakitae package and threshold_tools to calculate values of interest related to extreme weather events along the following dimensions:

  • return values (ex., the value of a daily preciptation event that will occur with a 10-year return period or once every 10 years),
  • return probabilities (ex., the probability of exceeding a 45 Degrees C temperature event),
  • and return periods (ex., how often, on average, a 150 mm daily preciptation event will occur).

Step 0: Import

Import neccessary packages before running analysis

import panel as pn
pn.extension()
!pip install git+https://github.com/OpenHydrology/lmoments3.git
import xarray as xr

import climakitae as ck
from climakitae import threshold_tools

Step 1: Select

Load a new application and call select to display interface from which to choose location, variables, scenarios, and designate warming levels of interest

app = ck.Application()

Call select to display an interface from which to select the data to examine

For this example, please select an area or region of interest without the “Area average” option.

Note:

  • This version only offers the dynamically-downscaled data.
  • To streamline later analysis, it’s helpful to select just one scenario.
  • If you select ‘daily’ for ‘Timescale’ it will result in a daily average of the hourly data.
app.select()

Step 2: Retrieve

Call app.retrieve() to load the subset/combo of data specified

generated_data = app.retrieve()

Note: You can preview the data in the retrieved, aggregated dataset when this is complete. It may take several minutes.

generated_data

Subset data by scenario and simulation to prepare it for threshold_tools functions

Replace scenario= and simulation= with the particular selections that are present in your generated data

subsetted_data = generated_data.sel(scenario='Historical Climate').sel(simulation='cnrm-esm2-1')
subsetted_data

Step 3: Transform

Pull Annual Maximum Series (AMS) for all grid cells

This is the first step of extreme value analysis — identifying what conditions are extreme. In this example, we default to considering each annual maximum value as a sample of an extreme event. Here, extreme events are evaluated using the annual block maxima approach, which determines the maximum value within a given block period (year). This approach is limited in that it excludes extreme events (that may be more extreme than maxima in subsequent years) when multiple extremes occur in a single year by returning only the maximum. Please consider this when using these tools for California in cases such as atmospheric river events and evaluating wet and dry years.

Future approaches will include the option to specify a threshold (i.e. critical value for infrastructure or high percentile) as the basis for identifying extremes.

After pulling the AMS, run .compute() to bring down the data to an appropriate size for later computations.

Note: Running .compute() may take several minutes.

ams = threshold_tools.get_ams(subsetted_data, extremes_type='max')
ams = ams.compute()
ams

Utilize KS test to calculate goodness of fit of selected distribution

The KS test can be used to compare a sample data with a reference probability distribution. It can be useful to understand the goodness of fit between the distribution and sample.

Note: You can input the following distributions in threshold_tools functions that have a distr= argument:

  • gev: Generalized extreme value distribution - allows for a continuous range of different shapes, and will reduce to the Gumbel, Weibull, and Generalized Pareto distributions under different conditions. The GEV may generally provide a better fit than the three individual distributions, and is a common approach in hydrological applications.
  • gumbel: Range of interest is unlimited
  • weibull: Range of interest has an upper limit
  • pearson3: Range of interest has a lower limit
  • genpareto: This distribution is often used in application for river flood events and suggested to be of a good general fit for precipitation in the United States.
goodness_of_fit = threshold_tools.get_ks_stat(ams, distr='gev', multiple_points=True)
goodness_of_fit

Calculate return value for a selected return period

Evaluate the return value for a particular return period (i.e., 1-in-X-year event). Specify the return period of interest, in years, by changing the return_period=. In the example, we evaluate the return value for a 1-in-10-year temperature event.

Note: bootstrap_runs, conf_int_lower_bound, and conf_int_upper_bound arguments are set to the default values listed below and hence don’t always need to be explicitly specified or called.

return_value = threshold_tools.get_return_value(ams, return_period=10, distr='gev',
                                                bootstrap_runs=100,
                                                conf_int_lower_bound=2.5,
                                                conf_int_upper_bound=97.5,
                                                multiple_points=True)
return_value

Calculate return probability for a selected threshold

Evaluate the probability of a certain threshold being exceeded by setting threshold= to the value of interest. In the example, we evaluate the probability of exceeding a 300 K temperature event.

Note: The threshold= input needs to be in equivalent units to those of the variable in the AMS.

Note: bootstrap_runs, conf_int_lower_bound, and conf_int_upper_bound arguments are not explicity specified and therefore are set to the following default values: bootstrap_runs=100, conf_int_lower_bound=2.5, and conf_int_upper_bound=97.5.

return_prob = threshold_tools.get_return_prob(ams, threshold=300, distr='pearson3', multiple_points=True)
return_prob

Calculate return period for a selected return value

Evaluate the return period (i.e., 1-in-X-year) for a certain return value of interest. In this example, we evaluate the return period of a 300 K return value.

Note: The return value will have units equivalent to those of the variable in the AMS.

Note: bootstrap_runs, conf_int_lower_bound, and conf_int_upper_bound arguments are not explicity specified and therefore are set to the following default values: bootstrap_runs=100, conf_int_lower_bound=2.5, and conf_int_upper_bound=97.5.

return_period = threshold_tools.get_return_period(ams, return_value=300, distr='weibull', multiple_points=True)
return_period

Step 4: Visualize

Visualize goodness of fit of distribution

Observe a geospatial map of p-values from the KS test.

threshold_tools.get_geospatial_plot(goodness_of_fit, data_variable='p_value')

Visualize return value

Observe a geospatial map of return values for selected return period.

threshold_tools.get_geospatial_plot(return_value, data_variable='return_value')

Visualize return probability

Observe a geospatial map of return probabilities of exceedance for selected threshold.

threshold_tools.get_geospatial_plot(return_prob, data_variable='return_prob')

Visualize return period

Observe a geospatial map of return periods for selected return value.

threshold_tools.get_geospatial_plot(return_period, data_variable='return_period', bar_max=1000)

Step 5: Export

Use the below code to export a dataset as a NetCDF, GeoTIFF, or CSV file. Provide the name of the dataset in the environment to export as well as a character string containing the file name in quotations. If the dataset contains multiple variables, provide an argument specifying which variable to export (e.g. variable=”T2”). If you would like to save data as a GeoTIFF or CSV file and the dataset contains scenarios or simulations, additionally provide arguments specifying the scenario (scenario=”historical”) and the simulation (simulation=”cesm2”).

app.export_as()
app.export_dataset(return_period,'my_filename')