Exploring Uncertainty in Extreme Climate Events

There are several kinds of scientific uncertainty that arise when working with long-term projections of future climates:

  1. Model Uncertainty , or the differences between different models (namely, how model physics, settings, or parameters can change the outcome).
  2. Internal Variability , which represents the variations inherent within the climate system itself.
  3. Scenario Uncertainty , which arises from differences in outcomes between emissions trajectories.
This notebook explores Internal Variability in the Cal-Adapt: Analytics Engine by focusing on projected changes in extreme precipitation. We show results across the suite of models currently available in the Cal-Adapt Data Catalog as well as the full set of CMIP6 models. We also provide a reasonable strategy for handling data with large internal variability. Intended application : While the assessment and handling of extreme events shown here applies to any extreme event for any variable, here we choose to explore precipitation because it in particular tends to have very high natural background variability which is represented by internal variability here. In this notebook, we examine projected changes in extreme precipitation accumulations , in particular to meet needs associated with planning and preparedness for precipitation impacts.

Terms used throughout this notebook:

  • Ensemble : A set of simulations run using a single model. For the purposes of this analysis, each ensemble member might differ in the following ways:
    • What point in time of the control data set was used as input data for the ensemble member.
    • What physics options are used.
  • Since the underlying model structure and emissions scenario remain constant across an ensemble, we call the differences between ensemble members internal variability.
  • Downscaling: Global climate models are often run with a coarse grid size (on the order of 1 x 1 degrees [100 x 100 km]) which is much larger than the scale of many weather events of interest (on the order of 1-10 km). To capture smaller scale processes, one ensemble member from each Cal-Adapt: Analytics Engine model was used to provide input data for the Weather Research and Forecasting (WRF) Model. The WRF output provides data at 45 km, 9 km, and 3 km spatial scale for the WECC region. The following table shows the original grid sizes for the downscaled models.
Model name Longitude spacing (deg) Latitude spacing (deg)
CESM2 1.3 0.9
CNRM-ESM2-1 1.4 1.4
EC-EARTH3-Veg 0.7 0.7
FGOALS-g3 2 2.3
  • Data pooling: In our case, this refers to placing all the model output into a single data “bucket” to increase our sample size before computing statistics. This notebook will show why this is an appropriate practice.
Scientific note : Why do extreme events benefit from special handling? Extreme events are often poorly sampled (a 1-in-100 year event may only happen one time in a given ensemble run), leading to higher model uncertainty and internal variability for these events.

Step 0: Setup

Import the climakitae library and other dependencies.

import climakitae as ck
import panel as pn
pn.extension()
from climakitae.explore.uncertainty import (
    CmipOpt, get_ensemble_data,
    get_warm_level, grab_multimodel_data,
    _area_wgt_average, get_ks_pval_df)
from climakitae.util.utils import read_ae_colormap

import holoviews as hv
from bokeh.models import HoverTool
import xarray as xr
import pandas as pd
import numpy as np
import fnmatch
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
%config InlineBackend.figure_format = 'svg' # Make plots look better in the notebook environment

Step 1: Introduction to uncertainty

1a) Model uncertainty

We begin by illustrating differences in results across the four downscaled Cal-Adapt models.

Get the Cal-Adapt model output over California for the historical and SSP 3-7.0 scenarios

selections = ck.Select()
selections.scenario_historical=['Historical Climate']
selections.scenario_ssp=['SSP 3-7.0 -- Business as Usual']
selections.append_historical = True
selections.variable = 'Precipitation (total)'
selections.time_slice = (1981, 2100)
selections.resolution = '9 km'
selections.timescale = 'monthly'
selections.downscaling_method = ["Dynamical"]
selections.area_subset = 'states'
selections.cached_area = ['CA']
selections.area_average = 'No'

wrf_ds = selections.retrieve().squeeze()

cmip_names = ['CESM2', 'CNRM-ESM2-1', 'EC-Earth3-Veg', 'FGOALS-g3'] # Simulation names in CMIP6
wrf_cmip_lookup_dict = { # WRF simulation names have additional information about activity ID and ensemble
    # Lookup dictionary used to rename simulation values
    fnmatch.filter(wrf_ds.simulation.values, "*{0}*".format(cmip))[0]:cmip for cmip in cmip_names
}

wrf_ds = wrf_ds.sortby("simulation") # Sort simulations alphabetically
wrf_ds["simulation"] = [wrf_cmip_lookup_dict[sim] for sim in wrf_ds.simulation.values] # Rename simulations
wrf_ds = wrf_ds.clip(0.1) # Remove values less than 0.1

Specify the warming threshold in deg C. You don’t have to change anything, but feel free to specify any of the following: 1.5, 2.0, 3.0, 4.0.

warm_level = 3.0

Process the data and compute the 99th percentile

hist_wrf = wrf_ds.sel(
    time=slice('1981','2010'))

sim_idx = list(wrf_ds.simulation.values)
ssp_wrf_list = [get_warm_level(
            warm_level, wrf_ds.sel(
            simulation=s).squeeze(), ipcc=False)
            for s in sim_idx]

ssp_wrf_list = list(filter(lambda item: item is not None, ssp_wrf_list))
ssp_wrf = xr.concat(ssp_wrf_list,dim='simulation')

hist_wrf = hist_wrf.sel(simulation = ssp_wrf.simulation.values)

# # mean and 99th percentile all models
cads_hist_percentile = hist_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

cads_ssp_percentile = ssp_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

cads_delta_percentile = (cads_ssp_percentile
                         - cads_hist_percentile).compute()

Show the inter-model spread by plotting the following maps for each of the four downscaled Cal-Adapt models:

  • Present-day (1981 - 2010) 99th percentile monthly precipitation accumulation
  • 99th percentile monthly precipitation accumulation under warming conditions (+/- 15 years from the year in which the warming threshold was reached; varies by model)
  • The change in 99th percentile monthly precipitation realized under those warming conditions
## define a plotting function
def make_precip_unc_map(
    data,
    title,
    vmin,
    vmax,
    sopt=False,
    width=250,
    height=250,
    xlim=(None, None),
    ylim=(None, None),
    xticks=[None],
    yticks=[None],
    xaxis=None,
    yaxis=None,
    absolute=True,
):
    """Make single map"""

    if absolute:
        val_str = "Precipitation (mm/month)"
    else:
        val_str = "% change"

    hover = HoverTool(
        description="Custom Tooltip",
        tooltips=[
            ("Longitude (deg E)", "@x"),
            ("Latitude (deg N)", "@y"),
            (val_str, "@z"),
        ],
    )

    if sopt:
        cmap = read_ae_colormap(
            cmap="ae_diverging_r", cmap_hex=True
        )  # sets to a diverging colormap
    else:
        cmap = read_ae_colormap(cmap="ae_blue", cmap_hex=True)

    _plot = hv.QuadMesh((data["lon"], data["lat"], data)).opts(
        tools=[hover],
        colorbar=True,
        cmap=cmap,
        symmetric=sopt,
        clim=(vmin, vmax),
        xaxis=None,
        yaxis=None,
        clabel="Precipitation (mm/month)",
        title=title,
        width=width,
        height=height,
        xlim=xlim,
        ylim=ylim,
    )
    return _plot
vmin = 0
vmax = 2000
delta_vmin = None
delta_vmax = None
num_simulations = len(cads_delta_percentile.simulation.values)

cads_hist_plots = make_precip_unc_map(
        data=cads_hist_percentile.isel(simulation=0),
        title=(cads_hist_percentile.isel(simulation=0
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False, width=300, height=300)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=cads_hist_percentile.isel(simulation=sim_i),
        title=(cads_hist_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False, width=300, height=300)
    cads_hist_plots += pl_i

# additional aesthetic settings to tidy figure
cads_hist_plots.cols(4)  # organize columns
cads_hist_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
cads_hist_plots.opts(toolbar="below")  # set toolbar location
cads_hist_plots.opts(
            title="99th percentile monthly precipitation accumulation"
            + " across models, 1981-2010"
        )  # add title

# now for EOC
cads_ssp_plots = make_precip_unc_map(
        data=cads_ssp_percentile.isel(simulation=0),
        title=(cads_ssp_percentile.isel(simulation=0
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False, width=300, height=300)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=cads_ssp_percentile.isel(simulation=sim_i),
        title=(cads_ssp_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False, width=300, height=300)
    cads_ssp_plots += pl_i

# additional aesthetic settings to tidy figure
cads_ssp_plots.cols(4)  # organize columns
cads_ssp_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
cads_ssp_plots.opts(toolbar="below")  # set toolbar location
cads_ssp_plots.opts(
            title="99th percentile monthly precipitation accumulation"
            + " across models, +/- 15 years of warming level"
        )  # add title

# now for the difference
cads_diff_plots = make_precip_unc_map(
        data=cads_delta_percentile.isel(simulation=0),
        title=(cads_delta_percentile.isel(simulation=0
        ).simulation.item()), vmin=delta_vmin,
        vmax=delta_vmax, sopt=True, width=300, height=300)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=cads_delta_percentile.isel(simulation=sim_i),
        title=(cads_delta_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=delta_vmin,
        vmax=delta_vmax, sopt=True, width=300, height=300)
    cads_diff_plots += pl_i

# additional aesthetic settings to tidy figure
cads_diff_plots.cols(4)  # organize columns
cads_diff_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
cads_diff_plots.opts(toolbar="below")  # set toolbar location
cads_diff_plots.opts(
            title="Change in 99th percentile monthly precipitation accumulation"
            + "  across models"
        )  # add title

cads_tabs = pn.Tabs(('Present-day',cads_hist_plots),
               ('+ '+ str(warm_level)+' C',cads_ssp_plots),
               ('Difference',cads_diff_plots),
               dynamic=False)
cads_tabs

Interpretation

Looking at all Cal-Adapt models reveals quite a bit of disagreement in precipitation accumulations, as well as precipitation sign and magnitude changes by end-of-century. These plots show that sampling from a single simulation can limit our understanding, while comparing across simulations can give conflicting and confusing results which are difficult to interpret from a practical perspective. In the next section, we will elucidate how internal variability contributes to these differences.

1b) Internal variability

Internal variability describes the intra-model differences, or those which occur across a given model’s ensemble (recall that an ensemble is a series of model runs which feature small tweaks in initial conditions, and sometimes use different underlying physics which govern smaller scale processes). Internal variability results from the inherently chaotic nature of the climate system, and as a result provides an analog for natural background variability.

We illustrate internal variability by comparing the ensemble members of the CMIP6 models which provided the large-scale input for the downscaled Cal-Adapt models shown above.

Set options for CMIP6 data

# select data options
copt = CmipOpt()
copt.variable = 'pr' # near-surface air temperature
copt.area_subset = 'states'
copt.location = 'California' # location of interest to subset for
copt.area_average = False # ensures that we grab spatiotemporal data
copt.timescale = 'Amon' # monthly frequency

Get the CMIP6 ensemble runs for the relevant GCMs (those which correspond to the downscaled models shown in 1.1 from the catalog).

hist_cae_ds, warm_cae_ds = get_ensemble_data(
    variable=copt.variable, selections=selections, cmip_names=cmip_names)

The following cell computes the 99th percentile for each ensemble member.

# Compute the 99th percentile
hist_cae_percentile = hist_cae_ds.quantile(.99, dim="time").squeeze()
ssp_cae_percentile = warm_cae_ds.quantile(.99, dim="time").squeeze()

# Compute the difference between 99th percentile for future - present-day
cae_delta_percentile = ssp_cae_percentile - hist_cae_percentile

Show the intra-model spread by plotting the following maps for each ensemble member of the four Cal-Adapt models:

  • Present-day (1981 - 2010) 99th percentile monthly precipitation accumulation
  • 99th percentile monthly precipitation accumulation under warming conditions (+/- 15 years from the year in which the warming threshold was reached)
  • The change in 99th percentile monthly precipitation realized under those warming conditions
# Plotting helper functions
def _hvplot_one_sim(data, sim_name, vmin=0, vmax=900, sopt=False, num_cols=6):
    """
    Using make_precip_unc_map, plot all member_id data for an input simulation.
    This will make a single row of maps.
    """
    plots_by_sim = None
    # Get the data just at the input simulation
    # Rename x --> lon and y --> lat so that the make_precip_unc_map can read the dataset
    to_plot_by_sim = data.where(data.simulation == sim_name, drop=True).rename(
        {"x": "lon", "y": "lat"}
    )
    # Make a plot for each individual member_id
    for member_id_i in range(len(to_plot_by_sim.member_id.values)):
        plot_i = make_precip_unc_map(
            to_plot_by_sim.drop("simulation").isel(member_id=member_id_i),
            title="{0} member {1}".format(sim_name, member_id_i + 1),
            vmin=vmin,
            vmax=vmax,
            sopt=sopt,
        )
        plots_by_sim = plot_i if plots_by_sim is None else plots_by_sim + plot_i
    return plots_by_sim.cols(6)

def hvplot_percentile_column(
    data, col_title="", vmin=0, vmax=900, sopt=False, num_cols=6
):
    """
    Create a pn.Column object, in which each row is a different simulation.
    Set col_title to give the Column a name.
    Set num_cols to indicate how many maps you want to show in each row.
    """
    col = pn.Column(col_title)
    for sim in np.unique(data.simulation.values):
        pl_by_sim = _hvplot_one_sim(
            data, sim_name=sim, vmin=vmin, vmax=vmax, sopt=sopt, num_cols=6
        )
        col += pn.Row(pl_by_sim)
    return col
all_hist_ens = hvplot_percentile_column(
    data=hist_cae_percentile.pr,
    col_title="### 99th percentile monthly precipitation accumulation across model ensemble members, 1981-2010",
    vmin=0, vmax=900,
    sopt=False
)

all_ssp_ens = hvplot_percentile_column(
    data=ssp_cae_percentile.pr,
    col_title="### 99th percentile monthly precipitation accumulation across model ensemble members, 2071-2100",
    vmin=0, vmax=900,
    sopt=False
)

all_diff_ens = hvplot_percentile_column(
    data=cae_delta_percentile.pr,
    col_title="### Change in 99th percentile monthly precipitation accumulation across ensemble members by end-of-century",
    vmin=-150, vmax=150,
    sopt=True
)

pn.Tabs(('Present-day',all_hist_ens),
        ('+ '+ str(warm_level)+' C',all_ssp_ens),
        ('Difference',all_diff_ens))

Interpretation

We have now expanded our analysis to the larger model ensembles from which the Cal-Adapt models were derived. Doing so shows us that not only are differences across models substantial, but so are differences within models. Since the downscaled Cal-Adapt models come from one ensemble member, the model differences shown in the first set of maps result from a combination of model uncertainty and internal variability .

1c) Model uncertainty across the CMIP6 catalog

Get the available CMIP6 output from the catalog. It can take a couple minutes to read in and prepare the data.

mdls_ds = grab_multimodel_data(copt,alpha_sort=True)

hist_ds = mdls_ds.sel(time=slice('1981','2010'))
ssp_ds = mdls_ds.sel(time=slice('2014','2100'))

num_simulations = range(len(ssp_ds.simulation.values))
ssp_ds_list = [get_warm_level(warm_level,
                ssp_ds.isel(simulation=i), ipcc=False)
                for i in num_simulations]
ssp_ds_list = [i for i in ssp_ds_list if i]
ssp_ds = xr.concat(ssp_ds_list, dim='simulation').squeeze()
hist_ds = hist_ds.sel(simulation=ssp_ds.simulation.values)

Next we compute statistics.

# 99th percentile all models
hist_percentile = hist_ds.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

ssp_percentile = ssp_ds.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

delta_percentile = (ssp_percentile
                    - hist_percentile).compute()

Show the inter-model spread by plotting the following maps for one ensemble member of each of the available CMIP6 simulations:

  • Present-day (1981 - 2010) 99th percentile monthly precipitation accumulation
  • 99th percentile monthly precipitation accumulation under warming conditions (+/- 15 years from the year in which the warming threshold was reached)
  • The change in 99th percentile monthly precipitation realized under those warming conditions
vmin = 0
vmax = 900
delta_vmin = -150
delta_vmax = 150
ncols = 5
num_simulations = len(hist_percentile.simulation)

hist_plots = make_precip_unc_map(
        data=hist_percentile.pr.isel(simulation=0
        ).rename({'x' : 'lon', 'y' : 'lat'}),
        title=(hist_percentile.isel(simulation=0
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=hist_percentile.pr.isel(simulation=sim_i
        ).rename({'x' : 'lon', 'y' : 'lat'}),
        title=(hist_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False)
    hist_plots += pl_i

# additional aesthetic settings to tidy figure
hist_plots.cols(ncols)  # organize columns
hist_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
hist_plots.opts(toolbar="below")  # set toolbar location
hist_plots.opts(
            title="99th percentile monthly precipitation accumulation"
            + " in each model, 1981-2010"
        )  # add title

# now for warming
ssp_plots = make_precip_unc_map(
        data=ssp_percentile.pr.isel(simulation=0
        ).rename({'x' : 'lon', 'y' : 'lat'}),
        title=(ssp_percentile.isel(simulation=0
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=ssp_percentile.pr.isel(simulation=sim_i
        ).rename({'x' : 'lon','y' : 'lat'}),
        title=(ssp_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=vmin,
        vmax=vmax, sopt=False)
    ssp_plots += pl_i

# additional aesthetic settings to tidy figure
ssp_plots.cols(ncols)  # organize columns
ssp_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
ssp_plots.opts(toolbar="below")  # set toolbar location
ssp_plots.opts(
            title="99th percentile monthly precipitation accumulation"
            + " in each model, 2071-2100"
        )  # add title

# now for the difference
diff_plots = make_precip_unc_map(
        data=delta_percentile.pr.isel(simulation=0
        ).rename({'x' : 'lon','y' : 'lat'}),
        title=(delta_percentile.isel(simulation=0
        ).simulation.item()), vmin=delta_vmin,
        vmax=delta_vmax, sopt=True)

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=delta_percentile.pr.isel(simulation=sim_i
        ).rename({'x' : 'lon', 'y' : 'lat'}),
        title=(delta_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=delta_vmin,
        vmax=delta_vmax, sopt=True)
    diff_plots += pl_i

# additional aesthetic settings to tidy figure
diff_plots.cols(ncols)  # organize columns
diff_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
diff_plots.opts(toolbar="below")  # set toolbar location
diff_plots.opts(
            title="Change in 99th percentile monthly precipitation accumulation"
            + " in each model by end-of-century"
        )  # add title

tabs = pn.Tabs(('1981-2010',hist_plots),
               ('+ '+ str(warm_level)+' C',ssp_plots),
               ('Difference',diff_plots),
               dynamic=False)
tabs

Interpretation

The above maps show substantial differences across models which are particularly apparent in the difference between end-of-century and present-day results. Importantly, the intra-model differences shown in the previous set of maps also showed substantial variation. In light of this, we conclude that differences across models occur both as a result of actual structural model differences and internal variability. In the next section, we will show how these two sources of uncertainty compare to one another.

1d) Uncertainty summarized

We will compactly summarize model uncertainty and internal variability by examining the area-average change over a small region. We use here a box surrounding the Bay Area.

Note that here we analyze relative (expressed as a percent) changes rather than absolute changes . This is because smaller grid sizes (such as those in the Cal-Adapt downscaled models) generally drive more higher extreme precipitation magnitudes than models with larger grid sizes (such as the CMIP6 models). These grid size effects occur due to differences in model physics and surface effects from the representation of topography.

Get the county-wide data

selections.timescale = 'monthly'
selections.area_subset = 'lat/lon'
selections.cached_area = ['coordinate selection']
selections.latitude = (37,38.5)
selections.longitude = (-123,-121.5)
selections.area_average = 'Yes'

reg_wrf_ds = selections.retrieve().squeeze()
reg_wrf_ds = reg_wrf_ds.sortby("simulation") # Sort simulations alphabetically
reg_wrf_ds["simulation"] = [wrf_cmip_lookup_dict[sim] for sim in reg_wrf_ds.simulation.values] # Rename simulations
reg_wrf_ds = reg_wrf_ds.clip(0.1)

Show maps of the relative change in 99th percentile monthly precipitation accumulation under warming conditions, with the subarea boxed for reference

lat0 = selections.latitude[0]
lat1 = selections.latitude[1]
lon0 = selections.longitude[0]
lon1 = selections.longitude[1]

cads_rel_delta_percentile = (
    (cads_delta_percentile / cads_hist_percentile
     )*100).compute()
lons = cads_rel_delta_percentile.lon[0]
lats = cads_rel_delta_percentile.lat[0]
num_simulations = len(cads_rel_delta_percentile.simulation.values)

cads_rel_diff_plots = make_precip_unc_map(
        data=cads_rel_delta_percentile.isel(simulation=0),
        title=(cads_rel_delta_percentile.isel(simulation=0
         ).simulation.item()), vmin=-60, vmax=60, sopt=True,
        ylim=(lat0,lat1), xlim=(lon0,lon1),
        yticks=[lat0,lat1], xticks=[lon0,lon1], absolute=False)
cads_rel_diff_plots.opts(clabel = '% change from present-day')

for sim_i in range(1, num_simulations): # plot remaining simulations
    pl_i = make_precip_unc_map(
        data=cads_rel_delta_percentile.isel(simulation=sim_i),
        title=(cads_rel_delta_percentile.isel(simulation=sim_i
        ).simulation.item()), vmin=-60, vmax=60, sopt=True,
     ylim=(lat0,lat1), xlim=(lon0,lon1),
    yticks=[lat0,lat1], xticks=[lon0,lon1], absolute=False)
    pl_i.opts(clabel = '% change from present-day')
    cads_rel_diff_plots += pl_i

# additional aesthetic settings to tidy figure
cads_rel_diff_plots.cols(4)  # organize columns
cads_rel_diff_plots.opts(hv.opts.Layout(merge_tools=True))  # merge toolbar
cads_rel_diff_plots.opts(toolbar="below")  # set toolbar location

box = hv.Bounds((lon0,lat0,lon1,lat1)).opts(color='k')
diff_with_box = cads_rel_diff_plots*box

diff_with_box.opts(
            title="Relative change in 99th percentile monthly precipitation accumulation"
            + "  across models under " + str(warm_level)+' C warming',
            toolbar="below" )

diff_with_box

Process data and compute statistics. As with the other cells which run statistics, this can take some time to run.

# first for the downscaled results
reg_hist_wrf = reg_wrf_ds.sel(
    time=slice('1981','2010'))

sim_idx = list(reg_wrf_ds.simulation.values)
reg_ssp_wrf_list = [get_warm_level(
            warm_level, reg_wrf_ds.sel(
            simulation=s).squeeze(), ipcc=False)
            for s in sim_idx]
reg_ssp_wrf_list = list(filter(lambda item:
                item is not None, reg_ssp_wrf_list))
reg_ssp_wrf = xr.concat(reg_ssp_wrf_list,dim='simulation')

reg_cads_hist_percentile = reg_hist_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

reg_cads_ssp_percentile = reg_ssp_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

reg_cads_delta_percentile = (((reg_cads_ssp_percentile
                              - reg_cads_hist_percentile)
                             /reg_cads_hist_percentile
                             )*100).compute()
# then for the GCMs
cae_rel_delta_percentile = (cae_delta_percentile
                            / hist_cae_percentile
                            *100).compute()
reg_cae_delta_percentile = _area_wgt_average(
                            cae_rel_delta_percentile.sel(
                            y=slice(lat0,lat1),
                            x=slice(lon0,lon1)))

Plot relative changes in 99th percentile precipitation accumulation for the Cal-Adapt GCM ensemble and downscaled Cal-Adapt members

bar_tops = []
bar_floors = []
line_pos = []
for sim in np.unique(
    reg_cads_delta_percentile.simulation.values):
    data = reg_cae_delta_percentile.where(
            reg_cae_delta_percentile.simulation==sim,
            drop=True)
    bar_tops.append(data.pr.max(
            dim="member_id",skipna=True).item())
    bar_floors.append(data.pr.min(
            dim="member_id",skipna=True).item())
    line_pos.append(data.pr.median(
            dim="member_id",skipna=True).item())

fig,ax = plt.subplots(figsize=(8,5))
plt.vlines(x=reg_cads_delta_percentile['simulation'],
        ymax=bar_tops,ymin=bar_floors,
        linewidths=20,color='#009292',
        alpha=.5,label="Ensemble range")
ax.scatter(reg_cads_delta_percentile['simulation'],
           reg_cads_delta_percentile,zorder=2,ec='k',fc='#ffff6d',
          s=200,label="Downscaled \n simulation")
ax.scatter(reg_cads_delta_percentile['simulation'],
           line_pos,zorder=2,ec='k',
          s=700,marker="_",label="Ensemble median")
ax.legend(ncol=1,bbox_to_anchor=(1.4, .9))
ax.set_title("Relative change in 99th percentile monthly"
             + " precipitation accumulation, subregion average")
ax.set_ylabel("% change",fontsize=14)
ax.set_xlabel("Model",fontsize=14)
plt.show()

Interpretation

  • The intra-model range (i.e., internal variability; represented by the blue shading) for each model is larger than the differences between models; in other words, internal variability dominates model uncertainty .
  • Accordingly, we would expect a similar range in the response for the downscaled Cal-Adapt models if we had multiple ensemble members for each of them.
  • While we do not necessarily expect the downscaled model change (yellow circles) for a single member to overlap with the blue shading, we would expect there to be overlap between the blue shading and the range in responses across ensemble members for each downscaled model if they existed.
  • The range in responses, which occurs as a result of both internal variability and model uncertainty, makes it difficult to use the data presented above to form actionable conclusions.

The next section shows a reasonable analysis technique in light of the uncertainty we have illustrated so far.

Step 2: Handling uncertainty

Given that internal variability is high compared to model uncertainty (which we show above), and assuming that a model run appropriately captures the relevant processes for extreme precipitation, we can treat each ensemble member of each model as an equally plausible representation of the climate. Under this treatment, we can pool the data from all simulations (which increases our sampling by a factor of four) and sample from the larger distribution.
selections.area_subset='states'
selections.cached_area = ['CA']
selections.area_average = 'No'
selections.timescale = 'monthly'

box_wrf_ds = selections.retrieve().squeeze()

box_wrf_ds = box_wrf_ds.sortby("simulation") # Sort simulations alphabetically
box_wrf_ds["simulation"] = [wrf_cmip_lookup_dict[sim] for sim in box_wrf_ds.simulation.values] # Rename simulations
box_wrf_ds = box_wrf_ds.clip(0.1) # Remove values less than 0.1
box_hist_wrf = box_wrf_ds.sel(
    time=slice('1981','2010'))

sim_idx = list(box_wrf_ds.simulation.values)
box_ssp_wrf_list = [get_warm_level(
            warm_level, box_wrf_ds.sel(
            simulation=s).squeeze(), ipcc=False)
            for s in sim_idx]

box_ssp_wrf_list = list(filter(lambda item: item is not None, ssp_wrf_list))
box_ssp_wrf = xr.concat(box_ssp_wrf_list,dim='simulation')

box_hist_wrf = box_hist_wrf.sel(simulation = box_ssp_wrf.simulation.values)

box_cads_hist_percentile = box_hist_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

box_cads_ssp_percentile = box_ssp_wrf.chunk(dict(
    time=-1)).quantile([.99],
    dim='time').compute().squeeze()

box_cads_delta_percentile = (((box_cads_ssp_percentile
                              - box_cads_hist_percentile)
                             /box_cads_hist_percentile
                             )*100).compute()
hist_wrf_pool = box_hist_wrf.stack(index=['simulation','time'])
hist_wrf_pool = hist_wrf_pool.compute()

ssp_percentile_mmm = box_cads_ssp_percentile.mean(
    dim="simulation").compute().squeeze()
hist_percentile_mmm = box_cads_hist_percentile.mean(
    dim="simulation").compute().squeeze()
delta_percentile_mmm = (ssp_percentile_mmm
            - hist_percentile_mmm).compute().squeeze()
hist_wrf_pool = box_hist_wrf.stack(index=['simulation','time'])
ssp_wrf_pool = box_ssp_wrf.stack(index=['simulation','time'])

Here we perform basic statistics as before, as well as a Kolmogorov-Smirnov (KS) test between our present-day and warming level data to determine statistical significance (p < 0.05). The KS test essentially answers the question, “How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?” (via Wikipedia). In other words, statistical significance is achieved when two samples (in this case, the present-day and warming level 99th percentile monthly precipitation accumulations) are sufficiently different from one another.

There is lots of computation here, and it will take a few minutes to run.

hist_wrf_pool_perc = hist_wrf_pool.chunk(
    dict(index=-1)).quantile([.99],
    dim='index').compute().squeeze()

ssp_wrf_pool_perc = ssp_wrf_pool.chunk(
    dict(index=-1)).quantile([.99],
    dim='index').compute().squeeze()

delta_wrf_pool_perc = (ssp_wrf_pool_perc
                       - hist_wrf_pool_perc
                      ).compute()

hist_wrf_pool = hist_wrf_pool.compute()
ssp_wrf_pool = ssp_wrf_pool.compute()
pooled_p_df = get_ks_pval_df(hist_wrf_pool, ssp_wrf_pool)

hist_wrf_mmm = box_hist_wrf.mean(dim='simulation').rename({'time':'index'}).compute()
ssp_wrf_mmm = box_ssp_wrf.mean(dim='simulation').rename({'time':'index'}).compute()
mmm_p_df = get_ks_pval_df(hist_wrf_mmm, ssp_wrf_mmm)

Get ready the make the relevant plots

vmin = 0
vmax = 2000
delta_vmin = None
delta_vmax = None

mmm_hist_plot = make_precip_unc_map(
        data=hist_percentile_mmm, title=("Multi-model mean"),
        vmin=vmin, vmax=vmax, sopt=False,
        height=300, width=300)

mmm_ssp_plot = make_precip_unc_map(
        data=ssp_percentile_mmm, title=("Multi-model mean"),
        vmin=vmin, vmax=vmax, sopt=False,
        height=300, width=300)

mmm_diff_plot = make_precip_unc_map(
        data=delta_percentile_mmm, title=("Multi-model mean"),
        vmin=delta_vmin, vmax=delta_vmax, sopt=True,
        height=300, width=300)

pool_hist_plot = make_precip_unc_map(
        data=hist_wrf_pool_perc, title=("Multi-model pool"),
        vmin=vmin, vmax=vmax, sopt=False,
        height=300, width=300)

pool_ssp_plot = make_precip_unc_map(
        data=ssp_wrf_pool_perc, title=("Multi-model pool"),
        vmin=vmin, vmax=vmax, sopt=False,
        height=300, width=300)

pool_diff_plot = make_precip_unc_map(
        data=delta_wrf_pool_perc, title=("Multi-model pool"),
        vmin=delta_vmin, vmax=delta_vmax, sopt=True,
        height=300, width=300)

pool_sig_diff = pool_diff_plot*hv.Points(
        pooled_p_df).opts(color='k',
        marker='dot', size=5)

mmm_sig_diff = mmm_diff_plot*hv.Points(
        mmm_p_df).opts(color='k',
        marker='dot', size=5)

pooled_hist = mmm_hist_plot + pool_hist_plot
pooled_ssp = mmm_ssp_plot + pool_ssp_plot
pooled_diff = mmm_diff_plot + pool_diff_plot
pooled_stippled = mmm_sig_diff + pool_sig_diff

pool_cads_tabs = pn.Tabs(
                ('Present-day',pooled_hist),
                ('+ '+ str(warm_level)+' C',pooled_ssp),
                ('Difference',pooled_diff),
                ('Difference, with significance',
                 pooled_stippled), dynamic=False)

Show maps of the results of taking the multi-model mean and pooling the data across models:

  • Present-day (1981 - 2010) 99th percentile monthly precipitation accumulation
  • 99th percentile monthly precipitation accumulation under warming conditions (+/- 15 years from the year in which the warming threshold was reached)
  • The change in 99th percentile monthly precipitation realized under those warming conditions

We also highlight the points where the difference is statistically significant (p<0.05) via the KS test.

pool_cads_tabs

Interpretation

Differences between results obtained via a multi-model mean or multi-model pool are subtle. There are even many points at which the difference is statistically significant in both the mean and pooled results. This might suggest that either method is acceptable. However, extreme events are rare (i.e., they give us a very small sample size), there are large differences between models (Section 1.1), and there is even more substantial internal variability (Section 1.4). In light of this, any statistical significance between the multi-model means can lead to overconfidence in the results . Instead, the available data should be pooled to increase the sample size.

Summary and key points

As discussed in Section 1.2, the model uncertainty shown in this notebook occurs as a function of both actual model differences and internal variability.

  • The uncertainty related to these sources is exacerbated by small sample size.
  • The ‘Difference’ results in 1.1 show that cross-model results can be in conflict with one another, with some models showing marked decreases in extreme precipitation where others show large increases. This makes it difficult to use these projections for impacts research and planning and preparedness purposes.

Here we overcame some of these issues by pooling the data across models before computing statistics, a strategy which provided the following advantages:

  • Increased the underlying sample size from which to derive statistics.
  • Provided interpretable results which are useful from an impacts perspective.

Want to know more about model uncertainty? Check out the model_uncertainty.ipynb notebook too!