AE Analytics
Methods
Fetching Global Warming Level data in the Analytics Engine
This section outlines the steps a user takes to access data on global warming levels using climakitAE. The tools in the AE platform can calculate global warming levels for dynamically downscaled (WRF) or statistically downscaled (LOCA) data using the same methods. For more information on how to select a dataset for an application of interest, refer to the Guidance section on the AE website titled Using Climate Data in Decision-Making.
The specific methodology and calculations underlying the generation of GWL data are covered in the next section.
California climate data on GWLs can be retrieved in two ways depending on the user’s preferred workflow:
1. Use the get_data()
function in climakitae
2. Use a GUI that visually shows all available options
Option 1: Retrieve WLs directly using the get_data()
function
This method uses a simple function from the data_interface
module
in
climakitae
to retrieve warming levels data directly using a python
function – no GUI required. This might be a good approach for a user that already
knows what data they want, and doesn’t need to examine the full set of options.
More information and additional options are available in the
get_data()
function – this is accessible in the
climakitae_direct_data_download.ipynb notebook.
Some basic information on how to use this method is shown below. Users are
encouraged to work through the notebook
climakitae_direct_data_download.ipynb
in the Analytics Engine for
more details on how this function works.
1. Import the function from the data_interface module:
from climakitae.core.data_interface import get_data
2. Optional: Print the function docstrings to see the possible function inputs. Note that some of these arguments require an input (variable, downscaling_method, resolution, and timescale), and some of the arguments are ignored for a warming level approach (time_slice and scenario).
print(get_data.__doc__)
3. Set the arguments to retrieve warming levels data:
get_data(
variable = "Precipitation (total)",
downscaling_method = "Dynamical",
resolution = "45 km",
timescale = "monthly",
approach = "Warming Level",
warming_level_window = 10,
warming_level = [2.0, 2.5, 3.0, 4.0]
)
Make sure the argument approach
is set to “Warming Level”
; the function defaults to a time-based approach. The argument warming_level_window
defaults to 15 (years on either side, i.e. a 30 year window), and the argument
warming_level
defaults to 2.0
(2 deg C).
Option 2: Retrieving WLs through the GUI
To visualize the available data options and minimize the amount of coding, a
GUI-based approach is also provided. To display the selections GUI in a
notebook, run the following lines of code (which are also listed in the
getting_started.ipynb
):
import climakitae as ck
import climakitaegui as ckg
selections = ckg.Select()
selections.show()
After the last line is executed, the following panel pops up:

The two important sections that need to be defined before retrieving WL data are boxed in red. Similarly to the manual parameter definition above, the “Approach” must first be set to “Warming Level”, then the desired time window (“Years around GWL”) and desired warming levels can be selected (1.5 deg C - 4.0 deg C).
After making these selections, running data = selections.retrieve()
in a following cell will load a warming levels data object. Below will show
how this data is shaped and how to interact with it.
Working with the WL data object
The warming levels object will look like the following:

The dimensions can be interpreted as follows:
- warming_level: the number of warming levels in the data object. Since we listed 2.0 and 2.5 warming levels in the examples above, we see that both 2.0 and 2.5 warming levels were retrieved in the resulting data object.
- time_delta: this is the number of time steps from the center warming level year in this object. Since we specified monthly data with a 30-year window, this results in 360 time steps (30 years x 12 months). The time steps are labeled with coordinate values of -180 to 179, with a time_delta= 0 indicating the year that the climate simulation reaches the specified global warming level.
- y,x: spatial dimensions for WRF data.
- simulation: the simulation names grabbed for this set of parameters, where the names are listed as: [Downscaling_Method]_[Model]_[Ensemble Member]_[Historical Data Used]_[SSP]
There may be warnings that pop up when using warming level data that look like the following:

These are only printed to illustrate the limitations of the data object returned, as different warming levels may have a different number of simulations that reach that given warming level.
How California-focused Warming Level data is Calculated
This section outlines the methodology for calculating data on GWL in the Analytics Engine. These are the calculations that take place “under the hood” (i.e., in the climakitae code for AE) when you retrieve GWL data as described in the previous section. The methods described here follow the approach used by the IPCC AR6 report as closely as possible.
Calculating GWL on the Analytics Engine is a two-step process:
1. Generate the GWL lookup Tables for Models
For each global climate model (GCM) simulation in the CMIP6 archive, the average global temperature increase relative to pre-industrial conditions (1850-1900) is measured for each year. This time-series of global warming is smoothed with a 20-year running average and used to create a lookup table to determine when each simulation reaches a given global warming level.
2. Retrieve the Analytics Engine Model data at selected GWL years
Using the years that each global simulation reaches a specified warming level as determined by the GWL lookup table in step 1, the corresponding years of data are taken from each regionally downscaled climate simulation. A slice of the time series (typically 30 years) is taken from each simulation, centered on the year that simulation reaches the specified warming level. This data therefore represents the estimated regional climate impacts that will be felt at a given level of global warming.
A detailed description of how these two steps are implemented in the Analytics Engine is described below:
Step 1: Generate the GWL lookup tables for models
The GWL lookup table captures what year each global climate simulation reaches each warming level. This table is pre-generated in the climakitae repository, and is only updated if changes are made to the methodology or new warming levels are added to the platform. No actions are required by the user for this step, but the section is provided as technical documentation and outlines the method used to generate this table for transparency.
1. From the CMIP6 catalog, all CMIP6 models and their ensemble members are selected via the Pangeo CMIP6 CSV.
2. Global average surface air temperature for each ensemble member is then calculated by a spatially weighted average of the tas (or appropriate surface air temperature) variable using this formula (which is essentially a weighted average of all grid cells around the world, which accounts for the fact that grid cells towards the poles are smaller than grid cells near the equator):
weightlat = np.sqrt(np.cos(np.deg2rad(ensemble_mem[lat])))
weightlat = weightlat / np.sum(weightlat)
timeseries = (ensemble_mem * weightlat).sum(lat).mean(lon)
3. Each time series is smoothed with a 20-year running average window. Then, the month/year that each time-series first exceeds a certain degree of warming (1.5, 2.0, 2.5, 3.0, 4.0) relative to the average temperature from a given reference period is computed and then saved per model into a lookup table of GWL, with the model as the index.
4. The lookup tables are saved in the data directory of the climakitae repository:
a. gwl_1850-1900ref.csv
uses the reference period 1850-1900, consistent
with the IPCC warming level definitions. This reference period can not be used
to calculate anomalies (different from historical), because the downscaled data
only extends back to 1950.
b. gwl_1980-2010ref.csv
uses the reference period 1980-2010, and
is only used when calculating anomalies.
c. A 20-year running average window is used to determine the “crossing year” for each GWL by the center of the window. This ensures that any one particularly high value year does not skew the results when the overall average temperature trend has not yet reached the warming level.
5. Additionally, the GWL at each month is saved for each ensemble member
from 1860-2090 into gwl_1850-1900ref_timeidx.csv
and
gwl_1980-2010ref_timeidx.csv
. These act as translations between
time and WLs.
a. These files have time as the index, whereas the files generated in Step 3 have the model as the index.
b. The time range of these files is 1860-2090 because the 20-year running average window clips the first and last 10 years.
6. The above steps are calculated for all CMIP6 models. The CESM2-LENS data is processed separately due to a slightly different data structure, but follows the same methodology.
Example of GWL lookup tables
WL-based indexing (Step 1, part 3): gwl_1850-1900ref.csv

Time-based indexing (Step 1, part 4): gwl_1850-1900ref_timeidx.csv

Step 2: Retrieval of the Analytics Engine model data at selected GWLs
When data on warming levels is retrieved using get_data()
or the
Select GUI, the following procedure will be run:
For each warming level:
For each simulation within our AE WRF/LOCA2-Hybrid catalog (depending on downscaling method):
A. Find the centered year that that simulations passes this global warming
level using the table generated from step 3 in Global Warming Level
calculations (gwl_1850-1900ref.csv
).
B. Slice the window (i.e. +/- 15 years) around the centered year from step 1 for the current simulation.
C. Filter for desired months and remove leap days.
D. Reset the time index so that all simulations can be stacked on top of each other.
a. Change timestamps to timedeltas with centered_year
coordinates.
b. i.e. a 30-year simulation with monthly frequency data from 2010-2040 is
transformed into a dataset with time-deltas from -180 to 179 (360 months in
30 years) with an added
centered_year
coordinate of 2025.
c. The time dimension is now called time_delta
because it represents
the time distance from the central year.
E. Simulations that don’t reach this given warming level are set to NaN.
Now, you will be able to view your warming level data through the data
object.
