About the Data

What climate data does the Analytics Engine provide?

The Analytics Engine hosts climate change projections and related data, much of which was generated for California's Fifth Climate Change Assessment. The derived projections were created using two methods: 1) dynamical downscaling and 2) hybrid-statistical downscaling, both applied to outputs of global climate models (GCMs) from CMIP6 (Coupled Model Intercomparison Project Phase 6).

California academic institutions implement dynamical and hybrid-statistical downscaling methods in support of California Energy Commission (CEC) initiatives. The dynamical downscaling data developed at UCLA utilized the Weather Research & Forecasting Model (WRF). This method created several physically-based projections that are available for public use and Analytics Engine use. This dynamically downscaled data was then used to train the Localized Construct Analogs (LOCA2-Hybrid) at Scripps Institution of Oceanography. The LOCA2-Hybrid hybrid-statistical datasets are bias-corrected GCM outputs that are adjusted to be consistent with station observations. WRF and LOCA2-Hybrid downscaled datasets differ in the number of GCMs, variables, temporal resolution, and Shared Socioeconomic Pathways (SSPs) that are included. Depending on the analysis desired, one data set might be preferred; this is covered in the Analytic Engine Guidance.

A summary of the datasets hosted on the Analytics Engine is listed in the table below; they can be viewed in detail in the Data Catalog. For more information on how to access these datasets, see the Accessing Data section. For additional details on Dynamical versus Statistical downscaling approaches, see the California Climate Change Assessment Justification Memos, as well as our forthcoming About Climate Models and Data and Glossary of Terms sections. To understand how to use WRF and LOCA2-Hybrid data to answer application-specific questions for decision-making and planning purposes, refer to our forthcoming Guidance section. For further information on these introductory comments, refer to:

How are datasets defined and used in the Analytics Engine?

Downscaled simulations of California’s climate

The climate projections hosted on the Cal-Adapt: Analytics Engine are regional simulations over the Western United States and California, produced by downscaling Global Climate Models (GCM) from the sixth iteration of the Climate Model Intercomparison Project (CMIP6). These downscaled simulations were created in support of California's Fifth Climate Change Assessment, and detailed information about the models used and the data produced can be found in the Data section.

Additional data sources for calculating data on Global Warming Levels

When using the Analytics Engine to calculate climate variables at global warming levels, data from two additional sources are utilized in the calculations: the CMIP6 archive hosted by Pangeo, and data from the IPCC AR6 Report. A detailed methodology of how this data is used will be included in an upcoming section “How California-focused Warming Level Time Series are Calculated”, and additional guidance on how to incorporate global warming levels into planning is given here.

Warming level years from global GCM runs

Global warming levels are defined as the average increase in global surface air temperature relative to pre-industrial conditions (1850-1900). Combining data from several climate simulations around the years that each one reaches specified global warming levels provides a clear standard of measuring how regional impacts will scale with different levels of global warming. Because the WRF and LOCA2-Hybrid simulations in the Analytics Engine are run only over the Western US they can not be used to evaluate global temperature, so data from the parent GCM simulations is used to determine the range of years from each simulation that corresponds to a given global warming level.

For this calculation, the Analytics Engine utilizes a publicly available archive of CMIP6 simulations hosted on AWS by the Pangeo project.

GWL timing for scenarios overall

One of the benefits of analyzing projections of climate change on global warming levels is that years for warming level estimates can be estimated independently from the trajectories of the GCMs themselves. This is particularly beneficial because the CMIP6 models are known to have a much wider range in their climate sensitivity, which introduces significant uncertainty into estimates of when each warming level will be reached.

Because of this, the Intergovernmental Panel on Climate Change (IPCC) draws in data from a variety of other scientific studies including observational constraints and climate emulators to produce a refined estimate of likely warming trajectories under each climate scenario. Details about the inputs to these likely trajectories are described in the 4th chapter of IPCC Sixth Assessment Report (AR6), and an overview of the resulting best estimates and ranges are provided in the Technical Summary.

When using warming levels, the Analytics Engine provides estimates for the crossing year and very likely range (90% confidence interval) based on the publicly available data used in the IPCC report.

Best estimate and 90% confidence interval for the year the world will reach 2.0 °C of warming under the SSP 3-7.0 scenario. The median year of crossing is denoted by the vertical dashed line and is provided. The solid lines represent the first and last occurrence of the 5th and 95th percentiles crossing the designated warming level, with the range in years provided. For example, the median year of GCMs reaching 2.0°C of warming is in 2047, with a range of 24 years between the first and last occurrence. Figure produced by the Cal-Adapt: Analytics Engine based on data from the IPCC. Figure 1. Best estimate and 90% confidence interval for the year the world will reach 2.0 °C of warming under the SSP 3-7.0 scenario. The median year of crossing is denoted by the vertical dashed line and is provided. The solid lines represent the first and last occurrence of the 5th and 95th percentiles crossing the designated warming level, with the range in years provided. For example, the median year of GCMs reaching 2.0°C of warming is in 2047, with a range of 24 years between the first and last occurrence. Figure produced by the Cal-Adapt: Analytics Engine based on data from the IPCC.

 

Global warming level year exceedence and range (Surface air temperature increase relative to 1850-1900)
  1.5°C 2.0°C 2.5°C 3.0°C
  Best Estimate Very Likely Range (5-95%) Best Estimate Very Likely Range (5-95%) Best Estimate Very Likely Range (5-95%) Best Estimate Very Likely Range (5-95%)
SSP 1-2.6 2033 2024-2100+ - - - - - -
SSP 2-4.5 2031 2024-2043 2053 2039-2081 2080 2054-2100+ - -
SSP 3-7.0 2031 2024-2041 2047 2037-2061 2062 2049-2080 2076 2060-2097
SSP 5-8.5 2028 2022-2037 2042 2034-2054 2054 2044-2069 2065 2053-2083
Table 1. Summary of best estimates and ranges for crossing reaching global warming levels under each SSP scenario. Summarized from data from the IPCC.

What additional data does the Analytics Engine provide?

In addition to hosting dynamical and hybrid-statistical downscaled climate datasets, the Analytics Engine also hosts vector datasets containing administrative boundaries, hydrologic boundaries, and key regions of interest to the energy sector. Within the Jupyter notebooks available on the Analytics Engine, the below vector datasets can be used to select, view, aggregate, and summarize climate data for a geography of interest:

  • California counties
  • Watersheds (HUC10)
  • California Electricity Demand Forecast Zones
  • California Electric Balancing Authority Areas
  • Investor- and public-owned electrical utility service territories
  • State boundaries

What derived variables and indices are available on the Analytics Engine?

In addition to the climate variables available on the Analytics Engine, several derived variables and indices may be of interest. From a computation standpoint, there is no difference between a derived variable and an index in their construction - they both calculate a new variable based on input variables. However, an index usually has a “next step of interpretation” (e.g. a public safety warning may be issued when Heat Index exceeds 80°F). A derived variable expands the available variable list beyond the native variables within the data.

  • Specific humidity: the amount of water vapor within a unit amount of air. Specific humidity is only computed for hourly WRF data; LOCA2-Hybrid provides specific humidity directly.

  • Heating and cooling degree days: a measure of how cold or warm a location is in reference to a standard temperature. Heating degree days are indicative of how much lower the ambient temperature is from the standard temperature; cooling degree days are indicative of how much higher the temperature is above the threshold temperature. A common standard temperature threshold for heating/cooling degree days is 65°F.

  • Heating and cooling degree days and hours are accessible through ClimaKitAE functions. The computation and interpretation demonstration can be found in the annual_consumption_model.ipynb Jupyter Notebook.

  • Heating and cooling degree hours: Similar to heating and cooling degree days, heating and cooling degree hours represent the number of hours in each day by how warm or cold a location is in reference to a standard temperature threshold. Heating degree hours are the number of hours that are lower than the reference temperature threshold, while cooling degree hours are the number of hours that are higher than the reference temperature threshold.

Derived Indices

  • Effective temperature: a comparison measure of the current day’s air temperature and the day prior in order to consider consumer behavior and perception of the weather. Calculated by taking half of the prior day’s temperature added to half of the current day’s temperature.

  • NOAA Heat Index: a measure of how hot weather “actually feels” on the body by accounting for air temperature and humidity.

  • The heat_index.ipynb Jupyter Notebook on the Analytics Engine provides a walkthrough that describes how to retrieve and interpret NOAA Heat Index values.

  • Fosberg Fire Weather Index: provides a quantitative assessment of weather impacts on fire management.

Each of the derived variable and index options are available through the Select function within the Jupyter notebooks. The Select function displays the “Choose Data Available with the Cal-Adapt Analytics Engine” panel by selecting “Derived Index”.

Image of Cal-Adapt: Analytics Engine data selection user interface

Is the data on the Analytics Engine credible?

All of the downscaled data available on the Analytics Engine comes from models that have undergone rigorous skill evaluation in terms of how well they capture specific relevant characteristics of California’s climate. These models perform well for both process-based and local climate metrics (Dynamical Downscaling memo) and are able to simulate key physical processes and patterns that strongly influence the hydrological cycle and extreme weather in California. They are also able to capture local climatic patterns such as annual and seasonal temperature and precipitation patterns. This state-level model evaluation and assessment, described in Krantz et al. 2021, is unique to California and lends credibility to the data hosted on the Cal-Adapt: Analytics Engine.

Although all the models on the Analytics Engine are skilled at representing California’s climate, not all models perform equally well for any specific sub-region within California or for any given metric - particularly ones that the models were not specifically evaluated for. Therefore, the Analytics Engine also provides tools and guidance (see the forthcoming Guidance section) to help a user conduct additional, context-specific credibility analyses, such as examining the skill of the data or models for their specific region, metric, and/or application.

What data challenges are the Analytics Engine team currently aware of?

The Analytics Engine team is aware of the following data issues and will provide updates when available.

Can users utilize external and/or private data on the Analytics Engine?

Yes, users are provided with a private directory space in the JupyterHub as a workspace associated with their account. Users retain this workspace between sessions, and any data uploaded to the workspace is inaccessible to other users.

How can users contribute data or code to the Analytics Engine?

Contributing data

The Analytics Engine team welcomes users to submit data for consideration for inclusion in the Data Catalog. Please submit a request via email to analytics@cal-adapt.org. These requests will be considered on a case-by-case basis depending on size, quality, and relationship to other funded work. All datasets must comply with CF conventions, as outlined in our Metadata Standards.

Contributing Python functions or Jupyter Notebooks

We welcome contributions of analyses using Analytics Engine data and ClimaKitAE Python library tools to demonstrate specific applications for climate data in California. Please see our contribution guidelines for guidance on contributing example analyses to the Analytics Engine.