About the Data

What climate data does the Analytics Engine provide?

The Analytics Engine hosts climate change projections and related data, much of which was generated for California's Fifth Climate Change Assessment. The derived projections were created using two methods: 1) dynamical downscaling and 2) hybrid-statistical downscaling, both applied to outputs of global climate models (GCMs) from CMIP6 (Coupled Model Intercomparison Project Phase 6).

California academic institutions implement dynamical and hybrid-statistical downscaling methods in support of California Energy Commission (CEC) initiatives. The dynamical downscaling data developed at UCLA utilized the Weather Research & Forecasting Model (WRF). This method created several physically-based projections that are available for public use and Analytics Engine use. This dynamically downscaled data was then used to train the Localized Construct Analogs (LOCA2) at Scripps Institution of Oceanography. The LOCA2 hybrid-statistical datasets are bias-corrected GCM outputs that are adjusted to be consistent with station observations. WRF and LOCA2 downscaled datasets differ in the number of GCMs, variables, temporal resolution, and Shared Socioeconomic Pathways (SSPs) that are included. Depending on the analysis desired, one data set might be preferred; this is covered in the Analytic Engine Guidance.

A summary of the datasets hosted on the Analytics Engine is listed in the table below; they can be viewed in detail in the Data Catalog. For more information on how to access these datasets, see the Accessing Data section. For additional details on Dynamical versus Statistical downscaling approaches, see the California Climate Change Assessment Justification Memos, as well as our forthcoming About Climate Models and Data and Glossary of Terms sections. To understand how to use WRF and LOCA2 data to answer application-specific questions for decision-making and planning purposes, refer to our forthcoming Guidance section. For further information on these introductory comments, refer to:

What additional data does the Analytics Engine provide?

In addition to hosting dynamical and hybrid-statistical downscaled climate datasets, the Analytics Engine also hosts vector datasets containing administrative boundaries, hydrologic boundaries, and key regions of interest to the energy sector. Within the Jupyter notebooks available on the Analytics Engine, the below vector datasets can be used to select, view, aggregate, and summarize climate data for a geography of interest:

  • California counties
  • Watersheds (HUC10)
  • California Electricity Demand Forecast Zones
  • California Electric Balancing Authority Areas
  • Investor- and public-owned electrical utility service territories
  • State boundaries

What derived variables and indices are available on the Analytics Engine?

In addition to the climate variables available on the Analytics Engine, several derived variables and indices may be of interest. From a computation standpoint, there is no difference between a derived variable and an index in their construction - they both calculate a new variable based on input variables. However, an index usually has a “next step of interpretation” (e.g. a public safety warning may be issued when Heat Index exceeds 80°F). A derived variable expands the available variable list beyond the native variables within the data.

  • Specific humidity: the amount of water vapor within a unit amount of air. Specific humidity is only computed for hourly WRF data; LOCA2 provides specific humidity directly.

  • Heating and cooling degree days: a measure of how cold or warm a location is in reference to a standard temperature. Heating degree days are indicative of how much lower the ambient temperature is from the standard temperature; cooling degree days are indicative of how much higher the temperature is above the threshold temperature. A common standard temperature threshold for heating/cooling degree days is 65°F.

  • Heating and cooling degree days and hours are accessible through ClimaKitAE functions. The computation and interpretation demonstration can be found in the annual_consumption_model.ipynb Jupyter Notebook.

  • Heating and cooling degree hours: Similar to heating and cooling degree days, heating and cooling degree hours represent the number of hours in each day by how warm or cold a location is in reference to a standard temperature threshold. Heating degree hours are the number of hours that are lower than the reference temperature threshold, while cooling degree hours are the number of hours that are higher than the reference temperature threshold.

Derived Indices

  • Effective temperature: a comparison measure of the current day’s air temperature and the day prior in order to consider consumer behavior and perception of the weather. Calculated by taking half of the prior day’s temperature added to half of the current day’s temperature.

  • NOAA Heat Index: a measure of how hot weather “actually feels” on the body by accounting for air temperature and humidity.

  • The heat_index.ipynb Jupyter Notebook on the Analytics Engine provides a walkthrough that describes how to retrieve and interpret NOAA Heat Index values.

  • Fosberg Fire Weather Index: provides a quantitative assessment of weather impacts on fire management.

Each of the derived variable and index options are available through the Select function within the Jupyter notebooks. The Select function displays the “Choose Data Available with the Cal-Adapt Analytics Engine” panel by selecting “Derived Index”.

Image of Cal-Adapt: Analytics Engine data selection user interface

Is the data on the Analytics Engine credible?

All of the downscaled data available on the Analytics Engine comes from models that have undergone rigorous skill evaluation in terms of how well they capture specific relevant characteristics of California’s climate. These models perform well for both process-based and local climate metrics (Dynamical Downscaling memo) and are able to simulate key physical processes and patterns that strongly influence the hydrological cycle and extreme weather in California. They are also able to capture local climatic patterns such as annual and seasonal temperature and precipitation patterns. This state-level model evaluation and assessment, described in Krantz et al. 2021, is unique to California and lends credibility to the data hosted on the Cal-Adapt: Analytics Engine.

Although all the models on the Analytics Engine are skilled at representing California’s climate, not all models perform equally well for any specific sub-region within California or for any given metric - particularly ones that the models were not specifically evaluated for. Therefore, the Analytics Engine also provides tools and guidance (see the forthcoming Guidance section) to help a user conduct additional, context-specific credibility analyses, such as examining the skill of the data or models for their specific region, metric, and/or application.

Can users utilize external and/or private data on the Analytics Engine?

Yes, users are provided with a private directory space in the JupyterHub as a workspace associated with their account. Users retain this workspace between sessions, and any data uploaded to the workspace is inaccessible to other users.

How can users contribute data or code to the Analytics Engine?

Contributing data

The Analytics Engine team welcomes users to submit data for consideration for inclusion in the Data Catalog. Please submit a request via email to analytics@cal-adapt.org. These requests will be considered on a case-by-case basis depending on size, quality, and relationship to other funded work. All datasets must comply with CF conventions, as outlined in our Metadata Standards.

Contributing Python functions or Jupyter Notebooks

We welcome contributions of analyses using Analytics Engine data and ClimaKitAE Python library tools to demonstrate specific applications for climate data in California. Please see our contribution guidelines for guidance on contributing example analyses to the Analytics Engine.