Glossary

This glossary is a non-exhaustive source for definitions of key terms and concepts related to the Analytics Engine and its associated tools and applications. This page includes detailed explanations of data products, use cases, analytical tools, and methodologies used in climate data analysis to support a user’s understanding and application of the information in the Analytics Engine. This glossary is designed to complement existing extensive meteorological glossaries, such as the American Meteorological Society’s Glossary of Meteorology, with specific terminologies that are focused on the Analytics Engine.

Annual Maximum Series (AMS)

Time series of observations where the maximum value is found for each year of the time series.

Bias correction

Various techniques applied to global climate model output to adjust for systematic errors (or biases). For example, a bias adjustment can be applied to model output which underestimates extreme precipitation events, or temperatures that are consistently too high. There are several sources of model biases that bias adjustment can adjust for, including spatial resolutions that are not fine enough to resolve all types of weather phenomena, or because model formulations rely on simplifications of physical processes.

Block maximum series

A block maximum series is a time series where extreme values are calculated from a time series of data by segmenting the time series into “blocks” of equal duration. The block maximum series is a more generalized approach to the Annual Maximum Series, where the block size used is one year.

ClimaKitAE

An open-source Python library that contains functionality to work with cloud-optimized gridded climate data. It includes basic tools for creating data objects and visualizations, as well as more advanced tools that support various applications of climate data including analyzing extreme events, understanding regional responses at different global warming levels, and describing uncertainty across different simulations. ClimaKitAE stands for “climate toolkit Analytics Engine”.

CF conventions

Metadata conventions for formatting and documenting netCDF format data, to ensure standardized reporting and variable naming across different data sources. CF stands for “climate and forecast”. Data supported on the Analytics Engine adhere to CF conventions.

CMIP6

The Coupled Model Intercomparison Project, Phase 6 is a coordinated international effort to produce climate model output with consistent standards in areas such as variable naming and experimental design. This consistency allows different models to be compared to each other more easily. CMIP6 is the latest generation of global climate models (ca. 2020), used in the Intergovernmental Panel on Climate Change Sixth Assessment Report (IPCC AR6), and in California’s Fifth Climate Change Assessment.

Derived Variable

Derived variables are variables that are computed from one or more other variables. An example is “heating degree days” which uses the air temperature variable as an input. Derived variables expand the available variable list beyond the native variables available within the Analytics Engine data sources.

Dynamical downscaling

A method to generate fine scale spatial resolution data from coarse scale global climate models using a dynamical regional weather model. The regional model uses the global climate model output as the inputs to generate additional projections. The Analytics Engine hosts dynamically downscaled model output created using the Weather Research and Forecasting (WRF) model.

Extreme value analysis

An area of statistical analysis focused on extremes relative to the median of a probability distribution. Concepts from extreme value analysis can be used to analyze extreme weather events (such as a heat wave or extreme precipitation event), including event return levels and return periods. Successful extreme value analysis requires a larger number of samples than other analyses (typically requiring multiple GCMs and ensemble members) to accurately describe rare events.

Global Warming Level

A global warming level is defined as the difference in the global mean air temperature from the historical period (defined on the Analytics Engine as the pre-industrial period 1850-1900). Global warming levels are frequently used in international policy discussions (for example, goals to constrain global warming to 1.5 or 2 degrees Celsius). Global warming levels can also be useful in planning processes to compare potential regional responses to climate change at different warming levels, which enables comparison of models across different emissions scenarios. The standard global warming levels are 1.5°C, 2°C, 3°C, and 4°C.

Internal variability

Internal variability represents the natural (stochastic) variations in the Earth’s climate, due to interactions between the atmosphere, ocean, land surface, and sea ice. In global climate models with multiple simulations, this uncertainty can result in slightly different outputs based on small differences in the model’s physical processes design. For example, the timing of events such as the El Niño pattern in the Pacific Ocean may vary between simulations, depending on the global climate model’s parameterization.

Localization

Statistical method to produce time-series of future weather at a point-location where a weather station is presently in operation. Localization involves bias adjusting gridded climate model output, based on observations from the weather station. One example of a localization methodology used in the Analytics Engine is quantile delta mapping, which preserves changes in individual quantiles rather than applying a correction to the data mean. This approach produces data which are consistent with existing workflows which are built or trained on historical weather observations.

KS test

The Kolmogorov-Smirnoff test for nonparametric data evaluates the equality of two different one-dimensional distributions. The KS test can be used to determine when statistical significance is achieved when two samples (e.g., present-day and future 99th percentile extremes) are sufficiently different from one another. As such, it is commonly used to identify if climate change is making statistically significant changes to historical conditions, and is sometimes preferred to other approaches as it is nonparametric or agnostic of the shape of the distribution.

Model run

Model run refers to data from different initial-condition ensemble runs of a GCM. Climate models are often run multiple times with slightly different initial conditions (ensemble members), and each of these runs is termed as a 'model run'. For instance, the Analytics Engine contains LOCA2-Hybrid data for different model runs of the ACCESS-CM2 model. ACCESS-CM2 r1i1p1f1 refers to one model run while ACCESS-CM2 r2i1p1f1 is another model run.

Model uncertainty

Uncertainty in global climate model output that arises from design differences between models. Global climate models are developed by different research institutions and differ in how they represent the global climate system. In some approaches, it may be appropriate to average responses from different models to obtain a consensus model estimate (referred to as a multi-model mean). Alternatively, it may be informative to look at the spread of a particular response across multiple models.

Notebook

The Analytics Engine uses Jupyter Notebooks to be a user-friendly and interactive example workflow from start to finish of an application of climate data. Notebooks contain examples pertaining to aspects of use cases or applications of data products.

Scenario uncertainty

Uncertainty that arises from not knowing how people, policies, the economy, and technology will evolve in the future to address the issue of climate change. Projecting levels of future emissions requires inherent assumptions about economic production, land use, technological advancements, and energy use (see Shared Socioeconomic Pathways) that create differences in the timing and strength of climate response between scenarios.

Shared Socioeconomic Pathways (SSPs)

Shared Socioeconomic Pathways (SSPs) are inputs into the latest generation of IPCC reports which describe potential pathways the world could take in terms of features such as political, economic, and other societal dynamics and choices which impact greenhouse gas emissions, anthropogenic aerosol generation, and land use changes. Shared Socioeconomic Pathways are an update to the “Representative Concentration Pathways'' (RCPs) used in an earlier CMIP phase. The Analytics Engine has multiple data sources with several SSPs, including SSP2-4.5, SSP3-7.0, and SSP5-8.5.

Statistical downscaling

A method to generate fine scale spatial resolution data outputs from coarser scale global climate models using statistical relationships between the coarse global climate model output and observed climatological conditions at fine scale spatial resolutions. The Analytics Engine hosts statistically downscaled model output from LOCA2-Hybrid.

Tool

A Python function to work with climate data, analyze data, or produce visualizations. Tools are located within ClimaKitAE.

Toolkit

A Python package consisting of a set of functions [tools] for climate data operations, analyses, and visualizations.

Typical meteorological year

A typical meteorological year (TMY) is a complete set of meteorological variables at a given location for every hour in a year. The TMY functionality in the Analytics Engine can replicate the most likely hourly conditions for current and future climate conditions. TMYs are used in some building and energy system modeling applications to describe typical annual weather conditions at a specific location. A TMY is a specific kind of 8760 hourly profile (see, 8760).

Use-case

Use-cases are written in the following format: [ data product ] + [ application of data product ]

  • ‘Data product’ – consists of the climate data itself being developed by the Analytics Engine development team as a result of the Analytics Engine analytical tools. These data products (e.g. hourly climate profiles, threshold based analytics, distribution of extreme events, sector specific metrics, and more) are designed to be generalizable, relevant to future use-cases, and able to be modified by end users to address unanticipated applications
  • ‘Application of the data product’ – a specific application as identified by energy industry partners, which explains how the data products can be used within the electricity sector to address an identified need. These applications of climate data may be broader or narrower depending on an agency’s specific planning/decision-making needs (e.g. conducting asset-by-asset vulnerability assessments, informing peak load and demand forecasts, examining impacts on renewable energy generation, etc.)
  • Example: “Threshold-based analytics for asset-by-asset vulnerability assessments and updating design standards”. Where “threshold-based analytics” is the data product, and “asset-by-asset vulnerability assessments and updating design standards” is the application of the data product.

Zarr

Filetype for storing large, multidimensional arrays (such as gridded climate data with spatial and temporal dimensions), optimized for cloud storage.

8760

A representative hourly time series for a variable for one year (24 hours x 365 days in a year), used to characterize typical environmental conditions in many energy system modeling applications. 8760s are also referred to as hourly profiles.