Glossary

This glossary is a non-exhaustive source for definitions of key terms and concepts related to the Analytics Engine and its associated tools and applications. This page includes detailed explanations of data products, use cases, analytical tools, and methodologies used in climate data analysis to support a user’s understanding and application of the information in the Analytics Engine. This glossary is designed to complement existing extensive meteorological glossaries, such as the American Meteorological Society’s Glossary of Meteorology, with specific terminologies that are focused on the Analytics Engine.

Annual Maximum Series (AMS)

Time series of observations where the maximum value is found for each year of the time series.

Bias correction

Various techniques applied to global climate model output to adjust for systematic errors (or biases). For example, a bias adjustment can be applied to model output which underestimates extreme precipitation events, or temperatures that are consistently too high. There are several sources of model biases that bias adjustment can adjust for, including spatial resolutions that are not fine enough to resolve all types of weather phenomena, or because model formulations rely on simplifications of physical processes.

Block maximum series

A block maximum series is a time series where extreme values are calculated from a time series of data by segmenting the time series into “blocks” of equal duration. The block maximum series is a more generalized approach to the Annual Maximum Series, where the block size used is one year.

Cascading event

A cascading event, or successive event, is when two or more hazards occur in sequence of each other. Cascading events can lead to a build-up of impacts, and can also result from the intersection of climate hazards and other environmental hazards. For example, the 2018 Montecito mudslide was preconditioned by a severe wildfire season and a strong low-pressure system and cold front, which then was followed by extreme in-land flooding.

ClimaKitAE

An open-source Python library that contains functionality to work with cloud-optimized gridded climate data. It includes basic tools for creating data objects and visualizations, as well as more advanced tools that support various applications of climate data including analyzing extreme events, understanding regional responses at different global warming levels, and describing uncertainty across different simulations. ClimaKitAE stands for “climate toolkit Analytics Engine”.

Climate sensitivity

Climate sensitivity is a measurement that describes the equilibrium change in global mean surface temperature resulting from a doubling of atmospheric carbon dioxide (CO2) concentrations compared to pre-industrial levels (1850-1900).

CF conventions

Metadata conventions for formatting and documenting netCDF format data, to ensure standardized reporting and variable naming across different data sources. CF stands for “climate and forecast”. Data supported on the Analytics Engine adhere to CF conventions.

CMIP6

The Coupled Model Intercomparison Project, Phase 6 is a coordinated international effort to produce climate model output with consistent standards in areas such as variable naming and experimental design. This consistency allows different models to be compared to each other more easily. CMIP6 is the latest generation of global climate models (ca. 2020), used in the Intergovernmental Panel on Climate Change Sixth Assessment Report (IPCC AR6), and in California’s Fifth Climate Change Assessment.

Compound event

A compound climate event is the combination of two or more hazards that occur at the same time. Compound events are events with multiple, potentially interacting, meteorological processes, and consequently require different analysis methods compared with univariate events. These events are also known as concurrent events, and can lead to disproportionately larger impacts. For example, a heatwave during a severe drought, and a high wind event coupled with extreme precipitation are two examples of compound events of importance for energy partners in California.

Derived Variable

Derived variables are variables that are computed from one or more other variables. An example is “heating degree days” which uses the air temperature variable as an input. Derived variables expand the available variable list beyond the native variables available within the Analytics Engine data sources.

Dynamical downscaling

A method to generate fine scale spatial resolution data from coarse scale global climate models using a dynamical regional weather model. The regional model uses the global climate model output as the inputs to generate additional projections. The Analytics Engine hosts dynamically downscaled model output created using the Weather Research and Forecasting (WRF) model.

Ensemble member

A single simulation or GCM run within a larger set of simulations (referred to as an ensemble) that are generated to assess the uncertainty and variability in climate model projections. Each ensemble member is typically produced by slightly varying the initial conditions, model parameters, or by using different models altogether. Using multiple ensemble members helps account for the natural variability of the climate system and the uncertainties inherent within model predictions.

Extreme value analysis

An area of statistical analysis focused on extremes relative to the median of a probability distribution. Concepts from extreme value analysis can be used to analyze extreme weather events (such as a heat wave or extreme precipitation event), including event return levels and return periods. Successful extreme value analysis requires a larger number of samples than other analyses (typically requiring multiple GCMs and ensemble members) to accurately describe rare events.

Generalized Pareto distribution

A probability distribution used to model the tail behavior of a variety of data types; it is particularly useful for assessing the probabilities of extreme events, such as rare climate events that exceed a certain threshold.

Generalized Extreme Value (GEV) distribution

A family of continuous probability distributions developed within extreme value theory to model the largest or smallest value from a large collection of random observations. The GEV distribution is used to model the behavior of extreme events, such as the highest or lowest temperatures, maximum wind speeds, or the most intense rainfall within a given period. Some distributions within this family include the Gumbel, Fréchet, and Weibull distributions.

Global Climate Model (GCM)

A mathematical model that represents the physical processes in the atmosphere, ocean, cryosphere, and land surface. GCMs are used to simulate the climate system and predict future climate conditions based on different scenarios of greenhouse gas emissions and other factors. For more information on how GCMs work, check out this blog post.

Global Warming Level

A global warming level is defined as the difference in the global mean air temperature from the historical period (defined on the Analytics Engine as the pre-industrial period 1850-1900). Global warming levels are frequently used in international policy discussions (for example, goals to constrain global warming to 1.5 or 2 degrees Celsius). Global warming levels can also be useful in planning processes to compare potential regional responses to climate change at different warming levels, which enables comparison of models across different emissions scenarios. The standard global warming levels are 1.5°C, 2°C, 3°C, and 4°C.

Goodness of fit

A statistical measure that describes how well a model’s outputs align with observed data. It is used to assess the accuracy of models or the accuracy of statistical methods in replicating real-world phenomena. Goodness of fit tests can help determine the validity of assumptions made in the model and are critical in evaluating the performance of climate models, particularly when comparing simulated data against historical or observed data.

Gumbel distribution

A probability distribution used to model the distribution of the maximum (or the minimum) value from a number of samples of various distributions. It is commonly applied in extreme value theory, particularly for modeling the distribution of extreme events such as floods, heatwaves, or other climate extremes.

Internal variability

Internal variability represents the natural (stochastic) variations in the Earth’s climate, due to interactions between the atmosphere, ocean, land surface, and sea ice. In global climate models with multiple simulations, this uncertainty can result in slightly different outputs based on small differences in the model’s physical processes design. For example, the timing of events such as the El Niño pattern in the Pacific Ocean may vary between simulations, depending on the global climate model’s parameterization.

Localization

Statistical method to produce time-series of future weather at a point-location where a weather station is presently in operation. Localization involves bias adjusting gridded climate model output, based on observations from the weather station. One example of a localization methodology used in the Analytics Engine is quantile delta mapping, which preserves changes in individual quantiles rather than applying a correction to the data mean. This approach produces data which are consistent with existing workflows which are built or trained on historical weather observations.

KS test

The Kolmogorov-Smirnoff test for nonparametric data evaluates the equality of two different one-dimensional distributions. The KS test can be used to determine when statistical significance is achieved when two samples (e.g., present-day and future 99th percentile extremes) are sufficiently different from one another. As such, it is commonly used to identify if climate change is making statistically significant changes to historical conditions, and is sometimes preferred to other approaches as it is nonparametric or agnostic of the shape of the distribution.

Model run

Model run refers to data from different initial-condition ensemble runs of a GCM. Climate models are often run multiple times with slightly different initial conditions (ensemble members), and each of these runs is termed as a 'model run'. For instance, the Analytics Engine contains LOCA2-Hybrid data for different model runs of the ACCESS-CM2 model. ACCESS-CM2 r1i1p1f1 refers to one model run while ACCESS-CM2 r2i1p1f1 is another model run.

Model uncertainty

Uncertainty in global climate model output that arises from design differences between models. Global climate models are developed by different research institutions and differ in how they represent the global climate system. In some approaches, it may be appropriate to average responses from different models to obtain a consensus model estimate (referred to as a multi-model mean). Alternatively, it may be informative to look at the spread of a particular response across multiple models.

Multivariate event

When several hazards affect the same region at the same time; can also result from the intersection of climate hazards and other environmental hazards. Multivariate event analysis is a complex mix of climate science, geography, and statistics. Multivariate approaches include the likelihood multiplication factor and joint return periods / bivariate probability. Multivariate events can be:

  • Temporally compounding
  • Spatially compounding
  • Pre-conditioned (extreme event superimposed on long-term trends, such as higher sea levels, heavier precipitation, and/or changing storm seasonality resulting in more frequent and severe coastal flooding)
  • Complex (non-climate stressors that exacerbate climate hazards)

Notebook

The Analytics Engine uses Jupyter Notebooks to be a user-friendly and interactive example workflow from start to finish of an application of climate data. Notebooks contain examples pertaining to aspects of use cases or applications of data products.

Pearson Type III distribution

A probability distribution that is commonly used in hydrology and climatology to model skewed data, particularly in the context of flood frequency analysis. The Pearson Type III distribution is often applied to model natural phenomena that exhibit an asymmetric distribution in frequency, such as precipitation or river discharge events.

Projection

A potential future evolution of a quantity or set of quantities, often associated with climate variables such as temperature or sea level. Projections are typically based on simulations produced by climate models, assuming specific scenarios of greenhouse gas emissions, land use, and other factors. Unlike predictions, projections do not imply certainty but rather illustrate a range of possible outcomes based on different assumptions and conditions. Projections are used to explore the potential impacts of climate change under various future pathways.

Return period

The average time interval between occurrences of a particular event, such as flood or extreme temperature, that is equaled or exceeded. For example, a 100-year return period refers to an event that has a 1% chance of occurring in any given year. Return periods are used in risk assessment and management to estimate the likelihood of extreme events and are important in the context of planning and designing infrastructure resilient to climate-related hazards.

Return probability

The probability that an event of a specified magnitude will occur in any given year. It is the reciprocal of the return period, which represents the average time between occurrences of such an event. For example, a return period of 100 years corresponds to a return probability of 1% per year, indicating that there is a 1% chance of the event occurring in any single year. Return probability is used to assess the likelihood of extreme events and to inform risk management and planning.

Return value

The magnitude or intensity of a climate or weather event that is expected to be exceeded with a specified return period. It represents the threshold or value associated with a particular return period, such as the amount of rainfall or temperature that is likely to be exceeded once every 100 years. Return values are used in risk assessment and planning to understand the severity of extreme events and to design infrastructure and systems that can withstand such events.

Scenario uncertainty

Uncertainty that arises from not knowing how people, policies, the economy, and technology will evolve in the future to address the issue of climate change. Projecting levels of future emissions requires inherent assumptions about economic production, land use, technological advancements, and energy use (see Shared Socioeconomic Pathways) that create differences in the timing and strength of climate response between scenarios.

Shared Socioeconomic Pathways (SSPs)

Shared Socioeconomic Pathways (SSPs) are inputs into the latest generation of IPCC reports which describe potential pathways the world could take in terms of features such as political, economic, and other societal dynamics and choices which impact greenhouse gas emissions, anthropogenic aerosol generation, and land use changes. Shared Socioeconomic Pathways are an update to the “Representative Concentration Pathways'' (RCPs) used in an earlier CMIP phase. The Analytics Engine has multiple data sources with several SSPs, including SSP2-4.5, SSP3-7.0, and SSP5-8.5.

Simulation

A computational process used to model and analyze the behavior of complex systems by representing them through mathematical models. Simulations involve running climate models on different parameters to predict future climate conditions based on various scenarios, such as different greenhouse gas emission pathways. These simulations help in understanding potential climate changes and their impacts by generating projections of various climate variables and assessing their possible outcomes under different assumptions. In the context of the Analytics Engine, a simulation is a specific ensemble member from a GCM.

Statistical downscaling

A method to generate fine scale spatial resolution data outputs from coarser scale global climate models using statistical relationships between the coarse global climate model output and observed climatological conditions at fine scale spatial resolutions. The Analytics Engine hosts statistically downscaled model output from LOCA2-Hybrid.

Tool

A Python function to work with climate data, analyze data, or produce visualizations. Tools are located within ClimaKitAE.

Toolkit

A Python package consisting of a set of functions [tools] for climate data operations, analyses, and visualizations.

Typical meteorological year

A typical meteorological year (TMY) is a complete set of meteorological variables at a given location for every hour in a year. The TMY functionality in the Analytics Engine can replicate the most likely hourly conditions for current and future climate conditions. TMYs are used in some building and energy system modeling applications to describe typical annual weather conditions at a specific location. A TMY is a specific kind of 8760 hourly profile (see, 8760).

Use-case

Use-cases are written in the following format: [ data product ] + [ application of data product ]

  • ‘Data product’ – consists of the climate data itself being developed by the Analytics Engine development team as a result of the Analytics Engine analytical tools. These data products (e.g. hourly climate profiles, threshold based analytics, distribution of extreme events, sector specific metrics, and more) are designed to be generalizable, relevant to future use-cases, and able to be modified by end users to address unanticipated applications
  • ‘Application of the data product’ – a specific application as identified by energy industry partners, which explains how the data products can be used within the electricity sector to address an identified need. These applications of climate data may be broader or narrower depending on an agency’s specific planning/decision-making needs (e.g. conducting asset-by-asset vulnerability assessments, informing peak load and demand forecasts, examining impacts on renewable energy generation, etc.)
  • Example: “Threshold-based analytics for asset-by-asset vulnerability assessments and updating design standards”. Where “threshold-based analytics” is the data product, and “asset-by-asset vulnerability assessments and updating design standards” is the application of the data product.

Weibull distribution

A probability distribution used to model the distribution of extreme values, particularly for analyzing the reliability of systems and the occurrence of extreme weather events. The Weibull distribution is commonly applied to fields such as hydrology and meteorology to model data such as wind speeds, precipitation, and failure times. This distribution falls within the GEV distribution family.

Working group

In the context of the Analytics Engine, a working group refers to a focused discussion involving a diverse group of climate scientists, social scientists, energy sector partners, platform developers, data users, and other stakeholders. The purpose of working group sessions is to gather feedback on the data analytics and platform development approaches of the Analytics Engine, ensuring that the resulting climate data and tools are relevant and useful for sectoral applications. These discussions critically shape the development of climate data and tools within the Analytics Engine, with the overarching goal of enhancing the usability of climate science for decision-makers and stakeholders.

Zarr

Filetype for storing large, multidimensional arrays (such as gridded climate data with spatial and temporal dimensions), optimized for cloud storage.

8760

A representative hourly time series for a variable for one year (24 hours x 365 days in a year), used to characterize typical environmental conditions in many energy system modeling applications. 8760s are also referred to as hourly profiles.