AE Guidance

Glossary

This glossary is a non-exhaustive source for definitions of key terms and concepts related to the Analytics Engine and its associated tools and applications. This page includes detailed explanations of data products, use cases, analytical tools, and methodologies used in climate data analysis to support a user’s understanding and application of the information in the Analytics Engine. This glossary is designed to complement existing extensive meteorological glossaries, such as the American Meteorological Society’s Glossary of Meteorology, with specific terminologies that are focused on the Analytics Engine.

Annual Maximum Series (AMS)

A time series constructed by extracting the maximum (or minimum) value for each year from a longer time series of observations or model outputs. The AMS is a foundational input to extreme value analysis and can be derived from data at any sub-annual time resolution, including hourly, daily, or monthly. The AMS is a specific case of the more general Block Maxima/Minima Series, where the block size is set to one year. When working with climate model simulations, the AMS should be constructed on a simulation-by-simulation basis rather than aggregating across simulations prior to analysis.

Bias correction / adjustment

Various techniques applied to global climate model output to adjust for systematic errors (or biases). For example, a bias adjustment can be applied to model output which underestimates extreme precipitation events, or temperatures that are consistently too high. There are several sources of model biases that bias adjustment can adjust for, including spatial resolutions that are not fine enough to resolve all types of weather phenomena, or because model formulations rely on simplifications of physical processes.

Block maximum series

A block maximum series is a time series where extreme values are calculated from a time series of data by segmenting the time series into “blocks” of equal duration. The block maximum series is a more generalized approach to the Annual Maximum Series, where the block size used is one year.

Cascading event

A cascading event is a type of compound event that can occur when two or more climate or environmental hazards are temporally sequenced. Their combination can lead to exacerbated impacts, as each hazard builds on the previous one.

CF conventions

Metadata conventions for formatting and documenting netCDF format data, to ensure standardized reporting and variable naming across different data sources. CF stands for “climate and forecast”. Data supported on the Analytics Engine adhere to CF conventions.

Change signal

The difference between two time periods, typically projected future climate conditions and a historical reference period. Also known as a "delta signal". A change signal is used to quantify how much a climate variable (such as temperature or precipitation) is expected to change relative to past conditions.

climakitae

An open-source Python library that contains functionality to work with cloud-optimized gridded climate data. It includes basic tools for creating data objects and visualizations, as well as more advanced tools that support various applications of climate data including analyzing extreme events, understanding regional responses at different global warming levels, and describing uncertainty across different simulations. climakitae stands for “climate toolkit Analytics Engine”.

Climate sensitivity

Climate sensitivity is a measurement that describes the equilibrium change in global mean surface temperature resulting from a doubling of atmospheric carbon dioxide (CO2) concentrations compared to pre-industrial levels (1850-1900).

Climatology

The long-term average or baseline conditions for a given climate variable (such as temperature of precipitation) over a specified historical period. These values are typically calculated from multi-year datasets and are used as a reference point to assess changes, anomalies, or trends in future climate projections. Best scientific practice is to use a 30-year period to ensure the reference period captures a range of variability.

CMIP6

The Coupled Model Intercomparison Project, Phase 6 is a coordinated international effort to produce climate model output with consistent standards in areas such as variable naming and experimental design. This consistency allows different models to be compared to each other more easily. CMIP6 is the latest generation of global climate models (ca. 2020), used in the Intergovernmental Panel on Climate Change Sixth Assessment Report (IPCC AR6), and in California’s Fifth Climate Change Assessment.

Compound event

A type of climate event that arises due to the combination of two or more climate drivers or hazards occurring simultaneously or successively with negative implications to environmental and human systems. The weather phenomena contributing to the compound hazard may not be extreme events on their own, but combine to create negative impacts. Impacts are the resulting consequences of compound hazard occurrences and can include physical or socioeconomic damages, or even the triggering of additional climate hazards. The spatial component of compounding events occurs when a location experiences multiple climate drivers or hazards that overlap within a given time frame. The temporal component of compounding events focuses on the timing of event occurrence, which may be in close succession or occur after a lag period. The threshold for the time between events is flexible and is dependent on whether the succession of events is close enough to create a more severe impact.

Cooling Degree Days (CDDs)

Cooling degree days (CDDs) measure how much and for how long outdoor temperatures exceed a specific threshold, indicating the need for indoor cooling. For each day, CDDs are calculated as the number of degrees the day’s temperature is above the chosen based temperature; days that do not exceed the threshold are assigned zero. The threshold (commonly 65 °F) represents the temperature above which cooling is typically required. CDDs are widely used in the energy sector for forecasting electricity demand, informing building standards, and evaluating long-term trends in cooling needs.

Cumulative distribution function (CDF)

A cumulative distribution function (CDF) is a statistical test that quantifies the probability that a variable having a value less than or equal to a given threshold. In the context of climate data, representing the long-term historical record of a variable as a CDF shows how often certain values have occurred, very similar to a historical climatology. This allows users to understand the full range and frequency of past conditions, which is useful for comparing historical baselines with projected future changes.

Derived Variable

Derived variables are variables that are computed from one or more other variables. An example is “heating degree days” which uses the air temperature variable as an input. Derived variables expand the available variable list beyond the native variables available within the Analytics Engine data sources.

Dynamical downscaling

A method to generate fine scale spatial resolution data from coarse scale global climate models using a dynamical regional weather model. The regional model uses the global climate model output as the inputs to generate additional projections. The Analytics Engine hosts dynamically downscaled model output created using the Weather Research and Forecasting (WRF) model.

Effective Sample Size (ESS)

An independent estimate of sample size that accounts for the reduction in statistical information caused by temporal autocorrelation in a dataset. When observations within a block are strongly correlated with one another (as is common in meteorological data), the effective number of independent data points is lower than the actual number of observations. The ESS is calculated by estimating a variance inflation factor that quantifies the degree of autocorrelation, then dividing the actual sample size by that factor.

Effective Temperature

A heat metric that represents the cumulative influence of temperatures over multiple consecutive days, rather than relying on a single day’s measurement. It is calculated as a weight sum of the current day’s temperature and temperatures from preceding days, with weights that vary depending on the application. This approach captures how heat retention in buildings affects heating and cooling demand over time and has been widely used in the energy sector to forecast gas consumption. The Analytics Engine applies a formulation based on the National Gas weighting scheme, using a four-day range to reflect industry-standard demand forecasting practices.

Ensemble member

A single simulation or GCM run within a larger set of simulations (referred to as an ensemble) that are generated to assess the uncertainty and variability in climate model projections. Each ensemble member is typically produced by slightly varying the initial conditions, model parameters, or by using different models altogether. Using multiple ensemble members helps account for the natural variability of the climate system and the uncertainties inherent within model predictions.

Extreme value analysis

An area of statistical analysis focused on extremes relative to the median of a probability distribution. Concepts from extreme value analysis can be used to analyze extreme weather events (such as a heat wave or extreme precipitation event), including event return levels and return periods. Successful extreme value analysis requires a larger number of samples than other analyses (typically requiring multiple GCMs and ensemble members) to accurately describe rare events.

Gamma Distribution

A flexible probability distribution commonly used in climate and hydrological applications to model variables that are strictly positive and right-skewed, such as precipitation totals or soil moisture. The gamma distribution is parameterized by a shape and a scale (or rate) parameter, which together determine the spread and skewness of the distribution.

Generalized Pareto distribution

A probability distribution used to model the tail behavior of a variety of data types; it is particularly useful for assessing the probabilities of extreme events, such as rare climate events that exceed a certain threshold.

Generalized Extreme Value (GEV) distribution

A family of continuous probability distributions developed within extreme value theory to model the largest or smallest value from a large collection of random observations. The GEV distribution is used to model the behavior of extreme events, such as the highest or lowest temperatures, maximum wind speeds, or the most intense rainfall within a given period. Some distributions within this family include the Gumbel, Fréchet, and Weibull distributions.

Global Climate Model (GCM)

A mathematical model that represents the physical processes in the atmosphere, ocean, cryosphere, and land surface. GCMs are used to simulate the climate system and predict future climate conditions based on different scenarios of greenhouse gas emissions and other factors. For more information on how GCMs work, check out this blog post.

Global Warming Level

A global warming level is defined as the difference in the global mean air temperature from the historical period (defined on the Analytics Engine as the pre-industrial period 1850-1900). Global warming levels are frequently used in international policy discussions (for example, goals to constrain global warming to 1.5 or 2 °C). Global warming levels can also be useful in planning processes to compare potential regional responses to climate change at different warming levels, which enables comparison of models across different emissions scenarios. The standard global warming levels are 1.5°C, 2°C, 3°C, and 4°C.

Goodness of fit

A statistical measure that describes how well a model’s outputs align with observed data. It is used to assess the accuracy of models or the accuracy of statistical methods in replicating real-world phenomena. Goodness of fit tests can help determine the validity of assumptions made in the model and are critical in evaluating the performance of climate models, particularly when comparing simulated data against historical or observed data.

Gumbel distribution

A probability distribution used to model the distribution of the maximum (or the minimum) value from a number of samples of various distributions. It is commonly applied in extreme value theory, particularly for modeling the distribution of extreme events such as floods, heatwaves, or other climate extremes.

Heat Index

A metric that combines air temperature and relative humidity to estimate how hot conditions feel to the human body, often referred to as “apparent temperature”. Because higher humidity reduces the body’s ability to cool itself through evaporation of sweat, humid conditions typically feel warmer than dry conditions at the same air temperature. The heat index is widely used to assess heat-related health risk and predict cooling energy demand, as it better reflects human thermal comfort than temperature alone. It is calculated using regression equations developed by Rothfusz (1990) and implemented by NOAA, with adjustments for very high or low humidity conditions. Since it does not account for wind speed or solar radiation, the heat index is best interpreted as the perceived temperature in shaded, light-wind environments.

Heating Degree Days (HDDs)

Heating degree days (HDDs) quantify how much and for how long outdoor temperatures fall below a specific threshold, reflecting demand for indoor heating. For a given day, HDDs are calculated as the number of degrees the day’s temperature is below the base temperature; days above the threshold are counted as zero. The base temperature (commonly 65 °F) approximates the point at which heating becomes necessary. HDDs are commonly aggregated monthly or annually to assess heating demand, support gas consumption forecasting, and guide infrastructure and efficiency investments.

Internal variability

Internal variability represents the natural (stochastic) variations in the Earth’s climate, due to interactions between the atmosphere, ocean, land surface, and sea ice. In global climate models with multiple simulations, this uncertainty can result in slightly different outputs based on small differences in the model’s physical processes design. For example, the timing of events such as the El Niño pattern in the Pacific Ocean may vary between simulations, depending on the global climate model’s parameterization.

Localization

Statistical method to produce time-series of future weather at a point-location where a weather station is presently in operation. Localization involves bias adjusting gridded climate model output, based on observations from the weather station. One example of a localization methodology used in the Analytics Engine is quantile delta mapping, which preserves changes in individual quantiles rather than applying a correction to the data mean. This approach produces data which are consistent with existing workflows which are built or trained on historical weather observations.

KS test

The Kolmogorov-Smirnoff test for nonparametric data evaluates the equality of two different one-dimensional distributions. The KS test can be used to determine when statistical significance is achieved when two samples (e.g., present-day and future 99th percentile extremes) are sufficiently different from one another. As such, it is commonly used to identify if climate change is making statistically significant changes to historical conditions, and is sometimes preferred to other approaches as it is nonparametric or agnostic of the shape of the distribution.

Maximum Likelihood Estimation (MLE)

A statistical method used to estimate the parameters of a probability distribution by finding the parameter values that make the observed data most probable under that distribution. MLE is widely used in climate applications because it provides consistent and asymptotically efficient parameter estimates.

Model run

Model run refers to data from different initial-condition ensemble runs of a GCM. Climate models are often run multiple times with slightly different initial conditions (ensemble members), and each of these runs is termed as a 'model run'. For instance, the Analytics Engine contains LOCA2-Hybrid data for different model runs of the ACCESS-CM2 model. ACCESS-CM2 r1i1p1f1 refers to one model run while ACCESS-CM2 r2i1p1f1 is another model run.

Model uncertainty

Uncertainty in global climate model output that arises from design differences between models. Global climate models are developed by different research institutions and differ in how they represent the global climate system. In some approaches, it may be appropriate to average responses from different models to obtain a consensus model estimate (referred to as a multi-model mean). Alternatively, it may be informative to look at the spread of a particular response across multiple models.

Multivariate event

When several hazards affect the same region at the same time; can also result from the intersection of climate hazards and other environmental hazards. Multivariate event analysis is a complex mix of climate science, geography, and statistics. Multivariate approaches include the likelihood multiplication factor and joint return periods / bivariate probability. Multivariate events can be:

Temporally compounding
Spatially compounding
Pre-conditioned (extreme event superimposed on long-term trends, such as higher sea levels, heavier precipitation, and/or changing storm seasonality resulting in more frequent and severe coastal flooding)
Complex (non-climate stressors that exacerbate climate hazards)

Notebook

The Analytics Engine uses Jupyter Notebooks to be a user-friendly and interactive example workflow from start to finish of an application of climate data. Notebooks contain examples pertaining to aspects of use cases or applications of data products.

Pearson Type III distribution

A probability distribution that is commonly used in hydrology and climatology to model skewed data, particularly in the context of flood frequency analysis. The Pearson Type III distribution is often applied to model natural phenomena that exhibit an asymmetric distribution in frequency, such as precipitation or river discharge events.

Percentage Point Function (PPF)

The inverse of the cumulative distribution function (CDF); also known as the quantile function or inverse of CDF. Given a specified probability, the PPF returns the corresponding value of the variable. The PPF is used in extreme value analysis to calculate return values: given a desired return probability, the PPF of the fitted probability distribution returns the corresponding return value.

Projection

A potential future evolution of a quantity or set of quantities, often associated with climate variables such as temperature or sea level. Projections are typically based on simulations produced by climate models, assuming specific scenarios of greenhouse gas emissions, land use, and other factors. Unlike predictions, projections do not imply certainty but rather illustrate a range of possible outcomes based on different assumptions and conditions. Projections are used to explore the potential impacts of climate change under various future pathways.

Return period

The reciprocal of the return probability. For example, if the maximum temperature at a location has a 10% annual return probability of exceeding 105 °F, the return period of that event is 10 years. It is important to note that a 10-year period does not mean the event occurs exactly once every 10 years; it means that event has a 10% chance of occurring in any given year, and multiple occurrences within a 10-year window are possible. Return periods are used in risk assessment and infrastructure design to characterize the likelihood of extreme events. When working with climate model simulations, the return period should be constructed on a simulation-by-simulation basis rather than aggregating across simulations prior to analysis.

Return probability

The probability that a climate or weather event will exceed a specified magnitude in any given year. Return probability is the reciprocal of the return period. For example, if the maximum temperature at a location has a 10% chance of exceeding 105 °F in any given year, the annual return probability of that temperature is 0.10. In the maximum case, return probability is calculated as one minus the exceedance probability from the fitted cumulative distribution function. In the minimum case, it is equal to the exceedance probability directly. Return probabilities are used in risk assessment and planning to characterize the likelihood of extreme events. When working with climate model simulations, the return probability should be constructed on a simulation-by-simulation basis rather than aggregating across simulations prior to analysis.

Return value

The magnitude of a climate or weather event associated with a given return probability. Also referred to as a return level. For example, if there is a 10% annual probability of maximum temperature exceeding 105 °F at a location, then 105 °F is the 1-in-10 year return value for that variable. Return values are calculated using the percentage point function (inverse CDF) of a fitted probability distribution and are used in risk assessment and planning to characterize the severity of rare events. When working with climate model simulations, the return value should be constructed on a simulation-by-simulation basis rather than aggregating across simulations prior to analysis.

Scenario uncertainty

Uncertainty that arises from not knowing how people, policies, the economy, and technology will evolve in the future to address the issue of climate change. Projecting levels of future emissions requires inherent assumptions about economic production, land use, technological advancements, and energy use (see Shared Socioeconomic Pathways) that create differences in the timing and strength of climate response between scenarios.

Shared Socioeconomic Pathways (SSPs)

Shared Socioeconomic Pathways (SSPs) are inputs into the latest generation of IPCC reports which describe potential pathways the world could take in terms of features such as political, economic, and other societal dynamics and choices which impact greenhouse gas emissions, anthropogenic aerosol generation, and land use changes. Shared Socioeconomic Pathways are an update to the “Representative Concentration Pathways'' (RCPs) used in an earlier CMIP phase. The Analytics Engine has multiple data sources with several SSPs, including SSP2-4.5, SSP3-7.0, and SSP5-8.5.

Simulation

A computational process used to model and analyze the behavior of complex systems by representing them through mathematical models. Simulations involve running climate models on different parameters to predict future climate conditions based on various scenarios, such as different greenhouse gas emission pathways. These simulations help in understanding potential climate changes and their impacts by generating projections of various climate variables and assessing their possible outcomes under different assumptions. In the context of the Analytics Engine, a simulation is a specific ensemble member from a GCM.

Statistical downscaling

A method to generate fine scale spatial resolution data outputs from coarser scale global climate models using statistical relationships between the coarse global climate model output and observed climatological conditions at fine scale spatial resolutions. The Analytics Engine hosts statistically downscaled model output from LOCA2-Hybrid.

Tool

A Python function to work with climate data, analyze data, or produce visualizations. Tools are located within climakitae.

Toolkit

A Python package consisting of a set of functions [tools] for climate data operations, analyses, and visualizations.

Typical meteorological year

A typical meteorological year (TMY) is a complete set of meteorological variables at a given location for every hour in a year. The TMY functionality in the Analytics Engine can replicate the most likely hourly conditions for current and future climate conditions. TMYs are used in some building and energy system modeling applications to describe typical annual weather conditions at a specific location. A TMY is a specific kind of 8760 hourly profile (see, 8760).

Use-case

Use-cases are written in the following format: [ data product ] + [ application of data product ]

‘Data product’ – consists of the climate data itself being developed by the Analytics Engine development team as a result of the Analytics Engine analytical tools. These data products (e.g. hourly climate profiles, threshold based analytics, distribution of extreme events, sector specific metrics, and more) are designed to be generalizable, relevant to future use-cases, and able to be modified by end users to address unanticipated applications
‘Application of the data product’ – a specific application as identified by energy industry partners, which explains how the data products can be used within the electricity sector to address an identified need. These applications of climate data may be broader or narrower depending on an agency’s specific planning/decision-making needs (e.g. conducting asset-by-asset vulnerability assessments, informing peak load and demand forecasts, examining impacts on renewable energy generation, etc.)
Example: “Threshold-based analytics for asset-by-asset vulnerability assessments and updating design standards”. Where “threshold-based analytics” is the data product, and “asset-by-asset vulnerability assessments and updating design standards” is the application of the data product.

Weibull distribution

A probability distribution used to model the distribution of extreme values, particularly for analyzing the reliability of systems and the occurrence of extreme weather events. The Weibull distribution is commonly applied to fields such as hydrology and meteorology to model data such as wind speeds, precipitation, and failure times. This distribution falls within the GEV distribution family.

Wet Bulb Globe Temperature (WBGT)

A heat stress metric that estimates the combined effects of air temperature, humidity, wind speed, and solar radiation on the human body, particularly in direct sunlight. Because WBGT incorporates multiple meteorological variables, it more comprehensively reflects physiological heat strain than temperature alone and is widely used to assess heat hazards for outdoor workers. While highly useful for evaluating heat exposure and related cooling demand, WBGT has limited applicability in cold conditions.

Working group

In the context of the Analytics Engine, a working group refers to a focused discussion involving a diverse group of climate scientists, social scientists, energy sector partners, platform developers, data users, and other partners. The purpose of working group sessions is to gather feedback on the data analytics and platform development approaches of the Analytics Engine, ensuring that the resulting climate data and tools are relevant and useful for sectoral applications. These discussions critically shape the development of climate data and tools within the Analytics Engine, with the overarching goal of enhancing the usability of climate science for decision-makers and users.

Zarr

Filetype for storing large, multidimensional arrays (such as gridded climate data with spatial and temporal dimensions), optimized for cloud storage.

8760

A representative hourly time series for a variable for one year (24 hours x 365 days in a year), used to characterize typical environmental conditions in many energy system modeling applications. 8760s are also referred to as hourly profiles.