AE Analytics
Getting started with the Analytics Engine (AE)
Intended Application : As a user, I want to understand how to use the Analytics Engine , by learning about:- How to select data of interest to my work
- How to retrieve a dataset from the catalog
- How to visualize data
- How to export data
To execute a given ‘cell’ of this notebook, place the cursor in the cell and press the ‘play’ icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.
Step 0: Setup
This cell imports the python library climakitae, our AE toolkit for climate data analysis, and any other specialized libraries needed for a given notebook.
import climakitae as ck
import climakitaegui as ckg
Step 1: Select data
Now we can call Select
to display an interface from which to select the data to examine. Execute the cell, and read on for more explanation.
There are multiple datasets you can use on the Analytics Engine:
- Dynamically downscaled WRF data, produced at hourly intervals. If you select ‘daily’ or ‘monthly’ for ‘Timescale’, you will receive an average of the hourly data. The spatial resolution options, on the other hand, are each the output of a different simulation, nesting to higher resolution over smaller areas.
- In addition to the gridded WRF data, you may also be interested in point-based data at a weather station. Station data is bias-corrected, or localized, to the exact location with the dynamically-downscaled gridded data.
- Hybrid-statistically downscaled LOCA2-Hybrid data, available at daily and monthly timescales. Multiple LOCA2-Hybrid simulations are available (100+) at a fine spatial resolution of 3km.
Historical :
- “Historical Climate” includes data from 1980-2014 simulated from the same GCMs used to produce the Shared Socioeconomic Pathways (SSPs). It will be automatically appended to a SSP time series when both are selected. Because this historical data is obtained through simulations, it represents average weather during the historical period and is not meant to capture historical timeseries as they occurred.
- “Historical Reconstruction” provides a reference downscaled reanalysis dataset based on atmospheric models fit to satellite and station observations, and as a result will reflect observed historical time-evolution of the weather.
Future:
- Future projections are available for greenhouse gas emission scenario (Shared Socioeconomic Pathway, or SSP) SSP 3-7.0 through 2100 with the dynamically-downscaled General Circulation Models (GCMs).
- One GCM was additionally downscaled for two additional SSPs (SSP 5-8.5 and SSP 2-4.5)
To learn more about the data available on the Analytics Engine, see our data catalog.
selections = ckg.Select()
selections.show()
Nothing is required to enter these selections, besides moving on to Step 2.
However, if you want to preview what has been selected, you can type “selections” alone in a new cell. This stores your selections behind-the-scenes.
($+$ will create a new cell, following the currently selected)
Step 2: Retrieve data
Call selections.retrieve(), to assign the subset/combo of data specified to a variable name of your choosing, in xarray DataArray or Dataset format.
data_to_use = selections.retrieve()
data_to_use
You can preview the data in the retrieved, aggregated dataset when this is complete.
Next, load the data into memory. This step may take a few minutes to compute, because the data is only loaded “lazily” until you output it (in visualize or export). This allows the previous steps to run faster.
data_to_use = ck.load(data_to_use)
Step 3: Visualize data
Preview the data before doing further calculations.
ckg.view(data_to_use)
The data previewer is also customizable: Check out an example where the display colors and coordinates are modified in gridded data. If you selected station data above, uncomment the second line in the cell below and comment out the first by using the #
character.
ckg.view(data_to_use, lat_lon = False, cmap = 'viridis') # grided data (with x-y coordinates)
# ckg.view(data_to_use, lat_lon = False, cmap = 'green') # station, or area-averaged data selection
More plotting helper-functions will be forthcoming.
See other notebooks for example analyses, or add your own.
# [insert your own code here]
You can load up another variable or resolution by modifying your selections and calling: next_data = selections.retrieve()
If you do this a lot, and things are starting to get slow, you might want to try: data_to_use.close()
Step 4: Export data
To save data as a file, call export
and input your desired
- data to export – an xarray DataArray or Dataset, as output by e.g. selections.retrieve()
- output file name (without file extension)
- file format (“NetCDF” or “CSV”)
We recommend NetCDF, which suits data and outputs from the Analytics Engine well – it efficiently stores large data containing multiple variables and dimensions. Metadata will be retained in NetCDF files.
CSV can also store Analytics Engine data with any number of variables and dimensions. It works the best for smaller data with fewer dimensions. The output file will be compressed to ensure efficient storage. Metadata will be preserved in a separate file.
CSV stores data in tabular format. Rows will be indexed by the index coordinate(s) of the DataArray or Dataset (e.g. scenario, simulation, time). Columns will be formed by the data variable(s) and non-index coordinate(s).
ck.export(data_to_use, "my_filename2", "NetCDF")