This notebook serves as supplementary material to the manuscript on low-level jets submitted to Wind Energy Science. The supplementary material consists of 6 notebooks:
In this notebook (4/6), we perform most of the analysis shown in the paper.
We have prepared and validated the wind data from ERA5 and Lidar observations. In this notebook, we detect low-level jets in both datasets, and we investigate whether the low-level jet climatologies correspond.
We identify low-level jets by seeking for local maxima in the wind profiles. Having identified a local maximum, the falloff is calculated by comparing the maximum with the subsequent (moving upwards) local minimum or, if no local minimum is present, the top of the wind profile. If two local maxima occur within one wind profile, the one with the largest falloff is used. Typically, a wind profile with a local maximum and falloff larger than 2 m/s is classified as a low-level jet, but by storing the falloff itself rather than just boolean flags (jet or not), we can easily assess the sensitivity of the climatology to this threshold.
The observations are available at 10-minute intervals; each record is an average of the past 10 minutes. In order to compare them to the ERA5 data, we need to resample the observations at hourly intervals. We investigated several methods to do so. The most straightforward method is to take only the data records at each hour and throw away the rest. Alternatively, we can compute hourly averages. To do so, we used a rolling window with a width of 5 records, so for example at 12 UTC we use the observations from 11:40, 11:50, 12:00, 12:10 and 12:20. The last method we explored is to detect the low-level jets based on the 10-minute data and aggregate the low-level jet counts using the same rolling window. The rationale behind this last approach is that it is less sensitive to differences in timing of the low-level jets.
To fully appreciate how the first to methods compare to ERA5, we note that the ERA5 data are 'instantaneous mean wind profiles'. Instantaneous, because no aggregation (e.g. average over past hour) is performed, and at the same time mean, because the IFS model (which is used to produce the ERA5 dataset) does not explicitly resolve turbulence. Moreover, since the grid size of the IFS is about 30 km, the ERA5 data can be interpreted as a spatial average. Considering all this, it actually makes sense to compare hourly averaged observations with the instantaneous ERA5 data.
The ERA5 data is available for a long period of time and also for a large vertical extent. In preliminary research, we found that the low-level jet climatology is quite sensitive to the vertical extent of the data, so we use the lower 500 m of the ERA5 data to start with. Similarly, the temporal extent of the observations is quite limited. Some platforms don't even cover a full year. Given that there is a clear annual cycle in the low-level jet climatology, we may expect that the short observation periods have a pronounced impact on the resulting climatology. To investigate and illustrate the impact of the limited extent of the observations (both spatially and temporally), we will show various 'representations' of the ERA5 data:
The last representation can be used for a fair comparison between ERA5 and the observations. The relations between the various representations of the ERA5 data, on the other hand, can be used to assess the impact of the limited extent of the observations, and to either correct the ERA5 data or extend the observation data (whichever perspective you prefer). As such, we strive towards a combined data product that is both extensive in space and time, and reliable.
The approach we take to synthesize/blend the most useful information from both data sources (combining the best of both worlds) is quite similar to some existing techniques. In the wind energy sector, various measure-correlate-predict methods have been developed to estimate long-term characteristics of the wind climate based on short measurement series. See this paper by Carta et al., 2013 for a comprehensive review. In general terms, data from a short measurement campaign is correlated with one or several longer data records (e.g. nearby observation site or modelled time series), and the correlation is then used to infer the long-term statistics for the measurement site.
In meteorological forecasting, model output statistics (MOS) is a standard technique used to correct forecasts from numerical weather prediction models, combine the best features of multiple models, or to derive secondary statistics that are not explicitly forecast. The statistics are based on comparison between historical forecasts and available observations.
Techniques like MCP and MOS are well-suited to address the differences between the ERA5 and observed wind speed (and direction) data that we've shown before. For example, Staffel and Pfenninger (2016) developed a procedure to postprocess the ERA5 wind speed data in order to derive bias-corrected national capacity factors. Although they celebrate the accuracy of their results, their approach involves, among others, the assumption of a logarithmic wind profile. This assumption is violated by the presence of low-level jets. Possibly, they are also correcting for the bias they introduce therein. Besides, the impact on their results is probably limited by the high level of aggregation (natial capacity factors). Whoever is interested in bias-corrected wind profiles instead, though, should be cautious.
In contrast to wind speed time series, low-level jets are boolean events (there's a jet or not), and we cannot immediately correlate jet occurrence between the two datasets. (Ironically, the occurrence of low-level jets violates the implicit assumption in MCP methods that the equal wind speeds are representative of equal wind profiles).
What we can do, however, is to estimate the probability of low-level jets based on other proxies. We can scale the seasonal and diurnal cycles, we can scale the jet height and strength, we can analyse the jet occurrence for various weather types of classes of stability and baroclinicity. This is what we will do. In the next notebook, we'll take this one step further and use a 'machine learning' model to predict the probabilities.
First we need to load scripts, data and detect LLJs.
# NetCDF Interface
import xarray as xr
# Computation
import numpy as np
# Tables
import pandas as pd
# Plotting library
import matplotlib.pyplot as plt
import seaborn as sns # seaborn modifies the rc
plt.rcdefaults()
plt.xkcd()
# Access to code base
import sys
sys.path.append('../../code/datasets')
# Reading ECN and ERA5 data
import ecn
import era5
# Low-level jet detection
sys.path.append('../../code/utils')
import lljs
The observations come at 10-minute intervals, but for fair comparison we use two other 'versions' of the observations, i.e. hourly resampled observations, and hourly averaged observations. One could say that the resampled profiles are 'instantaneous'. In fact, they are 10-minute averages over the last 10-minutes before the full hour, but since we're interested in mean profiles, this averaging effectively removes turbulent fluctuations and the 10-minute mean is more or less representative for a model 'instantaneous' mean profile.
# Load measurement data
obs_10mins = ecn.load_platform_data()
obs_1hfreq = {}
obs_1hmean = {}
for name, ds in obs_10mins.items():
obs_1hfreq[name] = ds.wspd.resample(time='1h').asfreq()
obs_1hmean[name] = ds.wspd.rolling(time=6, center=False).mean().shift(time=-3).resample(time='1h').asfreq()
# Load ERA5 data
era_full = {} # 10 years, 500 m
era_coll = {} # subsets, 500 m
era_intp = {} # 10 years, aligned/interpolated to obs heights
era_fair = {} # subsets, aligned/interpolated to obs heights
for name, ds in obs_10mins.items(): # this version has lat/lon info
# In case the dataset is prepared with NCO, we need to squeeze the lat and lon dimensions
era_full[name]= xr.open_dataset('../../data/interim/ERA5platforms/'+name+'.nc').squeeze().sel(level=slice(None, 124, -1))
era_full[name]['z_nd'] = xr.open_dataset('../../data/interim/ERA5platforms/'+name+'_z.nc').z_nd
era_intp[name]= xr.open_dataset('../../data/interim/ERA5platforms/'+name+'_interpolated.nc')
era_coll[name] = era_full[name].where(np.isfinite(ds.wspd.mean(dim='height')))
era_fair[name] = era_intp[name].where(np.isfinite(ds.wspd)) #.mean(dim='height') is not necessary here
# for the 'fair' dataset it is not necessary to take the mean of ds.wspd, because the heights are aligned.
# While this affects averaged wind profiles, for low-level jet statistics, there is no difference
This procedure differs between the datasets because:
lljs_obs_10mins = {}
lljs_obs_1hfreq = {}
lljs_obs_1hmean = {}
lljs_full = {} # 10 years, 500 m
lljs_coll = {} # subsets, 500 m
lljs_intp = {} # 10 years, aligned/interpolated to obs heights
lljs_fair = {} # subsets, aligned/interpolated to obs heights
for name, ds in obs_10mins.items():
falloff, strength, index = lljs.detect_llj_vectorized(ds.wspd.values, axis=0, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
height = xr.DataArray(ds.height[index], coords={'time': ds.time}, dims=['time']).where(falloff>0, -1)
lljs_obs_10mins[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in obs_1hfreq.items():
falloff, strength, index = lljs.detect_llj_vectorized(ds.values, axis=0, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
height = xr.DataArray(ds.height[index], coords={'time': ds.time}, dims=['time']).where(falloff>0, -1)
lljs_obs_1hfreq[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in obs_1hmean.items():
falloff, strength, index = lljs.detect_llj_vectorized(ds.values, axis=0, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
height = xr.DataArray(ds.height[index], coords={'time': ds.time}, dims=['time']).where(falloff>0, -1)
lljs_obs_1hmean[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in era_full.items():
wspd = (ds.u**2+ds.v**2)**.5
falloff, strength, index = lljs.detect_llj_vectorized(wspd.values, axis=1, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
index = xr.DataArray(index, coords={'time': ds.time}, dims=['time'])
height = ds.z_nd.isel(level=index).where(falloff>0).drop('level')
lljs_full[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in era_coll.items(): # This used to be lljs_subs and era_subs
wspd = (ds.u**2+ds.v**2)**.5
falloff, strength, index = lljs.detect_llj_vectorized(wspd.values, axis=1, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
index = xr.DataArray(index, coords={'time': ds.time}, dims=['time'])
height = ds.z_nd.isel(level=index).where(falloff>0).drop('level')
lljs_coll[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in era_intp.items(): # the underlying data has changed here (intp_all is now intp)
wspd = (ds.u**2+ds.v**2)**.5
falloff, strength, index = lljs.detect_llj_vectorized(wspd.values, axis=1, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
height = xr.DataArray(ds.height[index], coords={'time': ds.time}, dims=['time']).where(falloff>0, -1)
lljs_intp[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
for name, ds in era_fair.items(): # used to be era_mask and lljs_mask
wspd = (ds.u**2+ds.v**2)**.5
falloff, strength, index = lljs.detect_llj_vectorized(wspd.values, axis=1, output='all', mask_inv=True, inverse=False)
falloff = xr.DataArray(falloff, coords={'time': ds.time}, dims=['time'])
strength = xr.DataArray(strength, coords={'time': ds.time}, dims=['time'])
height = xr.DataArray(ds.height[index], coords={'time': ds.time}, dims=['time']).where(falloff>0, -1)
lljs_fair[name] = xr.merge([falloff.rename('falloff'), strength.rename('strength'), height.rename('height')])
# First, concat all platforms
names = [name for name in lljs_full.keys()]
allj_full = xr.concat([lljs_full[name] for name in names], dim='platform').assign(platform=names)
allj_coll = xr.concat([lljs_coll[name] for name in names], dim='platform').assign(platform=names)
allj_intp = xr.concat([lljs_intp[name] for name in names], dim='platform').assign(platform=names)
allj_fair = xr.concat([lljs_fair[name] for name in names], dim='platform').assign(platform=names)
allj_obs_10mins = xr.concat([lljs_obs_10mins[name] for name in names], dim='platform').assign(platform=names)
allj_obs_1hfreq = xr.concat([lljs_obs_1hfreq[name] for name in names], dim='platform').assign(platform=names)
allj_obs_1hmean = xr.concat([lljs_obs_1hmean[name] for name in names], dim='platform').assign(platform=names)
# Then, concat all 'versions' (keep obs separate)
lljs_era = xr.concat([allj_full, allj_coll, allj_intp, allj_fair], dim='version', coords='minimal').assign(version=['full', 'coll', 'intp', 'fair'])
lljs_obs = xr.concat([allj_obs_10mins, allj_obs_1hfreq, allj_obs_1hmean], dim='version', coords='minimal').assign(version=['10mins', '1hfreq', '1hmean'])
# Save to intermediate files
lljs_era.to_netcdf('../../data/interim/lljs_era5_platforms.nc')
lljs_obs.to_netcdf('../../data/interim/lljs_obs_platforms.nc')
# Display the datasets interface view
lljs_era
fig, axs = plt.subplots(1,2, figsize=(10, 4), sharex=True, sharey=True, dpi=100)
random_platform = 'MMIJ'
random_observed_jets = np.random.choice(np.where(lljs_obs.sel(platform=random_platform, version='1hmean').falloff>2)[0], 20)
random_analysis_jets = np.random.choice(np.where(lljs_era.sel(platform=random_platform, version='full').falloff>2)[0], 20)
for i in random_observed_jets:
axs[0].plot(obs_10mins['MMIJ'].wspd.isel(time=i), obs_10mins['MMIJ'].height)
wspd = (era_full[random_platform].u**2+era_full[random_platform].v**2)**.5
for i in random_analysis_jets:
axs[1].plot(wspd.isel(time=i), era_full[random_platform].isel(time=i).z_nd)
axs[0].set_title('in observations')
axs[1].set_title('in analysis')
axs[0].set_ylabel('Height (m)')
axs[0].set_xlabel('Wind speed (m/s)')
axs[1].set_xlabel('Wind speed (m/s)')
fig.suptitle('Random low-level jets')
plt.show()
Most jets look realistic. Notice how the observed low-level jets are uch more erratic in nature. The ERA5 jets are typically very smooth and whether or not they exceed the fallof criterion of, say, 2 m/s really depends on the vertical extent of the data. It seems that there is more shear in the observed jets than in the ERA5 jets.
# This will be useful for looping through the platforms
names = ['MMIJ', 'LEG', 'BWF1', 'BWF2', 'EPL', 'HKZA', 'HKZB', 'HKNA', 'HKNB', 'K13']
labels = ['Met Mast IJmuiden', 'Lichteiland Goeree', 'Borssele Wind Farm - lot 1',
'Borssele Wind Farm - lot 2', 'Europlatform', 'Hollandse Kust Zuid A',
'Hollandse Kust Zuid B', 'Hollandse Kust Noord A', 'Hollandse Kust Noord B', 'K13-A Oil Platform']
def jitter(arr):
stdev = .01*(np.max(arr)-np.min(arr))
return arr + np.random.randn(len(arr)) * stdev
def keyfigure(sharey=True):
fig, axs = plt.subplots(2, 3, figsize=(12, 7), sharex=True, sharey=sharey, dpi=150)
axs = axs.ravel()
# Add titles to subplots
axs[0].set_title('A. 2008-2017, 0-500m')
axs[1].set_title('B. Subset, 0-500m') # or concurrent or collocated?
axs[3].set_title('C. 2008-2017 at obs. heights')
axs[4].set_title('D. Subset at obs. heights')
axs[5].set_title('E. Observations')
# The third axes will hold the legend
axs[2].set_axis_off()
# Connect the axes with arrows
axs[1].annotate('', (0, .5), (-.18, .5), clip_on=False, arrowprops=dict(facecolor='black', arrowstyle='-|>', lw=3), xycoords=axs[1].transAxes)
axs[3].annotate('', (.5, .9), (.5, 1.15), clip_on=False, arrowprops=dict(facecolor='black', arrowstyle='-|>', lw=3), xycoords=axs[3].transAxes)
axs[4].annotate('', (0, .5), (-.18, .5), clip_on=False, arrowprops=dict(facecolor='black', arrowstyle='-|>', lw=3), xycoords=axs[4].transAxes)
axs[4].annotate('', (.5, .9), (.5, 1.15), clip_on=False, arrowprops=dict(facecolor='black', arrowstyle='-|>', lw=3), xycoords=axs[4].transAxes)
axs[5].annotate('', (0, .5), (-.2, .5), clip_on=False, arrowprops=dict(facecolor='black', arrowstyle='<|-|>', lw=3), xycoords=axs[5].transAxes)
return fig, axs
fig, axs = keyfigure()
for name, label in zip(names, labels):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_intp = lljs_era.sel(platform=name, version='intp').dropna(dim='time', how='all')
ds_fair = lljs_era.sel(platform=name, version='fair').dropna(dim='time', how='all')
ds_obs = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
axs[0].scatter(ds_full.height, ds_full.falloff, s=1)
axs[1].scatter(ds_coll.height, ds_coll.falloff, s=1)
axs[3].scatter(jitter(ds_intp.height.values), ds_intp.falloff, s=1)
axs[4].scatter(jitter(ds_fair.height.values), ds_fair.falloff, s=1)
axs[5].scatter(jitter(ds_obs.height.values), ds_obs.falloff, s=1, label=label+' ('+name+')')
# Plot (as text) the total number of jets in each panel, and the part that is above the falloff threshold
njets = lljs_era.where(lljs_era.falloff>0).falloff.count(dim=['time','platform'])
njets_ltFOT = lljs_era.where(lljs_era.falloff>2).falloff.count(dim=['time','platform'])
nobs = lljs_obs.where(lljs_obs.falloff>0).falloff.count(dim=['time','platform'])
nobs_ltFOT = lljs_obs.where(lljs_obs.falloff>2).falloff.count(dim=['time','platform'])
axs[0].text(0.5, 9, '%i / %i'%(njets_ltFOT.sel(version='full').values, njets.sel(version='full').values), fontsize=12)
axs[1].text(0.5, 9, '%i / %i'%(njets_ltFOT.sel(version='coll').values, njets.sel(version='coll').values), fontsize=12)
axs[3].text(0.5, 9, '%i / %i'%(njets_ltFOT.sel(version='intp').values, njets.sel(version='intp').values), fontsize=12)
axs[4].text(0.5, 9, '%i / %i'%(njets_ltFOT.sel(version='fair').values, njets.sel(version='fair').values), fontsize=12)
axs[5].text(0.5, 9, '%i / %i'%(nobs_ltFOT.sel(version='1hmean').values, nobs.sel(version='1hmean').values), fontsize=12)
for ax in axs[[0,1,3,4,5]]:
ax.axhline(2.0, color='k', linestyle='--')
ax.axvline(300, color='k', linestyle='--')
lgnd = axs[5].legend(ncol=1, markerscale=7, bbox_to_anchor=(-.05, 1.), bbox_transform=axs[2].transAxes, loc='upper left')
axs[3].set_xlabel('jet height (m)')
axs[4].set_xlabel('jet height (m)')
axs[5].set_xlabel('jet height (m)')
axs[0].set_ylabel('falloff (m/s)')
axs[3].set_ylabel('falloff (m/s)')
axs[2].set_axis_off()
axs[0].set_ylim(0, 10)
fig.savefig('../../reports/figures/fivestep_falloffvsjetheight', bbox_inches='tight')
plt.show()
The above figure shows how the detection of low-level jets is influenced by the effect of spatial and temporal subsetting. The first subplot is based on 10 years of ERA5 data, and the lower 500 m (model levels). The second subplot is based on the same data, but only where observation data is also available. Going to the third subplot, I interpolated the (10 years of) model data to the observation heights. This removes jets higher than roughly 300m, but also some of the lower jets can no longer be detected because the corresponding decrease in wind speed above the jet is no longer captured. In going from subplot 3 to 4, I combined the effects of interpolation and subsetting. Subplot 5, finally, shows the observations.
A fair ground for comparison between ERA5 and the observations is thus based on the lower two subplots. This suggests that hardly any jets are present in the ERA5 data (especially considering we typically classify a jet only when the falloff > 2 m/s), and the jet frequency is underestimated. A contingency table based on 1:1 jet correspondence shows very low critical success index (~0.1) and probability of detection (~0.1; see appendix B).
However, as the figure clearly illustrates the loss of information due to interpolation and subsetting, we might wonder whether the ERA5 data does in fact ‘see’ the jets, but perhaps at the wrong altitude or so. Therefore, we will now focus on the statistical/climatological jet characteristics.
fig, axs = plt.subplots(1, 4, figsize=(12, 3), sharex=True, sharey=True, dpi=150)
for name, label in zip(names, labels):
ds_obs_10mins = lljs_obs.sel(platform=name, version='10mins').dropna(dim='time', how='all')
ds_obs_1hfreq = lljs_obs.sel(platform=name, version='1hfreq').dropna(dim='time', how='all')
ds_obs_1hmean = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
ds_fair = lljs_era.sel(platform=name, version='fair').dropna(dim='time', how='all')
axs[0].scatter(jitter(ds_obs_10mins.height.values), ds_obs_10mins.falloff, s=1)
axs[1].scatter(jitter(ds_obs_1hfreq.height.values), ds_obs_1hfreq.falloff, s=1)
axs[2].scatter(jitter(ds_obs_1hmean.height.values), ds_obs_1hmean.falloff, s=1, label=label+' ('+name+')')
axs[3].scatter(jitter(ds_fair.height.values), ds_fair.falloff, s=1)
for ax in axs[:]:
ax.axhline(2.0, color='k', linestyle='--')
ax.axvline(300, color='k', linestyle='--')
axs[0].set_title('A. 10-minute obs')
axs[1].set_title('B. 1-hour resampled')
axs[2].set_title('C. 1-hour means')
axs[3].set_title('D. ERA5 Fair')
# lgnd = axs[3].legend(ncol=1, markerscale=7, bbox_to_anchor=(0, 1), bbox_transform=axs[-1].transAxes, loc='upper left')
axs[0].set_xlabel('jet height (m)')
axs[1].set_xlabel('jet height (m)')
axs[2].set_xlabel('jet height (m)')
axs[3].set_xlabel('jet height (m)')
axs[0].set_ylabel('falloff (m/s)')
# axs[-1].set_axis_off()
axs[0].set_ylim(0, 10)
fig.savefig('../../reports/figures/obs_repr_falloffvsjetheight', bbox_inches='tight')
plt.show()
Interpretation: obviously, the 10 minute data shows more jets than the hourly. The difference between the two hourly representations of the observations is small, though, and it does not affect how they compare to the ERA5 data.
Plotting the falloff in the observations versus the falloff in the ERA5 data. Only points for which both datasets report a LLJ are plotted.
fig, axs = plt.subplots(3, 4, figsize=(15, 15), sharex=True, sharey=True)
axs=axs.ravel()
for i, name in enumerate(names):
ds_obs_1hmean = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
ds_intp = lljs_era.sel(platform=name, version='intp').dropna(dim='time', how='all')
# obsh = ds_obs.falloff.rolling(time=5, center=True).mean().resample(time='1h').asfreq()
a, b = xr.align(ds_obs_1hmean.falloff, ds_intp.falloff)
axs[i].scatter(a, b, s=a)
axs[i].set_title(name)
axs[i].plot([2, 0], [0, 2], 'k--')
plt.show()
Since the number of jets (falloff>2 m/s) can be counted with the naked eye, it is obvious that this will deliver very poor statistics
Let's make a contingency table about the ERA5 performance. I just plot the hourly averaged observations and the ERA5 data in the subsets. Previously, I played with the 10-minute data and used different windows to count the low-level jets in the past hour, etc. This lead to an extra dimension and therefore many figures, and it hardly mattered for the interpretation. Therefore, these results are not included in the notebook. However, if I want to do this again, I can use the code below:
obsc = (ds_obs_10mins.falloff>threshold).rolling(time=window, center=True).sum()#.resample(time='1h').asfreq()
Because most information is lost upon interpolation, I also made a contingency table in which I compare the not-interpolated ERA5 data to the observations, but it does not improve much.
# Contingency table using binary confusion matrix from sklearn
from sklearn.metrics import confusion_matrix
contingency = []
for i, name in enumerate(names):
for threshold in [0.5, 1, 1.5, 2.0, 2.5]:
# Get comparable data formats
obsc = (lljs_obs.falloff>threshold).sel(platform=name, version='1hmean').dropna(dim='time', how='all')
erah = (lljs_era.falloff>threshold).sel(platform=name, version='coll').dropna(dim='time', how='all')
# Change version to 'fair' to compare between the height-aligned/interpolated data.
a, b = xr.align(obsc, erah)
# Determine contingency table
tn, fp, fn, tp = confusion_matrix(a, b).ravel()/len(a)*100
contingency.append([name, threshold, 'hit', tp])
contingency.append([name, threshold, 'false', fp])
contingency.append([name, threshold, 'miss', fn])
contingency.append([name, threshold, 'negative', tn])
df = pd.DataFrame(contingency, columns=['name', 'threshold', 'quadrant', 'value'])
sns.catplot(x="threshold", y="value", hue="name", kind="bar", col='quadrant', col_wrap=2, data=df)
ERA5 and obs agree on many 'not occurrences' of low-level jets. Whenever a jet does occur, ERA5 is sometimes able to predict it, but there's also a fair amount of missed jets or false predictions. There are summarized by the following statistics:
all = df.query('threshold==1.').groupby('quadrant')['value'].sum()
'pod', all.hit/(all.hit+all.miss)
'csi', all.hit/(all.hit+all.miss+all.false)
'far', all.false/(all.miss+all.negative)
# lljs_era.to_dataframe().query('falloff>2').plot.kde(ax=ax, legend=False, title='Histogram: A vs. B')
fig, axs = plt.subplots(1, 2, figsize=(10, 4), dpi=100)
lljs_era.to_dataframe().query('falloff>2').groupby('version')['strength'].plot.kde(ax=axs[0], bw_method=0.5)
lljs_obs.to_dataframe().query('falloff>2').groupby('version')['strength'].plot.kde(ax=axs[0], bw_method=0.5, ls='--')
lljs_era.to_dataframe().query('falloff>2').groupby('version')['height'].plot.kde(ax=axs[1], bw_method=1)
lljs_obs.to_dataframe().query('falloff>2').groupby('version')['height'].plot.kde(ax=axs[1], bw_method=1, ls='--')
axs[0].set_title('A.')
axs[1].set_title('B.')
_handles, _labels = axs[0].get_legend_handles_labels()
shuffle = [_handles[i] for i in [2,0,3,1,4,5,6]]
_labels = ['full', 'subset', 'interpolated', 'subset & interp', '10 minutes', '1 h values', '1 h means']
axs[0].legend(shuffle, _labels, loc='best')
axs[0].set_xlim(0, 25)
axs[1].set_xlim(0, 300)
axs[0].ticklabel_format(axis='y', scilimits=(-1, 1))
axs[1].ticklabel_format(axis='y', scilimits=(-1, 1))
axs[0].set_xlabel('Jet strength (m/s)')
axs[1].set_xlabel('Jet height (m)' )
fig.savefig('../../reports/figures/jet_morphology', bbox_inches='tight')
plt.show()
Interpolation has a strong effect on the low-level jet morphology. The average jet strength before interpolation is about 8 m/s and reduces to 5 or 6 m/s upon interpolation. The jet height distribution is much narrower after interpolation, and the average jet height also reduces from 180 m to about 75 m. The effect of taking a subset in time is more pronounced for the interpolated dataset, and in this case has the effect of increasing the averaged jet height and strength, but could be different for other subsets, obviously. Below I plot the relative contributions of each month to the overall climatology. It can be seen that the months Januray through May are somewhat underrepresented.
Comparison with the observations reveals that the jet height is better represented after interpolation, but jet strength is better represented before interpolation. A reason for this could be that ERA5 tends to smear out the low-level jets. They would be located to high, too smooth, and too weak.
Can we use this data to infer a correlation and obtain an overall distribution representative of long-term observations? I'm tempted to say that we should just trust the observed distribution, especially since taking a subset does not seem to have a big influence on the distributions.
fig, ax = plt.subplots()
lljs_era.sel(version='fair').groupby('time.month').count(dim='time')['falloff'].to_pandas().plot(kind='bar', stacked=True, legend=False, ax=ax)
lgnd = ax.legend(ncol=1, markerscale=7, bbox_to_anchor=(1.05, 1), bbox_transform=ax.transAxes, loc='upper left', fontsize=12)
ax.set_title('Contribution of each month to subset climatology')
plt.show()
Before we continue I'll make a similar figure for jet height and strength on a per-platform basis in order to assess the spatial differences. However, these plots are much more impacted by the scarcity of the data.
fig, axs = plt.subplots(10, 2, figsize=(10, 16), dpi=100)
for i, name in enumerate(names):
lljs_era.sel(platform=name).to_dataframe().query('falloff>2').groupby('version')['strength'].plot.kde(ax=axs[i, 0], bw_method=0.5)
lljs_obs.sel(platform=name).to_dataframe().query('falloff>2').groupby('version')['strength'].plot.kde(ax=axs[i, 0], bw_method=0.5, ls='--')
lljs_era.sel(platform=name).to_dataframe().query('falloff>2').groupby('version')['height'].plot.kde(ax=axs[i, 1], bw_method=1)
lljs_obs.sel(platform=name).to_dataframe().query('falloff>2').groupby('version')['height'].plot.kde(ax=axs[i, 1], bw_method=1, ls='--')
axs[i, 0].set_ylabel(name)
axs[i, 1].set_ylabel(None)
axs[i, 0].set_xlim(0, 25)
axs[i, 1].set_xlim(0, 300)
axs[i, 0].ticklabel_format(axis='y', scilimits=(-1, 1))
axs[i, 1].ticklabel_format(axis='y', scilimits=(-1, 1))
axs[i, 0].set_title('Jet strength (m/s)')
axs[i, 1].set_title('Jet height (m)' )
fig.tight_layout()
plt.show()
Generally, jets near the coast seem to be somewhat closer to the surface than jets further offshore.
Let's use the same format as before to visualize the same cycle for different locations and different versions of the datasets.
def lljrate(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.month').mean(dim=dim)*100
def seasonal_cycle(ds, ax, shade=True, label=None):
mean = lljrate(ds, 2.0)
x = mean.month
ax.plot(x, mean, label=label)
if shade==True:
maxi = lljrate(ds, 1.5)
mini = lljrate(ds, 2.5)
ax.fill_between(x, maxi, mini, alpha=.3)
return mean
# fig, ax = plt.subplots()
# seasonal_cycle(lljs_full['MMIJ'], ax)
# plt.show()
fig, axs = keyfigure()
for name, label in zip(names, labels):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_intp = lljs_era.sel(platform=name, version='intp').dropna(dim='time', how='all')
ds_fair = lljs_era.sel(platform=name, version='fair').dropna(dim='time', how='all')
ds_obs = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
for ds, idx in zip([ds_full, ds_coll, ds_intp, ds_fair, ds_obs],[0,1,3,4,5]):
result = seasonal_cycle(ds, axs[idx], label=label+' ('+name+')')
all_platforms = []
for ds in [lljs_era.sel(version=v) for v in ['full', 'coll', 'intp', 'fair']]+[lljs_obs.sel(version='1hmean')]:
all_platforms.append(lljrate(ds, 2.0).rolling(month=3, min_periods=2, center=True).mean())
for i_ax, ds in zip([0, 1, 3, 4, 5], all_platforms):
scale = all_platforms[-1].mean()/ds.mean()
axs[i_ax].plot(np.arange(1,13), ds*scale, 'k--', label='All; scaled & smoothed')
axs[0].set_ylabel('LLJS (%)')
axs[3].set_ylabel('LLJS (%)')
axs[3].set_xticks(range(1,13,))
axs[3].set_xticklabels([str(x) for x in 'jfmamjjasond'])
lgnd = axs[5].legend(ncol=1, markerscale=7, bbox_to_anchor=(-.05, 1.), bbox_transform=axs[2].transAxes, loc='upper left')
fig.savefig('../../reports/figures/fivestep_seasonal', bbox_inches='tight')
plt.show()
Figure above shows, in a similar fashion as before 5, how the seasonal cycle of low-level jets (falloff>2m/s) is impacted by the limited temporal and vertical extent of the observations. For the full dataset, spanning 10 years and 500 m, the seasonal cycle is quite smooth and similar for each platform. Subsetting the data to the limited time windows of the observations makes the seasonal cycle already much more erratic. Some platforms don't even cover a complete seasonal cycle. Given the clear annual cycle that is visible in both model and the collective body of measurement data, such short datasets can lead to very biased climatologies. Upon interpolating the model data to observation height (thereby loosing the observations above roughly 300 m, we lose almost all of the low-level jets. As a result, the honest comparison between ERA5 and observations shows a much lower LLJ rate for the ERA5 data. The observations seem to capture the diurnal cycle quite well, though.
The dashed black lines are aggregates of all platforms. They are smoothed with a rolling window of 3 months and scaled by the ratio between the mean of each representation and the mean of the full dataset. This leads to very comparable climatologies between each representation, and we might wonder whether we can find appropriate scaling relations per platform. As such, we could reconstruct the seasonal cycle from the observations based on a longer period.
# I chose to use scikit-learn to perform the linear regression
from sklearn.linear_model import LinearRegression
reg = LinearRegression(fit_intercept=False)
fig, ax = plt.subplots()
# Determine overall scaling parameter
ds_obs = lljs_obs.sel(version='1hmean')
ds_coll = lljs_era.sel(version='coll')
sc_obs = lljrate(ds_obs, 2.0, dim=None).values.ravel()
sc_coll = lljrate(ds_coll, 2.0, dim=None).values.ravel()
# Plot the data
ax.scatter(sc_coll, sc_obs)
ax.set_xlabel('coll')
ax.set_ylabel('obs')
# Fit y = a*x with y = sc_obs and x = sc_coll (X needs to be a column vector)
# a, _, _, _ = np.linalg.lstsq(sc_coll[:, None], sc_obs, rcond=None) # in an older version I used numpy's linear regression
a = reg.fit(sc_coll[:, None], sc_obs).coef_
# Show the fit
ax.plot(np.sort(sc_coll), np.sort(sc_coll)*a, 'k--')
plt.show()
print('least squares gives slope a={:.2f}'.format(a[0]))
The figure below illustrates the methods discussion in section 6 of the paper.
fig, axs = plt.subplots(3, 2, figsize=(12, 8), sharex=True, sharey=True)
axs = axs.ravel()
data = []
for name, label in zip(names, labels):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_obs = lljs_obs.sel(platform=name, version='1hmean')
sc_full = lljrate(ds_full, 2.0)
sc_coll = lljrate(ds_coll, 2.0)
sc_obs = lljrate(ds_obs, 2.0)
months = np.arange(1, 13)
# Instead of using one global value for <a>, we can try to compute it for each platform
# Outcomment these lines to use to global <a> value determined above
nans = np.isnan(sc_coll.values) | np.isnan(sc_obs.values)
a = reg.fit(sc_coll.values[~nans, None], sc_obs.values[~nans]).coef_
print(name+', a={:.2f}'.format(a[0]))
data.append([name, ds_full.longitude.values, ds_full.latitude.values, a[0]])
axs[0].plot(months, sc_obs)
axs[1].plot(months, sc_full/sc_coll*sc_obs)
axs[2].plot(months, sc_coll)
axs[3].plot(months, sc_coll*a)
axs[4].plot(months, sc_full)
axs[5].plot(months, sc_full*a)
axs[0].set_xticks(range(1,13))
axs[0].set_xticklabels('jfmamjjasond')
axs[0].set_title('obs')
axs[1].set_title('corrected obs')
axs[2].set_title('coll')
axs[3].set_title('reconstructed obs')
axs[4].set_title('full')
axs[5].set_title('long-term corrected')
# axs[0].legend()
plt.show()
The figure above shows on the left (top to bottom), the seasonal cycle of the observations, the ERA5 data at the same time stamps (up to 500 m) and the full 10 years of ERA5 data.
The top right figure shows the observations multiplied with the ratio between the full and subsets of ERA5, i.e. the representativity correction discussed in the paper. It can be seen that the results are still far from smooth. For K13 (the light blue line), the collocated data shows that in May, the low-level jet rate was much lower than the climatological avarage. Therefore, in the corrected version, the low-level jet rate in may is drastically increased. This is unlikely to be a robust and reliable result.
The middle right figure is based on optimized scaling factors per platform. These are based on the linear slopes. For this figure, we determined the slope for each platform, and then applied this correction factor to the ERA5 collocated/subset data. That should lead to a seasonal cycle similar to the collocated data, but with the amplitudes adjusted to the observations.
The bottom right figure is the overall scaling factor of 0.44 aplied to the full (10-year) ERA5 data (bottom left). That results in exactly the same cycles but with a smaller amplitude.
df = pd.DataFrame(data, columns=['platform', 'longitude', 'latitude', 'scaling'])
df
from cartopy import crs as ccrs
from scipy.interpolate import griddata
proj = ccrs.LambertConformal(central_longitude=3.5, central_latitude=52, standard_parallels=(45,55))
xi = np.linspace(2, 5, 25)
yi = np.linspace(50, 54, 25)
xi, yi = np.meshgrid(xi, yi)
grid = griddata((df.longitude, df.latitude), df.scaling, (xi, yi), method='cubic')
fig = plt.figure()
ax = fig.add_subplot(111, projection=ccrs.PlateCarree(), extent=[2, 5.5, 51, 53.5])
cont = ax.contourf(xi, yi, grid, transform=ccrs.PlateCarree())
ax.scatter(df.longitude, df.latitude, s=df.scaling*500, c=df.scaling, edgecolor='k', transform=ccrs.PlateCarree())
fig.colorbar(cont, ax=ax, label='<obs/coll>')
ax.coastlines('10m')
plt.show()
It seems that the larger the distance to the coast, the closer the scaling is to 1. In other words, closer to the coast, ERA5 shows a more severe overestimation of the low-level jet rate. Note that this is based on monthly low-level jet rates. We can repeat this procedure for other aggregations, such as the diurnal cycle, lamb weather type, etc.
def paperfig_regression(lljrate, lljcount):
""" This reproduces the paper figure
But by taking the lljrate_definition as input, it can
also be applied to LLJ-rates per hour of day, stability class,
weather types, etc.
"""
from sklearn.linear_model import LinearRegression
reg = LinearRegression(fit_intercept=False)
fig = plt.figure(figsize=(9, 4), dpi=200)
# First axes hold VERY CLEAR overall fit and very subtle individual fits
ax0 = fig.add_subplot(121)
# First plot individual platforms
data = []
sens_coll = []
sens_obs = []
for name, label in zip(names, labels):
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_obs = lljs_obs.sel(platform=name, version='1hmean')
# Determine season cycle (sc; i.e. monthly means)
sc_coll = lljrate(ds_coll, 2.0).values
sc_obs = lljrate(ds_obs, 2.0).values
# # Sensitivity to falloff threshold
# # sens_coll.append((lljrate(ds_coll, 2.5).values - lljrate(ds_coll, 1.5).values).mean())
# # sens_obs.append((lljrate(ds_obs, 2.5).values - lljrate(ds_obs, 1.5).values).mean())
# # alternatively: determine standard deviations
# # mean of bionomial distribution: n*p
# # --> for 100 events, with probability 0.15, the average/expected LLJ rate is 15 (%)
# # std of a binomial distribution: sqrt( n*p*(1-p) )
# # --> for 100 events, with probability 0.15, the std is sqrt(15*(1-.15)) = 3.57 (%)
# # sens_coll.append(np.sqrt(sc_coll*(1-sc_coll/100)).mean())
# # sens_obs.append(np.sqrt(sc_obs*(1-sc_obs/100)).mean())
# # alternatively: determine stardard errors: https://stats.stackexchange.com/a/29644
# # SE = sqrt ( kpq/n ) where k = 100 (100 bernoulli trials), p=probability, q = 1-p, n = sample size
# count_coll = lljcount(ds_coll, 2.0).values
# count_obs = lljcount(ds_obs, 2.0).values
# se_coll = np.sqrt(sc_coll*(1-sc_coll/100)/count_coll)
# se_obs = np.sqrt(sc_obs*(1-sc_obs/100)/count_obs)
# sens_coll.append(np.sqrt(sc_coll*(1-sc_coll/100)/count_coll).mean())
# sens_obs.append(np.sqrt(sc_obs*(1-sc_obs/100)/count_obs).mean())
# Fit a slope
nans = np.isnan(sc_coll) | np.isnan(sc_obs)
a = reg.fit(sc_coll[~nans, None], sc_obs[~nans]).coef_
data.append([name, ds_coll.longitude.values, ds_coll.latitude.values, a[0]])
# Plot the data with high transparancy
ax0.scatter(sc_coll, sc_obs, alpha=0.4)
ax0.plot(np.sort(sc_coll), np.sort(sc_coll)*a, alpha=0.4, lw=1)
# Plot average error bar in bottom right
sens_coll = np.asarray(sens_coll)
sens_obs = np.asarray(sens_obs)
# ax0.errorbar(x=18, y=1, xerr=sens_coll[~np.isnan(sens_coll)].mean(), yerr=sens_obs[~np.isnan(sens_obs)].mean(), fmt='k', capsize=5)
# Plot overall statistics on top
# Determine overall scaling parameter
ds_obs = lljs_obs.sel(version='1hmean')
ds_coll = lljs_era.sel(version='coll')
sc_obs = lljrate(ds_obs, 2.0, dim=None).values.ravel()
sc_coll = lljrate(ds_coll, 2.0, dim=None).values.ravel()
# Fit y = a*x with y = sc_coll and x = sc_obs (needs to be a column vector for np.linalg.lstsq to work)
nans = np.isnan(sc_coll) | np.isnan(sc_obs)
a = reg.fit(sc_coll[~nans, None], sc_obs[~nans]).coef_[0]
# Plot the data
ax0.scatter(sc_coll, sc_obs, color='k', s=50, zorder=4, alpha=0.8)
ax0.plot(np.arange(20), np.arange(20)*a, 'k', lw=3)
ax0.set_xlabel('ERA5 LLJ rate (%)')
ax0.set_ylabel('Observed LLJ rate (%)')
ax0.set_title('A. Fits to monthly mean LLJ rates')
ax0.plot([0, 10], [0, 10], 'k--', lw=1)
#####################################################################
# Second axes holds the spatial variability of the scaling parameters
proj = ccrs.LambertConformal(central_longitude=3.5, central_latitude=52, standard_parallels=(45,55))
xi = np.linspace(2, 5, 25)
yi = np.linspace(50, 54, 25)
xi, yi = np.meshgrid(xi, yi)
df = pd.DataFrame(data, columns=['platform', 'longitude', 'latitude', 'scaling'])
grid = griddata((df.longitude, df.latitude), df.scaling, (xi, yi), method='cubic')
ax1 = fig.add_subplot(122, projection=ccrs.PlateCarree(), extent=[2, 5.5, 51, 53.5])
cont = ax1.contourf(xi, yi, grid, transform=ccrs.PlateCarree())
ax1.scatter(df.longitude, df.latitude, s=df.scaling*200, c=df.scaling, edgecolor='k', transform=ccrs.PlateCarree())
fig.colorbar(cont, ax=ax1, label='Slope', orientation='horizontal')
ax1.coastlines('10m')
ax1.set_title('B. Fitted parameters')
return fig
def lljrate_seasonal(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.month').mean(dim=dim)*100
def lljcount_seasonal(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.month').count(dim=dim)
fig = paperfig_regression(lljrate_seasonal, lljcount_seasonal)
fig.savefig('../../reports/figures/slope_llj_monthly', transparent=False, bbox_inches='tight')
plt.show()
def lljrate(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.hour').mean(dim=dim)*100
def diurnal_cycle(ds, ax, shade=True, label=None):
mean = lljrate(ds, 2.0)
x = mean.hour
ax.plot(x, mean, label=label)
if shade==True:
maxi = lljrate(ds, 1.5)
mini = lljrate(ds, 2.5)
ax.fill_between(x, maxi, mini, alpha=.3)
return mean
fig, axs = keyfigure()
means=[]
for name, label in zip(names, labels):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_intp = lljs_era.sel(platform=name, version='intp').dropna(dim='time', how='all')
ds_fair = lljs_era.sel(platform=name, version='fair').dropna(dim='time', how='all')
ds_obs = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
for ds, idx in zip([ds_full, ds_coll, ds_intp, ds_fair, ds_obs],[0,1,3,4,5]):
result = diurnal_cycle(ds, axs[idx], label=label)
all_platforms = []
for ds in [lljs_era.sel(version=v) for v in ['full', 'coll', 'intp', 'fair']]+[lljs_obs.sel(version='1hmean')]:
all_platforms.append(lljrate(ds, 2.0).rolling(hour=3, min_periods=2, center=True).mean())
for i_ax, ds in zip([0, 1,3,4,5], all_platforms):
scale = all_platforms[0].max()/ds.max()
axs[i_ax].plot(np.arange(24), ds*scale, 'k--', label='All; scaled & smoothed')
axs[0].set_ylabel('LLJS (%)')
axs[3].set_ylabel('LLJS (%)')
lgnd = axs[5].legend(ncol=1, markerscale=7, bbox_to_anchor=(-.05, 1.), bbox_transform=axs[2].transAxes,
loc='upper left', fontsize=12)
plt.show()
axs[3].set_xticks(range(0, 24, 6))
# fig.savefig('../../reports/figures/fivestep_lljdiurnal', bbox_inches='tight')
plt.show()
def lljrate_diurnal(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.hour').mean(dim=dim)*100
def lljcount_diurnal(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.hour').count(dim=dim)
fig = paperfig_regression(lljrate_diurnal, lljcount_diurnal)
plt.show()
But I'm not gonna tune all details... this is just for illustration
lwtfile = '/home/peter919/NoordzeeData/ERA5/synoptic_classification/data/processed/labels_LWT_Jones.csv'
lwt = pd.read_csv(lwtfile, index_col=0, parse_dates=True).LWT.to_xarray().rename({'index':'time'})
# LWTs are automatically sorted on groupby, I want to sort them in a different order:
mapping = ['CN', 'N', 'AN',
'CNE', 'NE', 'ANE',
'CE', 'E', 'AE',
'CSE', 'SE', 'ASE',
'CS', 'S', 'AS',
'CSW', 'SW', 'ASW',
'CW', 'W', 'AW',
'CNW', 'NW', 'ANW',
'C', 'U', 'A']
lwt
Let's forget about the differences between the platforms for now, and see how the dataset manipulations alter the climatology
def lljrate(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, lwt)
return (a>falloff).where(np.isfinite(a)).groupby(b).mean(dim=dim)*100
def barplot(ds, ax):
lljrate(ds, 2.0).to_dataframe().loc[mapping].plot(kind='bar', ax=ax, legend=False)
fig, axs = keyfigure(sharey=False)
fig.set_size_inches(20,12) # this effectively recudes the font size
barplot(lljs_era.sel(version='full'), axs[0])
barplot(lljs_era.sel(version='coll'), axs[1])
barplot(lljs_era.sel(version='intp'), axs[3])
barplot(lljs_era.sel(version='fair'), axs[4])
barplot(lljs_obs.sel(version='1hmean'), axs[5])
plt.show()
There are substantially more low-level jets for circulation patterns with a pronounced easterly component, and hardly and jets for westerly flows. The undefined weather type is a weak synoptic circulation, giving local processes that may lead to the formation of low-level jets the chance to develop.
def barplot(ds, ax):
lljrate(ds, 2.0, dim='time').to_dataframe().falloff.unstack('platform').loc[mapping].plot(kind='bar', ax=ax, legend=False, stacked=True)
fig, axs = keyfigure(sharey=False)
fig.set_size_inches(20,12) # this effectively recudes the font size
barplot(lljs_era.sel(version='full'), axs[0])
barplot(lljs_era.sel(version='coll'), axs[1])
barplot(lljs_era.sel(version='intp'), axs[3])
barplot(lljs_era.sel(version='fair'), axs[4])
barplot(lljs_obs.sel(version='1hmean'), axs[5])
lgnd = axs[5].legend(ncol=1, markerscale=7, bbox_to_anchor=(-.05, 1), bbox_transform=axs[2].transAxes,
loc='upper left', fontsize=12)
plt.show()
def lljrate_LWT(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, lwt)
return (a>falloff).where(np.isfinite(a)).groupby(b).mean(dim=dim)*100
def lljcount_LWT(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, lwt)
return (a>falloff).where(np.isfinite(a)).groupby(b).count(dim=dim)
fig = paperfig_regression(lljrate_LWT, lljcount_LWT)
plt.show()
Here we just compute a single 'representative' value for the bulk Richardson number and use it to group all aggretated platform data together, for illustration. Later on (for the paper figure), we also use platform-specific bulk-richardson numbers
fig, axs = plt.subplots(1, 2, figsize=(10, 4))
sfc = xr.open_mfdataset('../../data/external/ECMWF/ERA5/era5_20*_sfc.nc').sel(longitude=3, latitude=52, method='nearest')
rib = 9.81/(sfc.sst)*(sfc.t2m-sfc.sst)/2/(sfc.u10**2+sfc.v10**2)*100
rib.plot.hist(bins=np.linspace(-0.3, 0.3, 100), ax=axs[0])
# rib.to_pandas().plot.kde(bw_method=0.05, ax=axs[1])
axs[1].axvline(0, c='k')
def lljrate(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, rib)
bins = np.linspace(-0.3, 0.3, 51)
return (a>falloff).where(np.isfinite(a)).groupby_bins(b.rename('stab'), bins).mean(dim=dim)*100
binned = lljrate(lljs_era.sel(version='full'), 2.0)
fig, ax = plt.subplots()
bins = np.linspace(-0.3, 0.3, 51)
bin_centers = .5*(bins[1:]+bins[:-1])
ax.scatter(bin_centers, binned)
plt.show()
bins = np.linspace(-0.3, 0.3, 51)
bin_centers = .5*(bins[1:]+bins[:-1])
fig, axs = keyfigure()
axs[0].scatter(bin_centers, lljrate(lljs_era.sel(version='full'), 2.0))
axs[1].scatter(bin_centers, lljrate(lljs_era.sel(version='coll'), 2.0))
axs[3].scatter(bin_centers, lljrate(lljs_era.sel(version='intp'), 2.0))
axs[4].scatter(bin_centers, lljrate(lljs_era.sel(version='fair'), 2.0))
axs[5].scatter(bin_centers, lljrate(lljs_obs.sel(version='1hmean'), 2.0))
plt.show()
def lljrate_Rib(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, rib)
bins = np.linspace(-0.3, 0.3, 51)
return (a>falloff).where(np.isfinite(a)).groupby_bins(b.rename('stab'), bins).mean(dim=dim)*100
def lljcount_Rib(ds, falloff, dim=None):
a, b = xr.align(ds.falloff, rib)
bins = np.linspace(-0.3, 0.3, 51)
return (a>falloff).where(np.isfinite(a)).groupby_bins(b.rename('stab'), bins).count(dim=dim)
fig = paperfig_regression(lljrate_Rib, lljcount_Rib)
plt.show()
combine diurnal cycle, Ri_b
## As preprocessing, we compute the Richardson number for each platform
stab = lljs_era.sel(version='full').falloff.copy().rename('Rib').drop('version')
for name in names:
lon = lljs_era.sel(platform=name).longitude.values
lat = lljs_era.sel(platform=name).latitude.values
sfc = xr.open_mfdataset('../../data/external/ECMWF/ERA5/era5_20*_sfc.nc').sel(longitude=lon, latitude=lat, method='nearest')
rib = 9.81/(sfc.sst)*(sfc.t2m-sfc.sst)/2/(sfc.u10**2+sfc.v10**2)*100
stab.loc[dict(platform=name)] = rib.values # See https://stackoverflow.com/a/40030606/6012085
def diurnal_cycle(ds, ax=None, shade=True, label=None):
def _lljrate(ds, falloff, dim=None):
return (ds.falloff>falloff).where(np.isfinite(ds.falloff)).groupby('time.hour').mean(dim=dim)*100
mean = _lljrate(ds, 2.0)
x = mean.hour
if ax is not None:
ax.plot(x, mean, label=label)
if shade==True:
maxi = _lljrate(ds, 1.5)
mini = _lljrate(ds, 2.5)
ax.fill_between(x, maxi, mini, alpha=.3)
return mean
def stability_cycle(ds, stab, ax=None, **kwargs):
def _lljrate(ds, bins, falloff=2.0, dim=None):
a, b = xr.align(ds.falloff, stab)
return (a>falloff).where(np.isfinite(a)).groupby_bins(b.rename('stab'), bins).mean(dim=dim)*100
bins = np.linspace(-0.3, 0.3, 51)
bin_centers = .5*(bins[1:]+bins[:-1])
means = _lljrate(ds, bins, falloff=2.0)
if ax is not None:
ax.scatter(bin_centers, means, **kwargs)
return means
fig, axs = plt.subplots(2, 3, figsize=(9, 6), dpi=200)
axs = axs.ravel()
#######################################################
means=[]
for name, label in zip(names, labels):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_obs = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
for ds, idx in zip([ds_full, ds_coll, ds_obs],[0,1,2]):
result = diurnal_cycle(ds, axs[idx], label=label)
# Add all-platform means
all_platforms = []
for ds in [lljs_era.sel(version='full'), lljs_era.sel(version='coll'),+lljs_obs.sel(version='1hmean')]:
all_platforms.append(diurnal_cycle(ds, ax=None).rolling(hour=3, min_periods=2, center=True).mean())
for i, ds in enumerate(all_platforms):
scale = all_platforms[-1].mean()/ds.mean()
axs[i].plot(np.arange(24), ds*scale, 'k--', label='All; scaled & smoothed')
for ax in axs[[0, 1, 2]]:
ax.set_ylim(0, 15)
########################################################
# Similarly for stability
for name, label in zip(names, label):
ds_full = lljs_era.sel(platform=name, version='full')
ds_coll = lljs_era.sel(platform=name, version='coll')
ds_obs = lljs_obs.sel(platform=name, version='1hmean').dropna(dim='time', how='all')
for ds, idx in zip([ds_full, ds_coll, ds_obs],[3,4,5]):
result = stability_cycle(ds, stab.sel(platform=name), axs[idx], alpha=0.5)
# Add all-platform means
all_platforms = []
for ds in [lljs_era.sel(version='full'), lljs_era.sel(version='coll'),+lljs_obs.sel(version='1hmean')]:
all_platforms.append(stability_cycle(ds, stab, ax=None).rolling(stab_bins=3, min_periods=2, center=True).mean())
for i, ds in zip([3,4,5], all_platforms):
scale = all_platforms[-1].mean()/ds.mean()
bins = np.linspace(-0.3, 0.3, 51)
bin_centers = .5*(bins[1:]+bins[:-1])
axs[i].plot(bin_centers, ds*scale, 'k--')
for ax in axs[[3, 4, 5]]:
ax.set_ylim(0, 40)
# Decoration
axs[0].set_title('A. ERA5 10 years')
axs[1].set_title('B. ERA5 collocated')
axs[2].set_title('C. Observations')
axs[3].set_title('D.', loc='left')
axs[4].set_title('E.', loc='left')
axs[5].set_title('F.', loc='left')
axs[0].set_xlabel('Hour of day (UTC)')
axs[1].set_xlabel('Hour of day (UTC)')
axs[2].set_xlabel('Hour of day (UTC)')
axs[3].set_xlabel('Bulk Richardson number [-]')
axs[4].set_xlabel('Bulk Richardson number [-]')
axs[5].set_xlabel('Bulk Richardson number [-]')
axs[0].set_ylabel('LLJ rate (%)')
axs[3].set_ylabel('LLJ rate (%)')
fig.tight_layout()
fig.savefig('../../reports/figures/lljs_diurnal_and_stability', bbox_inches='tight')
plt.show()
Anticyclonic weather types occur much more often. This corresponds to the climatology of Fealy and Mills.
Jones 2014 has a nice overview with some phase space figures about the LLJ classification. It can be seen that the A area is bigger than most others.
lwt.to_dataframe().assign(test=1).groupby('LWT').count().plot(kind='bar', legend=False)