Journal cover Journal topic
Wind Energy Science The interactive open-access journal of the European Academy of Wind Energy
Journal topic
Wind Energ. Sci., 3, 845-868, 2018
https://doi.org/10.5194/wes-3-845-2018
Wind Energ. Sci., 3, 845-868, 2018
https://doi.org/10.5194/wes-3-845-2018

Research articles 05 Nov 2018

Research articles | 05 Nov 2018

# Assessing variability of wind speed: comparison and validation of 27 methodologies

Assessing variability of wind speed
Joseph C. Y. Lee1,2, M. Jason Fields1, and Julie K. Lundquist1,2 Joseph C. Y. Lee et al.
• 1National Renewable Energy Laboratory, Golden, CO 80401, USA
• 2Department of Atmospheric and Oceanic Sciences, University of Colorado Boulder, Boulder, CO 80309, USA
Abstract

Because wind resources vary from year to year, the intermonthly and interannual variability (IAV) of wind speed is a key component of the overall uncertainty in the wind resource assessment process, thereby creating challenges for wind farm operators and owners. We present a critical assessment of several common approaches for calculating variability by applying each of the methods to the same 37-year monthly wind-speed and energy-production time series to highlight the differences between these methods. We then assess the accuracy of the variability calculations by correlating the wind-speed variability estimates to the variabilities of actual wind farm energy production. We recommend the robust coefficient of variation (RCoV) for systematically estimating variability, and we underscore its advantages as well as the importance of using a statistically robust and resistant method. Using normalized spread metrics, including RCoV, high variability of monthly mean wind speeds at a location effectively denotes strong fluctuations of monthly total energy generation, and vice versa. Meanwhile, the wind-speed IAVs computed with annual-mean data fail to adequately represent energy-production IAVs of wind farms. Finally, we find that estimates of energy-generation variability require 10±3 years of monthly mean wind-speed records to achieve a 90 % statistical confidence. This paper also provides guidance on the spatial distribution of wind-speed RCoV.

1 Introduction

The P50, a widely used parameter in the wind-energy industry, is an estimate of the threshold of annual energy production of a wind farm that the facility is expected to exceed 50 % of the time (Clifton et al., 2016). The P50 is usually estimated to apply over the lifetime of a wind farm, typically 20 years. To estimate P50 in the wind resource assessment process, a single percentage value is usually assigned to represent the uncertainty for the desired time period at a wind site (Brower, 2012). The interannual variability (IAV) of wind resources, along with site measurements and wind-power-plant performance, is an important component of the overall uncertainty in power production (Clifton et al., 2016; Klink, 2002; Lackner et al., 2008; Pryor et al., 2006). The IAV is also incorporated in the measure–correlate–predict process (Lackner et al., 2008), which usually considers wind measurements spanning less than 2 years.

Analysts and researchers use numerous metrics to quantify wind-speed variability, and the most common method is standard deviation (σ). For instance, the variability in historical or future wind resources is often represented as the σ from the annual-mean wind speed of a certain location (Brower, 2012). As wind turbine power generation is a function of wind speed, the variability of wind resources has important implications for the resultant long-term energy production. Financially, when the wind resource is projected to fluctuate more from year to year (Hdidouan and Staffell, 2017), the levelized cost of wind energy increases as well.

Because the profitability of wind farms depends on wind variability, past research has explored the implications of interannual and long-term variability in wind energy. Pryor et al. (2009) analyze trends of annual wind speed and IAV, without explicitly quantifying IAV values. Archer and Jacobson (2013) evaluate the seasonal variability of wind-energy capacity factor. Lee et al. (2018) assess the spatial discrepancies between wind-speed variabilities of different temporal scales, from hourly mean to annual-mean data. Bett et al. (2013) use σ and Weibull parameters to assess the wind variability in Europe. Extreme event analysis also offers another perspective to assess variability. For example, Cannon et al. (2015) examine extreme wind-energy generation events via reanalysis data and discuss the associated seasonal and IAV qualitatively. Leahy and McKeogh (2013) also quantify the return periods of multiweek wind droughts.

To quantify variability, the normalized σ or the coefficient of variation (CoV), the σ divided by the mean of a time series, is a commonly used tool. Justus et al. (1979) calculate and compare the CoVs of monthly and annual wind speeds at different sites across the United States. Baker et al. (1990) quantify interannual and interseasonal variations of both wind speed and energy production at three locations in the Pacific Northwest. They find the annual CoVs ranged from 4 % to 10 %, matching the conclusions from Justus et al. (1979). Recently, Li et al. (2010) calculate hub-height wind-speed variance and σ over 30 years to spatially evaluate seasonal and IAV in the Great Lakes region. Bodini et al. (2016) estimate the IAV of wind resources with a modified version of CoV, using observed meteorological data in Canada. As the sample period increases, the IAVs of most sites gradually increase, averaging 5 % to 6 % among the chosen sites (Bodini et al., 2016). Krakauer and Cohan (2017) correlate the CoVs of monthly mean wind speeds with different climate oscillation indices and find the global mean CoV at 8 %. In addition to characterizing wind speed, the metric is also used to evaluate the benefits of grid integration. For example, Rose and Apt (2015) conclude that the interannual CoV of aggregate wind-energy generation in the central United States is 3±0.1 %, much smaller than that of individual wind plants, which varies between 5.4 % and 12 %, ±4.2 %.

Aside from CoV, other metrics representing the spread of data have also been chosen to estimate variability in the literature. For example, the robust coefficient of variation (RCoV) normalizes the median absolute deviation (MAD) with the median. Gunturu and Schlosser (2012) quantify the spatial RCoV of wind-power density in the United States and demonstrate that the regions east of the Rockies, especially the Plains, generally have weaker variability and higher availability of wind resources. The seasonality index, originally used in Walsh and Lawler (1981) for precipitation purposes, is another measure to express variability. The seasonality index is defined as the sum of the absolute deviations of monthly averages from the annual mean, normalized with the annual mean. Chen et al. (2013) use the seasonality index to assess the interannual trend and the variability of wind speed in China, and they relate wind-speed IAVs to climate oscillations.

Alternative variability metrics emphasize the long-term trends via contrasting wind speeds of different periods. The “wind index”, used in Pryor et al. (2006) and Pryor and Barthelmie (2010), is a ratio of wind speeds of a reference period and an analysis period. An entirely different wind index evaluated in Watson et al. (2015) is a ratio of spatially averaged wind speeds during two different periods.

Despite the importance of long-term variability, the wind-energy industry lacks a systematic method to quantify this uncertainty. As various metrics to assess variability exist, a comprehensive comparison of measures is necessary. Therefore, the goal of this study is to evaluate various methods of estimating intermonthly and IAV in a reliable way using a long-term, consistent database. Specifically, our objective is to determine an optimal metric or metrics for relating wind-speed variability to energy-production variability. We describe the wind-speed and energy-generation data, the methodology, and the chosen variability metrics in Sect. 2. We evaluate different variability measures via two case studies in Sect. 3. We also contrast the results computed from monthly mean and annual-mean data, and we illustrate the spatial distribution of wind-speed variability in Sect. 3. We then recommend the best practice in using the ideal method in Sect. 4. We focus on the applicability of imposing such metrics to quantify the variabilities of wind speeds and wind-energy production.

2 Data and methodology

## 2.1 Wind and energy data

In this study, we use a 37-year time series of monthly mean wind speed and monthly total wind-energy production in the contiguous United States (CONUS). For wind speed, we use hourly horizontal wind components in the National Atmospheric and Space Administration's Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), reanalysis data set (Gelaro et al., 2017; GMAO, 2015) from 1980 to 2016. We use these components to derive the monthly mean wind speed at 80 m above the surface, which represents hub height in this study, via the power law (Eq. 1) and the hypsometric equation (Eq. 2):

$\begin{array}{}\text{(1)}& & \frac{u\left({z}_{\mathrm{2}}\right)}{u\left({z}_{\mathrm{1}}\right)}={\left(\frac{{z}_{\mathrm{2}}}{{z}_{\mathrm{1}}}\right)}^{\mathit{\alpha }},\text{(2)}& & {z}_{\mathrm{2}}-{z}_{\mathrm{1}}={R}_{\mathrm{d}}\stackrel{\mathrm{‾}}{T}\mathrm{ln}\left(\frac{{p}_{\mathrm{2}}}{{p}_{\mathrm{1}}}\right).\end{array}$

In Eq. (1), u(z1) and u(z2) are the horizontal wind speeds, at heights z1 and z2, in which wind speeds are the square root of the sum of squared horizontal wind components, and α is the shear exponent. In Eq. (2), Rd is the dry air gas constant, $\stackrel{\mathrm{‾}}{T}$ is the average temperature between levels z1 and z2, and p1 and p2 are the atmospheric pressures at z1 and z2. In most grid cells, we use the MERRA-2 meteorological output at 10 and 50 m above the surface to calculate α, so as to extrapolate the wind speed at 80 m. In mountainous regions, the heights at 850 or 500 hPa may be closer to 80 than 10 m above the surface; in that case, we use data at the next available level of 850 or 500 hPa to derive the heights of that level and thus to extrapolate the wind speed at 80 m.

The horizontal resolution of the MERRA-2 is 0.5 in latitude (about 56 km) and 0.625 in longitude (about 53 km). The MERRA-2 reanalysis interpolates the data and the metadata at the exact output latitude and longitude; hence the wind speed, air density, and elevation refer to the grid points with the particular sets of latitude and longitude (Bosilovich et al., 2016). Thus, the longest distance between a wind farm and the closest MERRA-2 grid-cell center is about 39 km.

For energy-production data, we use the net monthly energy production of wind farms in megawatt hours (MWh) from the US Energy Information Administration (EIA) between 2003 and 2016. Each of the wind farms has a unique EIA identification number. After we leave out about 300 wind sites with incomplete or substantially zero production data, a total of 607 wind farms in the CONUS are selected for this analysis. For simplicity, the CONUS in this analysis is defined as the area bounded by 127 W, 65 W, 24 N, and 50 N, and geographically includes the 48 states in CONUS and Washington, D.C. (Fig. 1).

## 2.2 Methodology

### 2.2.1 Linear regression and data post-processing

We focus on the direct relationship between wind speed and energy production to investigate approaches for calculating long-term variability. Therefore, we must minimize the influence from other determinants of energy production, such as curtailment and maintenance. First, we eliminate data with zero values for monthly energy production, which is typical in the first months of a new wind farm. Next, we linearly regress the monthly total energy production on the monthly mean MERRA-2 80 m wind speed at the closest grid point to each wind farm from 2003 to 2016. In other words, each wind site is assigned its own regression equation. We then remove any production data below the 90 % prediction interval to exclude underproduction for reasons other than low wind speeds, and omit the data above the 99 % prediction interval, or potentially erroneous overproduction. Prediction intervals are calculated via the t values and the standard error of prediction (Montgomery and Runger, 2014). In other words, we define the outliers of energy production using the threshold of 1.64 times below the standard error and 2.58 times above the standard error of the site-specific regression. We also apply a third-order polynomial fit (Archer and Jacobson, 2013), and it leads to very similar results to the linear model. Hence, we focus on presenting the results from the linear fit in this study.

After regressing the outlier-free energy data on wind speed, we then filter the wind farms based on the coefficient of determination (R2), which indicates the confidence of the linear regression. We select the R2 threshold of 0.75: 349 of the original 607 wind farms pass this filter. Through this filter, we ensure that wind speed is the primary driver of energy production in the wind farms with high R2 values. Lunacek et al. (2018) also use a similar R2-filtering method with a threshold of 0.7. Considering some farms lack years of complete generation data, we extend the monthly energy production to 37 years using the same site-specific linear models with the monthly MERRA-2 wind speed. In other words, we compute any missing energy-production data from 1980 to 2016 based on the linear fit from the years that do exist in the data set. Herein, we refer to this long-term extension of data as the predicted energy production. Of the 349 wind farms, 7.5 years is the median of the energy data that are derived via the linear fit, given the available EIA records between 2003 and 2016.

We then further apply a second filter using the Pearson's correlation coefficient (r) between the predicted and actual monthly energy production, and we only choose the 195 wind farms with r larger than 0.8. As a result, of the r-filtered wind sites, we ensure wind speed is the primary driver of wind-power production, and we confirm the energy predictions match well with those observed.

The nonfiltered, R2-filtered, and r-filtered wind farms carpet most of the popular wind farm regions across the CONUS (Fig. 1), even with the high r threshold of 0.8. Thus, the r-filtered samples provide a sufficient representation of the wind farms across the United States. To illustrate our analysis with examples, we select one site in Oregon (OR) and another site in Texas (TX) that demonstrate distinct wind-speed distributions. We choose the two sites to contrast the results of different variability metrics throughout the paper; both sites pass the r filter (Fig. 1).

Figure 1Wind farm locations in the CONUS: nonfiltered 607 sites in dark red, R2-filtered 349 sites in orange, and r-filtered 195 sites in yellow. The yellow square represents the Oregon site and the yellow star indicates the Texas site (Table 2). The grey box illustrates the boundary of the CONUS used in this study.

Recognizing that the horizontal resolution of the MERRA-2 data could be perceived as undermining the linear regressions, we explore any possible role of the distance between the closest MERRA-2 grid point and the actual wind farm, but we find no statistical relationship. In particular, horizontal and vertical discrepancies between the model and the observations do not affect the resultant R2 in the linear regressions. More than half of the 607 wind farms pass the R2 filter, and more than half of those pass the r filter (Fig. 2a). Additionally, the correlation between R2 and the horizontal distance between the closest MERRA-2 grid point and the actual wind farm is close to zero (Fig. 2b); the correlation between R2 and the vertical difference between the modeled grid point and the actual wind site is also weak (Fig. 2c). In other words, the horizontal and vertical distances between the MERRA-2 grid points and the wind farms have no apparent impact on the representativeness of the wind farms in the linear regression.

Figure 2(a) Histogram of R2 of all nonfiltered sites (dark red), R2-filtered sites (orange), and r-filtered sites (yellow); (b) scatterplot of the R2 and the horizontal distance between the closest MERRA-2 grid cell and the actual locations of the sites using the same color scheme in (a); (c) scatterplot of the R2 and the elevation difference between the closest MERRA-2 grid cell and the actual locations of the wind sites using the same color scheme in (a). The r in (b) and (c) represents the Pearson's r using all nonfiltered sites.

Additionally, we analyze the uncertainty of the linear-regression method. We first test the influence of the error term in the regression, to account for the uncertainty associated with the input data. After a wind farm passes the R2 threshold of 0.75, we add a random value within 1 standard error to the predicted energy production of each month. This random error term introduces uncertainty to the regression process but does not affect the R2 of the site-specific regression. Furthermore, we also test the sensitivity of the R2 and r thresholds by analyzing the results after modifying those limits. Specifically, we loosen the R2 and r thresholds to 0.6 and 0.7, and we tighten the R2 and r thresholds to 0.85 and 0.9. Loosening these thresholds increases the sample sizes of the wind farms that pass the filters and tightening the thresholds results in the opposite.

We test other factors that could undermine these regressions. We considered the hub-height air density extrapolated from MERRA-2 as another regressor in the regressions, but air density is a statistically insignificant predictor and thus is not discussed in the rest of this study. When we replace the prediction interval with the confidence interval, the sample sizes increase from 349 and 195 sites to 555 and 209 wind farms. However, at least 7 years of energy data are derived from the regression for 99 % of the samples, because confidence intervals are smaller than prediction intervals by definition. We also considered removing the long-term means and the impacts of annual cycles, yet the sample sizes decrease to 121 and 69 locations, and the regression fills at least some of the energy data for more than 99 % of the sites. Finally, to ensure these results were not specific to the MERRA-2 data set, we perform the same analysis on the ERA-Interim reanalysis data set (Dee et al., 2011). The results of the key variability parameters such as σ, CoV, and RCoV resemble the findings using MERRA-2; hence we focus on the MERRA-2 findings in this study.

Our analysis, although comprehensive, is constrained by the quality of our data. On the one hand, reanalysis data sets have errors and biases in wind-speed predictions from complexities in elevation and surface roughness (Rose and Apt, 2016). Reanalysis data sets also demonstrate long-term trends of surface wind speeds (Torralba et al., 2017). The MERRA-2 data set can also depict different meteorological environments than those at the wind farm locations, especially in complex terrain. The MERRA-2 data of coarse temporal and spatial resolutions may also represent a lower intermonthly or IAV than the wind sites actually experience. Thus, regressing actual energy production on reanalysis wind speed adds uncertainty to our analysis. On the other hand, constrained by the monthly total energy-production data from the EIA, our analysis ignores the signals finer than monthly cycles. The quality of the EIA data also varies across wind sites; therefore the filtering process via linear regression is necessary.

### 2.2.2 Variability metrics relating wind speeds and energy production

To evaluate the variabilities of both the wind speeds and the predicted energy generation from the filtered wind farms, we investigate a total of 27 combinations and variations of existing methods describing the spread of data. We categorize different variability metrics according to statistical robustness (insensitivity to assumptions about the data; for example, Gaussian distribution) and statistical resistance (insensitivity to outliers) (Wilks, 2011). Of the 27 variability methods tested, we select four representative measures to perform a comparison and discuss in detail, according to their robustness, resistance, and the nature of normalization by an average metric:

1. RCoV, defined as the MAD divided by the median (Gunturu and Schlosser, 2012; Watson, 2014), is a spread metric divided by an average metric and is both statistically robust and resistant.

2. Range (maximum minus minimum) divided by trimean (weighted average among quartiles) is a spread metric normalized by an average metric, and the numerator is not resistant.

3. CoV (Baker et al., 1990; Bodini et al., 2016; Hdidouan and Staffell, 2017; Krakauer and Cohan, 2017; Rose and Apt, 2015; Wan, 2004), defined as the σ divided by the mean, is a spread metric normalized by an average metric, and neither the denominator nor the numerator are robust or resistant.

4. σ is simply a spread metric that is not robust or resistant.

Among the four measures, only RCoV is completely statistically robust and resistant, and the first three methods are all normalized spread metrics. We further describe all the tested variability methods comprehensively in Table B1 in Appendix B. Each of these metrics is easy to implement via basic Python packages such as NumPy and SciPy with no more than a few lines of code. In addition, based on the exponential scaling relationship between power and wind speed developed by Bandi and Apt (2016), we also analyze the results from the exponential CoV and the exponential RCoV in this paper (Table B1).

In addition to calculating variabilities with the spread measures, we evaluate other diagnostics that describe distribution characteristics. These diagnostics include averaging metrics, such as the arithmetic mean (not resistant) and median (the 50th percentile, which is resistant); symmetry metrics, such as skewness (involving the third moment, not robust or resistant) and the Yule–Kendall Index (YKI, robust and resistant); a tailedness metric, namely kurtosis (involving the fourth moment, not robust or resistant); the Weibull scale and shape parameters (not robust); and the autocorrelation with a 1-year lag to dissect the interannual cycles. We summarize the diagnostics evaluated in this analysis in Table B2. Along with the regression results, results from the four representative variability metrics and other distribution diagnostics demonstrate differences between the two selected sites (Table 2).

Herein, we quantify the variabilities of the 37-year extended time series of wind speed and energy production via different methods, using a range of time frames: 1 year, 2 years, and up to 37 years for each wind farm. A metric is considered useful when the resultant wind-speed variability correlates well with the resultant energy-production variability across wind farms, even when random errors are implemented and the thresholds R2 and r are changed. In this analysis, we compare results with three correlation metrics: Pearson's r, Spearman's rank correlation coefficient (rs), and Kendall's rank correlation coefficient (τ) (Table 1).

Table 1Details of the three correlation metrics applied, adapted from Wilks (2011). All three metrics yield values between −1 and 1.

To assess the applicable time frames of various variability metrics, we evaluate the asymptote period of correlations for each method. In most cases, the correlation coefficients approach the 37-year value after a certain analysis time frame. Using RCoV as an example, the Pearson's r's of shorter analysis periods (1-year, 2-year, etc.) gradually converge to the 37-year value at 0.856 as the RCoV-calculation time frame expands (Fig. 5a). Hence, for each metric, assuming the 37-year correlation coefficient represents the long-term correlation, we calculate the normalized differences between the correlation coefficients and the 37-year value in each time frame, starting from 1 year. When the absolute mean of the normalized differences drops below 0.05 in a particular year, we determine that year as the length of data required for reliable results via that variability method. In other words, the asymptote year of a certain metric illustrates that the error of the resultant correlation between wind-speed and energy-production variability via that data length is less than 5 % from the long-term value. For example, the asymptote period of RCoV correlations is 3 years according to Pearson's r (Table 3).

To relate the IAVs between wind speed and energy production, we also perform the same analysis for annual-mean data. Strictly speaking, calculating the variabilities using monthly mean data yields intermonthly variabilities, because the results account for monthly, seasonal, and annual signals. To isolate the signals from interannual variations, we also examine the metrics and their correlations between the annual means of hub-height wind speeds and energy production, after linear regressing and filtering via monthly data. However, the samples from each site are then limited to 37 data points of annual wind speed and energy production. Besides, selecting de-trended data from long-term means to calculate variabilities and their correlations leads to trivial results because of the small sample sizes and hence is omitted in this study.

### 2.2.3 Investigation of wind-speed RCoV

After we demonstrate that RCoV is the most systematic approach in linking wind-speed and energy-generation variabilities in Sect. 3.2, we further examine the details of using RCoV, specifically determining the minimum length of wind-speed data necessary to quantify variability effectively. We use 37 years of wind speed in every MERRA-2 grid cell in the CONUS (a total of 5049 grid points), and we calculate the RCoVs with 1 to 37 years of data for each grid cell. Because the RCoVs calculated using data between 1980 and 2016 are only samples of the true long-term wind-speed variability and hence the results involve uncertainty, we select a confidence interval approach.

Table 2Site details, monthly means, and annual means of various metrics at the two selected sites based on 37 years of monthly and annual wind speeds, and 37 years of predicted and actual energy production; and the CONUS medians of wind-speed metrics using 37 years of monthly and annual-mean data.

We assume that the distribution of RCoV is Gaussian with infinite years of wind speed. Hence, we use a chi-square (χ2) distribution to set bounds for the σ's from samples of RCoV. In other words, because the derived RCoVs differ with the years of wind speeds sampled, we use the χ2 distribution to quantify the confidence intervals of RCoV for each sample size. To determine the minimum data required for RCoV calculation, we use the following criterion (Montgomery and Runger, 2014):

$\begin{array}{}\text{(3)}& {\mathit{\sigma }}_{\mathrm{37}}\ge \left|\sqrt{\frac{\left({n}_{i}-\mathrm{1}\right){\mathit{\sigma }}_{i}^{\mathrm{2}}}{{\mathit{\chi }}_{\mathit{\alpha }/\mathrm{2},{n}_{i}-\mathrm{1}}^{\mathrm{2}}}}\right|,\end{array}$

where σ37 is the predetermined 37-year σ of RCoV; ni is the sample size of n years in year i, which is between 1 and 36 years; ${\mathit{\sigma }}_{i}^{\mathrm{2}}$ is the variance of the sample of RCoVs in year i; and ${\mathit{\chi }}_{\mathit{\alpha }/\mathrm{2},{n}_{i}-\mathrm{1}}^{\mathrm{2}}$ is the percentage point of the χ2 distribution given the confidence level of α and the degrees of freedom of ni−1. We select a pair of α levels, 90 % and 95 %; hence we use four percentage points of the χ2 distribution at 0.025, 0.05, 0.95, and 0.975 to construct the respective confidence intervals. Because the 37-year RCoV is an estimate of the truth, which is the wind-speed RCoV of infinite years, its singular value does not yield any variance or possess any distribution shape. Thus, to construct the confidence interval of the σ of the truth, we set the predetermined σ37 as a fraction of the 37-year RCoV. Particularly, the σ37's are 10 % and 5 % of the 37-year RCoV for the 90 % and 95 % confidence levels, respectively.

Figure 3(a) Time series of MERRA-2 monthly mean 80 m wind speed (black), actual monthly net EIA energy production (lime), and extended monthly energy production from 1980 to 2016 based on linear regression (green) at the OR site; (b) time series at the TX site with the same annotations as in (a); (c) histograms of MERRA-2 monthly mean wind-speed distribution (black) and yearly mean wind-speed distribution (grey) at the OR site from 1980 to 2016. The blue curve indicates the Gaussian fit of the monthly mean wind speeds via the mean and the σ, and the cyan curve represents the Gaussian fit of the annual-mean data; (d) histograms and curves of the Gaussian fit of wind-speed distributions at the TX site with the same annotations as in (c).

In summary, for each grid point, we first determine an uncertainty bound based on the 37-year wind-speed RCoV of the location: we assign a 37-year σ, which is either 5 % or 10 % of the 37-year RCoV and, depending on the confidence level, has either a 95 % or 90 % confidence level. For each year i, from 1 to 37 years, we calculate the pairs of χ2-derived σ's of year i, which represent the lower and upper bounds of the confidence interval. When both of the χ2-derived σ's become smaller than the predetermined 37-year σ, year i becomes the minimum length of data required to calculate RCoV effectively at the specific confidence level. We analyze the wind-speed RCoV via both monthly mean and annual-mean wind speeds. We label the resultant minimum length of wind-speed data based on the χ2 method as the convergence year, in contrast to the asymptote period which determines the asymptote year of correlation coefficients.

3 Results

## 3.1 Case studies: Oregon and Texas sites

We select two sites from two different geographical regions with considerable wind-energy deployment, the southern Plains and the Pacific Northwest in the United States, to contrast the results of various variability metrics. Based on the site-specific regressions, we extend the monthly energy-production time series to 37 years (Fig. 3a and b) for the two sites. Both sites pass the R2 filter at 0.75 and the r filter at 0.8. Although the OR site is farther from the closest MERRA-2 grid point in a region with more complex terrain, the resultant R2 (0.87) and predicted–actual-energy Pearson's r (0.91) are larger than those of the TX site (0.79 and 0.81, respectively) (Table 2). The 37-year-average wind speed of about 7.6 m s−1 at the TX site is larger than that of the OR site at about 6.8 m s−1 (Table 2). Additionally, the 12-month-lag autocorrelations demonstrate that the annual cycle of monthly wind speeds of the TX site is stronger than that of the OR site, yet the autocorrelations of the sites, 0.53 and 0.32, are still lower than the CONUS median of 0.58 (Table 2).

None of the monthly and annual wind-speed distributions of the sites are perfectly Gaussian. According to the kurtosis, skewness, and YKI values of the monthly mean wind speeds (Table 2), the monthly wind-speed distribution at the OR site skews towards lower wind speeds with more and stronger extremes (Fig. 3c). The skewed distribution at the OR site leads to 71.2 % of the monthly wind speeds located within 1σ from the mean, compared to the classic Gaussian of 68.3 %. Nevertheless, although the TX site monthly wind-speed distribution is very close to symmetric with fewer outliers (Fig. 3d), which is supported by near-zero skewness and YKI (Table 2), only 64.6 % of monthly data fall within 1σ from its mean. For annual-mean wind speeds, the averaging with a 12-month time span at both sites reduces the ranges and thus leads to kurtosis close to −1 (Table 2). Although the skewness and YKI are close to 0 (Table 2), only 59.5 % and 56.8 % of the annual-mean wind speeds fall within 1σ from the means of the OR and TX sites, respectively.

Figure 4Scatterplots of 37-year wind-speed variability and energy variability via four metrics: (a) RCoV, (b) $\frac{\mathrm{range}}{\mathrm{trimean}}$, (c) CoV, and (d) σ, based on monthly data from the 195 r-filtered wind sites. Each black dot represents each filtered site, and the r value at the corner of each panel indicates the Pearson's r between each pair of wind-speed and energy-production spread metrics. The yellow square and the yellow star denote the OR and the TX sites, respectively.

The four selected variability methods yield similar resultant monthly variabilities that are close to the respective CONUS medians based on the 37-year monthly data. For variabilities of monthly wind speeds, the differences between the two sites are slight because the comparison among the results of the four metrics is inconclusive (Table 2): the monthly variabilities are not far from the national medians (Table 2). However, results from the normalized spread metrics (RCoVs, range divided by trimean, and CoV) using the 37-year and the observed energy production illustrate that the OR site generates more variable wind power than the TX site (Table 2). The magnitudes of the variabilities between the 37-year and the actual monthly energy production are also comparable, and the discrepancies between them are larger at the TX site than the OR site. Nonetheless, the predicted and the observed monthly energy production of the two sites demonstrate similar variability characteristics overall.

Moreover, when we apply the four selected methods to the annual-mean data, the metrics describe IAV exactly. For both variables, wind speed and energy generation, nearly all metrics illustrate that the OR site has stronger IAV than the TX site, except for using σ to quantify energy-production IAV (Table 2). Echoing the results of the monthly data mentioned previously, the use of normalized metrics suggests the energy production at the OR site varies more than that at the TX site, intermonthly and interannually. Note that all the IAVs are smaller than the variabilities calculated using monthly data (Table 2), because the annual averaging collapses variations in the data.

Additionally, the magnitudes of energy variabilities and IAVs are also nearly or more than twice as large as those of wind speed (Table 2). The reason is the nature of the power curve: wind-power generation is a function of wind speed cubed at wind speeds below rated. Therefore, small wind-speed variations propagate into large energy-production fluctuations that are discernible in monthly and yearly data.

## 3.2 Variability metrics comparisons

Matching the wind-speed and energy variabilities over 37 years at each r-filtered site, RCoV, as a statistically robust and resistant metric, yields the highest Pearson's r (0.86) among the four highlighted methods as well as all the variability metrics evaluated (Fig. 4 and Table B1). A perfect variability measure would link wind-speed and wind-power variations closely together with a correlation of unity, and so RCoV, with the highest Pearson's r, is the best of all. On the one hand, a strong correlation between the wind-speed RCoV and the energy-production RCoV implies that the high wind-speed variability at a wind farm translates to high energy-generation variability, and vice versa (Fig. 4a). For instance, the moderate 37-year wind-speed RCoVs of the OR and TX sites indicate modest fluctuations in energy production between months (Fig. 4a). On the other hand, a nonresistant method, range divided by trimean, leads to a lower r (0.64) and suggests the OR site has variable wind speed and energy production (Fig. 4b). For the other two nonrobust and nonresistant methods, the CoV results in a modest r (0.70) with a similar scatter as the RCoV (Fig. 4c); the σ, not normalized by an average metric, does not relate wind-speed and energy variabilities effectively (Fig. 4d). The positions of the two wind farms relative to the rest of the sites in Fig. 4 illustrate that the TX site experiences average variabilities in wind resource and energy production, whereas the OR site has above-average energy-generation variability. Overall, the four methods lead to different representations of energy variability at the OR site.

Figure 5Box plots of Pearson's r between wind-speed variability and energy variability for different analysis time frames, from 1 to 37 years: (a) RCoV, (b) $\frac{\mathrm{range}}{\mathrm{trimean}}$, (c) CoV, and (d) σ, based on the monthly data from the 195 r-filtered wind sites. Each r represents the correlation using all the filtered sites of a particular time frame. The 37-year correlations are equal to the r values listed in Fig. 4. The box and whiskers represent the third quartile plus the 1.5 times of interquartile range (IQR), the third quartile, the median, the first quartile, and the first quartile minus the 1.5 times of IQR.

By increasing the years included in the variability calculations using monthly data, the resultant correlations of most metrics vary less, the correlations gradually converge to their 37-year values, and their asymptote periods vary. The 37-year Pearson's r values from the four selected metrics between wind-speed and energy-production variabilities in Fig. 4 transform into the 37-year marks in Fig. 5, and we use a 5 % threshold of normalized deviation to determine the asymptote periods. Particularly, the r's from RCoV and CoV (Fig. 5a and c) reach their respective asymptotes steadily with longer length of data, whereas the r's from range divided by trimean do not (Fig. 5b). The 37-year correlation using σ is weak and thus the method is not actually useful: while the r's approach the 37-year benchmark (Fig. 5d), this correlation value is so low (0.2) as to be ineffective. Paired with a high long-term r, the asymptote period of a metric indicates the appropriate time span of wind-speed data required to represent the variability of wind-energy production. For example, the resultant r's using RCoV approach a high value after just 3 years, meaning one needs 3 years of wind-speed data to estimate the wind-speed variability so as to adequately infer the energy-production variability of a certain or potential wind farm via RCoV.

The three correlation coefficients (Pearson's r, Spearman's rs, and Kendall's τ) yield consistent results among all variability metrics tested; hence we primarily present the results using Pearson's r here. Table 3 summarizes the 37-year correlations (r, rs, and τ), between the wind-speed variabilities and the energy-production variabilities using the r-filtered data, and the respective asymptote periods of the methods. The r and τ of RCoV are the largest (0.86 and 0.67, respectively) among all variability metrics, and the associate asymptote periods are also relatively short (2 to 3 years) (Table 3). Another normalized, robust, and resistant spread metric, interquartile range (IQR) divided by median, results in the highest rs, and the rs of RCoV is the second largest (Table 3). More importantly, the asymptote periods of RCoV are the smallest of all, regardless of the choice of correlation coefficient. In other words, fewer years of data are necessary to calculate RCoV to effectively relate wind-speed and energy variabilities than any other metric. Overall, when a spread metric yields strong correlations between variabilities of wind speed and energy generation, the correlation metrics agree with each other (Table 3). Therefore, the results in this paper focus on Pearson's r, which is a commonly used correlation coefficient.

Table 3Correlations and the associated asymptote periods of wind-speed variability and energy variability using various spread methods and distribution diagnostics with different correlation metrics, based on the monthly data of the 195 r-filtered wind sites.

In addition to the spread metrics, other distribution diagnostics also yield strong correlations between the 37-year monthly wind speed and energy production. For example, kurtosis and skewness result in r and rs above 0.9. Because we determine the asymptote periods based on normalized deviations, when the 37-year correlation benchmark of a metric is high, the respective asymptote period tends to be shorter. Therefore, only 1 year of monthly data is required to compute kurtosis and skewness adequately, except for using rs in kurtosis, where those rs's of the smaller number of years are low (Table 3). Moreover, the symmetry and the shape of the energy-production distribution can be characterized using wind-speed data, given the moderately strong correlations of YKI and the Weibull shape parameter (Table 3).

Additionally, we also perform the same correlation and asymptote analyses on the data from changing the R2 and r filter thresholds as well as the data with random error, and RCoV again yields the strongest correlations and the shortest asymptote periods among all methods. We adjust the R2 and r requirements in the linear-regression process, thus changing the filtered sample sizes. On the one hand, reducing the R2 threshold to 0.6 and the r threshold to 0.7 increases the respective sample sizes to 461 and 306 wind farms, but weakens the correlations between wind-speed and energy variabilities for all methods (Table B3). On the other hand, increasing the R2 threshold to 0.85 and the r threshold to 0.9 strengthens the wind-speed–energy correlations of all the metrics and shrinks the sample sizes to 212 and 83 wind farms, respectively (Table B3). Modifying the filtering thresholds leads to different r's yet similar asymptote periods among all metrics. Moreover, we also test the vigorousness of our findings by introducing an error term, randomized based on the standard error, in predicting the 37-year energy production. The error term adds uncertainty to resemble the reality of noisy wind-speed and power-production data. We introduce the error term to the predicted energy production for each of the 349 wind farms that pass the original R2 threshold of 0.75. This approach weakens the correlations and lengthens the asymptote periods for most metrics (Table B3). Overall, according to the results from the R2r threshold and the random error tests, RCoV yields the highest r's among all methods, and its asymptote periods remain reasonably short.

Figure 6Similar to Fig. 4, but for scatterplots to compare 37-year wind-speed variability metrics: (a) RCoV and CoV, (b) RCoV and MAD, (c) σ and CoV, and (d) σ and MAD, based on monthly data from the 195 r-filtered wind sites. Each black dot represents each filtered site, and the r, rs, and τ at the corner of each panel indicate the Pearson's r, the Spearman's rank correlation coefficient, and the Kendall's rank correlation coefficient between each pair of wind-speed spread metrics. The yellow square and the yellow star denote the OR and the TX sites, respectively.

Meanwhile, using annual-mean data to compute IAVs can lead to misleading interpretations. Scatterplots of the 37-year wind-speed and energy IAVs similar to Fig. 4 are illustrated in Fig. A1, via the same 195 r-filtered sites. The correlations via yearly averages are generally weaker except for a few metrics, including range divided by mean, which yields the largest r of all (Table B4). However, the 37-year correlations do not adequately represent the long-term values (Table B4), so even though the resultant asymptote periods are longer than those using monthly data, the asymptote analysis method is unsuitable for annual data. Moreover, using annual averages greatly limits the sample size at each site even with 37 years of hourly wind-speed data. Statistically, a smaller sample leads to a smaller spread of that distribution. Accordingly, with few years of data, small spreads in annual-mean wind speeds result in a tight cluster of IAVs among all the wind farms. Therefore, the compact collection of wind-speed and energy-production IAVs causes strong correlations, solely because of the small number of annual averages used in the IAV calculation. Thus, the correlations via annual means demonstrate a downward trend with increasing length of data, regardless of the variability metrics chosen (Fig. 7). Although the correlations approach the 37-year values, the weakening correlations with more years included in the IAV calculations imply that using less data is preferred in connecting the two IAVs. Note that the spread cannot be computed with one data point and hence the correlations between wind-speed IAVs and energy IAVs do not exist with a single year of data (Fig. 7). Overall, the asymptote analysis causes deceptive results, and, given the nature of the annual data, we cannot determine the sufficient length of data to effectively link the IAVs of wind speed and energy production. In other words, relating wind-speed IAV and energy-generation IAV with annual-mean data is flawed.

Figure 7As in Fig. 5, but for annual-mean data.

Figure 8Box plots of wind-speed RCoV using monthly MERRA-2 data for different time frames from 1 year to 37 years at (a) the OR site and (b) the TX site.

## 3.3 Wind-speed RCoV calculation and spatial distribution

Now that we have established that RCoV is a powerful and accurate way to relate wind-speed and energy-generation variations, we assess the required amount of data to calculate the RCoV of wind speed. We compute the site-specific RCoVs using different spans of monthly mean wind speeds, including the OR and the TX sites (Fig. 8). The variations of RCoVs decrease as more years are included in the calculations, and for each location we use the 37-year wind-speed RCoV as the long-term benchmark. For example, the 37-year wind-speed RCoV of 0.082 at the OR site means that the median among the absolute deviations from the median is 8.2 % of the median monthly mean wind speed (Fig. 8a and Table 2). We determine the 37-year σ's as 10 % and 5 % of the 37-year RCoV, and we apply the χ2 approach at 90 % and 95 % confidence levels, respectively, to derive the convergence years, or the minimum length of wind-speed data required to calculate RCoV effectively. The convergence years of the OR and TX sites are 12 and 25 years with a 90 % confidence, and 20 and 31 years with a 95 % confidence, respectively (Table B5). In other words, for the OR site, one needs 12 years of monthly mean wind speeds to compute RCoV with a 90 % confidence that the resultant RCoV is within a 10 % deviation from the 37-year RCoV.

Figure 9(a) Box plots of σ's of wind-speed RCoVs, where the RCoVs are calculated using monthly mean MERRA-2 data of 1 to 37 years. For each year, each box summarizes the σ from each MERRA-2 grid cell in the CONUS; (b) the time series of the cumulative fraction of grid cells in the CONUS that satisfies the threshold: when the pair of the χ2-derived σ's from the grid cell, calculated using the particular amount of data, become smaller than the 37-year σ. The solid black, dash black, solid orange, and dash orange lines, respectively, indicate the minimum length of data: when the wind-speed RCoV using monthly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, when the wind-speed RCoV using monthly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level, when the wind-speed RCoV using yearly mean data yields a 10 % deviation at maximum from the 37-year value at a 90 % confidence level, and when the wind-speed RCoV using yearly mean data yields a 5 % deviation at maximum from the 37-year value at a 95 % confidence level.

To quantify the intermonthly variability of wind speed at a wind farm, RCoV requires 10 years of monthly wind-speed records with a 90 % confidence. In general, the σ's of wind-speed RCoVs across the CONUS decrease with more years included in the RCoV calculation (Fig. 9a). For each grid point, the sample size of RCoV also becomes smaller, from 37 RCoVs of 1 year of data to 1 RCoV of 37 years of data, and hence the σ of RCoV decreases as the length of the analysis period of wind speed increases (Fig. 9a). With the σ's of RCoVs across 37 years, we determine the convergence years via the χ2 method. For a certain confidence level, the cumulative fraction of the CONUS grid cells that exceed the associated threshold of χ2-derived confidence intervals increases with the length of data (Fig. 9b). Among all of the MERRA-2 grid cells in the CONUS, the median convergence year is 10 years and the associated MAD is 3 years at a 90 % confidence level (Fig. 9b and Table B5). In other words, to assess the wind-speed variability via RCoV with a maximum of 10 % error from the long-term value and a 90 % confidence, one needs 10±3 years of monthly mean wind-speed records.

Figure 10(a) Map of the convergence years, or years of monthly mean wind-speed data required to derive a maximum of 10 % deviation from the 37-year RCoV at each grid point, at a 90 % confidence level. The CONUS median is 10 years with the MAD of 3 years; (b) map of RCoV of monthly mean wind speed using the grid-cell-specific convergence years in (a), normalized using the CONUS RCoV median at 0.100. The RCoVs illustrated are averaged over (37  convergence year + 1) available year blocks. The MAD of the normalized RCoV in the CONUS is 0.224; (c) map of the mean monthly wind speed at 80 m of 37 years from 1980 to 2016. The CONUS median is 6.45 m s−1 with the MAD of 1.03 m s−1; (d) map of wind resource and its variability, by summarizing (b) and (c) into four categories: regions with below-median wind speed and above-median RCoV (grey), regions with below-median wind speed and below-median RCoV (orange), regions with above-median wind speed and above-median RCoV (orange red), and regions with above-median wind speed and below-median RCoV (dark red), based on the CONUS median wind speed and RCoV.

Moreover, raising the confidence level extends the minimum length of wind-speed data to compute RCoV. At the 95 % confidence level, the median convergence year is 20 years, and 2.5 % of grid points in the CONUS require more than 37 years of monthly mean data to calculate RCoV (Fig. 9b and Table B5). Additionally, using yearly mean wind speeds instead of monthly data to calculate RCoV requires much longer time to reach convergence. At a 95 % confidence, 33 years of annual-mean data is the average required length, and half of the CONUS grid points have convergence years of more than 37 years (Fig. 9b and Table B5). We also perform the same analysis on CoV and σ of wind speeds (Table B5). Although CoV and σ need fewer years to attain convergence, these nonrobust and nonresistant methods yield worse correlations between wind-speed and energy-production variabilities than RCoV, and hence we focus on demonstrating the RCoV results.

Spatial distributions of wind-speed RCoVs across the CONUS identify locations with reliable wind resources. Based on the site-specific convergence years at a 90 % confidence level (Fig. 10a), we calculate the RCoVs with monthly mean wind speeds of the particular time spans at each grid point and normalize with the CONUS median (Fig. 10b). Regions requiring long wind-speed records are irregularly scattered across the continent, such as the Northeast, the Dakotas, and Texas. The mountainous states generally illustrate high RCoVs, including the Appalachians and the Rockies. Given the strong correlations between the wind-speed RCoV and energy-production RCoV, Fig. 10b offers a realistic estimation of the general spatial pattern of the variability in wind-energy production as well. Note that, qualitatively, Fig. 10b is similar to the maps of wind-speed variability in Fig. 13a of Gunturu and Schlosser (2012) and in Fig. 3 in Hamlington et al. (2015), which also illustrate the variability of wind resources in the CONUS. In addition, using a 10-year fixed length of wind-speed data for all CONUS grid points to compute RCoV results in a nearly identical spatial distribution to the pattern in Fig. 10b.

Further, an ideal location for wind farms should exhibit ample wind speeds with low variability. We combine the spatial variations of the normalized RCoV and the long-term wind resource (Fig. 10b and c), and we differentiate regions according to the CONUS median RCoV and wind speed (Fig. 10d). Favorable candidates for wind farm developments have above-average wind speeds and below-average variabilities, such as the Plains, parts of the upper Midwest, spots in the Columbia River region, and pockets nears the coasts of the Carolinas; poor places for wind power with weak winds and strong variabilities include the Appalachians and most of the Northeast.

The convergence years in some CONUS grid points are beyond 37 years when we increase the confidence level from 90 % to 95 % (Fig. 9b and Table B5), and those grid points do not demonstrate any geographical pattern as in Fig. 10a. Additionally, when using RCoV to represent IAV, the spatial patterns of required data lengths and the resultant normalized RCoVs for annual data are notably different from the monthly mean results, and geographical features seem to be irrelevant (Fig. A3). Furthermore, the categorical features of CoV resemble those of RCoV for onshore wind resources in the CONUS, whereas using σ results in notably distinct classifications of CONUS wind resources (Figs. 10d and A4).

4 Discussion

When using statistically robust and resistant variability metrics, higher correlations between variabilities of wind speed and energy production emerge. Statistically robust methods do not assume or require any underlying wind-speed distributions, and statistically resistant methods are insensitive to wind-speed extremes. Of all methods, three robust and resistant metrics, RCoV, MAD divided by trimean, and IQR divided by median, result in the largest three r's in Tables 3 and B1, which suggests that they are the most useful metrics to quantify long-term variability. Depending on the meteorological data availability, wind-speed characteristics, and terrain complexity, different methods are appropriate in different conditions. Nevertheless, robust and resistant methods are best able to relate wind-speed variability and energy-generation variability, and RCoV is the most effective of all the metrics.

Overall, of all the methods we considered, RCoV consistently yields the strongest correlations between wind-speed and energy variabilities and exhibits reasonable asymptote periods (Tables 3 and B1), even after accounting for random standard errors and modifying the R2 and r thresholds (Table B3). In addition, assessing wind-speed RCoV with a 90 % confidence requires 10±3 years of wind-speed data (Fig. 9 and Table B5), which exceeds the asymptote periods of 2 to 6 years to yield strong wind-speed and energy-production correlations (Table 3). Even though different locations require various spans of data (Fig. 10a), the average of the resultant RCoVs using 10 years of wind speeds leads to nearly identical spatial distributions (Fig. 10b). Therefore, to effectively quantify wind-speed variability and thus adequately derive energy-generation variability, we recommend using the RCoV with 10 years of monthly mean wind-speed data.

Annual-mean data are inadequate to relate wind-speed and energy-production IAVs or to represent wind-speed IAVs. We cannot determine the minimum years of data to relate annual wind-speed and energy IAVs because their correlations decline with the length of data (Fig. 7). Moreover, the coarse time resolution of annual averages smooths out the fluctuations of smaller timescales. Yearly mean wind speeds also possess different distribution characteristics, such as skewness and kurtosis, compared to those of finer temporal resolutions (Lee et al., 2018). The nonzero kurtosis and skewness in Table 2 and in Lee et al. (2018) illustrate that most of the distributions of annual-mean wind speeds in the CONUS are non-Gaussian. Hence, using nonrobust metrics, such as σ, to evaluate IAV with samples of annual means from non-Gaussian distributions can lead to incorrect representations of variability.

Additionally, extended years of wind-speed data are also necessary to compute RCoV and represent IAV (Fig. A3a), and the resultant IAVs (Fig. A3b) differ from the variabilities calculated via monthly wind speeds (Fig. 10b). For instance, the low IAVs in the Appalachians (Fig. A3b) calculated with yearly mean wind speeds contradict the pattern of high monthly mean wind-speed RCoVs in mountainous areas (Fig. 10b) as well as the findings in past research (Gunturu and Schlosser, 2012; Hamlington et al., 2015). Furthermore, some of the grid points require more than 37 years of yearly mean data to calculate wind-speed RCoV with statistical confidence (Fig. 9 and Table B5). Although RCoV does not yield the strongest 37-year r in relating wind-speed and energy IAVs, readers should be cautious when using a limited number of annual-mean data to derive IAVs. In short, to effectively assess the long-term variability of wind farm productivity, one should use wind speeds finer than yearly mean data.

Regions with ample wind resources and low variability favor wind-energy developments, coinciding with the locations of many existing wind farms in the CONUS (Fig. 10d). Wind farms in the Plains and parts of the upper Midwest benefit from the above-average wind speeds and the below-average wind-speed RCoVs. Other regions, such as parts of the Columbia River region and the Carolinas, also experience strong, consistent winds. The Northeast and the Appalachians are relatively unfavorable for producing a stable, onshore wind-energy supply, whereas the area east of Cape Cod in Massachusetts and the sections along the West Coast exhibit a promising offshore wind resource. Wind farm developers should account for wind resource as well as its long-term variability in repowering existing turbines and building new wind farms.

Distribution diagnostics, other than the variability metrics, are also effective in identifying the characteristics of wind-energy production. We examine distribution parameters resulting in strong wind-speed–energy correlations, including kurtosis and YKI (Tables 3 and B2), which assess the degree of deviations from a Gaussian distribution. For example, we confirm that the monthly and annual wind-speed distributions for our case studies in OR and TX are not perfectly Gaussian because of their nonzero kurtosis and skewness values (Table 2), as well as their portions of data within 1σ. Moreover, a multimodal or an asymmetric wind-speed distribution (Fig. 3c and d) also implies a non-Gaussian energy-production distribution. Gaussian distribution is invalid for wind speeds across averaging timescales in general (Lee et al., 2018). Hence, understanding the underlying distribution of wind resources can validate the applications and the legitimacy of Gaussian statistics, especially in quantifying P50 and the associated losses and uncertainties.

5 Conclusions

Wind-speed variability is a crucial component in assessing the overall uncertainty of P50, which is the estimated average energy production of a wind farm. This study highlights the importance of using rigorous methods to estimate intermonthly and interannual variability. To search for suitable ways to quantify this uncertainty under different conditions, we investigate 27 combinations of spread metrics over 607 wind farms in the United States, with closer examination of two geographically distinct sites. We evaluate the methods for robustness to non-Gaussian distributions and resistance to extreme values, in contrast to the common practice of using only standard deviation (σ). We calculate variabilities using monthly and annual mean wind speeds from the MERRA-2 reanalysis data set and wind farm monthly net energy production from the EIA. We find that within the contiguous United States (CONUS), statistically robust and resistant methods predict variabilities more accurately, particularly in that wind-speed variabilities strongly correlate with observed energy-production variabilities.

We recommend using the robust coefficient of variation (RCoV) to quantify variabilities of wind resource and energy production. RCoV, defined as the median of absolute deviation from the median wind speed divided by the median of the wind speed, is a robust and resistant spread metric, in contrast to σ. RCoV yields strong correlations consistently (a Pearson's correlation coefficient, or a Pearson's r, of 0.856 with 37 years of monthly means) in various sensitivity tests via different correlation coefficients, whereas σ does not. In other words, using RCoV, a wind farm with high wind-speed fluctuations also possesses high variations in wind-energy generations and vice versa, whereas other metrics do not reflect that relationship as effectively. RCoV, as a normalized spread metric, also leads to a more accurate depiction of wind-speed variabilities than σ, a simple spread metric. Contrary to the custom of displaying uncertainty in one percentage value, we advise users to assess both the RCoV and the median in estimating intermonthly variability. Moreover, depending on the location, on average 10±3 years of monthly wind-speed data are necessary to compute wind-speed RCoV with a 90 % statistical confidence, such that the resultant RCoV deviates within 10 % of the long-term RCoV.

RCoV characterizes the spreads of the distributions of wind resources and wind-energy production. The relatively low monthly mean wind-speed RCoVs in the central United States indicate stable long-term wind resources, and the RCoV overall spatial distribution in the CONUS agrees with the findings from past research. Other distribution diagnostics, such as kurtosis and skewness, also result in strong correlations between monthly mean wind speed and energy generation, and thus they adequately represent energy-production characteristics.

Because the long-term correlations between the wind-speed and energy-production interannual variabilities (IAVs) are weak (a Pearson's r of 0.668 for RCoV with 37 years of data) and decrease with the length of data, we cannot determine the minimum length of annual mean data required for skillful assessment of IAV. Hence, we do not recommend calculating IAVs with annual-mean data. Although the concept of IAV has been essential in determining the annual energy production in the wind resource assessment process, annual-mean wind speeds mask signals of finer temporal scales and thus lead to unreliable representations of long-term variability. Overall, uncertainty arises in the process of calculating IAVs based on limited samples, whereas RCoV yields credible intermonthly variabilities considering the adequate amount of monthly mean data.

Now that we have highlighted the preferred structure of using RCoV, we can assess finer-scale variations using high-resolution wind-speed and energy-production data. With data of different temporal scales, the autocorrelation of wind resources and its relationship with long-term energy-production variations can also be quantified. The influence of climatic cycles on energy production can be explored. Furthermore, applying the concept of RCoV to reduce the uncertainty of P50 and assist financial decisions can be beneficial to the industry.

Data availability
Data availability.

The MERRA-2 data and the EIA data used in this study are publicly available at http://disc.sci.gsfc.nasa.gov/ (last access: 31 October 2017; Gelaro et al., 2017) and http://www.eia.gov/renewable (last access: 31 October 2017).

Appendix A

Figure A1As in Fig. 4, but the metrics are calculated using annual-mean wind speed and energy production.

Figure A2As in Fig. 6, but the metrics are calculated using yearly mean wind speed.

Figure A3As in Fig. 10a and b, but the data plotted are annual-mean wind speeds: (a) map of the convergence years, or years of wind-speed data required to derive a maximum of 10 % deviation from the 37-year RCoV at each grid point at a 90 % confidence level. Because 12.6 % of the CONUS grid points yield convergence years beyond 37 years using annual data (solid orange line in Fig. 9 and first column in Table B5), we assign 37 years as the convergence years for those grid points. After excluding the non-numeric values, the CONUS median is 27 years and the MAD is 4 years; (b) map of RCoV of annual-mean wind speed using the grid-cell-specific convergence years in (a), normalized using the CONUS RCoV median at 0.020. The RCoVs illustrated are averaged over (37  convergence year + 1) available year blocks. The MAD of the normalized RCoV in the CONUS is 0.205.

Figure A4As in Fig. 10d, but the spread metrics are (a) σ and (b) CoV, calculated using monthly mean wind speeds of 37 years.

Appendix B

Table B1Description of the 26 spread metrics tested, adapted from Wilks (2011), and the 37-year r's from the r-filtered monthly data. q0.25 is the 25th percentile (first quartile), q0.5 is the 50th percentile (median), and q0.75 is the 75th percentile (third quartile). $\mathrm{Trimean}=\frac{\mathrm{1}}{\mathrm{4}}\left({q}_{\mathrm{0.25}}+\mathrm{2}×{q}_{\mathrm{0.5}}+{q}_{\mathrm{0.75}}\right)$, $\mathrm{range}\left(x\right)=max\left(x\right)-min\left(x\right)$, and an overbar ($\stackrel{\mathrm{‾}}{x}\right)$ indicates the arithmetic mean. Reason I: the metric is not robust because the metric possesses distribution constraints, for example, assuming a Gaussian distribution, and the metric is not resistant because outliers influence it; Reason II: the metric is not resistant because outliers influence it; Reason III: the numerator of the metric is not robust or resistant; Reason IV: the denominator of the metric is not robust or resistant; Reason V: the numerator of the metric is not resistant.

Table B2Description of the distribution diagnostics tested, adapted from Wilks (2011) and the 37-year r's from the r-filtered monthly data. Reason I: the metric is not robust because the metric possesses distribution constraints, for example, assuming a Gaussian distribution, and the metric is not resistant because outliers influence it; Reason II: the metric is not robust because it assumes a Weibull distribution.

Table B3As in Table 3, but with the calculated metrics, the associated correlations, and asymptote periods using different R2 and r filters and adding the randomized standard error to predicted monthly total energy production. The sample sizes of the 0.7-r threshold test, the 0.9-r threshold test, and the random error test are 306, 83, and 195 wind farms, respectively.

Table B4As in Table 3, but with the calculated metrics, the associated correlations, and asymptote periods using annual-mean wind speed and energy production using the 195 r-filtered sites.

Table B5Convergence years based on the χ2 approach of wind-speed RCoV (as in Figs. 8 and 9), wind-speed CoV, and wind-speed σ, using monthly and yearly wind speeds. The calculations of median and MAD exclude the data with convergence years beyond 37 years in the CONUS.

Author contributions
Author contributions.

All authors formulated the research idea and designed the methodology together. JCYL performed the analysis; MJF and JKL provided critical feedback. JCYL prepared the manuscript with contributions from the two co-authors.

Competing interests
Competing interests.

Julie K. Lundquist is an Associate Editor of Wind Energy Science. Joseph C. Y. Lee and M. Jason Fields have no conflict of interest.

Acknowledgements
Acknowledgements.

This work was authored by the National Renewable Energy Laboratory, operated by the Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE), under contract no. DE-AC36-08GO28308. Funding was provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy's Wind Energy Technologies Office. The views expressed in the article do not necessarily represent the views of the DOE or U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

The authors would like to thank our collaborators, Vineel Yettella and Mark Handschy of the Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado Boulder; our colleagues at NREL, especially Paul Veers; and Cory Jog at EDF Renewable Energy.

Edited by: Christian Masson
Reviewed by: two anonymous referees

References

Archer, C. L. and Jacobson, M. Z.: Geographical and seasonal variability of the global “practical” wind resources, Appl. Geogr., 45, 119–130, https://doi.org/10.1016/j.apgeog.2013.07.006, 2013.

Baker, R. W., Walker, S. N., and Wade, J. E.: Annual and seasonal variations in mean wind speed and wind turbine energy production, Sol. Energy, 45, 285–289, https://doi.org/10.1016/0038-092X(90)90013-3, 1990.

Bandi, M. M. and Apt, J.: Variability of the Wind Turbine Power Curve, Appl. Sci., 6, 262, https://doi.org/10.3390/app6090262, 2016.

Bett, P. E., Thornton, H. E., and Clark, R. T.: European wind variability over 140 yr, Adv. Sci. Res., 10, 51–58, https://doi.org/10.5194/asr-10-51-2013, 2013.

Bodini, N., Lundquist, J. K., Zardi, D., and Handschy, M.: Year-to-year correlation, record length, and overconfidence in wind resource assessment, Wind Energ. Sci., 1, 115–128, https://doi.org/10.5194/wes-1-115-2016, 2016.

Bosilovich, M. G., Lucchesi, R., and Suarez, M.: MERRA-2: File Specification, GMAO Office Note No. 9 (Version 1.1), available at: https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf (last access: 1 August 2017), 2016.

Brower, M. C.: Wind resource assessment: a practical guide to developing a wind project, Wiley, Hoboken, New Jersey, USA, 2012.

Cannon, D. J., Brayshaw, D. J., Methven, J., Coker, P. J., and Lenaghan, D.: Using reanalysis data to quantify extreme wind power generation statistics: A 33 year case study in Great Britain, Renew. Energy, 75, 767–778, https://doi.org/10.1016/j.renene.2014.10.024, 2015.

Chen, L., Li, D., and Pryor, S. C.: Wind speed trends over China: quantifying the magnitude and assessing causality, Int. J. Climatol., 33, 2579–2590, https://doi.org/10.1002/joc.3613, 2013.

Clifton, A., Smith, A., and Fields, M.: Wind Plant Preconstruction Energy Estimates: Current Practice and Opportunities, NREL/TP-5000-64735, National Renewable Energy Laboratory, Golden, Colorado, USA, available at: http://www.nrel.gov/docs/fy16osti/64735.pdf (last access: 19 July 2017), 2016.

Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V, Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J.-N., and Vitart, F.: The ERA-Interim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828, 2011.

Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1, 2017.

GMAO (Global Modeling and Assimilation Office): MERRA2 tavg1_2d_slv_Nx: 2d, 1-Hourly, Time-Averaged, Single-Level, Assimilation, Single-Level Diagnostics V5.12.4, Greenbelt, MD, USA, 2015.

Gunturu, U. B. and Schlosser, C. A.: Characterization of wind power resource in the United States, Atmos. Chem. Phys., 12, 9687–9702, https://doi.org/10.5194/acp-12-9687-2012, 2012.

Hamlington, B. D., Hamlington, P. E., Collins, S. G., Alexander, S. R., and Kim, K.-Y.: Effects of climate oscillations on wind resource variability in the United States, Geophys. Res. Lett., 42, 145–152, https://doi.org/10.1002/2014GL062370, 2015.

Hdidouan, D. and Staffell, I.: The impact of climate change on the levelised cost of wind energy, Renew. Energ., 101, 575–592, https://doi.org/10.1016/j.renene.2016.09.003, 2017.

Justus, C. G., Mani, K., and Mikhail, A. S.: Interannual and Month-to-Month Variations of Wind Speed, J. Appl. Meteorol., 18, 913–920, https://doi.org/10.1175/1520-0450(1979)018<0913:IAMTMV>2.0.CO;2, 1979.

Klink, K.: Trends and Interannual Variability of Wind Speed Distributions in Minnesota, J. Climate, 15, 3311–3317, https://doi.org/10.1175/1520-0442(2002)015<3311:TAIVOW>2.0.CO;2, 2002.

Krakauer, N. and Cohan, D.: Interannual Variability and Seasonal Predictability of Wind and Solar Resources, Resources, 6, 29, https://doi.org/10.3390/resources6030029, 2017.

Lackner, M. A., Rogers, A. L., and Manwell, J. F.: Uncertainty Analysis in MCP-Based Wind Resource Assessment and Energy Production Estimation, J. Sol. Energy Eng., 130, 31006–31010, https://doi.org/10.1115/1.2931499, 2008.

Leahy, P. G. and McKeogh, E. J.: Persistence of low wind speed conditions and implications for wind power variability, Wind Energy, 16, 575–586, https://doi.org/10.1002/we.1509, 2013.

Lee, J. C.-Y., Fields, M. J., Lundquist, J. K., and Lunacek, M.: Determining variabilities of non-Gaussian wind-speed distributions using different metrics and timescales, J. Phys. Conf. Ser., 1037, 072038, https://doi.org/10.1088/1742-6596/1037/7/072038, 2018.

Li, X., Zhong, S., Bian, X., and Heilman, W. E.: Climate and climate variability of the wind power resources in the Great Lakes region of the United States, J. Geophys. Res., 115, D18107, https://doi.org/10.1029/2009JD013415, 2010.

Lunacek, M., Jason Fields, M., Craig, A., Lee, J. C. Y., Meissner, J., Philips, C., Sheng, S., and King, R.: Understanding Biases in Pre-Construction Estimates, J. Phys. Conf. Ser., 1037, 062009, https://doi.org/10.1088/1742-6596/1037/6/062009, 2018.

Montgomery, D. C. and Runger, G. C.: Applied statistics and probability for engineers, 6th Edn., Wiley, Hoboken, New Jersey, USA, 2014.

Pryor, S. C. and Barthelmie, R. J.: Climate change impacts on wind energy: A review, Renew. Sust. Energ. Rev., 14, 430–437, https://doi.org/10.1016/j.rser.2009.07.028, 2010.

Pryor, S. C., Barthelmie, R. J., and Schoof, J. T.: Inter-annual variability of wind indices across Europe, Wind Energy, 9, 27–38, https://doi.org/10.1002/we.178, 2006.

Pryor, S. C., Barthelmie, R. J., Young, D. T., Takle, E. S., Arritt, R. W., Flory, D., Gutowski, W. J., Nunes, A., and Roads, J.: Wind speed trends over the contiguous United States, J. Geophys. Res., 114, D14105, https://doi.org/10.1029/2008JD011416, 2009.

Rose, S. and Apt, J.: What can reanalysis data tell us about wind power?, Renew. Energ., 83, 963–969, https://doi.org/10.1016/j.renene.2015.05.027, 2015.

Rose, S. and Apt, J.: Quantifying sources of uncertainty in reanalysis derived wind speed, Renew. Energ., 94, 157–165, https://doi.org/10.1016/j.renene.2016.03.028, 2016.

Torralba, V., Doblas-Reyes, F. J., and Gonzalez-Reviriego, N.: Uncertainty in recent near-surface wind speed trends: a global reanalysis intercomparison, Environ. Res. Lett., 12, 114019, https://doi.org/10.1088/1748-9326/aa8a58, 2017.

Walsh, R. P. D. and Lawler, D. M.: Rainfall seasonality: description, spatial patterns and change through time, Weather, 36, 201–208, https://doi.org/10.1002/j.1477-8696.1981.tb05400.x, 1981.

Wan, Y.-H.: Wind Power Plant Behaviors: Analyses of Long-Term Wind Power Data, NREL/TP-500-36551, National Renewable Energy Laboratory, Golden, Colorado, USA, available at: https://www.nrel.gov/docs/fy04osti/36551.pdf (last access: 19 July 2017), 2004.

Watson, S.: Quantifying the variability of wind energy, WIREs Energy Environ., 3, 330–342, https://doi.org/10.1002/wene.95, 2014.

Watson, S. J., Kritharas, P., and Hodgson, G. J.: Wind speed variability across the UK between 1957 and 2011, Wind Energy, 18, 21–42, 2015.

Wilks, D. S.: Statistical methods in the atmospheric sciences, Academic Press, Amsterdam, the Netherlands, 2011.