Interannual variability of wind speeds presents a fundamental source of uncertainty in preconstruction energy estimates. Our analysis of one of the longest and geographically most widespread extant sets of instrumental wind-speed observations (62-year records from 60 stations in Canada) shows that deviations from mean resource levels persist over many decades, substantially increasing uncertainty. As a result of this persistence, the performance of each site's last 20 years diverges more widely than expected from the P50 level estimated from its first 42 years: half the sites have either fewer than 5 or more than 15 years exceeding the P50 estimate. In contrast to this 10-year-wide interquartile range, a 4-year-wide range (2.5 times narrower) was found for “control” records where statistical independence was enforced by randomly permuting each station's historical values. Similarly, for sites with capacity factor of 0.35 and interannual variability of 6 %, one would expect 9 years in 10 to fall in the range 0.32–0.38; we find the actual 90 % range to be 0.27–0.43, or three times wider. The previously un-quantified effect of serial correlations favors a shift in resource-assessment thinking from a climatology-focused approach to a persistence-focused approach: for this data set, no improvement in P50 error is gained by using records longer than 4–5 years, and use of records longer than 20 years actually degrades accuracy.

Wind power is becoming less expensive and nowadays represents a very
attractive low-emissions choice for electricity production. Its economy
depends on generation plants being sited where enough wind blows to make
development worthwhile; “resource assessments” are intended to identify
such sites. Assessments can be carried out in different ways

Resource assessment inaccuracies and uncertainties arise not only from less
than perfect target and/or reference wind-speed correlation, including missing data

Given wind's natural variability, an important question is how long a history
is needed to both adequately estimate its mean level and to characterize the
range of expected year-to-year variation. Previous authors have been of two
minds.

Here, we focus on how statistics derived from historical time-series records
are used to predict the future exceedance levels employed in resource
assessments. We use 62 years of homogenized monthly wind speed records from
60 Canadian stations

We base our analysis here on one of the longer observational data sets of
instrumental wind speed records available, a 62-year (1953–2014) record of
monthly average wind speeds from 156 Canadian meteorological stations

For more relevance to wind-turbine energy production we convert monthly Environment Canada wind speeds to modeled turbine capacity factor (monthly electric energy production divided by turbine capacity). Since we begin with monthly averages the conversion is necessarily crude; with our interests focused on correlation effects, though, it suffices that low wind speeds translate to low capacity factor and high wind speeds to high capacity factor, with a variability scale comparable to that of real wind plant. Since we judge resource-assessment accuracy by comparing the exceedance levels in the final 20 years of each site's data to estimates based on the previous years' data, systematic conversion errors will not affect accuracy so long as we treat data from the final 20 years and from the preceding years in the same way.

As shown by

To avoid any effects of seasonal cycles in the subsequent analysis, we average the 12 monthly capacity factor values in a calendar year, and then work with these annual averages, 62 per station. To deal with data gaps, unavoidably present in such an extended data set, we assign null weights to missing monthly data. Furthermore, for each station we calculate an average seasonal cycle as the average of all the Januaries, all the Februaries, and so on (for both wind speed and modeled capacity factor) over the station's 62-year record. In years where our annual average is calculated from less than 12 data points, we then make an adjustment according to the fraction the available data represent of the average seasonal cycle. All the statistical functions calculated in the following analysis are thus weighted to take into account how many data points are available.

Ratio of the standard deviation

In line with our focus on energy resource assessment, we eliminated from
further consideration those stations with wind speeds too low for practical
wind turbine deployment. We calculated, for each station, the ratio of the
(weighted, always according to the number of available data) standard
deviation

Histogram for the final 20-years average capacity factor

Difference

Resource assessments use quantiles or exceedance indices to characterize a
site's wind resource. P50 expresses an energy production level chosen so that
the fraction of time it is exceeded is expected to be half or 50 %, and
thus in some sense characterizes the expected “average” production level. A
higher-denominated index value, such as P90, expresses a lower energy
production level or “floor” that should be exceeded more often. Thus, it
gives a sense of the financial “downside” risk. Estimators of this type are
known in statistical quality control as

Average interannual variability

We hold out the final 20 years of each station's record, representing a typical lifetime of a wind plant, as its “actual” production, and attempt to “predict” the actual production using estimates derived from capacity-factor values sampled from preceding years. To analyze the performance of the P50 and P90 estimates we count the (weighted) number of years in the station's final 20-year segment having capacity factor in excess of the estimator's value and divide by 20 (adjusted for weight), “expecting” the result to match the estimator's denomination (i.e. 50 or 90 %).

To facilitate comparison to standard resource assessment metrics, we portray the error of our estimates or forecasts both as exceedance errors and as energy errors. Exceedance errors represent the difference of the actual from the targeted exceedance (e.g. “only 40 % of the stations exceeded their estimated P50 level”). Energy errors represent the difference of the actual from the targeted capacity-factor quantile (e.g. “the median capacity factor of the station's final 20 years was 0.32 compared to an estimated P50 of 0.37”). Further, for both portrayals, we quantify both the spread and the “bias” – bias in the forecasting sense of the difference between mean forecast and mean observation rather than in the statistical sense of difference between true value and expected value of an estimator.

Assuming each station's final 20 capacity-factor values are independent and
symmetrically distributed, the distribution of energy errors should, as

We first examine the potential effect of record length on resource assessment
statistics by quantifying the interannual variability (IAV) of wind speed.
For each station, we calculate the sample means

Figure

Fraction of each of the selected 60 station's final 20 years with
capacity factor exceeding its P50 estimate

Under the assumptions of independence and normality, a “prediction” of the
median, or P50, can be calculated from a sample of historical values simply
as the sample mean

Focusing on the 30-station low-trend subset, the histograms of
Fig.

P50 estimator performance.

To characterize bias in the P50 estimator, we also calculate the exceedance
fraction averaged across all 60 stations, as shown in Fig.

To characterize the magnitude of the energy estimation error, we calculate
the

We also calculate the mean absolute energy error (MAE) of the P50 estimates:

To provide additional insight into the effect of year-to-year correlation, we
again estimate the P50 of each station's final 20-year segment, but now using
a fixed 5-year sample, separated from the final segment by a

Sixty-station Mean Absolute P50 Error derived from estimate based on 5-year sample segment separated final 20-year segment by indicated interlude.

As these results show, year-to-year correlation has substantial impacts on the estimation of P50. When the preponderance of the stations appear to exhibit a secular trend it is perhaps not surprising that P50 estimates will exhibit bias. However, even for a group of stations without a predominant trend, where bias is essentially eliminated, error in the P50 estimate is still always more than would be expected on the basis of uncorrelated samples, for all record lengths. Little improvement in error is gained by using records longer than 4–6 years, and in fact using records longer than 18 years actually degrades accuracy.

The degree of resource variability, and hence financial risk, can be
indicated by a “floor”, or a production level enough lower than P50 that it
is only rarely

According to the definition of P90 we desire an estimator such that

Values of

Fraction of each of the 60 stations' final 20 years with capacity
factor exceeding P90 estimate for

To evaluate how P90 estimation errors depend on record length, we calculate
P90 estimates using the weighted means and standard deviations of the
immediately preceding

As for the P50 estimates, we examine the distribution of the fractions of the
final 20 years' capacity factors exceeding the estimated P90 for the
low-trend subset of 30 stations. If the final 20 years were independent, the
exceedance counts would have approximately binomial distributions, this time
with

The average across all 60 stations of the exceedance of the final 20 capacity
factors, as seen in Fig.

Figures

P90 “thought experiment”.

We also calculate the mean absolute value of this error vs.

Our P90 estimator

For the first half of the thought experiment, we utilize the actual mean

For the sample standard deviation of the preceding

For the second half of the thought experiment, we utilize the actual sample
standard deviation

The primary purpose of resource assessment is to quantify financial risk and
returns, and to this end it is important that resource assessments quantify
their degree of certainty. Using simple estimators generated from sample mean
and variance (

The performance of the estimators using the actual chronological records is
quite another story. When considering the entire set of 60 stations, both the
P50 and P90 estimates exhibited strong bias, grossly over-predicting resource
levels actually attained in the final 20 years of each station's record. This
bias is consistent with widespread decreasing wind speeds identified by

The higher errors of the estimates made from the chronological data must
arise from non-zero correlation (lack of statistical independence) since
these data are identical to the randomized control data except for sequence.
One might hope to account for the higher errors in estimates made from the
correlated data in terms of an “effective number” of independent samples

Previous work finds, though, that wind speeds seem to exhibit “long-term
persistence”, with autocorrelation of a hyperbolic form

In light of the persistence behavior, it is premature to dismiss the larger
estimation errors from the 60-station set as being somehow a spurious result
attributable to nonstationarity. Persistent processes are characterized by
seeming “trends” that spontaneously appear and disappear in a way that is
actually entirely random

We have shown here that year-to-year correlations in resource level produce
large effects, seemingly not recognized or incorporated into current
estimation practice, degrading the certainty of pre-construction wind energy
estimates. For primitive estimators, of the type utilized here, longer
records do not provide better estimates of future energy production. Since
ignoring available data would not – and must not – be a reasonable
solution, statistical approaches that explicitly account for the observed
year-to-year correlation should be considered. One parsimonious approach
would be to utilize estimation procedures based on long-term-persistence
phenomenology, such as the pioneering comprehensive MCP technique proposed
some time ago by

The homogenized monthly average windspeed source data used in this work are
freely available on the Environment Canada website

We thank Environment Canada for making the monthly wind speed data used in this work openly accessible. Nicola Bodini was partially supported by a grant from Opera Universitaria of Trento. This material is based upon work funded by the National Science Foundation under Grant IIP-1332147. The authors appreciate helpful discussions with Jason Fields of the National Renewable Energy Laboratory and with Chris Gifford of DBRS Limited, Toronto. Edited by: J. Mann Reviewed by: four anonymous referees