The risks of extreme load extrapolation

Abstract. An important problem in wind turbine design is the prediction of the 50-year load, as set by the IEC 61400-1 Design Load Case 1.1. In most cases, designers work with limited simulation budgets and are forced into using extrapolation schemes to obtain the required return level. That this is no easy task is proven by the many studies dedicated to finding the best distribution and fitting method to capture the extreme load behavior as well as possible. However, the issue that is often overlooked is the effect that the sheer uncertainty around the 50-year load has on a design process. In this paper, we use a 5 collection of 96 years’ worth of extreme loads to perform a large number of hypothetical design problems. The results show that, even with sample sizes exceeding N = 10 ten-minute extremes, designs are often falsely rejected or falsely accepted based on an overor underpredicted 50-year load. Therefore, designers are advised to be critical of the outcome of DLC 1.1 and should be prepared to invest in large sample sizes.

The aim of this paper is to demonstrate this with a simple exercise, using a collection of 96 years' worth of ten-minute load maxima released by Barone et al. (2012a). The uncertainty distribution is constructed by repeatedly sampling subsets of this data set and obtaining the 50-year loads through an automated extrapolation scheme. We then simulate a problem where a hypothetical designer has to choose between two or more concepts and record how often this uncertainty leads to wrong choices. The results of this paper should help designers to estimate the required sample sizes for their problem, but also to form 5 a critical attitude concerning the quality and reliability of extrapolated 50-year loads.

Methodology
Since the focus of this work is on the impact of uncertainty, rather than obtaining the highest possible quality result, the workflow is kept as simple as possible. Loads were extracted by drawing a random sample from a large set of crude Monte Carlo results and the 50-year return period is found by a graphical fit.

Loads data set
The data set that was used for this study was generated by Barone et al. (2012a). It features the onshore version of the NREL 5 MW reference wind turbine, operating for 96 years in an IEC class 1B climate (IEC, 2005). 1 Ten-minute mean wind speeds were randomly drawn from a Rayleigh distribution, bounded by the cut-in and cut-out wind speeds of 3 and 25 m/s, respectively.
Turbulent wind fields were generated by TurbSim on a 20×20 grid with a width and height of 137 m and were fed to the FAST 15 v7 aeroelastic code. Every simulation ran for eleven minutes, of which the first minute was discarded to avoid any start-up transients. More details can be found in the original paper.
Each output channel contains over 5 million ten-minute extremes. In this paper, we will use the tower base overturning moment, which plays a major role in the design of foundations. Figure 1 shows the entire set of loads at the respective wind speeds.

Extreme load distribution
The wind speeds were drawn directly from the Rayleigh distribution. Therefore, the cumulative distribution of extreme loads follows naturally from ranking a set of N loads and assigning a plotting position: In this case, however, the wind speeds outside of the operating range will have to be accounted for: where f (Ū ) is the mean wind speed distribution. Then, plotting the entire data set yields the return level plot shown in Figure   2. With 96 years worth of load data, the 50-year return value can be interpolated, which yields 115 MN·m. The data set, containing over 5 million ten-minute extreme overturning moments between the cut-in and cut-out wind speeds. The box plots indicate the scatter per 1-m/s bin, where the boxes mark the 25th and 75th percentiles, the whiskers mark the 2.5th and 97.5th percentiles, and the bar is the median. The tail of the distribution shows a characteristic bend, or "knee", that hints that more than one process is at work. Indeed, tracing back the wind speeds belonging to the 10% highest loads points towards a region well above the rated wind speed (see Figure 3). It turns out that this is due to a particular controller response to negative gust amplitudes (e.g., Bos et al., 2015;Bos and Veldkamp, 2016), which also explains the shape of the scatter plot in Figure 1.  IEC class 1B climate. The light and dark filled areas correspond to the lowest 90% and highest 10% loads, respectively.

Extrapolation scheme
In many practical situations, a designer does not have the computational resources available to simulate several decades of operation. That is when the 50-year load has to be found by extrapolating. The objective is then to match the tail behavior with a distribution function, for example by least-squares fitting. A good candidate for this is the generalized extreme value distribution (GEV): where µ is the location parameter, σ the scale parameter, and ξ the shape parameter.
To estimate the uncertainty that comes from repeatedly extrapolating different sets of loads, this process has to be automated.
However, the difficult part is then to decide where the tail exactly starts under varying sample size. A simple solution that seems to work in most cases is to assume that the tail covers the second half of the distribution when drawn on Gumbel paper; i.e., For the full data set, this means that the GEV distribution is fitted to the upper 0.07% of the data, which results in the Q-Q plot shown in Figure 4. This procedure is automated for k = 10 4 sets of loads, which yields a collection of 50-year return levels, distributed according The medians and other quantiles are then estimated by sorting. In order to catch very bad fits, any extrapolated 50-year loads that are more than 50% higher than the "real" value are discarded and resampled (as they would by an experienced engineer).

5
The end result is the situation depicted in Figure 5, which can be repeated for different sample sizes.

Results
Based on the load set and the extrapolation scheme, we can estimate how far a single 50-year load prediction would be off from the true value.

Uncertainty surrounding the 50-year overturning moment
First, Figure 6 shows how the median and confidence intervals around the 50-year level vary as a function of sample size, 5 N . Evidently, the larger the sample size, the smaller the error. Overall, the GEV distribution has a tendency to overpredict the 50-year level, which is also found by fitting a straight line (i.e., a Gumbel distribution with ξ = 0) as done by Barone et al. (2012b). Even with N = 10 4 -which is already more than two months of simulated time-the median prediction still lies about 5% above the ture 50-year level.
For sample sizes smaller than N = 10 4 , it can be very hard to establish an appropriate tail behavior. The GEV fit will often 10 return a 50-year level that is more than 50% higher than the true value, partially owing to large values of the shape parameter, ξ. In such cases, a simple straight line will produce better predictions.
In addition, the root-mean-squared (RMS) error provides a single measure for the quality of the result: where M 50 yrs is the "true" 50-year level. As shown in Figure 7, the RMS error stays fairly constant up until a sample size 15 of N = 10 4 when predictions above +50% are discarded. It is not until sample sizes of close to 10 5 that the tail shape of the empirical extreme load distribution is reliable and most of the fitting problems disappear. Ultimately, the RMS error falls into the classic 1/ √ N rule that is often found with Monte Carlo methods.

Effect on conceptual design
How this uncertainty affects the design process is demonstrated here with a very simple example. A new concept is proposed that is an exact copy of the NREL 5 MW machine, but with a different wall thickness at the base of the tower. The second moment of area then changes according to where r = 3 m is the base radius and t = 35 mm is the original wall thickness (Jonkman et al., 2009). An extreme overturning moment would cause a compressive stress of  where mg = 6.82 MN is the total weight of the wind turbine and is the cross-sectional area of the tower base section.
The objective of the exercise is to find a new wall thickness to reduce the 50-year stress levels; i.e., σ z,new < σ z,old .
5 This might seem trivial at this point-any thicker wall is guaranteed to reduce the stresses-but the actual difficulty is to determine the 50-year moment. Whereas the original design has already gone through an extensive load analysis from which the 50-year load level is known, any new concept has to go through this process again. 2 Due to the uncertainty that surrounds this 50-year level, the new design can be falsely rejected or falsely accepted. Figure 8 shows how often this happens when the load analysis is carried out 10 4 times with a sample size of N = 10 3 . When the wall 10 thickness is reduced by 10% to 31.5 mm, the new design will appear to have lower stresses in 25% of the cases (i.e., the false positives). On the other hand, even when the wall thickness is increased by 20% to 42 mm, the new design has a 17% chance to still be rejected (i.e., the false negatives). The closer the new design is to the original, the harder it will be to make the right choice.
The positive bias that is often present in extreme load extrapolations (e.g., Barone et al., 2012b;Van Eijk, 2016) make it 15 particularly difficult to prove that new designs are capable to reduce 50-year levels. With a crude Monte Carlo method, the only solution is to further increase the sample size, as shown in Figure 9. However, it will take an immense computational effort to completely remove the uncertainty from the design process.
2 Maybe not for something like a new wall thickness, but more so for different control schemes or for rigorous changes to the blade design.   Another case is a comparison between several concepts, where the 50-year stress levels contain the same degree of uncertainty. Five concepts, from 25-to 45-mm wall thickness, are ranked among each other, such that σ z,1 ≤ σ z,2 ≤ σ z,3 ≤ σ z,4 ≤ σ z,5 .
In the ideal case, the 45-mm wall thickness should end up at rank 1, the 40-mm one at rank 2, etc. However, how often this ideal ranking happens in practice is shown in Figure 10. For sample sizes smaller than, say, N = 5 · 10 3 , the right order is only 5 found in 25% or less of the cases. It is not until N = 3·10 5 that the uncertainty is small enough for the order to be right roughly 100% of the time. From N = 10 4 to N = 10 5 , the quality of the predictions increases sharply, which is when the tail of the distribution starts to take its proper shape. How often each rank is assigned to each concept is shown in Figure 11. Clearly, the 45-mm wall thickness does not always appear the best and the 25-mm wall thickness does not always appear the worst. The closer the concepts are to each other, the harder it becomes to distinguish them by their 50-year stress levels. In fact, the chances of selecting a concept other than the thickest, 45-mm one are higher than 15% for N ≤ 10 4 .

5
The uncertainty around the 50-year level clearly has a very large impact on the design process. In this paper, we have focused on wall thickness in order to produce results that are counter-intuitive. This is to demonstrate that extrapolated 50-year values can be misleading and can easily trick the designer into making bad choices. It is therefore very important that the designer is skeptical enough of their own results. The most obvious solution to reduce the uncertainty is to use high-performance computers in order to run extensive simulation campaigns (e.g., Barone et al., 2012a, b). An alternative remedy is to rely on importance sampling, which is a well-known variance reduction method that allows the user to allocate the computational resources for the most severe conditions (e.g., Bos et al., 2015;Bos and Veldkamp, 2016).

5
The goal of this paper was to demonstrate the effects of the uncertainty around extrapolated 50-year loads. It showed that, unless very large sample sizes are used, DLC 1.1 is a very unreliable measure for the performance of a design. This uncertainty has a pronounced effect on early phases of the design, when computational resources are often scarce.
One should always take into account that it is very time-consuming to prove that concepts are able to reduce the 50-year load, unless the design changes are very radical. In an example where the bottom tower wall thickness of the NREL 5 MW 10 reference turbine was varied, a 10% increase in wall thickness was identified as a way to reduce the stress in only 75% of the cases with a sample size of N = 10 3 . In fact, more than 10 5 simulations were required to decrease the probability of a false rejection to 10%. Another example where five wall thicknesses, ranging from 25 to 45 mm, were ranked in order from the lowest to highest stress showed a similar trend. With N = 10 3 , the correct order appeared in only 25% of the cases, which improved to roughly 90% for N = 10 5 .

15
These results show that a critical attitude is required when judging extrapolated extreme loads. When DLC 1.1 is not the design-driver, it might be best to avoid it altogether. Otherwise, using high-performance computing or importance sampling methods will be the best approach.