Adaptive stratified importance sampling: hybridization of extrapolation and importance sampling Monte Carlo methods for estimation of wind turbine extreme loads

Wind turbine extreme loads estimation is especially difficult because turbulent inflow drives nonlinear turbine physics and control strategies, so there can be huge differences in turbine response to essentially equivalent environmental conditions. The two main current approaches, extrapolation and Monte Carlo sampling, are both unsatisfying: extrapolationbased methods are dangerous because by definition they make predictions outside the range of available data, but Monte Carlo methods converge too slowly to routinely reach the desired 50-year return period estimates. Thus a search for a better method 5 is warranted. Here we introduce an adaptive stratified importance sampling approach that allows for treating the choice of environmental conditions at which to run simulations as a stochastic optimization problem that minimizes the variance of unbiased estimates of extreme loads. Furthermore, the framework, built on the traditional "bin"-based approach used in extrapolation methods, provides a close connection between sampling and extrapolation, and thus allows the solution of the stochastic optimization (i.e., the optimal distribution of simulations in different wind speed bins) to guide and recalibrate the extrapolation. 10 Results show that indeed this is a promising approach, as the variance of both the Monte Carlo and extrapolation estimates are reduced quickly by the adaptive procedure. We conclude, however, that due to the extreme response variability of turbine loads to the same environmental conditions, our method and any similar method quickly reaches its fundamental limits, and that therefore our efforts going forward are best spent elucidating the underlying causes of the response variability.

The essential task in wind turbine extreme loads estimation is to evaluate the probability of exceedance (POE) integral P (Y > l) = P (Y > l|x)f (x)dx. (1) Here P (Y > l) is the probability of load Y exceeding target/threshold l, P (Y > l|x) is the conditional probability of exceedance given wind speed x, and f (x) is the distribution of wind speeds (or other environmental conditions). Because we are interested in extremely low probability events and Y (x) is stochastic and only available via simulation, standard methods of 5 integration do not apply.
Existing approaches fall generally into two main classes. The first is based on extrapolation: data is gathered in different wind speed "bins", extreme value distributions are fit to the empirical distribution function for each bin, and these are then integrated. The second is based on Monte Carlo methods: exceedance probabilities are written as expectations of indicator functions, samples are drawn from an assumed wind distribution, and unbiased estimates made by the usual Monte Carlo 10 summation. Unfortunately, to date neither of these approaches is satisfactory. The crux of the difficulty is that on the one hand too many samples are required for converged Monte Carlo estimates, but on the other hand reliable extrapolation of nonlinear physics under uncertain forcing is extremely problematic, especially without knowledge of the form (e.g. quadratic, etc.) of the nonlinearity. Nevertheless, the computational expense of MC implies that except in rare cases, some sort of extrapolation will be necessary in order to reach the desired 50 year return period estimates. This paper is motivated by the intuition that perhaps 15 we can at least use MC/IS to make sure extrapolations are accurate to the resolution of data we actually have, and to gather data in ways that accelerates their convergence.
The difficulty of estimating these "tail-probabilities" of interest in extreme loads estimation is one of timescales. We are trying to estimate loads seen roughly once in 50 years using a set of simulations whose total length is only a few hours.
This large difference in timescales means that any uncertainty in the data is necessarily magnified by the extrapolation. Small 20 variations in short-term data could lead to significant over-or underestimation of long-term extreme loads.
One of our main conclusions will be that while we may have reasonable knowledge of the distribution of environmental conditions a turbine faces, we have very little knowledge regarding the distribution of the response of the turbine to its environment, and this response variability may in fact be so large that our knowledge of the distribution of environmental conditions is of limited use. A conceptual aid is provided by the Inverse First Order Reliability Method (IFORM) (Winterstein et al., 1993;25 Sultania and Manuel, 2017) and a variant of it called the Environmental Contour (EC) method, which will be discussed briefly below. These methods, though highly practical in their own right, are not the main subject of the present paper, but they will help us appreciate the important distinction between environment and response.
Our goal is to develop methods that make unbiased estimates that minimize variance as a function of the number of samples/simulations, and to use these to dynamically update extrapolations. Our proposed method, Adaptive Stratified Importance 30 Sampling (ASIS), is essentially a global stochastic optimization method where the search variables are the number of samples from each wind speed bin we use, and the objective function is the variance of our Monte Carlo estimates. The key tool here is importance sampling, which allows us to continually produce unbiased estimates of exceedance probabilities even as the dis-tribution of bins changes. These quasi-optimal samples are then used to make the best possible extrapolations. Results below show that this is indeed a promising approach.
The organization of this paper is as follows. First we present the necessary background on the existing extrapolation method (e.g., as recommended in the IEC standard (IEC, 2005)), Monte Carlo methods, and importance sampling. Next we describe our ASIS algorithm. Then we present a brief study illustrating some of its properties. The paper provides a context for discussing 5 extrapolation and importance sampling in the same framework. Our conclusion highlights the potential for this approach, reiterates some of the fundamental difficulties with the endeavor, and leads to suggestions for where we should next focus our efforts to crack this difficult problem.
2 Background on turbine simulation, extrapolation, Monte Carlo, and importance sampling 2.1 Turbine simulation 10 Throughout this paper, we use FAST, NREL's aeroelastic simulation tool (Jonkman, 2013). FAST is a widely-used industry and academic tool for wind turbine loads estimation. NREL's WISDEM software allows for the execution of FAST and its companion tool TurbSim (which generates turbulent wind fields for input to FAST) in a programmatic fashion from python, as has been reported previously (Graf et al., 2016).
The particular turbine on which we are testing these methods is the NREL 5 MW reference turbine, often used for such 15 studies (Jonkman et al., 2015;Choe et al., 2016), in an onshore configuration. The "environmental conditions" are thus described by hub height mean wind speed (modeled by a Weibull distribution with scale and shape parameters of 11.28 m/s and 2, respectively). Additional environmental parameters of turbulence intensity, spectrum, coherence function, and wind shear are kept fixed at nominal values. It should be noted that the stochasticity in the combined TurbSim/FAST simulation comes from the "random seed" that is input to TurbSim. This seed governs the exact starting conditions for the generation of the 20 turbulent inflow wind field. Because this is an ergodic system, a long enough simulation would eventually cover all possible wind conditions. However, for finite simulations, multiple runs using different random seeds allow us to "sample" the space of all possible turbulent flow field snapshots with the same mean wind speed, turbulence intensity, etc. This is common practice in extreme loads analysis. We can regard random seed as a proxy for sampling over a uniform distribution of turbulent inflows for each set of environmental conditions.

25
For this study, we selected two output channels of interest, tower base side-side bending ("TwrBsMxt" in FAST nomenclature) and tower base fore-aft bending ("TwrBsMyt"), which provide contrast because the wind speeds where their highest loads occur overlap differently with the typical wind speed distribution. The side-side moments grow with hub-height wind speed, making their extremes hard to estimate with traditional MC sampling because they do not overlap well with typical wind distributions. The fore-aft counterparts do overlap quite closely with the typical wind distributions.

Extrapolation
The current standard for estimating extreme loads relies on extrapolation. We refer the reader to the relevant literature for a detailed exposition of the extrapolation method (Ragan and Manuel, 2008;Moriarty, 2008;Toft et al., 2011;Graf et al., 2017).
Here we present a concise statement of the method and discuss one or two subtleties. The protocol to construct exceedance curves using binning and extrapolation is as follows: 5 1. Run Turbsim/FAST N i times per wind speed x i at center of bin i (Typically N i ∼ 6).
2. For each bin i, concatenate the data from each seed and extract peaks (see below r.e. peak extraction and time scales).
For future reference we refer to the resulting dataset as {Y i,k } where i indexes over wind speed bins and k indexes over the peaks we have extracted at that wind speed 3. For each bin i, form empirical cumulative distribution functions (CDFs) and fit a chosen distribution F i (x) = P (Y < l|x) 10 to them. In this paper we use a 3 parameter Weibull distribution.
4. (optional) For each bin: convert each fitted distribution to desired time scale (see below r.e. peak extraction and time scales).

Finally
is pdf of wind distribution, and ∆x i is the width of bin i. The extrapolated estimate of the probability of exceedance is then P (Y > l) = 1 − P (Y < l).

15
The distributions chosen to fit the bin-wise CDFs are the theoretically appropriate extreme value distributions (Generalized Extreme Value (GEV), 3 parameter Weibull, etc.). However, this does not mean they accurately represent the behavior of the particular FAST loads in a specific context. The optimality properties of extreme value distributions are asymptotic properties, but we are doing "intermediate asymptotics": long term-but not infinitely long term-trends. Nevertheless, these distributions are the appropriate starting point.

20
In this paper we are using a 3 parameter Weibull distribution, but this is not meant as a claim that this choice is better than any other in the literature (Gumbel, GEV, etc). We used the 3 parameter Weibull because we have used it with success in previous work (Graf et al., 2017). There are many excellent studies examining the choice of distribution (Ragan and Manuel, 2008;Moriarty, 2008;Toft et al., 2011), but this is not the focus of the present work.
Regarding the fitting procedure, in light of the interest in extrapolation, rather than just fitting, we have fit the empirical 25 cumulative distribution function of the data directly to the theoretical cumulative distribution function (CDF) of the distribution by nonlinear least squares. We have done this separately for the data from each wind speed bin. Furthermore, in order to emphasize the largest peaks (i.e. the lowest probability values) we do not use all the data, just the M pks largest peaks in each bin, where M pks is an algorithmic parameter. M pks plays a role in our studies similar to the "threshold" used in peak-overthreshold methods, but for purposes of connecting to the bin-based approach it has the advantage that there are always the 30 same number of peaks extracted from a given length simulation. As an exercise, we experimented with using different values of M pks , as discussed below in section 4 and illustrated in Figure 2. The broad conclusion is that there is a window of values of this parameter that provides similar performance, which suggests this parameter will not be an impediment to practical implementation of this algorithm.
It is important to be clear regarding various time spans at play here. First, there is the ultimate time of interest, typically in wind studies the "50 year return period". This does not mean that in 50 years the event in question happens with probability 1. Sometimes it is loosely defined as an event having probability 1/50 of happening in one year. Strictly speaking it is an 5 event that happens on average one or more times in 50 years according to a Poisson process whose "events per interval" parameter is one in 50 years. Next there is the simulation time, i.e., the length of each FAST run. This is almost always 10 minutes in the literature. Relatedly, there is the time span over which our estimates of exceedance probability apply. These are also traditionally 10 minutes, i.e. reported probabilities of exceedance are probabilities of exceedance in 10 minutes, but there is nothing in principle to make this fixed. Finally, there is the length of time between independent peaks. This time can be 10 estimated empirically by examining the autocorrelation of the data; values as low as 4 seconds have been justified in previous studies (Ragan and Manuel, 2008), and 10 seconds seems to be more than adequate.
The various time spans come into play as we extract peaks and make estimates. The rule that connects them is the simple "AND" rule of probability: If Y < l for time T , and T = Kt, then Y < l for time t for K times in a row, so For example, the former is typically used to convert peak distributions collected on a 10 second, 1 minute, or peak-over-threshold basis to a 10 minute basis, while the latter is used when we derive the ubiquitous value of 3.8 x 10 −7 that represents the probability of the 50 year return period event happening in 10 minutes.
In this paper we adopt the following setup: All our simulations are 11 minutes long, where we discard the first minute as 20 a transient and retain the final 10 minutes for our studies (we will occasionally be loose with the terminology and refer to these as "10 minute simulations"). To gather peaks, we take the maximum of each 1 minute segment in our simulations. This provides exactly 10 peaks per simulation, which allows for building 1-minute empirical cumulative distribution functions (i.e. probability of exceedance in 1 minute) in a consistent manner. The resulting 1 minute empirical POEs are converted to 10 minute POEs as described above (i.e. K = 10) to conform to standard practice. Below we experiment with how many of these 25 peaks should be used in the fitting of the extreme value distributions used in the extrapolation method. This alternative to the peak-over-threshold method makes comparison and correspondence between extrapolation and sampling easier, because we have a fixed number of peaks per FAST run. The tradeoff between gathering more peaks (at the risk of sacrificing statistical independence) versus less peaks (at the risk of not having enough data) is an algorithmic detail that we could study further, but it is not the focus of this paper. Our motivation for using 1 minute interval peak separations is to pick a reasonable point along 30 this tradeoff that avoids the pitfalls of either extreme. Finally, we have divided the wind speed range into 5 bins centered at 8, 12, 16, 20, and 24 m/s.
To acquire a sense of the basic variability of the response, we have run 20 independent sequences of the simulations described above for the extrapolation method (6 random seeds per bin). Figure 1 consists of box and whiskers plots of the peaks from the first 6000 peaks (10 peaks per run X 6 seeds X 20 repetitions X 5 bins). For both the side-side and fore-aft loads, the variability 35 within each bin is large. For example, the difference between 95th and 5th quantile for the 24 m/s side-side bin is comparable to its median value. For the side-side load the dependence on wind speed is clearly also very strong, whereas for the fore-aft load, the variability within the bins is as large as it is between the bins. Figure 1. Box and whiskers plots of the distribution of raw response (specifically, all 1-minute maxima) of the combined TurbSim/FAST simulation as a function of wind speed bin for side-side (left) and fore-aft (right) loads. The difference in general trend between side-side and fore-aft loads is clearly evident. The variability within each bin is extreme (esp. for the fore-aft load), which puts an upper bound on the utility of sampling methods (e.g., importance sampling) targeting certain wind speeds. The boxes show the median, 25th, and 75th percentile.
The whiskers are positioned at the 5th and 95th percentile. The data are the absolute maxima in 1 minute segments of 120 separate 10 minute simulations per bin (1200 total peaks for each bin).

Monte Carlo importance sampling for extreme loads
Monte Carlo (MC) methods are widely used to estimate expectations of quantities calculated by stochastic simulations (Robert 5 and Casella, 2004). Importance Sampling (IS) is an MC method in which an auxiliary importance distribution q(x) is used to focus sampling on areas of the target distribution f (x) that are most relevant with respect to the functions of interest (e.g., the loads Y (x)). Often "relevant" means minimal variance (see below). The broad applicability of the method arises from the so-called Importance Sampling Identity: where E f and E q represent the expectation with respect to f and q, respectively. This means, from an MC standpoint, that and (where M tot is the total number of samples we have of the quantity we are estimating) are unbiased estimates of the same For us M tot will be the number of peaks gathered from the FAST runs. In what follows we will use M to represent numbers of peaks, and N to represent numbers of FAST runs. With our convention of taking the maximum over 1 minute spans of 10 minute simulations, we will have M = 10N throughout. The subscript " tot " will be the total number (over all bins), whereas index subscripts (e.g. " i ") will refers to the peaks or runs within the corresponding bin. Thus as above N i is the number of 15 FAST runs (i.e. random seeds) in the ith bin, and M i = 10N i is the number of peaks extracted from the N i runs, and N tot is the total number of FAST runs.
Although the above estimates (5) and (6) are both unbiased, they could have drastically different variance, which means that they may converge at drastically different rates. The minimal variance importance distribution can be derived and is 20 From a practical standpoint there are two obvious problems with this result. First, it depends on the expectation we wanted to calculate in the first place. This objection we can overcome using some form of accept-reject sampling that does not depend on the normalization constant. [As long as we can evaluate Y (x), even if by simulation (we assume we can also evaluate f (x)), we can sample from any distribution proportional to Y (x)f (x) by the accept-reject algorithm (see, e.g., (Robert and Casella, 2004)), which involves sampling uniformly in a 2D region containing the function Y (x)f (x). The probability of x 25 w.r.t. the Y * f distribution is just the proportion of these uniform samples below Y * f in this 2D "box". This procedure does require assumptions on the bounds of Y and the support of Y and f , but in principle these can be made large enough to sample any reasonable probability Y * f ]. A more significant problem, however, is that q * also depends on the function Y (x) whose expectation we are trying to calculate with as few evaluations as possible. Finally, in our case, an even worse problem is that Y (x) is not a deterministic function of x, so, as stated, q * is also a stochastic function. Nevertheless, Eqn. (7) provides a guide for our quest to find the minimal variance importance distribution: it should be as close as possible to proportional to the product of load Y (x) and wind probability f (x).
For our purposes, finally, note that Eqn.
(1) can be written as an expectation of the so-called "indicator" function that is 1 if Y (x) > l and 0 otherwise: Eqns. (1) and (8) form the basis of the mathematical bridge between extrapolation and MC/IS methods described in section 3.

IFORM
Here, keeping in mind the goal-minimal variance unbiased estimates of extreme loads through importance sampling, minimizing the use of extrapolation-we summarize the Inverse First Order Reliability Method (IFORM) and Environmental Contour 10 (EC) methods. IFORM was introduced by Winterstein (Winterstein et al., 1993) and addresses the estimation of extreme loads from a different perspective. Instead of directly computing the integral in Eqn.
(1), IFORMS seeks to find the combined set of environmental conditions and resulting loads that have a desired joint return period probability. The EC variant explicitly separates the environment from the turbine's response to it, in effect being a method that seeks to directly find the conditions that cause the extreme load.

15
In the general IFORM approach, the combined environmental and response variable space is considered as one joint probability distribution, and the quantile corresponding to the desired return period is explored to find the maximal response. In practice, the distribution of the environmental part of this combined space is assumed known. For example, in this paper, the environmental component is wind speed, which is assumed to have a Weibull distribution with shape and scale 11.28 m/s and 2, respectively. Then response data is gathered for samples in the environmental space, and a conditional distribution "response 20 | environment" is fit to this data, which allows finally for extrapolation to the desired return period. This use case of IFORM is thus a form of extrapolation, with its strengths (it requires very few samples to build arbitrarily low POE estimates) and weaknesses (there is no guarantee, as outlined above, that extrapolation outside the range of data, i.e., from short to long times, is valid). In this sense everything we can say about using adaptive importance sampling to reduce the variance of traditional extrapolations applies to IFORM estimates as well. The more data we have in the relevant bins, the more accurate our statistical 25 fits and resulting extrapolations will be. These notions help to explain why IFORM or EC applied to wind turbine extreme loads estimation may not be much different than other extrapolation methods. On the one hand, as noted above, though systematic and efficient, IFORM relies on extrapolation. Just like the standard extrapolation method, it relies on being able to extrapolate from easily observable quantiles (5th,25th,50th,75th,95th,...) to the very difficult to observe quantiles corresponding to the 50 year return period. On the other hand, EC is not applicable when the main driver of variation in a system is the response variability, which (see Figure 1) is 5 largely the case for wind turbines; the main driver of extreme loads is not a particular wind speed but some idiosyncratic chaotic process that even for a prosaic wind speed just might come together to cause an extreme load.
The present task, that of estimating extreme loads with wind speed as the only environmental variable, is governed mostly by the response variation. Therefore we will not estimate extreme loads by IFORM in this paper. However, IFORM is critical for conceptual understanding: where possible, our goal should be to "convert" response variation (intractable) to environmental 10 variation (tractable) through better understanding of its physical cause. It is the extreme response variability (different random seeds for the same environmental conditions can cause very different FAST output, because they cause very different turbulent inflow) that makes extreme loads estimation a difficult problem. We return to this subject in section 5.

15
When we perform the bin-wise simulations used in the extrapolation methods, we are performing stratified sampling. Recognizing that these samples can be described as a probability distribution provides a bridge to using them in an IS context, as discussed in (Graf et al., 2017). The basic idea is as follows. Assume a set of samples {x i } from an arbitrary distribution g(x) of wind speeds (g may be either f (x), q(x), or an empirical one derived from binning the data). By running FAST, the set {Y i } of corresponding loads can be generated and sorted from the lowest to the highest. Then for any given load Y j , the IS estimate 20 is: To apply this formula to data from the binning method, we need the appropriate g(x), i.e. the appropriate weights w i of each sample. Recall, we write the dataset as {Y i,k } where i indexes over wind speed bins and k indexes over peaks extracted at that wind speed. We rewrite the integral over wind speeds as a sum of integrals, and then approximate each separate integral by the 5 M i peaks derived from the N i runs (in our case, over TurbSim random seeds) at the fixed wind speed x i for bin i: From this, we see that the "weight" contributed by sample i, k to the probability of non-exceedance To calculate the empirical CDF from the bin data, then, we assign weight w i = 1 Mi f (x i )∆x i to all 10 samples from bin i and apply Equation (15). Table 1 summarizes the exact correspondence between extrapolation and IS/MC.
The importance density corresponding to stratified sampling is seen to be g(x i ) = Mi Mtot 1 ∆xi = Ni Ntot 1 ∆xi . Table 1. Bin-wise empirical cumulative distribution functions provide a bridge from extrapolation, which builds POE from bin-wise fitted distributions Fi, and importance sampling, which builds an unbiased estimate from the appropriately defined distribution g.
Fi(l)f (xi)∆xi P (Y < l|xi) ∼ Fi(l), fitted to above Importance Sampling Thus we have a "bridge" between fitting and sampling. Bin-wise empirical CDFs can be directly compared with fitted distributions (in fact, they are what we fit to). But the estimate over all wind speeds can then be expressed generically in the IS language suited to comparison with MC and IS estimates. IS/MC do not provide any "bin-wise" information (there are no 15 bins), so it is otherwise impossible to "debug" their divergence from extrapolation. This formulation allows us to see, first, that error accrues from lack of convergence of empirical CDFs, for both methods. Additionally though, for extrapolation, the error is compounded by lack of fit between the chosen extreme value distribution and the empirical CDFs, which is the price we pay for being able to extrapolate to arbitrarily low POEs with small numbers of samples.

ASIS as stochastic optimization
The discussion above indicates that the samples from the bin based methods can alternatively be used to make empirical estimates via their implied importance distributions. This orientation suggests, also, that there is no barrier to changing the distribution of samples as we go. So we can think of the estimation procedure as an optimization problem: find the distribution of bins (number of samples per bin) that results in unbiased estimates with minimal variance. In (Graf et al., 2017) we have 5 used a heuristic algorithm that looked for "gaps" in the empirical peak distribution. Here instead we introduce a gradient-based approach. As above, let N i be the number of FAST runs performed in the ith wind speed bin, and let N be the vector of bin-counts. Now, the variance of the estimate using importance distribution g is Note the second term does not depend on g (in the corresponding integral, the g in the denominator cancels out). So Here we have used the fact that Mi Mtot = Ni Ntot to write the expression in terms of bin counts instead of peak counts. Our algorithm begins by running the standard 6 seeds per bin from the extrapolation method (i.e., N i is initialzed to 6 for all  Note the algorithm as stated does not explicitly recognize the stochasticity of the underlying quantity Y (x). Because we are using an unbiased estimate of the gradient of the variance, our approach, naive as it is, is known to converge "almost surely" to a local optimum (Robbins and Monro, 1951). In fact it is a form of Stochastic Gradient Descent (Goodfellow et al., 2016).

25
Casting the problem in this form allows for taking advantage of ongoing research in this area. There are two reasons for step 2 (b) of the algorithm. First, it is not clear a priori that our optimization problem is convex. Thus there could be multiple local minima. Steps 2 (a) and 2 (b) correspond to the tradeoff between "exploitation" and "exploration" common to all global optimization algorithms. Second, because the gradient is calculated from an unconverged statistical estimate, there is an error associated with this vector. Preventing unconverged estimates from steering us in the wrong direction is an additional reason to include the "explore" step 2 (b).
A slight complication comes from the need for the N i to be integers (we can only have an integer number of runs per wind 5 speed bin). Currently we simply round N i to the nearest integer. Thus we are following the gradient as closely as possible subject to the integral nature of N i . We claim that this is a reasonable procedure, because the variance of the estimates is observed to be a rather slowly varying function of the bin distribution, so taking the nearest feasible (i.e. integral N i ) point to the point suggested by our gradient will incur a bounded and likely small error. Also, the gradient itself is an estimate, not an exact value, so we are already working with inexact quantities from the standpoint of traditional gradient-based optimization.

10
A more refined approach we could explore in the future is to apportion simulation time spent in each bin according to N i . This would amount to a "relaxation" (from the discrete space to the continuous space (Parker and Rardin, 1988)) of the problem and would allow for exactly following the (albeit still stochastic) gradient.
Next, as stated J only concerns one load type (e.g. tower base side-side bending moment). But previously (Graf et al., 2017) and above (i.e. Fig.1) we have seen that side-side and fore-aft loads favor different importance distributions. Since it defeats the 15 purpose of the method to have to repeat it for every load, we have adopted an "umbrella" concept; we compute the desired bin distribution for all the loads of interest, and form the minimal superset of bins that includes them all. Also, note that the gradient (e.g., see Equation (21)) is always negative; increasing any bin count will reduce the variance, which makes obvious sense.
The purpose of the algorithm is to guide the distribution of bin counts to have optimal proportions from each bin. Our current implementation works in a cumulative fashion. At every iterations, we are always adding more simulations, never removing 20 them.
Another issue, even for a single load type, is how many of its peaks Y i are considered in computing the gradient of the variance. We remind the reader that ASIS per se does not require any extrapolation. This question of how many peaks to use to compute ∇ N J(N ) is different from the question of how many peaks M pks to use for extrapolation. It is another algorithmic parameter that one could tune. Using all the peaks would reduce the variance of the POE estimates of all the peaks, even the 25 small peaks we don't care about. Using just the single largest peak would overemphasize the bin that this single peak happened to come from. The choice of the 5 largest peaks is rather arbitrary, enough so that more than the single largest peak contributes, but not so many as to deemphasize the goal of finding large peaks.
Finally, because we are still refining the mechanics of the algorithm, ASIS as stated does not include a stopping criteria.
Since what ASIS minimizes is the variance of our load estimates, stopping should be based on driving the variance below a 30 user-defined threshold. The difficulty, of course, is that unlike a deterministic gradient descent procedure, our only access to the actual variance is through further statistical estimates. In the results below we simply repeat the stochastic optimization procedure 100 times and compute the variance of the estimated loads directly. A less computationally expensive approach is bootstratpping (Fogle et al., 2008;Sultania and Manuel, 2017), which we would recommend for a production implementation of ASIS. In bootstrapping, the variance of our load estimates are estimated as follows: At every iteration, there are M i peaks 35 extracted from the N i runs at wind speed x i for bin i. Normally we use these directly in Equation (15) to form a single POE estimate. Instead, in bootstrapping, we resample the M i peaks with replacement some number, say L, times to form L resampled sets of peaks, all of length M i , but all containing different subsets of the original M i peaks. These are used to compute L independent POE estimates, from which an empirical variance can be computed and used for stopping criteria.

5
In this section we demonstrate the basic mechanics of the algorithm in the context of a study of the effect of the number of peaks, M pks , used to fit the bin-wise extrapolation distributions (F i ∼ P (Y < l|x i ), above). First, Figure 2 illustrates the variability of the extrapolated exceedance probability as a function of M pks . We have run FAST for 10 minutes 20 separate times for each bin (the figure shows only the results for the 20 m/s bin, but the others are similar). This results in 200 minutes of total simulation time, thus according to our 1-peak-per-minute convention (which is fixed throughout the paper), 2000 peaks.

10
Each line on the figure (e.g. "5 peaks", "10 peaks", etc.) is the result of fitting the 3 parameter Weibull CDF to just the largest M pks (e.g. M pks = 5, 10, etc) of these 2000. The line labelled "empiricial CDF" is the empirical CDF of the 2000 peaks. It is important that each peak always represents the same amount of simulation time. Otherwise we would have to re-weight the contribution of each peak to the POE in Equation (15).
As an exercise, we examine the sensitivity of the ASIS results to M pks . For this, we study the variance of the resulting extrapolation, which we can estimate simply by repeating the entire sampling, simulating, fitting, and extrapolation procedure 100 times. (Note, to estimate the variance in a production environment where compute time was of paramount importance we would recommend the more efficient if less straightforward bootstrapping procedure described above). The results are summarized in Figures 3, 4, and 5.
For each of the 100 independent tests, we ran extrapolation and ASIS for 25 iterations (an iteration of extrapolation is 20 simply re-performing the extrapolation procedure with the current ASIS bin data), which adds a varying number of new samples to each bin at each iteration. The most obvious observation is that indeed the variance of the estimates is decreasing quickly as a function of iteration. ASIS reliably drives the variance of the estimates of POE down, and simply recalculating the extrapolations to "keep up" with ASIS drives the variance of the extrapolation estimates down as well. There is a slight dependence on M pks , but it does appear there is a "sweet spot" around 40 peaks that is good for both loads (further study of 25 the optimal M pks is beyond the scope of this paper; it is akin to the study of the optimal threshold in the peak-over-threshold method). The standard deviation drops by roughly a factor of 3 after only about 100 FAST runs, (compared to 30 for the original extrapolation (iteration 0)). This is closer to a " 1 N " rate of convergence than the theoretical " 1 √ N " convergence of Monte Carlo integration.
Thus our "adaptive extrapolation" approach appears capable of reducing variance somewhat dramatically with minimal 30 additional computation. The two approaches maintain correspondence even while adapting the bin distribution, which allows for leveraging the variance reduction of the empirical ASIS estimate to reduce the variance of the extrapolation estimate. And the latter is the estimate of real importance, because that is what will be used in practice. Note that ASIS could as well drive Figure 2. Weibull 3 parameter fits to empirical CDF for wind speed bin centered at 20 m/s for a variety of choices for how many peaks M pks we use for the fit for side-side (left) and fore-aft (right) tower base loads. In all cases we are fitting the analytical CDF to the highest M pks loads of the empirical CDF (a lower values of M pks is like a higher threshold in the peak-over-threshold method). Especially for the side-side moment, the extrapolated POE values depend heavily on how many peaks are used.
IFORM estimates instead of the traditional extrapolation estimates. In both cases ASIS optimizes the distribution of samples (wind speed bins in this case) that are then used to fit statistical distributions, which are then used to extrapolate to desired return periods.

Conclusions
In this paper we have built a bridge between bin-based extrapolative methods and sample-based importance sampling Monte 5 Carlo methods. With this, we proposed an adaptive stratified importance sampling (ASIS) algorithm that is both more efficient than existing Monte Carlo approaches and maintains contact with the extrapolation methods and thereby allows for iteratively increasing the extrapolation accuracy. This is important, because only the extrapolations are able to routinely make estimates of extremely long return period load exceedance probabilities.
The search for the optimal importance distribution is a stochastic optimization problem. As stated above, our algorithm is a 10 convergent algorithm. But stochastic optimization is an active area of research, and more sophisticated algorithms may exist to improve our approach. We need to keep in mind, however, that the optimization problem is a means to an end. The real goal is minimal variance estimates with the smallest amount of effort. We want to use the optimal importance distribution at the The top row is side-side, the bottom is fore-aft. The y-axis units are the ratio of the standard deviation to the mean estimate (relative standard deviation, measuring convergence). The x-axis is the number of FAST runs (measuring computational expense). The target POE for the empirical ASIS estimate (left) is 5 x 10 −2 and for the extrapolation estimate (middle and right) is 10 −5 . Adaptively selecting samples in a way designed to accelerate the convergence of the empirical estimate (ASIS, left) also accelerates the convergence of the extrapolation estimates (middle, right). There is a somewhat weak dependence on the number of peaks used for extrapolation, but M pks ∼ 40 appears robust for both loads. The side-side load estimate has larger initial relative variance, because (as shown above in Fig. 1) its extremes occur at high winds, but its relative variance is reduced more quickly by the adaptive procedure than the fore-aft load.  10,000s of kN-m, the y-axis is probability of exceedance in 10 minutes. The format is the same as Fig. 4. Again it is interesting to compare the visual representation with the statistics presented in Fig. 3. The single extremely large load discovered by ASIS was also seen in (Graf et al., 2017) (Fig 5, panel (f)). Though it is beyond the scope of the present paper to do so, one of our main conclusions is that the statistical methods have come to a point where the best course forward will be to pursue the exact causes of such loads and integrate a statistical description of such situations into our methods. same time as we are discovering it. Relatedly, we need to also keep in mind that we have the dual mission of both efficiently estimating the load POEs and accurately estimating their variance. We can use the peaks we sample to make unbiased estimates of variance just as we do expectation, but these are only estimates, and they themselves suffer from lack of convergence. The resampling method of bootstrapping described above offers a way to leverage a single data set to estimate statistics and their variance, and in a practical setting this would be recommended (as opposed to the completely separate runs we have described 5 above).
In principle there is no barrier to application of ASIS to higher dimensional problems. In particular, it is well known that turbulence intensity and turbulence standard deviation have a large role in wind turbine extreme loads (Bos et al., 2015), where a bimodal importance distribution is warranted. It would be interested to see if the variance minimization of ASIS would discover this distribution. This would open the door to trusting an automated procedure to derive the distribution.

10
This problem may be ripe for a machine learning approach: The physics is in the solver. To the extent it is possible, we should be able to learn from increasing amounts of data. For this, we need accurate variance estimation methods that can build a "loss function" for learning algorithms that examine data and decide how to process it to make the best next estimate, and to choose the best next places to sample; here we have presented a framework for extrapolating from such data that allows for learning the best extrapolation strategy from the variance minimization algorithm.

15
On the other hand, we should realize there is a physical source of extreme response variation, which is the combination of turbulent inflow and nonlinear turbine response. By "opening up the black box", i.e., circling back to the original physics, we hope to transfer what in the present setup is response variability into the realm of environmental variability, at which point we can use its probability distribution to hone in on just the loads of interest (i.e., the extreme loads) more quickly. Further studies into the root causes of extreme response variation of wind turbine loads and their ultimate incorporation into more efficient 20 statistical extreme loads estimation are ongoing.