1 Introduction
Let be a sample of random variables (rvs) with cumulative distribution function (cdf) F(x). We consider the nonparametric estimation of the extremal index (EI) of stochastic processes. There are nonparametric methods like the well-known blocks and runs estimators of the EI which require the selection of two parameters, where an appropriate threshold u is among them
[3]. Modifications of the blocks estimator
[6, 20] and sliding blocks estimators
[17, 19] require only the block size without u. The intervals estimator depends only on u
[9]. Less attention is devoted to the estimation of parameters required for these estimators.




















![$$\theta \in [0,1]$$](../images/461444_1_En_31_Chapter/461444_1_En_31_Chapter_TeX_IEq17.png)
































The paper is organized as follows. In Sect. 2, related work is recalled. In Sect. 3, a limit distribution of the normalized statistic is obtained, and an algorithm of the discrepancy method based on the M-S statistic is given. Simulation study is shown in Sect. 4. Conclusions are presented in Sect. 5.
2 Related Work
Our achievements are based on the following Lemmas 3.4.1, 2.2.3 by [7] concerning limit distributions of the order statistics. They are recalled here.


































![$$\theta \in [0,1]$$](../images/461444_1_En_31_Chapter/461444_1_En_31_Chapter_TeX_IEq79.png)





For declustering the sample into approximately independent inter-cluster times , one can take
of the largest inter-exceedance times,
[9]. The larger u corresponds to larger inter-cluster times whose number L(u) may be small. This leads to a larger variance of the estimates based on
.
3 Main Results
3.1 Theory































![$$k^*=[k/2]$$](../images/461444_1_En_31_Chapter/461444_1_En_31_Chapter_TeX_IEq120.png)











































3.2 Discrepancy Equation Based on the Chi-Squared Statistic





![$$k^*=[(k-2)/2]$$](../images/461444_1_En_31_Chapter/461444_1_En_31_Chapter_TeX_IEq160.png)










Empirical cdf of the left-hand side of (11) built by 300 re-samples by a Moving Maxima (MM) process with sample size , the EI
,
,
, and
(solid line) and chi-squared cdf (points), Fig. 1a; Left-hand side statistic in (11) for
against threshold u and the
mode as the discrepancy, Fig. 1(b). The same statistics for an ARMAX process with
,
,
, and
, Fig. 1c and for
Fig. 1d
The discrepancy methods (3) and (11) are universal and can be used for any nonparametric estimator .
- 1.Using
and taking thresholds u corresponding to quantile levels
, generate samples of inter-cluster times
and the normalized rvs
where N is the number of exceedances over threshold u. - 2.
For each u, select
,
, e.g.,
.
- 3.
Use a sorted sample
and find all solutions
(here, l is a random number) of the discrepancy equation (11).
- 4.For each
,
, calculate
and find
as resulting estimates, whereand
are the minimal and maximal values among
.

Left-hand side statistic in (11) for against threshold u and the
mode
as the discrepancy for an ARMAX process with
3.3 Estimation of k
It remains to select k. For declustering purposes, i.e., to have approximately independent clusters of exceedances over u, it is recommended in
[9] to take the largest value k such that is strictly larger than
.
We propose another approach. For each predetermined threshold u and for a corresponding L(u), one may decrease the k-value until the discrepancy equations have solutions and select the largest one among such k’s. Figure 2 shows that the solution of (11) exists for and it does not for
. Due to several possible solutions u, the average may be taken over all estimates
with such u’s.
4 Simulation Study
Our simulation study, enhancing the behavior of Algorithm 3.1 is based on 1000 replicas of samples with size
generated from a set of models. These models are Moving Maxima (MM), Autoregressive Maximum (ARMAX), AR(1), AR(2), MA(2), and GARCH. The AR(1) process is considered with uniform noise (ARu) and with Cauchy distributed noise (ARc). Using Algorithm 3.1, we check the accuracy of the intervals estimator (5), where u is selected based on (11). The root mean squared error (RMSE) and the absolute bias are given in Tables 1 and 2. The best results are shown in bold numbers.
4.1 Models
Let us shortly recall the processes under study. The mth order MM process is ,
, where
are constants with
,
, and
are iid standard Fréchet distributed rvs with the cdf
, for
. Its EI is equal to
,
[1]. Values
and
corresponding to
and
are taken.
The ARMAX process is determined as ,
where
,
are i.i.d standard Fréchet distributed rvs and
holds assuming
. Its EI is given by
,
[3]. We consider
.
The ARu process is defined by ,
and
with
independent of
. For a fixed integer
, let
,
be iid rvs with
,
. The EI of AR(1) is
[5].
are taken.
The MA(2) process is determined by ,
with
,
, and iid Pareto rvs
with
if
, and
if
for some
[20]. Its EI is
. We consider
,
with corresponding
.













The root mean squared error
| MM | ARMAX | ARu | MA(2) | ARc | AR(2) | GARCH | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.5 | 0.8 | 0.25 | 0.75 | 0.5 | 0.8 | 0.5 | 2/3 | 0.3 | 0.25 | 0.328 | |
146 | 188 | 136 | 173 | 2719 | 1217 | 227 | 545 | 66 | 426 | 237 | |
141 | 164 | 122 | 156 | 2163 | 946 | 295 | 813 | 104 | 498 | 288 | |
360 | 440 | 291 | 430 | 3330 | 1527 | 336 | 470 | 67 | 432 | 467 | |
105 | 155 | 100 | 148 | 2519 | 1127 | 268 | 696 | 50 | 498 | 231 | |
139 | 144 | 97 | 151 | 1906 | 854 | 434 | 1107 | 161 | 692 | 860 | |
355 | 451 | 296 | 463 | 3325 | 1527 | 355 | 449 | 67 | 439 | 484 | |
96 | 120 | 88 | 115 | 2224 | 975 | 331 | 846 | 151 | 620 | 416 | |
149 | 135 | 117 | 143 | 1704 | 768 | 503 | 1197 | 358 | 953 | 1127 | |
350 | 453 | 285 | 434 | 3331 | 1518 | 336 | 441 | 67 | 431 | 474 | |
217 | 569 | 69 | 498 | 1883 | 199 | 309 | 466 | 33 | 3630 | 4028 | |
630 | 550 | 3640 | 320 | 1000 | |||||||
550 | 450 | 950 | 320 | 840 | |||||||
550 | 140 | 1550 | |||||||||
550 | 3640 | 220 | 485 | ||||||||
400 | 1070 | 375 | 210 | ||||||||
420 | 853 | ||||||||||
660 | 950 | 1580 | |||||||||
320 | 6080 | 3520 |
The absolute bias
| MM | ARMAX | ARu | MA(2) | ARc | AR(2) | GARCH | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.5 | 0.8 | 0.25 | 0.75 | 0.5 | 0.8 | 0.5 | 2/3 | 0.3 | 0.25 | 0.328 | |
7.4016 | 6.2708 | 2.9461 | 3.6254 | 2709 | 1204 | 182 | 516 | 66 | 399 | 70 | |
41 | 50 | 22 | 49 | 2139 | 921 | 267 | 791 | 104 | 477 | 139 | |
38 | 40 | 36 | 62 | 3302 | 1474 | 72 | 246 | 67 | 314 | 174 | |
37 | 12 | 15 | 13 | 2513 | 1118 | 251 | 684 | 50 | 485 | 137 | |
104 | 59 | 52 | 65 | 1893 | 841 | 424 | 1017 | 161 | 679 | 809 | |
11 | 61 | 48 | 45 | 3296 | 1474 | 47 | 227 | 67 | 319 | 210 | |
56 | 35 | 48 | 43 | 2221 | 970 | 322 | 842 | 151 | 613 | 391 | |
126 | 81 | 97 | 97 | 1701 | 760 | 497 | 1194 | 358 | 949 | 1119 | |
43 | 59 | 31 | 43 | 3303 | 1464 | 51 | 246 | 67 | 304 | 186 | |
0.14862 | 567 | 54 | 496 | 1878 | 196 | 306 | 462 | 33 | 3627 | 4027 | |
160 | 450 | 3530 | 80 | 690 | |||||||
100 | 180 | 340 | 80 | 630 | |||||||
50 | 160 | 630 | |||||||||
450 | 4170 | 87.5 | 270 | ||||||||
0 | 1070 | 40 | 25 | ||||||||
179 | 320 | ||||||||||
130 | 200 | 230 | |||||||||
20 | 6000 | 3230 |
4.2 Estimators and Their Comparison
In Tables 1 and 2, we insert apart from our estimates the available results of the simulation study by
[8, 17, 20, 21]. The estimators are notated as follows.
denotes the disjoint blocks and
the sliding blocks estimators
[17, 18];
the runs estimator
[24];
the multilevel estimator
[20];
and
the bias-corrected multilevel and the multilevel sliding blocks estimators
[20, 21];
and
the cycles and the max-stable cycles estimators
[8]. We can compare only results related to processes overlapping with our experiment. We calculate the
gaps estimates by
[22] with IMT-selected pairs (u, K) (for details regarding the IMT test, see
[10]) which are denoted as
.
We may conclude the following. The intervals estimator coupling with the discrepancy method demonstrates a good performance in comparison with other investigated estimators. It is not appropriate for light-tailed distributed processes (by its definition) as one can see by the example of the ARu process. The K-gaps estimator is indicated as one of the most promising methods in
[8, 17]. Our estimators, especially for smaller s (that reflect the smaller number of the largest order statistics k), may perform better.
Comparing Tables 1 and 2 with Fig. 1 by
[21], where the multilevel and the bias-corrected multilevel estimators were compared by data simulated from an ARMAX model only, one can see that the latter estimates demonstrate much larger accuracy values. Particularly, our estimate gives the best RMSE equal to 0.0088 and 0.0115 as far as the best among these estimates show about 0.04 and a bit less than 0.15 for and
, respectively.
In
[8] the cycles, the max-stable cycles, the runs, the K-gaps, the disjoint blocks, and sliding blocks estimators were compared. For the first three estimators, the misspecification IMT test was applied as a choice method of the threshold-run parameter. As an alternative, quantiles were used for these estimators as thresholds with the run parameter estimated by the latter test. We can compare only results related to MM with
, ARMAX with
, and AR(1) with
processes. The best bias equal to 0.002 and the RMSE equal to 0.032 for an MM process were achieved by the max-stable cycles estimator
. For our estimator, the best absolute bias is 0.00074 and the RMSE is 0.0096 for an MM process. For an ARMAX process, the best were the cycles estimated with the bias equal to 0.003 and the max-stable cycles estimated with the RMSE equal to 0.032. Our estimator provides the best absolute bias 0.00036 and the RMSE 0.0115.
The MA(2) process has been studied in
[20] regarding the multilevel and the bias-corrected multilevel blocks estimators with two specific weighted functions. For MA(2) with , the obtained best absolute bias is inside the interval
, and the MSE is in
. Our estimator provides the best absolute bias
and the MSE
. For MA(2) with
, we find in
[20] the bias about 0.0025 and the MSE
. Our estimator shows 0.0227 and
for the bias and the MSE, respectively.
5 Conclusions
The discrepancy method proposed for smoothing of pdf estimates is modified to select the threshold parameter u for the EI estimation. We derive the asymptotic distribution of the statistic relating to the M-S statistic. This allows us to use its mode as an unknown discrepancy value
. Since the discrepancy method may be applied for different estimators of the EI, one can find other parameters such as the block size for the blocks estimator of the EI instead of the threshold in the same way. The accuracy of the intervals estimator (5) with u selected by the new discrepancy method (11) is provided by a simulation study. The comparison with several EI estimators shows its good performance regarding heavy-tailed distributed processes.
The author appreciates the partial financial support by the Russian Foundation for Basic Research, grant 19-01-00090.