M. La Rocca et al. (eds.)Nonparametric StatisticsSpringer Proceedings in Mathematics & Statistics339https://doi.org/10.1007/978-3-030-57306-5_31

The Discrepancy Method for Extremal Index Estimation

Natalia Markovich¹

(1)

V.A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Moscow, 117997, Russia

Natalia Markovich

Email: nat.markovich@gmail.com

Abstract

We consider the nonparametric estimation of the extremal index of stochastic processes. The discrepancy method that was proposed by the author as a data-driven smoothing tool for probability density function estimation is extended to find a threshold parameter u for an extremal index estimator in case of heavy-tailed distributions. To this end, the discrepancy statistics are based on the von Mises–Smirnov statistic and the k largest order statistics instead of an entire sample. The asymptotic chi-squared distribution of the discrepancy measure is derived. Its quantiles may be used as discrepancy values. An algorithm to select u for an estimator of the extremal index is proposed. The accuracy of the discrepancy method is checked by a simulation study.

Keywords

Nonparametric estimationDiscrepancy methodvon Mises–Smirnov statisticExtremal indexHeavy-tailed distribution

1 Introduction

Let $X^n=\{X_1,\ldots ,X_n\}$ be a sample of random variables (rvs) with cumulative distribution function (cdf) F(x). We consider the nonparametric estimation of the extremal index (EI) of stochastic processes. There are nonparametric methods like the well-known blocks and runs estimators of the EI which require the selection of two parameters, where an appropriate threshold u is among them [3]. Modifications of the blocks estimator [6, 20] and sliding blocks estimators [17, 19] require only the block size without u. The intervals estimator depends only on u [9]. Less attention is devoted to the estimation of parameters required for these estimators.

The discrepancy method was proposed and studied in [13, 23] as a data-driven smoothing tool for light-tailed probability density function (pdf) estimation by independent identically distributed (iid) data. The idea is to find a smoothing parameter (i.e., a bandwidth) h as a solution of the discrepancy equation:

$\begin{aligned}\rho (\widehat{F}_h,F_n)=\delta .\end{aligned}$

Here, $\widehat{F}_h(x)=\int _{-\infty }^x\widehat{f}_h(t)dt$ , $\hat{f}_h(t)$ is some pdf estimate, and $\delta$ is a known discrepancy value of the estimation of F(x) by the empirical distribution function $$F_n(t)$$

, i.e., $\delta =\rho (F,F_n)$ , $\rho (\cdot ,\cdot )$ is a metric in the space of cdf’s. Since $\delta$ is usually unknown, some quantiles of limit distributions of von Mises–Smirnov (M-S)

$\begin{aligned} \omega ^2_n=n\int _{-\infty }^{\infty }\left( F_n(x)-F(x)\right) ^2dF(x), \end{aligned}$

and Kolmogorov–Smirnov statistics were proposed as $\delta$ . Distributions of these statistics are invariant regarding F(x), [4]. In practice the bandwidth h may be found as a solution of the equation [13]

$\begin{aligned} \hat{\omega }^2_n(h)=0.05,\end{aligned}$

(1)

where

$\hat{\omega }^2_n(h) = \sum _{i=1}^n\left( \widehat{F}_h(X_{i,n})-\frac{i-0.5}{n}\right) ^2+\frac{1}{12n}$

is based on the order statistics $X_{1,n}\le \cdots \le X_{n,n}$ corresponding to the sample $$X^n$$

. The value 0.05 corresponding to the maximum (mode) of the pdf of the statistic $\omega ^2_n$ was found by tables of statistic $\omega ^2_n$ [4] as the discrepancy value $\delta$ .

It is noted in [14, 16] that for heavy-tailed distributions, the statistic $\omega ^2_n$ may not reach the value 0.05 and hence, the discrepancy equation (1) may have no solutions, or the solutions provide too small values of h that are unsatisfactory for pdf estimation. In order to estimate heavy-tailed pdf’s, the modification of the discrepancy method based on the k largest order statistics instead of the entire sample was considered in [16]. Namely, the statistic

$\begin{aligned}&\hat{\omega }^2_n(h) = \sum _{i=n-k+1}^n\left( \widehat{F}_h(X_{i,n})-\frac{i-0.5}{n}\right) ^2+\frac{1}{12n}\end{aligned}$

was proposed to be used in (1). A similar idea was explored in [15] to estimate the EI.

Definition 1

([12, p. 67]) The stationary sequence $\{X_n\}_{n\ge 1}$ is said to have EI $\theta \in [0,1]$ if for each $0<\tau <\infty$ there is a sequence of real numbers $u_n=u_n(\tau )$ such that it holds

$\begin{aligned} \lim _{n\rightarrow \infty }n(1-F(u_n))=\tau , \qquad \lim _{n\rightarrow \infty }P\{M_n\le u_n\}=e^{-\tau \theta },\end{aligned}$

(2)

where $M_n=\max \{X_1,\ldots ,X_n\}$ .

The EI reflects a cluster structure of a sequence. If $X_1,\ldots ,X_n$ are independent, $\theta =1$ holds. One can determine the cluster as the number of consecutive observations exceeding a threshold u between two consecutive non-exceedances, [9]. Then the values of inter-cluster times $$T_1(u)$$

for a given threshold u are stated as the numbers of consecutive non-exceedances between two consecutive clusters, [9]. We have

$\begin{aligned} T_1(u)=\min \{j\ge 1: M_{1,j}\le u, X_{j+1}>u|X_{1}>u\}, \end{aligned}$

$M_{1,j}=\max \{X_{2},\ldots ,X_{j}\}$ , $M_{1,1}=-\infty$ . Observations of $$T_1(u_n)$$

normalized by the tail function $\{Y_i=\overline{F}(u_n)T_1(u_n)_i\}$ , $i=1,\ldots ,L$ , $$L=L(u_n)$$

,¹ are derived to be asymptotically exponentially distributed with weight $\theta$ , i.e.,

$\begin{aligned}P\{\overline{F}(u_n)T_1(u_n)>t\}\rightarrow & {} \theta \exp (-\theta t)\qquad \text{ for }~~t>0\end{aligned}$

as $n\rightarrow \infty$ under a specific mixing condition and $$u_n$$

satisfying (2), [9].

The discrepancy equation may be based on the k, $1\le k\le L-1$ , largest order statistics of a sample $\{Y_i=(N_u/n)T_1(u)_i\}$ as follows:

$\begin{aligned} \hat{\omega }^2_L(u) = \sum _{i=L-k+1}^L\left( \widehat{G}(Y_{i,L})-\frac{i-0.5}{L}\right) ^2+\frac{1}{12L}=\delta .\end{aligned}$

(3)

Here, $N_u=\sum _{i=1}^n\mathbf{1}\{X_i>u\}$ is the number of observations which exceed a predetermined high threshold u. $\widehat{G}(Y_{i,L})$ is determined by $G(t)=1-\theta \exp (-\theta t)$ with the replacement of $\theta$ by some estimate $\widehat{\theta }(u)$ and t by the order statistic $Y_{i,L}$ , [15]. An appropriate value of the threshold u can be found as a solution of the discrepancy equation (3) with a predetermined value $\delta$ with regard to any consistent nonparametric estimator of EI. The calculation (3) by an entire sample may lead to the lack of a solution of the discrepancy equation regarding u the same way as for the heavy-tailed pdf estimation or to too large values u which may not be appropriate for the estimation of $\theta$ . The selection of k and $\delta$ remains a problem. We aim to obtain a limit distribution of the discrepancy statistic related to (3) depending on k and to use its quantiles as $\delta$ .

The paper is organized as follows. In Sect. 2, related work is recalled. In Sect. 3, a limit distribution of the normalized statistic $\hat{\omega }^2_L(u)$ is obtained, and an algorithm of the discrepancy method based on the M-S statistic is given. Simulation study is shown in Sect. 4. Conclusions are presented in Sect. 5.

2 Related Work

Our achievements are based on the following Lemmas 3.4.1, 2.2.3 by [7] concerning limit distributions of the order statistics. They are recalled here.

Lemma 1

([7, p. 89]) Let $X, X_1, X_2,\ldots ,X_n$ be iid rvs with common cdf F, and let $X_{1,n} \le X_{2,n} \le \cdots \le X_{n,n}$ be the n order statistics. The joint distribution of $\{X_{i,n}\}^n_{i=n-k+1}$ given $X_{n-k,n} = t$ , for some $k\in \{1,\ldots , n-1\}$ , equals the joint distribution of the set of the order statistics $\{X^*_{i,k}\}^k_{i=1}$ of iid rvs $\{X^*_{i}\}^k_{i=1}$ with cdf

$\begin{aligned} F_t(x)= & {} P\{X \le x|X> t\} = (F(x) - F(t))/(1-F(t)), ~~~ x>t.\end{aligned}$

(4)

Lemma 2

([7, p. 41]) Let $U_{1,n}\le U_{2,n}\le \cdots \le U_{n,n}$ be the n order statistics from a standard uniform distribution. Then, as $n\rightarrow \infty$ , $k\rightarrow \infty$ , $n-k\rightarrow \infty$ ,

$\begin{aligned} (U_{k,n}-b_{n})/(a_{n}) \end{aligned}$

is asymptotically standard normal with

$\begin{aligned}&b_{n}=(k-1)/(n-1),\qquad a_{n}=\sqrt{b_n(1-b_n)/(n-1)}.\end{aligned}$

Definition 2

([9]) For real u and integers $1\le k\le l$ , let $\mathscr {F}_{k,l}(u)$ be the $\sigma$ -field generated by the events $\{X_i>u\}$ , $k\le i\le l$ . Define the mixing coefficients $\alpha _{n,q}(u)$ ,

$\begin{aligned}&\alpha _{n,q}(u)=\max _{1\le k\le n-q}\sup |P(B|A)-P(B)|, \end{aligned}$

where the supremum is taken over all $A\in \mathscr {F}_{1,k}(u)$ with $$P(A)> 0$$

and $B\in \mathscr {F}_{k+q,n}(u)$ and k, q are positive integers.

Theorem 1

([9, p. 547]) Let $\{X_n\}_{n\ge 1}$ be a stationary process of rvs with tail function $\overline{F}(x)=1-F(x)$ . Let the positive integers $\{r_n\}$ and the thresholds $\{u_n\}$ , $n\ge 1$ be such that $r_n\rightarrow \infty$ , $r_n\overline{F}(u_n)\rightarrow \tau$ , and $P\{M_{r_n}\le u_n\}\rightarrow exp(-\theta \tau )$ hold as $n\rightarrow \infty$ for some $\tau \in (0,\infty )$ and $\theta \in [0,1]$ . If there are positive integers $$q_n=o(r_n)$$

such that $\alpha _{cr_n,q_n}(u_n)=o(1)$ for any $$c>0$$

, then we get for $$t>0$$

$\begin{aligned}&P\{\overline{F}(u_n)T_1(u_n)>t\}\rightarrow \theta \exp (-\theta t),\qquad n\rightarrow \infty . \end{aligned}$

For declustering the sample into approximately independent inter-cluster times $\{(T_1(u))_i\}$ , one can take $k-1=\lfloor \theta \sum _{i=1}^n\mathbf{1}(X_i>u)\rfloor$ of the largest inter-exceedance times, [9]. The larger u corresponds to larger inter-cluster times whose number L(u) may be small. This leads to a larger variance of the estimates based on $\{(T_1(u))_i\}$ .

The intervals estimator of the EI follows from Theorem 1 and depends only on u. It is defined as [3, p. 391],

$\begin{aligned} \hat{\theta }_n(u)= & {} \Big \{ \begin{array}{ll} \min (1,\hat{\theta }_n^1(u)) , \text{ if } \max \{(T_1(u))_i : 1 \le i \le L -1\} \le 2,\\ \min (1,\hat{\theta }_n^2(u)) , \text{ if } \max \{(T_1(u))_i : 1 \le i \le L -1\} > 2, \end{array} \end{aligned}$

(5)

$\begin{aligned}&\hat{\theta }_n^1(u)=\frac{2(\sum _{i=1}^{L-1}(T_1(u))_i)^2}{(L-1)\sum _{i=1}^{L-1}(T_1(u))_i^2},\end{aligned}$

$\begin{aligned}&\hat{\theta }_n^2(u)=\frac{2(\sum _{i=1}^{L-1}((T_1(u))_i-1))^2}{(L-1)\sum _{i=1}^{L-1}((T_1(u))_i-1)((T_1(u))_i-2)}.\end{aligned}$

The intervals estimator is derived to be consistent for m-dependent processes, [9]. Asymptotic normality of $\sqrt{k_n}(\hat{\theta }_n(u)-\theta )$ , where $k_n=\lfloor n/r_n\rfloor$ and $r_n\rightarrow \infty$ , $$r_n=o(n)$$

as $n\rightarrow \infty$ is proved for the intervals estimator in [18].

3 Main Results

3.1 Theory

Let us rewrite the left-hand side of (3) in the following form:

$\begin{aligned}\hat{\omega }^2_L(u)= & {} \sum _{i=L-k+1}^L\left( 1-\theta \exp (-Y_{i,L}\theta )-\frac{i-0.5}{L}\right) ^2+\frac{1}{12L}.\end{aligned}$

We disregard the marginal distribution of the random number of inter-cluster times $$L=L(u)$$

. This approach is referred to as a conditional one. This is possible since the limit distribution of $$L(u_n)$$

does not depend on $$u_n$$

as $n\rightarrow \infty$ . Following [3], Sect. 10.3.1, the probability $P\{L(u_n)=i\}$ may be approximated by a binomial distribution with probability $p_n^*=P\{M_{r_n}\le u_n\}$ that tends to $e^{-\tau \theta }$ by (2). Here, $$r_n$$

denotes the length of a data block. The cluster is defined as a block of data with at least one exceedance over $$u_n$$

. The same is true for the cluster defined as in [9]. We have

$\begin{aligned} 1-\theta \exp (-Y_{i,L}\theta )=^d U_{i,L},\end{aligned}$

(6)

where $\{U_{i,L}\}$ are order statistics derived from an iid uniform [0, 1] sample $\{U_{i}\}_{i=1}^L$ .

By Lemma 1, the joint distribution of upper order statistics $(U_{L-k+1,L},\ldots ,U_{L,L})$ given $U_{L-k,L}=t$ , $t\in [0,1)$ , for some $k= 1,\ldots , L-1$ , equals the joint distribution of order statistics $(U^*_{1,k},\ldots ,U^*_{k,k})$ of associated iid rvs $\{U^*_{i}\}_{i=1}^k$ with cdf

$\begin{aligned} F_t(x)= & {} (x-t)/(1-t),\qquad 0\le t<x\le 1. \end{aligned}$

(7)

Lemma 3 derives the normal distribution of $U^*_{i,k}$ given $U_{L-k,L} = t$ after normalization.

Lemma 3

Let $U^*_{1,k}\le U^*_{2,k}\le \cdots \le U^*_{k,k}$ be the k order statistics of iid rvs $\{U^*_{i}\}_{i=1}^k$ with cdf (7). Then, as $k\rightarrow \infty$ , $i\rightarrow \infty$ , $k-i\rightarrow \infty$ ,

$\begin{aligned}(U^*_{i,k}-b_{i,k})/a_{i,k}\end{aligned}$

is asymptotically standard normal with

$\begin{aligned} b^*_{i,k}= & {} \frac{i-1}{k-1}=\frac{b_{i,k}-t}{1-t},\qquad a_{i,k}=(1-t)\sqrt{\frac{b^*_{i,k}(1-b^*_{i,k})}{k-1}}.\end{aligned}$

(8)

Proof

The pdf of $(U^*_{i,k}-b_{i,k})/a_{i,k}$ is given by [2]

$\begin{aligned}&\frac{k!}{(i-1)!(k-i)!}\cdot f(a_{i,k}x+b_{i,k})F_t(a_{i,k}x+b_{i,k})^{i-1}(1-F_t(a_{i,k}x+b_{i,k}))^{k-i}\nonumber \\= & {} \frac{k!}{(i-1)!(k-i)!}\cdot \frac{a_{i,k}}{1-t}\cdot \left( \frac{xa_{i,k}+b_{i,k}-t}{1-t}\right) ^{i-1}\cdot \left( 1-\frac{xa_{i,k}+b_{i,k}-t}{1-t}\right) ^{k-i} \\= & {} \left( \frac{k!}{(i-1)!(k-i)!}(b^*_{i,k})^{i-1}(1-b^*_{i,k})^{k-i}\frac{a_{i,k}}{1-t}\right) \nonumber \\&\cdot \left( 1+\frac{xa_{i,k}}{b_{i,k}^*(1-t)}\right) ^{i-1}\left( 1-\frac{xa_{i,k}}{(1-b_{i,k}^*)(1-t)}\right) ^{k-i}.\nonumber \end{aligned}$

(9)

In the same way as in the proof of Lemma 2.2.3 [7], we obtain

$\begin{aligned}&(i-1)\log \left( 1+\frac{xa_{i,k}}{b_{i,k}^*(1-t)}\right) +(k-i)\log \left( 1-\frac{xa_{i,k}}{(1-b_{i,k}^*)(1-t)}\right) \\= & {} (i-1)\left( x\frac{a_{i,k}}{b_{i,k}^*(1-t)}-\frac{x^2}{2}\left( \frac{a_{i,k}}{b_{i,k}^*(1-t)}\right) ^2+\cdots \right) \\+ & {} (k-i)\left( -x\frac{a_{i,k}}{(1-b_{i,k}^*)(1-t)}-\frac{x^2}{2}\left( \frac{a_{i,k}}{(1-b_{i,k}^*)(1-t)}\right) ^2-\cdots \right) . \end{aligned}$

From (8) it follows $(i-1)\left( \frac{a_{i,k}}{b_{i,k}^*(1-t)}\right) ^2+(k-i)\left( \frac{a_{i,k}}{(1-b_{i,k}^*)(1-t)}\right) ^2=1$ . The other terms are of smaller order. Using Stirlings’s formula for k! we find that the factor in the third string of (9) tends to $(2\pi )^{-1/2}$ . Thus, the statement follows.

Lemma 4

Let the conditions of Lemma 3 be fulfilled. Then the statistic

$\begin{aligned}&\chi ^2=\sum _{i=1}^{k^*}\left( (U^*_{i,k}-b_{i,k})/a_{i,k}\right) ^2\end{aligned}$

(10)

is asymptotically $\chi ^2$ distributed with $$k^*=[k/2]$$

degrees of freedom.

Proof

Let us denote $Y^*_{i,k}=(U^*_{i,k}-b_{i,k})/a_{i,k}$ and obtain the distribution $\Phi _{\zeta }(y)$ of $\zeta =\chi /\sqrt{k}$ . From Lemma 3 $Y^*_{i,k}\sim N(0,1)$ holds asymptotically. Due to the symmetry, we get $Y^*_{1,k}\le \cdots \le Y^*_{k^*,k}$ and $Y^*_{k,k}\le \cdots \le Y^*_{k^*+1,k}$ for odd k ( $Y^*_{k,k}\le \cdots \le Y^*_{k^*,k}$ for even k). By Lemma 1, the joint pdf of the $$k^*$$

order statistics $Y^*_{1,k}, \ldots , Y^*_{k^*,k}$ is determined by [2]

$\begin{aligned}f(x_1,\ldots ,x_{k^*})= & {} (k^*)!\prod _{i=1}^{k^*}f(x_i)=\frac{(k^*)!}{(\sqrt{2\pi })^{k^*}}\exp \left( -\frac{\sum _{i=1}^{k^*}x_i^2}{2}\right) , \\&t<x_1<x_2<\cdots<x_{k^*}<+\infty . \end{aligned}$

For positive y, the cdf $\Phi _{\zeta }(y)$ is equal to the probability to fall inside the $$k^*$$

-dimensional sphere $\sum _{i=1}^{k^*}\left( Y^*_{i,k}\right) ^2=y^2\sqrt{k^*}$ . For negative y, we have $\Phi _{\zeta }(y)=0$ . Hence, we obtain

$\begin{aligned}&\Phi _{\zeta }(y)=\frac{(k^*)!}{(\sqrt{2\pi })^{k^*}}\cdot \\&\int \cdots \int _{\sum _{i=1}^{k^*} x_i^2<y^2k^*}\exp \left( -\frac{\sum _{i=1}^{k^*}x_i^2}{2}\right) \mathbf{1}(t<x_1<x_2<\cdots<x_{k^*}<+\infty )dx_1dx_2\ldots dx_{k^*}. \end{aligned}$

Using spherical coordinates and replacing $x_1=\rho \cos \theta _1\cos \theta _2\ldots \cos \theta _{k^*-1}$ , $x_2=\rho \cos \theta _1\cos \theta _2\ldots \sin \theta _{k^*-1},$ $\ldots ,$ $x_{k^*-1}= \rho \cos \theta _1\sin \theta _2$ , $x_{k^*}= \rho \sin \theta _1$ , we find the intervals of each variable $\rho$ and $\theta _i$ , $i=1,\ldots ,k^*-1$ . Since $t<x_1<x_2<\cdots<x_{k^*}<+\infty$ holds, we get $x_2/x_1 = \tan (\theta _{k^*-1})>1$ and hence, $\pi /4<\theta _{k^*-1}<\pi /2$ , $\sqrt{2}/2<\sin (\theta _{k^*-1})<1$ . From $x_3/x_2 = \tan (\theta _{k^*-2})/\sin (\theta _{k^*-1})>1$ we then have $\tan (\theta _{k^*-2})>\sin (\theta _{k^*-1})$ , and since the largest value of $\sin (\theta _{k^*-1})$ is 1, we get $\tan (\theta _{k^*-2})>1$ and $\pi /4<\theta _{k^*-2}<\pi /2$ . Finally, we get $\pi /4<\theta _{i}<\pi /2$ , $i=1,\ldots ,k^*-1$ . Now, we may take the following integral:

$\begin{aligned}&\Phi _{\zeta }(y)=\frac{(k^*)!}{(\sqrt{2\pi })^{k^*}}\cdot \\&\int _{\pi /4}^{\pi /2}\cdots \int _{\pi /4}^{\pi /2}\int _{0}^{y\sqrt{k^*}}\exp \left( -\frac{\rho ^2}{2}\right) \rho ^{k^*-1}D(\theta _1,\ldots ,\theta _{k^*-1}) d\rho d\theta _{k^*-1}\ldots d\theta _{1} \\= & {} C_{k^*}\int _{0}^{y\sqrt{k^*}}\exp \left( -\frac{\rho ^2}{2}\right) \rho ^{k^*-1}d\rho , \end{aligned}$

where

$\begin{aligned}C_{k^*}= & {} \frac{(k^*)!}{(\sqrt{2\pi })^{k^*}}\int _{\pi /4}^{\pi /2}\cdots \int _{\pi /4}^{\pi /2}D(\theta _1,\ldots ,\theta _{k^*-1}) d\theta _{k^*-1}\ldots d\theta _{1}. \end{aligned}$

Then the constant $C_{k^*}$ can be obtained from the equation:

$\begin{aligned}\Phi _{\zeta }(+\infty )= & {} 1=C_{k^*}\int _{0}^{\infty }\exp \left( -\frac{\rho ^2}{2}\right) \rho ^{k^*-1}d\rho =C_{k^*}\Gamma \left( \frac{k^*}{2}\right) 2^{k^*/2-1}. \end{aligned}$

Hence, it follows

$\begin{aligned}\Phi _{\zeta }(y)= & {} \frac{1}{\Gamma \left( k^*/2\right) 2^{k^*/2-1}}\int _{0}^{y\sqrt{k^*}}\exp \left( -\frac{\rho ^2}{2}\right) \rho ^{k^*-1}d\rho .\end{aligned}$

The pdf of $\zeta$ is given by

$\begin{aligned}\varphi _{\zeta }(y)= & {} \frac{\sqrt{2k^*}}{\Gamma \left( k^*/2\right) }\left( \frac{y\sqrt{k^*}}{\sqrt{2}}\right) ^{k^*-1}\exp \left( -\frac{k^*y^2}{2}\right) .\end{aligned}$

Hence, one can get the chi-squared pdf of $\chi ^2$ :

$\begin{aligned} p(x)= & {} \frac{x^{k^*/2-1}\exp (-x/2)}{2^{k^*/2}\Gamma \left( k^*/2\right) }. \end{aligned}$

3.2 Discrepancy Equation Based on the Chi-Squared Statistic

Regarding a consistent estimate $\widehat{\theta }(u)$ by (6) and denoting $$i^*=i-L+k^*$$

, $i=L-k^*+1,\ldots ,L$ , then u can be selected as a solution of the discrepancy equation

$\begin{aligned}&\sum _{i^*=2}^{k^*-1}\left( (1-\widehat{\theta }(u)\exp (-Y_{i^*+L-k^*,L}\widehat{\theta }(u))-b_{i^*,k^*})/a_{i^*,k^*}\right) ^2 =\delta \end{aligned}$

(11)

for a given k such that $k^*-1\ge 2$ and $$k^*=[(k-2)/2]$$

. Here $\delta$ is a mode $\max \{k^*-2,0\}$ of the $\chi ^2(k^*)$ distribution and $t=1-\widehat{\theta }(u)\exp (-Y_{L-k,L}\widehat{\theta }(u))$ . $b_{i,k}$ , $a_{i,k}$ , and $$k^*$$

are calculated as in Lemmas 3 and 4. This could be an alternative to the method (3). In Fig. 1a, c, one can see that the empirical cdf of the left-hand side statistic in (11), where $\widehat{\theta }(u)$ is based on (5), and a chi-squared cdf are rather close. Figure 1b, d shows that the discrepancy equation (11) may have solutions since the left-hand side statistic in (11) crosses the mode of the $\chi ^2$ -distribution.

../images/461444_1_En_31_Chapter/461444_1_En_31_Fig1_HTML.png — Fig. 1
Empirical cdf of the left-hand side of (11) built by 300 re-samples by a Moving Maxima (MM) process with sample size $n=4\cdot 10^4$ , the EI $\theta =0.8$ , , , and (solid line) and chi-squared cdf (points), Fig. 1a; Left-hand side statistic in (11) for against threshold u and the $\chi ^2$ mode as the discrepancy, Fig. 1(b). The same statistics for an ARMAX process with $\theta =0.25$ , , , and , Fig. 1c and for Fig. 1d

$$n=4\cdot 10^4$$ — Fig. 1
Empirical cdf of the left-hand side of (11) built by 300 re-samples by a Moving Maxima (MM) process with sample size $n=4\cdot 10^4$ , the EI $\theta =0.8$ , , , and (solid line) and chi-squared cdf (points), Fig. 1a; Left-hand side statistic in (11) for against threshold u and the $\chi ^2$ mode as the discrepancy, Fig. 1(b). The same statistics for an ARMAX process with $\theta =0.25$ , , , and , Fig. 1c and for Fig. 1d

Remark 1

The discrepancy methods (3) and (11) are universal and can be used for any nonparametric estimator $\widehat{\theta }(u)$ .

Algorithm 3.1

1.
Using $X^n=\{X_1, X_2,\ldots ,X_n\}$ and taking thresholds u corresponding to quantile levels $q\in \{0.90,0.905,\ldots ,0.995\}$ , generate samples of inter-cluster times $\{T_1(u)_i\}$ and the normalized rvs
$\begin{aligned}\{Y_{i}\}= & {} \{\overline{F}(u)T_1(u)_{i}\}=\{(N/n)T_1(u)_{i}\}, \qquad i=\overline{1,L}, \qquad L=L(u),\end{aligned}$
where N is the number of exceedances over threshold u.
2.
For each u, select , , e.g., .
3.
Use a sorted sample $Y_{L-k+1,L}\le \cdots \le Y_{L,L}$ and find all solutions $u_1,\ldots ,u_l$ (here, l is a random number) of the discrepancy equation (11).
4.
For each , $j\in \{1,\ldots ,l\}$ , calculate $\hat{\theta }(u_j)$ and find
$\begin{aligned} \widehat{\theta }_1(u)= & {} \frac{1}{l}\sum _{i=1}^l\widehat{\theta }(u_i),\quad \widehat{\theta }_2(u) =\widehat{\theta }(u_{min}),\qquad \widehat{\theta }_3(u) =\widehat{\theta }(u_{max}) \end{aligned}$
as resulting estimates, where $u_{min}$ and $u_{max}$ are the minimal and maximal values among $\{u_1,\ldots ,u_l\}$ .

../images/461444_1_En_31_Chapter/461444_1_En_31_Fig2_HTML.png — Fig. 2
Left-hand side statistic in (11) for against threshold u and the $\chi ^2$ mode $\max \{k^*-2,0\}$ as the discrepancy for an ARMAX process with $\theta =0.75$

$$k^*= 7, 8$$ — Fig. 2
Left-hand side statistic in (11) for against threshold u and the $\chi ^2$ mode $\max \{k^*-2,0\}$ as the discrepancy for an ARMAX process with $\theta =0.75$

3.3 Estimation of k

It remains to select k. For declustering purposes, i.e., to have approximately independent clusters of exceedances over u, it is recommended in [9] to take the largest value k such that $(T_1(u))_{L-k,L}$ is strictly larger than $(T_1(u))_{L-k-1,L}$ .

We propose another approach. For each predetermined threshold u and for a corresponding L(u), one may decrease the k-value until the discrepancy equations have solutions and select the largest one among such k’s. Figure 2 shows that the solution of (11) exists for $$k^*=7$$ and it does not for $$k^*=8$$ . Due to several possible solutions u, the average may be taken over all estimates $\widehat{\theta }(u)$ with such u’s.

4 Simulation Study

Our simulation study, enhancing the behavior of Algorithm 3.1 is based on 1000 replicas of samples $\{X_1,\ldots ,X_n\}$ with size $$n=10^5$$ generated from a set of models. These models are Moving Maxima (MM), Autoregressive Maximum (ARMAX), AR(1), AR(2), MA(2), and GARCH. The AR(1) process is considered with uniform noise (ARu) and with Cauchy distributed noise (ARc). Using Algorithm 3.1, we check the accuracy of the intervals estimator (5), where u is selected based on (11). The root mean squared error (RMSE) and the absolute bias are given in Tables 1 and 2. The best results are shown in bold numbers.

4.1 Models

Let us shortly recall the processes under study. The mth order MM process is $X_t=\max _{i=0,\ldots ,m}\{\alpha _i Z_{t-i}\}$ , $t\in Z$ , where $\{\alpha _i\}$ are constants with $\alpha _i \ge 0$ , $\sum _{i=0}^m \alpha _i=1$ , and $$Z_t$$ are iid standard Fréchet distributed rvs with the cdf $F(x)=\exp \left( -1/x\right)$ , for $$x>0$$ . Its EI is equal to $\theta =\max _i\{\alpha _i\}$ , [1]. Values $$m=3$$ and $\theta \in \{0.5,0.8\}$ corresponding to $\alpha =(0.5,0.3,0.15,0.05)$ and $\alpha =(0.8,0.1,0.008,0.02)$ are taken.

The ARMAX process is determined as $X_t=\max \{\alpha X_{t-1},(1-\alpha )Z_t\}$ , $t\in Z ,$ where $0\le \alpha < 1$ , $\{Z_t\}$ are i.i.d standard Fréchet distributed rvs and $P\{X_t\le x\}=\exp \left( -1/x\right)$ holds assuming $$X_0=Z_0$$ . Its EI is given by $\theta =1-\alpha$ , [3]. We consider $\theta \in \{0.25,0.75\}$ .

The ARu process is defined by $X_j=(1/r) X_{j-1}+\epsilon _j$ , $j \ge 1$ and $X_0\sim U(0,1)$ with $$X_0$$ independent of $\epsilon _j$ . For a fixed integer $r\ge 2$ , let $\epsilon _n$ , $n\ge 1$ be iid rvs with $P\{\epsilon _1=k/r\}=1/r$ , $k=0,1, \ldots , r-1$ . The EI of AR(1) is $\theta =1-1/r$ [5]. $\theta \in \{0.5,0.8\}$ are taken.

The MA(2) process is determined by $X_i= p Z_{i-2} + q Z_{i-1} + Z_i$ , $i\ge 1,$ with $$p>0$$ , $$q<1$$ , and iid Pareto rvs $Z_{-1}, Z_0, Z_1,\ldots$ with $P\{Z_0>x\}=1$ if $$x<1$$ , and $P\{Z_0>x\}=x^{-\alpha }$ if $x\ge 1$ for some $\alpha >0$ [20]. Its EI is $\theta =\left( 1+p^{\alpha }+q^{\alpha }\right) ^{-1}$ . We consider $\alpha =2$ , $(p,q)=(1/\sqrt{2}, 1/\sqrt{2}), (1/\sqrt{3}, 1/\sqrt{6})$ with corresponding $\theta =1/2, 2/3$ .

We consider also processes studied in [8, 17, 22]. The ARc process is $X_j=0.7 X_{j-1}+\epsilon _j$ , where $\epsilon _j$ is standard Cauchy distributed and $\theta =0.3$ . The AR(2) process is $X_j=0.95 X_{j-1}-0.89X_{j-2} +\epsilon _j$ , where $\epsilon _j$ is Pareto distributed with tail index 2 and $\theta =0.25$ . The GARCH(1,1) is $X_j=\sigma _j\epsilon _j$ , with $\sigma _j^2=\alpha +\lambda X_{j-1}^2+\beta \sigma _{j-1}^2$ , $\alpha =10^{-6}$ , $\beta =0.7$ , $\lambda =0.25$ , with an iid sequence of standard Gaussian rvs $\{\epsilon _j\}_{j\ge 1}$ and $\theta =0.447$ [11].

Table 1

The root mean squared error

$RMSE \cdot 10^{4}$ / $\theta$	MM		ARMAX		ARu		MA(2)		ARc	AR(2)	GARCH
$RMSE \cdot 10^{4}$ / $\theta$	0.5	0.8	0.25	0.75	0.5	0.8	0.5	2/3	0.3	0.25	0.328

$\widehat{\theta }_1$	146	188	136	173	2719	1217	227	545	66	426	237
$\widehat{\theta }_2$	141	164	122	156	2163	946	295	813	104	498	288
$\widehat{\theta }_3$	360	440	291	430	3330	1527	336	470	67	432	467

$\widehat{\theta }_1$	105	155	100	148	2519	1127	268	696	50	498	231
$\widehat{\theta }_2$	139	144	97	151	1906	854	434	1107	161	692	860
$\widehat{\theta }_3$	355	451	296	463	3325	1527	355	449	67	439	484

$\widehat{\theta }_1$	96	120	88	115	2224	975	331	846	151	620	416
$\widehat{\theta }_2$	149	135	117	143	1704	768	503	1197	358	953	1127
$\widehat{\theta }_3$	350	453	285	434	3331	1518	336	441	67	431	474
$\widehat{\theta }^{Kimt}$	217	569	69	498	1883	199	309	466	33	3630	4028
$\widehat{\theta }^{db}$	630		550	3640					320		1000
$\widehat{\theta }^{sb}$	550		450	950					320		840
$\widehat{\theta }^{r}$	550								140		1550
$\widehat{\theta }^{ml}$			550	3640			220	485
$\widehat{\theta }^{bcml}$			400	1070			375	210
$\widehat{\theta }^{mlsb}$			420	853
$\widehat{\theta }^{C}$	660								950		1580
$\widehat{\theta }^{Cms}$	320								6080		3520

Table 2

The absolute bias

$\|Bias\| \cdot 10^{4}$ / $\theta$	MM		ARMAX		ARu		MA(2)		ARc	AR(2)	GARCH
$\|Bias\| \cdot 10^{4}$ / $\theta$	0.5	0.8	0.25	0.75	0.5	0.8	0.5	2/3	0.3	0.25	0.328

$\widehat{\theta }_1$	7.4016	6.2708	2.9461	3.6254	2709	1204	182	516	66	399	70
$\widehat{\theta }_2$	41	50	22	49	2139	921	267	791	104	477	139
$\widehat{\theta }_3$	38	40	36	62	3302	1474	72	246	67	314	174

$\widehat{\theta }_1$	37	12	15	13	2513	1118	251	684	50	485	137
$\widehat{\theta }_2$	104	59	52	65	1893	841	424	1017	161	679	809
$\widehat{\theta }_3$	11	61	48	45	3296	1474	47	227	67	319	210

$\widehat{\theta }_1$	56	35	48	43	2221	970	322	842	151	613	391
$\widehat{\theta }_2$	126	81	97	97	1701	760	497	1194	358	949	1119
$\widehat{\theta }_3$	43	59	31	43	3303	1464	51	246	67	304	186
$\widehat{\theta }^{Kimt}$	0.14862	567	54	496	1878	196	306	462	33	3627	4027
$\widehat{\theta }^{db}$	160		450	3530					80		690
$\widehat{\theta }^{sb}$	100		180	340					80		630
$\widehat{\theta }^{r}$	50								160		630
$\widehat{\theta }^{ml}$			450	4170			87.5	270
$\widehat{\theta }^{bcml}$			0	1070			40	25
$\widehat{\theta }^{mlsb}$			179	320
$\widehat{\theta }^{C}$	130								200		230
$\widehat{\theta }^{Cms}$	20								6000		3230

4.2 Estimators and Their Comparison

In Tables 1 and 2, we insert apart from our estimates $\widehat{\theta }_1(u)-\widehat{\theta }_3(u)$ the available results of the simulation study by [8, 17, 20, 21]. The estimators are notated as follows. $\widehat{\theta }^{db}$ denotes the disjoint blocks and $\widehat{\theta }^{sb}$ the sliding blocks estimators [17, 18]; $\widehat{\theta }^{r}$ the runs estimator [24]; $\widehat{\theta }^{ml}$ the multilevel estimator [20]; $\widehat{\theta }^{bcml}$ and $\widehat{\theta }^{mlsb}$ the bias-corrected multilevel and the multilevel sliding blocks estimators [20, 21]; $\widehat{\theta }^{C}$ and $\widehat{\theta }^{Cms}$ the cycles and the max-stable cycles estimators [8]. We can compare only results related to processes overlapping with our experiment. We calculate the $$K-$$ gaps estimates by [22] with IMT-selected pairs (u, K) (for details regarding the IMT test, see [10]) which are denoted as $\widehat{\theta }^{Kimt}$ .

We may conclude the following. The intervals estimator coupling with the discrepancy method demonstrates a good performance in comparison with other investigated estimators. It is not appropriate for light-tailed distributed processes (by its definition) as one can see by the example of the ARu process. The K-gaps estimator is indicated as one of the most promising methods in [8, 17]. Our estimators, especially $\widehat{\theta }_1(u)$ for smaller s (that reflect the smaller number of the largest order statistics k), may perform better.

Comparing Tables 1 and 2 with Fig. 1 by [21], where the multilevel and the bias-corrected multilevel estimators were compared by data simulated from an ARMAX model only, one can see that the latter estimates demonstrate much larger accuracy values. Particularly, our estimate gives the best RMSE equal to 0.0088 and 0.0115 as far as the best among these estimates show about 0.04 and a bit less than 0.15 for $\theta =0.25$ and $\theta =0.75$ , respectively.

In [8] the cycles, the max-stable cycles, the runs, the K-gaps, the disjoint blocks, and sliding blocks estimators were compared. For the first three estimators, the misspecification IMT test was applied as a choice method of the threshold-run parameter. As an alternative, quantiles $q\in \{0.95,0.975,0.90\}$ were used for these estimators as thresholds with the run parameter estimated by the latter test. We can compare only results related to MM with $\theta =0.5$ , ARMAX with $\theta =0.75$ , and AR(1) with $\theta =0.5$ processes. The best bias equal to 0.002 and the RMSE equal to 0.032 for an MM process were achieved by the max-stable cycles estimator $\widehat{\theta }^{Cms}$ . For our estimator, the best absolute bias is 0.00074 and the RMSE is 0.0096 for an MM process. For an ARMAX process, the best were the cycles estimated with the bias equal to 0.003 and the max-stable cycles estimated with the RMSE equal to 0.032. Our estimator provides the best absolute bias 0.00036 and the RMSE 0.0115.

The MA(2) process has been studied in [20] regarding the multilevel and the bias-corrected multilevel blocks estimators with two specific weighted functions. For MA(2) with $\theta =0.5$ , the obtained best absolute bias is inside the interval $(2.5,3.0)\cdot 10^{-3}$ , and the MSE is in $(0.75,1.0)\cdot 10^{-3}$ . Our estimator provides the best absolute bias $4.7\cdot 10^{-3}$ and the MSE $5.15259\cdot 10^{-4}$ . For MA(2) with $\theta =2/3$ , we find in [20] the bias about 0.0025 and the MSE $0.7\cdot 10^{-3}$ . Our estimator shows 0.0227 and $2.025\cdot 10^{-3}$ for the bias and the MSE, respectively.

5 Conclusions

The discrepancy method proposed for smoothing of pdf estimates is modified to select the threshold parameter u for the EI estimation. We derive the $\chi ^2$ asymptotic distribution of the statistic relating to the M-S statistic. This allows us to use its mode as an unknown discrepancy value $\delta$ . Since the discrepancy method may be applied for different estimators of the EI, one can find other parameters such as the block size for the blocks estimator of the EI instead of the threshold in the same way. The accuracy of the intervals estimator (5) with u selected by the new discrepancy method (11) is provided by a simulation study. The comparison with several EI estimators shows its good performance regarding heavy-tailed distributed processes.

Acknowledgments

The author appreciates the partial financial support by the Russian Foundation for Basic Research, grant 19-01-00090.

References

1.
Ancona-Navarrete, M.A., Tawn, J.A.: A comparison of methods for estimating the extremal index. Extremes 3(1), 5–38 (2000)MathSciNetCrossref
2.
Balakrishnan, N., Rao, C.R. (eds.): Handbook of Statistics, vol. 16. Elsevier Science B.V, Amsterdam (1998)
3.
Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J.: Statistics of Extremes: Theory and Applications. Wiley, Chichester (2004)Crossref
4.
Bolshev, L.N., Smirnov, N.V.: Tables of Mathematical Statistics. Nauka, Moscow (1965). (in Russian)
5.
Chernick, M.R., Hsing, T., McCormick, W.P.: Calculating the extremal index for a class of stationary. Adv. Appl. Prob. 23, 835–850 (1991)MathSciNetCrossref
6.
Drees, H.: Bias correction for estimators of the extremal index (2011). arXiv:1107.0935
7.
de Haan, L., Ferreira, A.: Extreme Value Theory: An Introduction. Springer (2006)
8.
Ferreira, M.: Analysis of estimation methods for the extremal index. Electron. J. Appl. Stat. Anal. 11(1), 296–306 (2018)MathSciNet
9.
Ferro, C.A.T., Segers, J.: Inference for clusters of extreme values. J. R. Stat. Soc. B. 65, 545–556 (2003)MathSciNetCrossref
10.
Fukutome, S., Liniger, M.A., Süveges, M.: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 120, 403–416 (2015)Crossref
11.
Laurini, F., Tawn, J.A.: The extremal index for GARCH(1,1) processes. Extremes 15, 511–529 (2012)MathSciNetCrossref
12.
Leadbetter, M.R., Lingren, G., Rootzén, H.: Extremes and Related Properties of Random Sequence and Processes. Springer, New York (1983)Crossref
13.
Markovich, N.M.: Experimental analysis of nonparametric probability density estimates and of methods for smoothing them. Autom. Remote. Control. 50, 941–948 (1989)zbMATH
14.
Markovich, N.M.: Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice. Wiley (2007)
15.
Markovich, N.M.: Nonparametric estimation of extremal index using discrepancy method. In: Proceedings of the X International conference System identification and control problems” SICPRO-2015 Moscow. V.A. Trapeznikov Institute of Control Sciences, 26–29 January 2015, pp. 160–168 (2015). ISBN 978-5-91450-162-1
16.
Markovich, N.M.: Nonparametric estimation of heavy-tailed density by the discrepancy method. In: Cao, R. et al. (eds.) Nonparametric Statistics. Springer Proceedings in Mathematics & Statistics, vol. 175, pp. 103–116. Springer International Publishing, Switzerland (2016)
17.
Northrop, P.J.: An efficient semiparametric maxima estimator of the extremal index. Extremes 18(4), 585–603 (2015)MathSciNetCrossref
18.
Robert, C.Y.: Asymptotic distributions for the intervals estimators of the extremal index and the cluster-size probabilities. J. Stat. Plan. Inference 139, 3288–3309 (2009)MathSciNetCrossref
19.
Robert, C.Y., Segers, J., Ferro, C.A.T.: A sliding blocks estimator for the extremal index. Electron. J. Stat. 3, 993–1020 (2009)MathSciNetCrossref
20.
Sun, J., Samorodnitsky, G.: Estimating the extremal index, or, can one avoid the threshold-selection difficulty in extremal inference? Technical report, Cornell University (2010)
21.
Sun, J., Samorodnitsky, G.: Multiple thresholds in extremal parameter estimation. Extremes (2018). https://doi.org/10.1007/s10687-018-0337-5
22.
Süveges, M., Davison, A.C.: Model misspecification in peaks over threshold analysis. Ann. Appl. Stat. 4(1), 203–221 (2010)MathSciNetCrossref
23.
Vapnik, V.N., Markovich, N.M., Stefanyuk, A.R.: Rate of convergence in of the projection estimator of the distribution density. Autom. Remote. Control. 53, 677–686 (1992)zbMATH
24.
Weissman, I., Novak, SYu.: On blocks and runs estimators of the extremal index. J. Stat. Plan. Inference 66, 281–288 (1978)MathSciNetCrossref