© Springer Nature Switzerland AG 2020
M. La Rocca et al. (eds.)Nonparametric StatisticsSpringer Proceedings in Mathematics & Statistics339https://doi.org/10.1007/978-3-030-57306-5_18

Asymptotic for Relative Frequency When Population Is Driven by Arbitrary Unknown Evolution

Silvano Fiorin1  
(1)
Dipartimento di Scienze Statistiche, via C. Battisti, 241-35121 Padova, Italy
 
 
Silvano Fiorin

Abstract

Strongly consistent estimates are shown, via relative frequency, for the probability of white balls inside a dichotomous urn when such a probability is an arbitrary unknown continuous time-dependent function over a bounded time interval. The asymptotic behaviour of relative frequency is studied in a nonstationary context using a Riemann-Dini type theorem for strong law of large numbers of random variables with arbitrarily different expectations; furthermore, the theoretical results concerning the strong law of large numbers can be applied for estimating the mean function of an unknown form of a general nonstationary process.

1 Introduction

Several different areas of statistics deal with an urn model including white and black balls with probability p and $$1-p$$, respectively. In this very classic context a time-dependent component is introduced, and p is replaced with $$p_0(t)$$ which denotes a time varying quantity $$0\le p_0(t)\le 1$$ in such a way that at any instant $$t\in [0,T]$$ only one observation is taken from the corresponding urn with probability $$p_0(t)$$ and the random variable Y(t) is obtained such that $$P(Y(t)=1)=p_0(t), P(Y(t)=0)=1-p_0(t), E(Y(t))=p_0(t)$$ $$\forall t\in [0,T]$$, defining the nonstationary process
$$\begin{aligned} Y=\{Y(t):t\in [0,T]\} \end{aligned}$$
(1)
with mean function $$E(Y(t))=p_0(t)$$. The description of the above model is specified introducing some reasonable assumptions:
A 1

we assume continuity for the usually unknown mean function $$p_0:[0,T]\mapsto [0,1]$$;

A 2

for any fixed pair of instants $$t_1,t_2 \in [0,T]$$ the independence is assumed for the random variables $$Y(t_1)$$ and $$Y(t_2)$$.

This assumption is introduced in order to apply the Rajchman Theorem (see [5]) or the classical results concerning Strong Law of Large Numbers (SLLN) (see [4]). Namely, only pairwise uncorrelation is requested for $$Y(t_1)$$ and $$Y(t_2)$$ but, it can be easily checked in this case, the uncorrelation implies independence; furthermore, independence is here a very mild condition; in fact, we may suppose that the total number of white and black balls in the urn is big enough that the knowledge of $$Y(t_1)=1$$ or $$Y(t_1)=0$$ does not produce a meaningful modification of the probability distribution for $$Y(t_2)$$.

The main purpose of this paper consists of a double aim:
  1. 1.

    to study the asymptotic behaviour of relative frequency in a nonstationary context;

     
  2. 2.

    to estimate the unknown function $$p_0$$, i.e. the mean function $$p_0(t)=E(Y(t))$$ of the nonstationary process (1), which is an arbitrary continuous map form $$[0,T]$$ into $$[0,1]$$.

     

The urn evolution has effects concerning sampling; for instance, if the observations number n is big enough, a not slight time interval will be needed in order to receive the n observations which surely are not values taken by the same random variable. Then, for the sake of simplification, we assume that any r.v. Y(t) may be observed at most only one time. The point of view we adopt is then characterized by a strong nonstationarity and the consistent estimation for the mean $$m(t_0)$$ at a fixed time $$t_0$$ may appear as a very hard objective.

An approach to estimation for the mean function $$m(\cdot )$$ of a nonstationary process was given by M. B. Priestley (see [10] in page 587 and [11] in page 140) when the form of m is known and the case is suggested of a polynomial function in t. Viceversa: with no information on the form of m we obviously cannot construct a consistent estimate of it. The approach here adopted is quite different from classical methods of time series analysis; the only information available for m is the continuity property over $$[0,T ]$$, and no approximation of m is introduced by continuous functions of a known form. The estimation technique involves the process (1) which is a specified case of nonstationarity but the theoretical results given in the last section hold true for a general nonstationary process. The case (1) is only a concrete example of a process having no regularity properties; nevertheless, the continuity for the mean function m is a reasonable and not restrictive assumption which denotes compatibility with a context of an arbitrary but not brutal evolution for the composition of the urn.

Concerning estimation problem for the mean function $$m(\cdot )$$ of a nonstationary process, some well-known approach is available in the literature as, for instance, the smoothing spline estimation by [13] or nonparametric regression estimation as in [7] and [9]. These classical approaches, following the sieves technique, need the first $$k_n$$ functions belonging to a base inside a vector space and the usual assumptions involved for the smooth function $$m(\cdot )$$ are concerning the derivatives $$m'$$, $$m''$$ and so on. Thus the estimation procedure developed in this paper may be seen as an alternative method; only continuity is adopted for $$m(\cdot )$$ and the use of sieves technique is omitted.

The answer to above arguments is the relative frequency
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^nY(t_j) \end{aligned}$$
(2)
where $$\{t_j:j=1,\ldots ,n\}$$ are the first n observation times of a sequence $$\{t_j:j\ge 1\} \subset [0,T]$$ and the main purpose is that of getting consistent estimations of $$m(t)=p_0(t)$$ via almost sure convergence for the sequence (2). The SLLN is then the theoretical tool needed in the below analysis, but the classical approach based on the zero-mean r.v.’s $$(Y(t_j)-p_0(t_j))$$, i.e.
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^n(Y(t_j)-p_0(t_j)) \rightarrow 0 \text { a.s.} \end{aligned}$$
(3)
is not enough; in fact, we need convergence for (2) with the not zero-mean r.v.’s $$Y(t_j)$$. This argument, investigated by Fiorin [8] is now improved with the help of new results given in Sect. (5).

Nevertheless, the application of usual SLLN for studying the asymptotic behaviour of (2) is not a trivial step and several problems arise concerning the process (1). The family of r.v.’s $$\{Y(t_j):j\ge 1\}$$ is not a stationary process and then we have no possibility of applying the classical ergodic theory (see, for instance, Chap. 3 in [2]) based on a stationary probability distribution over $$R^\infty $$ and on a measure-preserving transformation. Analogously the generalizations of ergodic theory such as Dunford and Schwartz pointwise ergodic theorem (see [6] in page 675) or Chacon and Ornstein theorem [3] cannot be applied to our problem. Also law of large numbers for random functions cannot be adapted to the above problem; taking, for instance, the Ranga Raw law for D[0, 1] valued r.v.’s [12], the main argument is given by the observable trajectories inside the Skorohod space D[0, 1] of functions with discontinuities only for the first kind; thus the trajectories of process (1), including any arbitrary function taking only values 0 and 1, are not a random element into D[0, T]. Moreover let us observe that, because of the discontinuity at any point t, the observation of any trajectory over all the interval [0, T], and then any law of large numbers based on trajectories, are a too hard purpose. Consequently, the asymptotic arguments are concerning the sequence (2), where the number of observed r.v.’s $$Y(t_j)$$ tends to infinity.

The convergence of (2) is studied via the sequence $$\{E(Y(t_j))=p_0(t_j):j\ge 1 \}$$ and permutations (i.e. bijections) $$\pi :N \rightarrow N$$; in fact, if a permutation $$\pi $$ is introduced, the possible almost sure limit of
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^nY(t_{\pi (j)}) \end{aligned}$$
(4)
is depending on $$\pi $$. If $$\{P_{\pi n}^0\}$$ is a sequence of probability measures, where each $$P_{\pi n}^0$$ assigns mass $$\frac{1}{n}$$ to each point $$\{p_0(t_{\pi (j)}):j=1,\ldots ,n\}$$, then the weak or vague convergence for the sequence $$\{P_{\pi n}^0\}$$ to a probability measure $$P^0$$ implies almost sure convergence of (4) to the limit $$\int _{0}^{1}I(v) dP^0(v)$$ where I(v) is the identity map over $$[0,1]$$ and $$P^0$$ depends on the sequence $$\{Y(t_j):j\ge 1\}$$ and on permutation $$\pi $$. All the below analysis is based on the possibility of finding a permutation $$\pi $$ in such a way that the convergence of (4) is driven to a limit $$\int _{0}^{1} I(v) dP^0(v)$$ where $$P^0$$ is a previously chosen probability measure over $$[0,1]$$; under a theoretical point of view this is a result for SLLN (4) which is the analogous of the well-known Riemann-Dini Theorem for real simply convergent (but not absolutely convergent) series. Under the operative point of view the strongly consistent estimates are the result of an experimental design based on choosing
  1. (I)

    the sequence of observation times $$\{t_j:j\ge 1\} \subset [0,T]$$;

     
  2. (II)

    the permutation $$\{t_{\pi (j)}:j\ge 1\}$$.

     

2 Convergence Elements

If the observation times $$\{t_j:j\ge 1\}$$ are given jointly with the observable r.v.’s $$\{Y(t_j):j\ge 1\}$$, an intuitive approach for studying the almost sure convergence for (2) is suggested by the elementary equality
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^nY(t_j)=\frac{1}{n}\sum _{j=1}^n(Y(t_j)-E(Y(t_j)))+\frac{1}{n}\sum _{j=1}^nE(Y(t_j)); \end{aligned}$$
(5)
if the $$Y(t_j)$$’s are pairwise uncorrelated and their second moments have a common bound (see [5]) then the a.s. convergence to 0 for $$\frac{1}{n}\sum _{j=1}^n(Y(t_j)-E(Y(t_j)))$$ jointly with the convergence to a limit L for the deterministic sequence
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^nE(Y(t_j)) \end{aligned}$$
(6)
imply that (2) is a.s. convergent to the limit L. Thus the argument of below analysis is the possible convergence to some limit L for the sequence (6). Then writing (6) as an integral
$$\begin{aligned} \frac{1}{n}\sum _{j=1}^nE(Y(t_j))=\int _0^1I(x) dP_n(x), \end{aligned}$$
(7)
where I(x) is the identity map and $$P_n$$ is the probability measure which assigns the weight $$\frac{1}{n}$$ to each point $$\{E(Y(t_j)):j=1,\ldots ,n\}$$, and the argument of below analysis is the possible limit for the sequence of integrals (7) adopting the technique of weak or vague convergence for the sequence of measures $$P_n$$’s; in fact, by definition of weak convergence for measures, we have that if the $$P_n$$’s are weakly convergent to P then
$$\begin{aligned} \lim _{n \rightarrow \infty } \int _0^1 I(x) dP_n(x) = \int _0^1 I(x) dP(x). \end{aligned}$$
(8)
Nevertheless, the weak convergence for $$P_n$$’s is not so easy to obtain since the expectations $$\{E(Y(t_j)):j\ge 1\}$$ define an arbitrary deterministic sequence and then weak convergence is achieved via permutations.

3 A General SLLN via Permutations

A permutation is any bijection $$\pi :N \rightarrow N$$ defined over the naturals N in such a way that the sequence of random variables is introduced
$$\begin{aligned} \{Y(t_{\pi (j)}):j\ge 1\} \text { with expectations } \{E(Y(t_{\pi (j)})):j\ge 1 \} \end{aligned}$$
(9)
and thus, for any assigned natural n, $$P_{\pi n}$$ is defined as the probability measure giving mass $$\frac{1}{n}$$ to each point $$\{E((Y(t_{\pi (j)})):j=1,\ldots ,n\}$$. The main theoretical result shows the technique of finding a permutation $$\pi $$ such that the sequence $$P_{\pi n}$$ is weakly convergent to an assigned probability measure P. For a rigorous proof of below statement see Theorem 7 in [8].
Theorem 1
For any assigned sequence of constants $$\{E(Y(t_j)):j\ge 1\} \subset [0,1]$$ there exists a class $$\mathscr {M}$$ of probability measures (over $$[0,1]$$) such that for each given $$P \in \mathscr {M}$$ a corresponding permutation can be constructed such that the sequence $$P_{\pi n}$$ is weakly (or vaguely) convergent to P and then
$$\begin{aligned} \lim _{n \rightarrow \infty } \int _0^1 I(x) dP_{\pi n}(x) = \int _0^1 I(x) dP(x) \text { and} \end{aligned}$$
$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{1}{n}\sum _{j=1}^nY(t_{\pi (j)})= \int _0^1 I(x) dP(x) \text { almost surely}. \end{aligned}$$
Some comments and remarks may help to clarify the meaning of above result:
  1. (a)

    The final goal is not only the construction of a permutation $$\pi $$ making the $$P_{\pi n}$$’s a weakly convergent sequence, but also that of driving convergence to a chosen limit measure belonging to class $$\mathscr {M}$$.

     
  2. (b)

    The definition of class $$\mathscr {M}$$ is, of course, a central and rather technical argument: for details and a rigorous treatment see the construction leading to Definition 6 in [8].

     
  3. (c)

    The main theorem may appear as an analogous of the well-known Riemann-Dini theorem for convergent real series: both the proofs are clearly involving permutations, but the technique adopted in proving the above main result is a constructive one.

     
  4. (d)
    The above result is a generalization of the classical SLLN concerning a sequence of r.v.’s $$Y_j$$ having a common finite expectation $$\mu =E(Y_j), \forall j\ge 1$$. By the elementary equality
    $$\begin{aligned} \frac{1}{n}\sum _{j=1}^nY(t_j)=\frac{1}{n}\sum _{j=1}^n(Y(t_j)-E(Y(t_j)))+\frac{1}{n}\sum _{j=1}^nE(Y(t_j)) \end{aligned}$$
    and if the convergence holds true:
    $$\begin{aligned} \lim _{n \rightarrow \infty }\frac{1}{n}\sum _{j=1}^n(Y(t_j)-E(Y(t_j)))=0 \text { a.s.} \end{aligned}$$
    an easy direct comparison is possible:
    1. 1.
      in the standard case, when $$E(Y(t_j))=\mu , \forall j\ge 1$$, we trivially have
      $$\frac{1}{n}\sum _{j=1}^nE(Y(t_j))=\mu , \forall n.$$
      This means that for each n the weight 1 is assigned to value $$\mu $$ and then the probability measure $$P_{\pi n}=\delta _{\mu }$$ are invariant with respect to any given permutation $$\pi $$ and the $$P_{\pi n}$$’s are weakly convergent to measure $$P=\delta _{\mu }$$.
       
    2. 2.
      In the general case, when expectations $$\{E(Y(t_j)):j\ge 1\}\subset [0,1]$$ are arbitrarily different values,
      $$\begin{aligned} \frac{1}{n}\sum _{j=1}^nE(Y(t_{\pi (j)}))=\int _0^1 I(x) dP_{\pi n} \end{aligned}$$
      depends on the sequence $$\{E(Y(t_j)):j\ge 1\}$$ and $$\pi $$, and the technique based on weak convergence for $$P_{\pi n}$$’s is a generalization of the standard case.
       
     

Moreover, the limit for SLLN is written as an integral $$\int _0^1 I(x) dP(x)$$, i.e. as an expectation with respect to the probability measure P which is the weak limit of $$P_{\pi n}$$’s; thus P is defined through $$\pi $$, independently of probability distribution of r.v.’s $$Y(t_j)$$.

Finally, let us observe that the main theorem cannot be directly applied for finding $$\pi $$ because the proof technique is fully based on the knowledge of values $$E(Y(t_j))$$’s which are the estimation object.

4 Estimating E(Y(t))

Let us choose as observation times any sequence $$\{t_j:j\ge 1\}$$ which is dense into $$[0,T]$$ and thus Theorem 1 can be applied to $$\{t_j:j\ge 1\}$$; because of the density of $$t_j$$’s the class $$\mathscr {M}$$ of the weak limit measures contains all the absolutely continuous probability measures over $$[0,T]$$. Thus $$P_U\in \mathscr {M}$$ where $$P_U$$ denotes the uniform probability measure over $$[0,T]$$ having density
$$\begin{aligned} f_U(t)=\frac{1}{T}\text { }\forall \text { }t\in [0,T]\end{aligned}$$
and, applying the main theorem, a permutation $$\pi $$ can be found such that $$P_{\pi n}$$, which assigns weight $$\frac{1}{n}$$ to each point $$\{t_{\pi (j)}:j=1,\ldots ,n\}$$, is weakly convergent to $$P_U$$. The continuity of the unknown function $$p_0(t)=E(Y(t))$$ for each $$t\in [0,T]$$ keeps weak convergence for the induced measures over $$[0,1]$$: then $$p_0(P_{\pi n})$$ is weakly convergent to $$p_0(P_U)$$, where $$p_0(P_{\pi n})$$ assigns weight $$\frac{1}{n}$$ to each point
$$\begin{aligned} \{p_0(t_{\pi (j)})=E(Y(t_{\pi (j)})): j=1,\ldots ,n\} \end{aligned}$$
and then, by the mean value theorem for integrals, the limits hold true:
$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{j=1}^nE(Y(t_{\pi (j)}))=\lim _{n\rightarrow \infty }\int _0^1I(x) dp_0(P_{\pi n})= \end{aligned}$$
$$\begin{aligned} =\int _0^1 I(x) dp_0(P_U)=\int _0^T p_0(t) dP_U =\frac{1}{T} \int _0^T p_0(t) dt=p_0(\underline{t}) \end{aligned}$$
for some points $$\underline{t}\in [0,T]$$, and
$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{1}{n}\sum _{j=1}^nY(t_{\pi (j)})=p_0(\underline{t})\text { almost surely.} \end{aligned}$$
An analogous version of above result holds true for any assigned interval $$(a,b]\subset [0,T]$$. Using the same above permutation $$\pi $$ such that the $$P_{\pi n}$$’s are weakly convergent to $$P_U$$ over $$[0,T]$$, for each n fixed, we collect inside the set $$\{t_{\pi (j)}:j=1,\ldots ,n\}$$ all the $$t_{\pi (j)}$$’s falling into $$(a,b]$$, i.e. the set is defined
$$\begin{aligned} A(\pi ,n,(a,b])=\{t_{\pi (j)} \in (a,b]:j=1,\ldots ,n\} \end{aligned}$$
and if $$n(a,b]$$ denotes its cardinality, the following statement holds true (see Theorem 4 in [8] for a complete proof).
Theorem 2
The sequence of r.v.’s
$$\begin{aligned} \frac{1}{n(a,b]}\sum _{t_{\pi (j)}\in A(\pi ,n,(a,b])}Y(t_{\pi (j)}), \end{aligned}$$
when $$n \rightarrow \infty $$, is a strongly consistent estimate of $$p_0(\underline{t})$$ for some points $$\underline{t} \in [a,b]$$.

5 Remarks

  1. 1.

    Theorem (2) may be applied, at the same time, to several different subintervals of $$[0,T]$$; for instance, to all the subintervals belonging to a finite partition of $$(0,T]$$.

     
  2. 2.
    The policy of choosing the observation times $$\{t_j:j\ge 1\}$$ as a dense subset of $$[0,T]$$ is a technique which is common to several areas of statistical inference. In this context it can be easily checked that
    1. (a)

      this choice derives directly from evolution of the nonstationary process $$\{Y(t):t\in [0,T]\}$$; in fact at most only one observation is possible for any r.v. Y(t). Thus to increase the number of observations implies to choose new $$t_j$$’s and their density in $$[0,T]$$ ensures a good knowledge of the process.

       
    2. (b)

      The density of $$t_j$$’s makes necessary the use of permutations; in fact, the sequence $$\frac{1}{n}\sum _{j=1}^nY(t_j)$$ has no meaning if a permutation is not assigned for choosing the $$t_j$$’s. But the choice of $$\pi $$, as it was shown above, has a deep effect in terms of measures $$P_{\pi n}$$ and of convergence.