1 Introduction
Due to its natural intermittency, the estimation of a non-uniform density can be described as a nonequispaced multiscale problem, especially when the density contains singularities. Indeed, when the number and the locations of the singularities remain unknown, then the estimation procedure is deemed to go through all possible combinations of locations and intersingular distances. Also, since a given bandwidth in a kernel-based method may be too small in a region of low intensity and too large in a region of high intensity, a local choice of the bandwidth can be considered as an instance of multiscale processing, where the bandwidth is seen as a notion of scale.
A popular class of multiscale methods in smoothing and density estimation is based on a wavelet analysis of the data. The classical wavelet approach for density estimation [3, 6] requires an evaluation of the wavelet basis functions in the observed data or otherwise a binning of the data into fine scale intervals, defined by equispaced knots on which the wavelet transform can be constructed. The preprocessing for the equispaced (and possibly dyadic) wavelet analysis may induce some loss of details about the exact values of the observations.
This paper works with a family of multiscale transforms constructed on nonequispaced knots. With these constructions and taking the observations as knots, no information is lost at this stage of the analysis. The construction of wavelet transforms on irregular point sets is based on the lifting scheme [11, 12]. Given the transformation matrix that maps a wavelet approximation at one scale onto the approximation and offsets at the next coarser scale, the lifting scheme factorizes this matrix into a product of simpler, readily invertible operations. Based on the lifting factorization, there exist two main directions in the design of wavelets on irregular point sets. The first direction consists of the factorization of basis functions that are known to be refinable, to serve as approximation basis, termed scaling basis in a wavelet analysis. The wavelet basis for the offsets between successive scales is then constructed within the lifting factorization of the refinement equation, taking into account typical design objectives such as vanishing moments and control of variance inflation. Examples of such existing refinable functions are B-spline functions defined on nested grids of knots [8]. The second approach for the construction of wavelets on irregular point sets does not factorize a scheme into lifting steps. Instead, it uses an interpolating or smoothing scheme as a basic tool in the construction of a lifting step from scratch. Using interpolating polynomials leads to the Deslauriers-Dubuc refinement scheme [2, 4]. To this refinement scheme, a wavelet transform can be associated by adding a single lifting step, designed for vanishing moments and control of variance inflation, as in the case of factorized refinement schemes. This paper follows the second approach, using local polynomial smoothing [5, Chapter 3] as a basic tool in a lifting scheme. For reasons explained in Sect. 2, the resulting Multiscale Local Polynomial Transform (MLPT) is no longer a wavelet transform in the strict sense, as it must be slightly overcomplete. Then, in Sect. 3, the density estimation problem is reformulated in a way that it can easily be handled by an MLPT. Section 4 discusses aspects of sparse selection and estimation in the MLPT domain for data from a density estimation problem. In Sect. 5, the sparse selection is finetuned, using information criteria and defining the degrees of freedom in this context. Finally, Sect. 6 presents some preliminary simulation results and further outlook.
2 The Multiscale Local Polynomial Transform (MLPT)
Subsamplings, i.e., keep a subset of the current approximation vector, , with a submatrix of the identity matrix.
Prediction, i.e., compute the detail coefficients at scale j as offsets from a prediction based on the subsample.
Update, the remaining approximation coefficients. The idea is that can be interpreted as smoothed, filtered, or averaged values of .
Before elaborating on the different steps of this decomposition, we develop the inverse transform by straightforwardly undoing the two lifting steps in reverse order.
Undo update, using .
Undo prediction, using .
2.1 Local Polynomial Smoothing as Prediction
The prediction matrix has dimension . This expansive or redundant prediction is in contrast to lifting schemes for critically downsampled wavelet transform, such as the Deslauriers–Dubuc or B-spline refinement schemes. In these schemes, the prediction step takes the form , where , with the subsampling operation, complementary to . In the MLPT, a critical downsampling with and would lead to fractal-like basis functions [7]. The omission of the complementary subsampling leads to slight redundancy, where n data points are transformed into roughly 2n MLPT coefficients, at least if is approximately half of at each scale. The corresponding scheme is known in signal processing literature as the Laplacian pyramid [1]. With an output size of 2n, the MLPT is less redundant than the non-decimated wavelet transform (cycle spinning, stationary wavelet transform) which produces outputs of size . Nevertheless, the inverse MLPT shares with the non-decimated wavelet transform an additional smoothing occurring in the reconstruction after processing. This is because processed coefficients are unlikely to be exact decompositions of an existing data vector. The reconstruction thus involves some sort of projection.
2.2 The Update Lifting Step
The second lifting step, the update , serves multiple goals, leading to a combination of design conditions [8]. An important objective, especially in the context of density estimation, is to make sure that all functions in have zero integral. When , then any processing that modifies the detail coefficients , e.g., using thresholding, preserves the integral of , which is interesting if we want to impose that for an estimation or approximation of a density function. Another important goal of the update, leading to additional design conditions, is to control the variance propagation throughout the transformation. This prevents the noise from a single observation from proceeding unchanged all the way to coarse scales.
2.3 The MLPT Frame
2.4 The MLPT on Highly Irregular Grids
The regression formulation of the density estimation problem in Sect. 3 will lead to regression on highly irregular grids, that is, grids that are far more irregular than ordered observations from a random variable. On these grids, it is not possible to operate at fine scales, even if these scales are a bit wider than in the equidistant case, as discussed in Sect. 2.1. In order to cope with the irregularity, the fine scales would be so wide that fine details are lost, and no asymptotic result would be possible. An alternative solution, adopted here, is to work with dyadic scales, but only processing coefficients that have sufficient nearby neighbors within the current scale. Coefficients in sparsely sampled neighborhoods are forwarded to coarser scales. The implementation of such a scheme requires the introduction of a smooth transition between active and non-active areas at each scale [9].
3 A Regression Model for Density Estimation
Let be a sample of independent realization from an unknown density on a bounded interval, which we take, without loss of generality, to be [0, 1]. The density function has an unknown number of singularities, i.e., points where , as well as other discontinuities.
As in [9], we consider the spacings , i.e., the differences between the successive ordered observations . Then, by the mean value theorem, we have that there exists a value for which , where . Unfortunately, the value of cannot be used as such in the subsequent asymptotic result, due to technical issues in the proof. Nevertheless, for a fairly free choice of , close to , the theorem provides nonparametric regression of on . For details on the proof, we refer to [9].
Let be an almost everywhere twice continuously differentiable density function on . Define as the set where and , with arbitrary, strictly positive real numbers. Then there exist values , so that with probability one, for all intervals ,
We thus consider a model with exponentially distributed response variable and the vector of parameters with , for which we propose a sparse MLPT model , with the inverse MLPT matrix defined on the knots in .
The formulation of the density estimation problem as a sparse regression model induces no binning or any other loss of information. On the contrary, the information on the values of is duplicated: a first, approximative copy can be found in the covariate values . A second copy defines the design matrix. The duplication prevents loss of information when in subsequent steps some sort of binning is performed on the response variables.
4 Sparse Variable Selection and Estimation in the Exponential Regression model
- (C1)
the expected value of is close to , so that
- (C2)
the MLPT decomposition is sparse,
- (C3)
the MLPT decomposition of the errors, has no outliers, i.e., no heavy tailed distributions.
5 Fine-Tuning the Selection Threshold
The estimator applies a threshold on the MLPT of . The input is correlated and heteroscedastic, while the transform is not orthogonal. For all these reasons, the errors on are correlated and heteroscedastic. In an additive normal model where variance and mean are two separate parameters, the threshold would be taken proportional to the standard deviation. In the context of the exponential model with approximate variance function , coefficients with large variances tend to carry more information, i.e., they have a larger expected value as well. As a result, there is no argument for a threshold linearly depending on the local standard deviation. This paper adopts a single threshold for all coefficients to begin with. Current research also investigates the use of block thresholding methods.
The threshold or any other selection parameter can be finetuned by optimization of the estimated distance between the data generating process and the model under consideration. The estimation of that distance leads to an information criterion. This paper works with an Akaike’s Information Criterion for the estimation of the Kullback–Leibler distance. As data generating process, we consider the (asymptotic) independent exponential model for the spacings, and not the asymptotic additive, heteroscedastic normal model for . This choice is motivated by the fact that a model for is complicated as it should account for the correlation structure, while the spacings are nearly independent. Moreover, fine-tuning w.r.t. the spacings is not affected by the loss of information in the computation of .
An estimator of the degrees of freedom is developed in [9],
6 Illustration and Concluding Discussion
Ongoing research concentrates on motivating choices for the tuning parameters in the proposed data transformation and processing. In particular, the data transformation depends on the choice of the finest resolution bandwidth , the degree of the local polynomial in the prediction step, the precise design of the updated step. Also, the Unbalanced Haar prefilter is parametrized by a fine scale . Processing parameters include the threshold value, which is selected using the AIC approach of Sect. 5, and the sizes of the blocks in the block threshold procedure.
For the result in Fig. 2, the MLPT adopted a local linear prediction step. In the wavelet literature, the transform is said to have two dual vanishing moments, i.e., , meaning that all detail coefficients of a linear function are zero. The MLPT for the figure also includes an update step designed for two primal vanishing moments, meaning that for and . Block sizes were set to one, i.e., classical thresholding was used.
The density function in the simulation study is the power law on the finite interval [0, 1], with a singular point in this simulation study and . The sample size is . The MLPT approach, unaware of the presence and location of , is compared with a kernel density estimation applied to a probit transform of the observations, for and for . This transform uses the information on the singularity’s location, in order to create a random variable whose density has no end points of a finite interval, nor any singular points inside. In this experiment, the MLPT outperforms the Probit transformed kernel estimation, both in the reconstruction of the singular peak and in the reduction of the oscillations next to the peak. With the current procedure, this is not always the case. Further research concentrates on the design making the MLPT analyses as close as possible to orthogonal projections, using appropriate update steps. With an analysis close to orthogonal projection, the variance propagation throughout the analysis, processing, and reconstruction can be more easily controlled, thereby reducing spurious effects in the reconstruction. Both MLPT and Probit transformation outperform a straightforward uniscale kernel density estimation. This estimation, illustrated the Fig. 2d, oversmooths the sharp peaks of the true density.