M. La Rocca et al. (eds.)Nonparametric StatisticsSpringer Proceedings in Mathematics & Statistics339https://doi.org/10.1007/978-3-030-57306-5_2

Change of Measure Applications in Nonparametric Statistics

Mayer Alvo¹

(1)

Department of Mathematics, Statistics University of Ottawa, Ottawa, ON, Canada

Mayer Alvo

Email: malvo@uottawa.ca

Abstract

Neyman [7] was the first to propose a change in measure in the context of goodness of fit problems. This provided an alternative density to the one for the null hypothesis. Hoeffding introduced a change of measure formula for the ranks of the observed data which led to obtaining locally most powerful rank tests. In this paper, we review these methods and propose a new approach which leads on the one hand to new derivations of existing statistics. On the other hand, we exploit these methods to obtain Bayesian applications for ranking data.

Keywords

RanksChange of measureBayesian methods

Mathematics Subject Classification (2010)

62F0762G8662H11

1 Introduction

In a landmark paper, [7] considered the nonparametric goodness of fit problem and introduced the notion of smooth tests of fit by proposing a parametric family of alternative densities to the null hypothesis. In this article, we describe a number of applications of this change of measure. Hence, we obtain a new derivation of the well-known Friedman statistic as the locally most powerful test in an embedded family of distributions.

2 Smooth Models

Suppose that the probability mass function of a discrete k-dimensional random vector $\textit{\textbf{X}}$ is given by

$\begin{aligned} \pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =\exp \left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) p_{j},\;j=1,\ldots ,m, \end{aligned}$

(1)

where $\textit{\textbf{x}}_{j}$ is the jth value of $\textit{\textbf{X}}$ and $\textit{\textbf{p}}=\left( p_{j}\right) '$ denotes the vector of probabilities when $\varvec{\theta }=\varvec{\theta }_{0}$ . Here $K\left( \varvec{\theta }\right)$ is a normalizing constant for which

$\sum _{j}\pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =1.$

We see that the model in (1) prescribes a change of measure from the null to the alternative hypothesis. Let $\textit{\textbf{T}}=\left[ \textit{\textbf{x}}_{i},\ldots ,\textit{\textbf{x}}_{m}\right]$ be the $k\times m$ matrix of possible vector values of $\textit{\textbf{X}}$ . Then under the distribution specified by $\textit{\textbf{p}}$ ,

$\begin{aligned} \varvec{\Sigma }\equiv Cov_{\textit{\textbf{p}}}\left( \textit{\textbf{X}}\right)= & {} E_{\textit{\textbf{p}}}\left[ \left( \textit{\textbf{X}}-E\left[ \textit{\textbf{X}}\right] \right) \left( \textit{\textbf{X}}-E\left[ \textit{\textbf{X}}\right] \right) '\right] \end{aligned}$

(2)

$\begin{aligned}= & {} \textit{\textbf{T}}\left( diag\left( \textit{\textbf{p}}\right) \right) \textit{\textbf{T}}'-\left( \textit{\textbf{T}}\textit{\textbf{p}}\right) \left( \textit{\textbf{T}}\textit{\textbf{p}}\right) ', \end{aligned}$

(3)

where the expectations are with respect to the model (1). This particular situation arises often when dealing with the nonparametric randomized block design. Define

$\varvec{\pi }\left( \varvec{\theta }\right) =\left( \pi \left( \textit{\textbf{x}}_{1};\varvec{\theta }\right) ,\ldots ,\pi \left( \textit{\textbf{x}}_{m};\varvec{\theta }\right) \right) '$

and suppose that we would like to test

$H_{0}:\varvec{\theta }=\textit{\textbf{0}}\,\mathrm{vs}\,H_{1}:\varvec{\theta }\ne \textit{\textbf{0}}.$

Letting $\textit{\textbf{N}}$ denote a multinomial random vector with parameters $\left( n,\varvec{\pi }\left( \varvec{\theta }\right) \right)$ , we see that the log likelihood as a function of $\varvec{\theta }$ is, apart from a constant, proportional to

$\begin{aligned} \sum _{j=1}^{m}n_{j}\log \left( \pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) \right)= & {} \sum _{j=1}^{m}n_{j}\left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) \\= & {} \varvec{\theta }'\left( \sum _{j=1}^{m}n_{j}\textit{\textbf{x}}_{j}\right) -nK(\varvec{\theta }). \end{aligned}$

The score vector under the null hypothesis is then given by

$\begin{aligned} U\left( \varvec{\theta ;X}\right)= & {} \sum _{j=1}^{m}N_{j}\left( \frac{1}{\pi _{j}\left( \varvec{\theta }\right) }\frac{\partial \pi _{j}\left( \varvec{\theta }\right) }{\partial \varvec{\theta }}\right) \\= & {} \textit{\textbf{T}}\left( \textit{\textbf{N}}-n\textit{\textbf{p}}\right) . \end{aligned}$

Under the null hypothesis,

$E\left[ U\left( \varvec{\theta ;X}\right) \right] =0,$

and the score statistic is given by

$\begin{aligned} \frac{1}{n}\left[ \textit{\textbf{T}}(\textit{\textbf{N}}-n\textit{\textbf{p}})\right] '\varvec{\Sigma }^{-1}\left[ \textit{\textbf{T}}(\textit{\textbf{N}}-n\textit{\textbf{p}})\right] =\frac{1}{n}(\textit{\textbf{N}}-n\textit{\textbf{p}})'\left( \textit{\textbf{T}}'\varvec{\Sigma }^{-1}\textit{\textbf{T}}\right) (\textit{\textbf{N}}-n\textit{\textbf{p}})\xrightarrow {\mathcal {L}}\chi _{r}^{2}, \end{aligned}$

(4)

where $r=rank\left( \textit{\textbf{T}}'\varvec{\Sigma }^{-1}\textit{\textbf{T}}\right) .$

In the one-sample ranking problem whereby a group of judges are each asked to rank a set of t objects in accordance with some criterion, let $\mathcal {P}=\left\{ \varvec{\nu }_{j},j=1,\ldots ,t!\right\}$ be the space of all t! permutations of the integers $1,2,\ldots ,t$ and let the probability mass distribution defined on $\mathcal {P}$ be given by

$\textit{\textbf{p}}=\left( p_{1},\ldots ,p_{t!}\right) ,$

where $p_{j}=\Pr \left( \varvec{\nu }_{j}\right)$ . Conceptually, each judge selects a ranking $\varvec{\nu }$ in accordance with the probability mass distribution $\textit{\textbf{p}}.$ In order to test the null hypothesis that each of the rankings are selected with equal probability, that is,

$\begin{aligned} H_{0}:\textit{\textbf{p}}=\textit{\textbf{p}}_{0}\text { vs }H_{1}:\textit{\textbf{p}}\ne \textit{\textbf{p}}_{0}, \end{aligned}$

(5)

where $\textit{\textbf{p}}_{0}=\frac{1}{t!}\mathbf {1},$ define a k-dimensional vector score function $\textit{\textbf{X}}\left( \varvec{\nu }\right)$ on the space $\mathcal {P}$ and following (1), let its smooth probability mass function be given as

$\begin{aligned} \pi (\textit{\textbf{x}}_{j};\varvec{\theta })=\exp \left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) \frac{1}{t!},\quad j=1,\ldots ,t! \end{aligned}$

(6)

where $\varvec{\theta }$ is a t-dimensional vector, $K\left( \varvec{\theta }\right)$ is a normalizing constant and $\textit{\textbf{x}}_{j}$ is a t-dimensional score vector to be specified in (8). Since

$\sum _{j=1}^{t!}\pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =1$

it can be seen that $K(\textit{\textbf{0}})=0$ and hence the hypotheses in (5) are equivalent to testing

$\begin{aligned} H_{0}:\varvec{\theta }=\textit{\textbf{0}}\text { vs }H_{1}:\varvec{\theta }\ne \textit{\textbf{0}}. \end{aligned}$

(7)

It follows that the log likelihood function is proportional to

$l\left( \varvec{\theta }\right) \sim n\left[ \varvec{\theta }'\hat{\varvec{\eta }}-K(\varvec{\theta })\right] ,$

where

$\hat{\varvec{\eta }}=\left[ \sum _{j=1}^{t!}\textit{\textbf{x}}_{j}\hat{p}_{nj}\right] ,\hat{p}_{nj}=\frac{n_{j}}{n}$

and $n_{j}$ represents the number of observed occurrences of the ranking $\varvec{\nu }_{j}$ . The Rao score statistic evaluated at $\varvec{\theta }=\textit{\textbf{0}}$ is

$\begin{aligned} U\left( \varvec{\theta };\textit{\textbf{X}}\right)= & {} n\frac{\partial }{\partial \theta }\left[ \varvec{\theta }'\varvec{\hat{\eta }}-K(\textit{\textbf{0}})\right] \\= & {} n\left[ \varvec{\hat{\eta }}-\frac{\partial }{\partial \theta }K(\textit{\textbf{0}})\right] , \end{aligned}$

whereas the information matrix is

$\begin{aligned} \textit{\textbf{I}}(\varvec{\theta })= & {} -n\left[ \frac{\partial ^{2}}{\partial \theta ^{2}}K(\textit{\textbf{0}})\right] . \end{aligned}$

The test then rejects the null hypothesis whenever

$n^{2}\left[ \hat{\varvec{\eta }}-\frac{\partial }{\partial \varvec{\theta }}K\left( \textit{\textbf{0}}\right) \right] '\textit{\textbf{I}}^{-1}\left( \textit{\textbf{0}}\right) \left[ \hat{\varvec{\eta }}-\frac{\partial }{\partial \varvec{\theta }}K\left( \textit{\textbf{0}}\right) \right] >\chi _{f}^{2}\left( \alpha \right) ,$

where $\chi _{f}^{2}\left( \alpha \right)$ is the upper $$100$$

$\left( 1-\alpha \right) \%$ critical value of a chi square distribution with $f=\text {rank}(\textit{\textbf{I}}\left( \varvec{\theta }\right) )$ degrees of freedom. We note that the test just obtained is the locally most powerful test of $H_{0}.$

Specializing this test statistic to the Spearman score function of adjusted ranks

$\begin{aligned} \textit{\textbf{x}}_{j}=\left( \nu _{j}\left( 1\right) -\frac{t+1}{2},\ldots ,\nu _{j}\left( t\right) -\frac{t+1}{2}\right) ',\,j=1,\ldots ,t!, \end{aligned}$

(8)

we can show that the Rao score statistic is the well-known Friedman test [5].

$\begin{aligned} W=\frac{12n}{t\left( t+1\right) }\sum _{i=1}^{t}\left[ \bar{R}_{i}-\frac{t+1}{2}\right] ^{2}, \end{aligned}$

(9)

where $\bar{R}_{i}$ is the average of the ranks assigned to the ith object.

2.1 The Two-Sample Ranking Problem

The approach just described can be used to deal with the two-sample ranking problem assuming again the Spearman score function. Let $\textit{\textbf{X}}_{1},\textit{\textbf{X}}_{2}$ be two independent random vectors whose distributions as in the one sample case are expressed for simplicity as

$\pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }_{l}\right) =\exp \left\{ \varvec{\theta }_{l}'\textit{\textbf{x}}_{j}-K\left( \varvec{\theta }_{l}\right) \right\} p_{l}\left( j\right) ,\;j=1,\ldots ,t!,l=1,2,$

where $\varvec{\theta }_{l}=\left( \theta _{l1},\ldots ,\theta _{lt}\right) ^{\prime }$ represents the vector of parameters for population l. We are interested in testing

$H_{0}:\varvec{\theta }_{1}=\varvec{\theta }_{2}\text { vs }H_{1}:\varvec{\theta }_{1}\ne \varvec{\theta }_{2}.$

The probability distribution $\left\{ p_{l}\left( j\right) \right\}$ represents an unspecified null situation. Define

$\hat{\textit{\textbf{p}}}_{l}=\left( \frac{n_{l1}}{n_{l}},\ldots ,\frac{n_{lt!}}{n_{l}}\right) ',$

where $n_{ij}$ represents the number of occurrences of the ranking $\varvec{\nu }_{j}$ in sample l.

Also, for

, set $\sum _{j}n_{ij}\equiv n_{l}$ , $\varvec{\gamma }=\varvec{\theta }_{1}-\varvec{\theta }_{2}$ and

$\begin{aligned} \varvec{\theta }_{l}= & {} \textit{\textbf{m}}+b_{l}\varvec{\gamma }, \end{aligned}$

where

$\textit{\textbf{m}}=\frac{n_{1}\varvec{\theta }_{1}+n_{2}\varvec{\theta }_{2}}{n_{1}+n_{2}},b_{1}=\frac{n_{2,}}{n_{1}+n_{2}},b_{2}=-\frac{n_{1}}{n_{1}+n_{2}}.$

Let $\varvec{\Sigma }_{l}$ be the covariance matrix of $\textit{\textbf{X}}_{l}$ under the null hypothesis defined as

$\varvec{\Sigma }_{l}=\varvec{\Pi }_{l}-\textit{\textbf{p}}_{l}\textit{\textbf{p}}_{l}',$

where $\varvec{\Pi }_{l}=diag\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right)$ and $\textit{\textbf{p}}_{l}=\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right) '$ . The logarithm of the likelihood L as a function of $\left( \textit{\textbf{m}},\varvec{\gamma }\right)$ is proportional to

$\begin{aligned} \log \,L\left( \textit{\textbf{m}},\varvec{\gamma }\right)\sim & {} \sum _{l=1}^{2}\sum _{j=1}^{t!}n_{lj}\left\{ \left( \textit{\textbf{m}}+b_{l}\varvec{\gamma }\right) '\textit{\textbf{x}}_{j}-K\left( \varvec{\theta }_{l}\right) \right\} . \end{aligned}$

In order to test

$H_{0}:\varvec{\theta }_{1}=\varvec{\theta }_{2}\text { vs }H_{1}:\varvec{\theta }_{1}\ne \varvec{\theta }_{2}$

we calculate the Rao score test statistic which is given by

$\begin{aligned} n\left( \textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{1}-\textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{2}\right) '\hat{\textit{\textbf{D}}}\left( \textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{1}-\textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{2}\right) . \end{aligned}$

(10)

It can be shown to have asymptotically a $\chi _{f}^{2}$ whenever $n_{l}/n\rightarrow \lambda _{l}>0$ as $n\rightarrow \infty ,$ where $n=n_{1}+n_{2}.$ Here $\hat{\textit{\textbf{D}}}$ is the Moore–Penrose inverse of $\textit{\textbf{T}}_{S}\hat{\varvec{\Sigma }}\textit{\textbf{T}}_{S}'$ and $\hat{\varvec{\Sigma }}$ is a consistent estimator of $\varvec{\Sigma }=\frac{\varvec{\Sigma }_{1}}{\lambda _{1}}+\frac{\varvec{\Sigma }_{2}}{\lambda _{2}}$ and f is the rank of $\hat{\textit{\textbf{D}}}$ , as required.

2.2 The Use of Penalized Likelihood

In the previous sections, it was possible to derive test statistics for the one and two-sample ranking problems by means of the change of measure paradigm. This paradigm may be exploited to obtain new results for the ranking problems. Specifically, we consider a negative penalized likelihood function defined to be the negative log likelihood function subject to a constraint on the parameters which is then minimized with respect to the parameter. This approach yields further insight into ranking problems.

For the one-sample ranking problem, let

$\begin{aligned} \varLambda (\varvec{\theta },c)=-\varvec{\theta }'\left[ \sum _{j=1}^{t!}n_{j}\textit{\textbf{x}}_{j}\right] +nK(\varvec{\theta })+\lambda \bigg (\sum _{i=1}^{t}\theta _{i}^{2}-c\bigg ) \end{aligned}$

(11)

represent the penalizing function for some prescribed values of the constant c. We shall assume for simplicity that $\left\| \textit{\textbf{x}}_{j}\right\| =1$ . When t is large (say $t\ge 10$ ), the computation of the exact value of the normalizing constant $K(\varvec{\theta })$ involves a summation of t! terms. [6] noted the resemblance of (6) to the continuous von Mises-Fisher density

$f\left( \varvec{x;\theta }\right) =\frac{\left\| \varvec{\theta }\right\| ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}t!I_{\frac{t-3}{2}}(\left\| \varvec{\theta }\right\| )\Gamma (\frac{t-1}{2})}\exp \left( \varvec{\theta }'\textit{\textbf{x}}\right) ,$

where $\left\| \varvec{\theta }\right\|$ is the norm of $\varvec{\theta }$ and $\textit{\textbf{x}}$ is on the unit sphere and $I_{\upsilon }(z)$ is the modified Bessel function of the first kind given by

$I_{\upsilon }(z)=\sum _{k=0}^{\infty }\frac{1}{\Gamma (k+1)\Gamma (\upsilon +k+1)}\left( \frac{z}{2}\right) ^{2k+\nu }.$

This seems to suggest the approximation of the constant $K\left( \varvec{\theta }\right)$ by

$\exp \left( -K(\varvec{\theta })\right) \approx \frac{1}{t!}\cdot \frac{\left\| \varvec{\theta }\right\| ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}I_{\frac{t-3}{2}}(\left\| \varvec{\theta }\right\| )\Gamma (\frac{t-1}{2})}.$

In [1], penalized likelihood was used in ranking situations to obtain further insight into the differences between groups of rankers.

3 Bayesian Models for Ranking Data

The fact that the model in (1) is itself parametric in nature leads one to consider an extension to Bayesian considerations. Let $\textit{\textbf{R}}=(R(1),\ldots ,R(t))'$ be a ranking t items, labeled $1,\ldots ,t$ and define the standardized rankings as

$\textit{\textbf{y}}=\left( \textit{\textbf{R}}-\frac{t+1}{2}\mathbf {1}\right) /\sqrt{\frac{t(t^{2}-1)}{12}},$

where $\textit{\textbf{y}}$ is the $t\times 1$ vector with $\left\| \textit{\textbf{y}}\right\| \equiv \sqrt{\textit{\textbf{y}}'\textit{\textbf{y}}}=1$ . We consider the following more general ranking model:

$\pi (\textit{\textbf{y}}|\kappa ,\varvec{\theta })=C(\kappa ,\varvec{\theta })\exp \left\{ \kappa \varvec{\theta }'\textit{\textbf{y}}\right\} ,$

where the parameter $\varvec{\theta }$ is a $t\times 1$ vector with $\left\| \varvec{\theta }\right\| =1$ , parameter $\kappa \ge 0$ , and $C(\kappa ,\varvec{\theta })$ is the normalizing constant. This model has a close connection to the distance-based models considered in [3]. Here, $\varvec{\theta }$ is a real-valued vector, representing a consensus view of the relative preference of the items from the individuals. Since both $\left\| \varvec{\theta }\right\| =1$ and $\left\| \textit{\textbf{y}}\right\| =1$ , the term $\varvec{\theta }'\textit{\textbf{y}}$ can be seen as $\cos \phi$ where $\phi$ is the angle between the consensus score vector $\varvec{\theta }$ and the observation $\textit{\textbf{y}}$ . The probability of observing a ranking is proportional to the cosine of the angle from the consensus score vector. The parameter $\kappa$ can be viewed as a concentration parameter. For small $\kappa$ , the distribution of rankings will appear close to a uniform whereas for larger values of $\kappa$ , the distribution of rankings will be more concentrated around the consensus score vector. We call this new model an angle-based ranking model.

To compute the normalizing constant $C(\kappa ,\varvec{\theta })$ , let $\text {P}_{t}$ be the set of all possible permutations of the integers $1,\ldots ,t$ . Then

$\begin{aligned} \left( C(\kappa ,\varvec{\theta })\right) ^{-1}=\sum _{\textit{\textbf{y}}\in \mathcal {P}}\exp \left\{ \kappa \varvec{\theta }^{T}\textit{\textbf{y}}\right\} . \end{aligned}$

(12)

Notice that the summation is over the t! elements in $\mathcal {P}$ . When t is large, say greater than 15, the exact calculation of the normalizing constant is prohibitive. Using the fact that the set of t! permutations lie on a sphere in $$(t-1)$$

-space, our model resembles the continuous von Mises-Fisher distribution, abbreviated as $vMF(\textit{\textbf{x}}|\textit{\textbf{m}},\kappa )$ , which is defined on a $\left( p-1\right)$ unit sphere with mean direction $\textit{\textbf{m}}$ and concentration parameter $\kappa$ :

$p(\textit{\textbf{x}}|\kappa ,\textit{\textbf{m}})=V_{p}(\kappa )\exp (\kappa \textit{\textbf{m}}'\textit{\textbf{x}}),$

where

$V_{p}(\kappa )=\frac{\kappa ^{\frac{p}{2}-1}}{\left( 2\pi \right) ^{\frac{p}{2}}I_{\frac{p}{2}-1}(\kappa )},$

and $I_{a}(\kappa )$ is the modified Bessel function of the first kind with order a. Consequently, we may approximate the sum in (12) by an integral over the sphere:

$C(\kappa ,\varvec{\theta })\simeq C_{t}(\kappa )=\frac{\kappa ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}t!I_{\frac{t-3}{2}}(\kappa )\Gamma (\frac{t-1}{2})},$

where $\Gamma (.)$ is the gamma function. In ( [9], it is shown that this approximation is very accurate for values of $\kappa$ ranging from 0.01 to 2 and t ranging from 4 to 11. Moreover, the error drops rapidly as t increases. Note that this approximation allows us to approximate the first and second derivatives of $\log \,C$ which can facilitate our computation in what follows.

3.1 Maximum Likelihood Estimation (MLE) of Our Model

Let $\textit{\textbf{Y}}=\left\{ \textit{\textbf{y}}_{1},\ldots ,\textit{\textbf{y}}_{N}\right\}$ be a random sample of N standardized rankings drawn from $p(\textit{\textbf{y}}|\kappa ,\varvec{\theta })$ . The log likelihood of $\left( \kappa ,\varvec{\theta }\right)$ is then given by

$\begin{aligned} l(\kappa ,\varvec{\theta })=n\log C_{t}(\kappa )+\sum _{i=1}^{n}\kappa \varvec{\theta }'\textit{\textbf{y}}_{i}. \end{aligned}$

(13)

Maximizing (13) subject to $\left\| \varvec{\theta }\right\| =1$ and $\kappa \ge 0$ , we find that the maximum likelihood estimator of $\varvec{\theta }$ is given by $\hat{\varvec{\theta }}_{MLE}=\frac{\sum _{i=1}^{N}\textit{\textbf{y}}_{i}}{\left\| \sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| },$ and $\hat{\kappa }$ is the solution of

$\begin{aligned} A_{t}(\kappa )\equiv \frac{-C_{t}^{'}(\kappa )}{C_{t}(\kappa )}=\frac{I_{\frac{t-1}{2}}\left( \kappa \right) }{I_{\frac{t-3}{2}}\left( \kappa \right) }=\frac{\left\| \sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| }{N}\equiv r. \end{aligned}$

(14)

A simple approximation to the solution of (14) following [4] is given by

$\hat{\kappa }_{MLE}=\frac{r(t-1-r^{2})}{1-r^{2}}.$

A more precise approximation can be obtained from a few iterations of Newton’s method. Using the method suggested by [8], starting from an initial value $\kappa _{0}$ , we can recursively update $\kappa$ by iteration:

$\kappa _{i+1}=\kappa _{i}-\frac{A_{t}(\kappa _{i})-r}{1-A_{t}(\kappa _{i})^{2}-\frac{t-2}{\kappa _{i}}A_{t}(\kappa _{i})},\;i=0,1,2,\ldots .$

3.2 One-Sample Bayesian Method with Conjugate Prior

Taking a Bayesian approach, we consider the following conjugate prior for $(\kappa ,\varvec{\theta })$ as

$\begin{aligned} p(\kappa ,\varvec{\theta })\propto \left[ C_{t}(\kappa )\right] ^{\nu _{0}}\exp \left\{ \beta _{0}\kappa \textit{\textbf{m}}_{0}'\varvec{\theta }\right\} , \end{aligned}$

(15)

where $\left\| \textit{\textbf{m}}_{0}\right\| =1$ , $\nu _{0},\beta _{0}\ge 0$ . Given $\textit{\textbf{y}}$ , the posterior density of $(\kappa ,\varvec{\theta })$ can be expressed by

$p(\alpha ,\varvec{\theta }|\textit{\textbf{y}})\propto \exp \left\{ \beta \kappa \textit{\textbf{m}}'\varvec{\theta }\right\} V_{t}(\beta \kappa )\cdot \frac{\left[ C_{t}(\kappa )\right] ^{N+\nu _{0}}}{V_{t}(\beta \kappa )},$

where $\textit{\textbf{m}}=\left( \beta _{0}\textit{\textbf{m}}_{\textit{\textbf{0}}}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right) \beta ^{-1},$ $\beta =\left\| \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\|$ . The posterior density can be factored as

$\begin{aligned} p(\kappa ,\theta |\textit{\textbf{y}})=p(\theta |\kappa ,\textit{\textbf{y}})p(\kappa |\textit{\textbf{y}}), \end{aligned}$

(16)

where $p(\varvec{\theta }|\kappa ,\textit{\textbf{y}})\sim vMF(\varvec{\theta }|\textit{\textbf{m}},\beta \kappa )$ and

$p(\kappa |\textit{\textbf{y}})\propto \frac{\left[ C_{t}(\kappa )\right] ^{N+\nu _{0}}}{V_{t}(\beta \kappa )}=\frac{\kappa ^{\frac{t-3}{2}(\upsilon _{0}+N)}I_{\frac{t-2}{2}}(\beta \kappa )}{\left[ I_{\frac{t-3}{2}}(\kappa )\right] ^{\nu _{0}+N}\left( \beta \kappa \right) ^{\frac{t-2}{2}}}.$

The normalizing constant for $p(\kappa |\textit{\textbf{y}})$ is not available in closed form. For reasons explained in [9], we approximate the posterior distribution using the method of variational inference (abbreviated VI from here on). Variational inference provides a deterministic approximation to an intractable posterior distribution through optimization. We first adopt a joint vMF- Gamma distribution as the prior for $(\kappa ,\varvec{\theta })$ :

$\begin{aligned} p(\kappa ,\varvec{\theta })&=p(\varvec{\theta }|\kappa )p(\kappa )\\&=vMF(\varvec{\theta }|\textit{\textbf{m}}_{0},\beta _{0}\kappa )\,Gamma(\kappa |a_{0},b_{0}), \end{aligned}$

where $Gamma(\kappa |a_{0},b_{0})$ is the Gamma density function with shape parameter $a_{0}$ and rate parameter $b_{0}$ (i.e., mean equal to $\frac{a_{0}}{b_{0}}$ ), and $p(\varvec{\theta }|\kappa )=vMF(\varvec{\theta }|\textit{\textbf{m}}_{0},\beta _{0}\kappa )$ . The choice of $Gamma(\kappa |a_{0},b_{0})$ for $p(\kappa )$ is motivated by the fact that for large values of $\kappa$ , $p(\kappa )$ in (15) tends to take the shape of a Gamma density. In fact, for large values of $\kappa$ , $I_{\frac{t-3}{2}}(\kappa )\simeq \frac{e^{\kappa }}{\sqrt{2\pi \kappa }}$ , and hence $p(\kappa )$ becomes the Gamma density with shape $(\nu _{0}-1)\frac{t-2}{2}+1$ and rate $\nu _{0}-\beta _{0}$ :

$p(\kappa )\propto \frac{\left[ C_{t}(\kappa )\right] ^{\nu _{0}}}{V_{t}(\beta _{0}\kappa )}\propto \kappa ^{(\nu _{0}-1)\frac{t-2}{2}}\exp (-(\nu _{0}-\beta _{0})\kappa ).$

Using the variational inference framework, [9] showed that the optimal posterior distribution of $\theta$ conditional on $\kappa$ is a von Mises-Fisher distribution $vMF(\varvec{\theta }|\textit{\textbf{m}},\kappa \beta )$ where

$\beta =\left\| \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| \;\text { and }\;\textit{\textbf{m}}=\left( \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right) \beta ^{-1}.$

The optimal posterior distribution of $\kappa$ is a $Gamma(\kappa |a,b)$ with shape a and rate b with

$\begin{aligned} a=a_{0}+N\left( \frac{t-3}{2}\right) +\beta \bar{\kappa }\left[ \frac{\partial }{\partial \beta \kappa }\ln I_{\frac{t-2}{2}}(\beta \bar{\kappa })\right] , \end{aligned}$

(17)

$\begin{aligned} b=b_{0}+N\frac{\partial }{\partial \kappa }I_{\frac{t-3}{2}}(\bar{\kappa })+\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] . \end{aligned}$

(18)

Finally, the posterior mode $\bar{\kappa }$ can be obtained from the previous iteration as

$\begin{aligned} \bar{\kappa }={\left\{ \begin{array}{ll} \frac{a-1}{b} &{} \text{ if } a>1,\\ \frac{a}{b} &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$

(19)

3.3 Two-Sample Bayesian Method with Conjugate Prior

Let $\textit{\textbf{Y}}_{i}=\left\{ \textit{\textbf{y}}_{i1},\ldots ,\textit{\textbf{y}}_{iN_{i}}\right\}$ for $$i=1,2,$$

be two independent random samples of standardized rankings each drawn, respectively, from $p(\varvec{y_{i}}|\kappa _{i},\varvec{\theta _{i}}).$ Taking a Bayesian approach, we assume that conditional on $\kappa$ , there are independent von Mises conjugate priors, respectively, for $(\varvec{\theta }_{1},\varvec{\theta }_{2})$ as

$p(\varvec{\theta _{i}}|\kappa )\propto \left[ C_{t}(\kappa )\right] ^{\nu _{i0}}\exp \left\{ \beta _{i0}\kappa \textit{\textbf{m}}_{i0}^{T}\varvec{\theta _{i}}\right\} ,$

where $\left\| \textit{\textbf{m}}_{i0}\right\| =1$ , $\nu _{i0},\beta _{i0}\ge 0$ . We shall be interested in computing the Bayes factor when considering two models. Under model 1, denoted $M_{1},$ $\varvec{\theta _{1}}=\varvec{\theta _{2}}$ whereas under model 2, denoted $M_{2}$ , equality is not assumed. The Bayes factor comparing the two models is defined to be

$\begin{aligned} B_{21}= & {} \frac{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{1}}|\kappa )p(\varvec{\theta _{2}}|\kappa )d\varvec{\theta _{1}}d\varvec{\theta _{2}}d\kappa }{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa }\\ \\= & {} \frac{\int \left[ \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{\theta _{1}}|\kappa )d\varvec{\theta _{1}}\right] \left[ \int p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{2}}|\kappa )d\varvec{\theta _{2}}\right] d\kappa }{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa }. \end{aligned}$

The Bayes factor enables us to compute the posterior odds of model 2 to model 1. We fist deal with the denominator in $B_{21}.$ Under $M_{1}$ , we assume a joint von Mises-Fisher prior on $\theta$ and a Gamma prior on $\kappa \!:$

$p\left( \theta ,\kappa \right) =vMF\left( \theta |m_{0},\beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) .$

Hence,

$\begin{aligned} \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa= & {} \int C_{t}^{N}\left( \kappa \right) exp\left\{ \beta \kappa \varvec{\theta }^{T}\textit{\textbf{m}}\right\} V_{t}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\varvec{\theta }d\kappa \\ \\= & {} \int C_{t}^{N}\left( \kappa \right) V_{t}\left( \beta _{0}\kappa \right) V_{t}^{-1}\left( \beta \kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\kappa , \end{aligned}$

where $N=N_{1}+N_{2}$ and

$\textit{\textbf{m}}=\left[ \beta _{0}\varvec{m_{0}}+\sum _{i=1}^{2}\sum _{j=1}^{N_{i}}\varvec{y_{ij}}\right] \beta ^{-1},\beta =\parallel \textit{\textbf{m}}\parallel .$

Now,

$\begin{aligned} \int C_{t}^{N}\left( \kappa \right) V_{t}\left( \beta _{0}\kappa \right) V_{t}^{-1}\left( \beta \kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\kappa= & {} C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\int \left[ \frac{\kappa ^{a_{0}+N\left( \frac{t-3}{2}\right) -1}e^{-b_{0}\kappa }I_{\left( \frac{t-2}{2}\right) }\left( \beta \kappa \right) }{I_{\left( \frac{t-2}{2}\right) }\left( \beta _{0}\kappa \right) I_{\left( \frac{t-3}{2}\right) }^{N}\left( \kappa \right) }\right] d\kappa \\ \\\approx & {} C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\int \kappa ^{a-1}e^{-b\kappa }d\kappa , \end{aligned}$

where in the last step, we used the method of variational inference as an approximation, with

$\begin{aligned} C= & {} \frac{b_{0}^{a_{0}}}{\Gamma \left( a_{0}\right) }\left( 2^{N\left( \frac{t-3}{2}\right) }\left( t!\right) ^{N}\Gamma ^{N}\left( \frac{t-1}{2}\right) \right) ^{-1}\\ \\ a_{1}= & {} a_{0}+N\left( \frac{t-3}{2}\right) +\beta \bar{\kappa }\left[ \frac{\partial }{\partial \beta \kappa }\ln I_{\frac{t-2}{2}}(\beta \bar{\kappa })\right] ,\\ b_{1}= & {} b_{0}+N\frac{\partial }{\partial \kappa }I_{\frac{t-3}{2}}(\bar{\kappa })+\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] \end{aligned}$

and the posterior mode $\bar{\kappa }$ is

$\bar{\kappa }={\left\{ \begin{array}{ll} \frac{a_{1}-1}{b_{1}} &{} \text{ if } a_{1}>1,\\ \frac{a_{1}}{b_{1}} &{} \text{ otherwise. } \end{array}\right. }$

It follows that the denominator of $B_{21}$ is

$C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}.$

For the numerator, we shall assume that conditional on $\kappa$ , there are independent von Mises conjugate priors, respectively, for $\varvec{\theta }_{1},\varvec{\theta }_{2}$ given by

$p(\varvec{\theta _{i}}|\kappa )\propto \left[ C_{t}(\kappa )\right] \exp \left\{ \beta _{0}\kappa \textit{\textbf{m}}_{0}^{T}\varvec{\theta _{i}}\right\} ,i=1,2$

where $\left\| \textit{\textbf{m}}_{0}\right\| =1$ , $\beta _{0}\ge 0$ . Hence,

$\begin{aligned} \int \left[ \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{\theta _{1}}|\kappa )d\varvec{\theta _{1}}\right] \left[ \int p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{2}}|\kappa )\mathrm{d}\varvec{\theta _{2}}\right] \mathrm{d}\kappa&\\ \\ =\int C_{t}^{N}\left( \kappa \right) exp\left\{ \beta _{1}\kappa \varvec{\theta _{1}}^{T}\varvec{m_{1}}\right\} exp\left\{ \beta _{2}\kappa \varvec{\theta _{2}}^{T}\varvec{m_{2}}\right\} V_{t}^{2}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) \mathrm{d}\varvec{\theta _{1}}\mathrm{d}\varvec{\theta _{2}}\mathrm{d}\kappa&\\ \\ =\int C_{t}^{N}\left( \kappa \right) V_{t}^{-1}\left( \beta _{1}\kappa \right) V_{t}^{-1}\left( \beta _{2}\kappa \right) V_{t}^{2}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) \mathrm{d}\kappa&\\ \\ = C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\int \left[ \frac{\kappa ^{a_{0}+N\left( \frac{t-3}{2}\right) -1}e^{-b_{0}\kappa }I_{\left( \frac{t-2}{2}\right) }\left( \beta _{1}\kappa \right) I_{\left( \frac{t-2}{2}\right) }\left( \beta _{2}\kappa \right) }{I_{\left( \frac{t-2}{2}\right) }^{2}\left( \beta _{0}\kappa \right) I_{\left( \frac{t-3}{2}\right) }^{N}\left( \kappa \right) }\right] \mathrm{d}\kappa&\\ \\ = C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\int \kappa ^{a_{2}-1}e^{-b_{2}\kappa }\mathrm{d}\kappa \end{aligned}$

where for

$\begin{aligned} \varvec{m_{i}}= & {} \left[ \beta _{0}\varvec{m_{0}}+\sum _{j=1}^{N_{i}}\varvec{y_{ij}}\right] \beta _{i}^{-1}=\parallel \varvec{m_{i}}\parallel \\ a_{2}= & {} a_{0}+N\left( \frac{t-3}{2}\right) +\sum _{i}\beta _{i}\bar{\kappa }\left[ \frac{\partial }{\partial \beta _{i}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{i}\bar{\kappa })\right] \\ b_{2}= & {} b_{0}+N\frac{\partial }{\partial \kappa }lnI_{\frac{t-3}{2}}(\bar{\kappa })+2\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] \end{aligned}$

and the posterior mode $\bar{\kappa }$ is given recursively:

$\bar{\kappa }={\left\{ \begin{array}{ll} \frac{a_{2}-1}{b_{2}} &{} \text{ if } a_{2}>1,\\ \frac{a_{2}}{b_{2}} &{} \text{ otherwise. } \end{array}\right. }$

It follows that the numerator of the Bayes factor is

$C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}.$

The Bayes factor is then given by the ratio

$\begin{aligned} B_{21} \quad = \quad&\frac{\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}}{\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{2}\right) }{b_{2}^{a_{2}}}}\\ =\left( \frac{\beta _{1}\beta _{2}}{\beta \beta _{0}}\right) ^{\frac{t-2}{2}}&\frac{\Gamma \left( a_{1}\right) b_{2}^{a_{2}}}{\Gamma \left( a_{2}\right) b_{1}^{a_{1}}}. \end{aligned}$

4 Conclusion

In this article, we have considered a few applications of the change of measure paradigm. In particular, it was possible to obtain a new derivation of the Friedman statistic. As well, extensions to the Bayesian models for ranking data were considered. Further applications as, for example, to the sign and Wilcoxon tests are found in [2].

Acknowledgements

Work supported by the Natural Sciences and Engineering Council of Canada, Grant OGP0009068.

References

1.
Alvo, M., Xu, H.: The analysis of ranking data using score functions and penalized likelihood. Austrian J. Stat. 46, 15–32 (2017)Crossref
2.
Alvo, M., Yu, Philip, L.H.: A Parametric Introduction to Nonparametric Statistics. Spinger, Berlin (2018)
3.
Alvo, M., Yu, Philip, L.H.: Statistical Methods for Ranking Data. Springer, Berlin (2014)
4.
Banerjee, A., Dhillon, IS., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
5.
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)Crossref
6.
McCullagh, P.: Models on spheres and models for permutations. In: Fligner, M.A., Verducci, J.S. (eds.) Probability Models and Statistical Analyses for Ranking Data, p. 278283. Springer, Berlin
7.
Neyman, J.: Smooth test for goodness of fit. Skandinavisk Aktuarietidskrift 20, 149–199 (1937)zbMATH
8.
Sra, S.: A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of is (x). Comput. Stat. 27(1), 177–190 (2012)MathSciNetCrossref
9.
Xu, H., Alvo, M., Yu, Philip, L.H.: Angle-based models for ranking data. Comput. Stat. Data Anal. 121, 113–136 (2018)