© Springer Nature Switzerland AG 2020
M. La Rocca et al. (eds.)Nonparametric StatisticsSpringer Proceedings in Mathematics & Statistics339https://doi.org/10.1007/978-3-030-57306-5_21

Multivariate Permutation Tests for Ordered Categorical Data

Huiting Huang1  , Fortunato Pesarin1  , Rosa Arboretti2   and Riccardo Ceccato3  
(1)
Department of Statistical Sciences, University of Padova, Padova, Italy
(2)
Department of Civil, Environmental and Architrctural Engineering, University of Padova, Padova, Italy
(3)
Department of Management and Engineering, University of Padova, Padova, Italy
 
 
Huiting Huang (Corresponding author)
 
Fortunato Pesarin
 
Rosa Arboretti
 
Riccardo Ceccato

Abstract

The main goal of this article is to compare whether different groups with ordinal responses on the same measurement scale satisfy stochastic dominance and monotonic stochastic ordering. In the literature, the majority of inferential approaches to settle the univariate case are proposed within the likelihood framework. These solutions have very nice characterizations under their stringent assumptions. However, when the set of alternatives lie in a positive orthant with more than four dimensions, it is quite difficult to achieve proper inferences. Further, it is known that testing for stochastic dominance in multivariate cases by likelihood approach is much more difficult than the univariate case. This paper intends to discuss the problem within the conditionality principle of inference through the permutation testing approach and the nonparametric combination (NPC) of dependent permutation tests. The NPC approach based on permutation theory is generally appropriate to suitably find exact good solutions to this kind of problems. Moreover, some solutions for a typical medical example are provided.

Keywords
Multivariate permutation testingStochastic dominanceConditional inferenceNonparametric combination

1 Introduction

Ordered categorical data are frequently encountered in many research and decision-making fields. For instance, records from patients under different treatments in clinical experiments, feedbacks of questionnaire in social sciences, data on some questions about feeling, thought or opinion collected in a natural way in psychology, quality examination of products in marketing and technology, etc. Taking clinical trails as a guide, we intend to find if results of cure plans satisfy stochastic ordering and to find the best treatment among cure plans. Problems of comparing whether different groups with ordinal responses on the same measurement scale satisfy stochastic dominance $$(C=2)$$ is our principal interest. Thereby, we intend to provide tests of hypotheses with ordinal responses especially by testing for stochastic dominance since that for stochastic ordering, $$(C>2)$$, is obtained as a combination of $$C-1$$ dominance partial tests. This is known to be a rather difficult problem. Many approaches are proposed in the literature to settle it within likelihood frameworks. [14] proposed an iterative procedure with censored data which is based on a pair-wise algorithm to find the asymptotic MLE’s of Kaplan–Meier form. [25] introduced numerical approximation of MLE’s of two V-dimensional distributions under stochastic ordering. [27] derived the null asymptotic distribution for the likelihood ratio test statistic for some testing procedures. Testing procedures based on maximum likelihood estimates of odds ratios have been considered by [2, 3] and others. Moreover, Kateri and Agresti (2013), in place of traditional frequentist methods, applied a Bayesian approach to test if the structure of an association between the response variable and the explanatory variable in two samples is ordinal. When available, likelihood-based solutions within their stringent assumptions are provided with known inferential properties. In general, however, it is quite difficult to obtain proper testing inference, especially for the multivariate case. Multivariate case is much more difficult to be analyzed within likelihood frameworks than the univariate one. In such a setting, the number of underlying nuisance parameters and/or that of observed variables can often be much larger than sample sizes. So, unless clearly justified assumptions allowing for considerable reduction of underlying complexity, the most intriguing of which is when one pseudo-parameter is expressed as a function of many underlying nuisance parameters, no correct general testing solution is possible within that approach.

Our approach to this kind of problems is within the conditionality principle of inference [13], where the conditioning is with respect to a set of sufficient statistics in the null hypothesis as usually the pooled observed data is. That is, by using the permutation testing theory and the nonparametric combination (NPC) of dependent permutation tests [48, 1823]. When the underlying population distribution is unknown, nonparametric permutation methods might become a necessity. This is especially true when the number of categories and/or that of underlying nuisance parameters are not very small. [16] studied the testing for marginal inhomogeneity and direction-independent marginal order under the global permutation tests. [15] utilized $$\chi ^2$$-P statistic with small sample size under the permutation approach. The NPC approach is a general methodology for multivariate problems, especially, for stochastic dominance and stochastic ordering. The NPC testing solution performs [24] Union-Intersection (UI) approach when an equivalent set of sub-problems is properly carried out.

In principle, the exact calculations of required testing distributions are obtained through complete enumeration of all data permutations. This, however, becomes impossible in practice when the cardinality of permutation spaces are large. To this end, a conditional Monte Carlo procedure was suggested to practically obtain their estimations, at any desired degree of accuracy ([18, 20, 22]). Main NPC routines are achieved in MATLAB, R, Python, StatXact, SAS, etc.

The rest of the paper is organized as follows. Section 2 introduces a typical real example. Section 3 discusses the two-sample basic problem under unidimensional and multidimensional cases. Section 4 studies approaches for stochastic ordering restriction in C-sample designs. Solutions to the example are in Sect. 5. Some concluding remarks are in Sect. 6.

2 A Typical Medical Example

Let us consider the example in Table 1 from Chuang-Stein and Agresti (1997), also reported by [1, 2, 12, 26]. It regards a unidimensional survey on subarachnoid hemorrhage measured by Glasgow outcome scale, where 210 patients received a Placebo, 190 received a Low dose, 207 a Medium dose, and 195 a High dose. Response data, related to the extent of trauma, measured on the same ordinal scale, are classified according to $$C=4$$ doses of a treatment, {Placebo, Low, Medium, High}, with outcome classified in $$K=5$$ ordered categories {Death, Vegetative state, Major disability, Minor disability, Good recovery}.
Table 1

Dose and Extent of trauma due to subarachnoid hemorrhage

Treatment

Death

Veget

Major

Minor

Recov

Total

Placebo

59

25

46

48

32

210

Low

48

21

44

47

30

190

Medium

44

14

54

64

31

207

High

43

4

49

58

41

195

Total

194

64

193

217

134

802

Based on our intuition, but also in accordance with quoted authors, patients taking Placebo are expected to achieve lower treatment effect than those taking Low dose, patients taking Low dose have lower effects than those with Medium dose, and so forth. Therefore, it is expected that patients exhibit monotonically non-decreasing responses X as the dose increases. Thus, it is required to test whether there is a monotonic stochastic ordering on related response data. Formally, the hypotheses to consider are $$H_0: X_{P}\overset{d}{=}X_{L}\overset{d}{=}X_{M}\overset{d}{=}X_{H}$$ against $$H_1: X_{P}\overset{d}{\preceq }X_{L}\overset{d}{\preceq }X_{M}\overset{d}{\preceq }X_{H}$$ with at least one strict inequality. If responses were quantitative, this problem is also termed of isotonic regression. Defining the cumulative distribution function for responses X at ordered categories $$c_1 \prec \ldots \prec c_K$$ as $$F_{X}(c_{k})=\Pr \{X\preceq c_{k}\}$$, namely, the hypotheses are equivalently expressed as $$H_0: \{F_{X_{P}}=F_{X_{M}}=F_{X_{L}}=F_{X_{H}}\}$$ against $$H_1: \{F_{X_{P}}\ge F_{X_{M}}\ge F_{X_{L}}\ge F_{X_{H}}\}$$, with at least one strict inequality.

With clear meaning of the symbols, the rationale for this formulation resides in that if, according to increasing doses, non-decreasing treatment effects $$\delta $$ occur at latent variables Y, i.e., $$\delta _h \le \delta _j$$, $$1\le h < j \le C$$, then latent responses should behave as $$Y_h = (Y+\delta _h) \overset{d}{\le } Y_j =(Y+\delta _j)$$.

The related testing problem has a rather difficult solution within the likelihood-ratio theory, which with categorical data in addition presents quite a serious difficulty: even for moderate number of cells it is recognized to be not unique ([1012, 26, 27]; etc.). Moreover, to get a solution, important supplementary options, difficult to justify in terms of the real problem under study, are required. This difficulty mostly consists in that the set of alternatives is restricted to lie in the $$(C-1)\times (K-1)$$-Dimensional positive orthant where the likelihood cannot be maximized under $$H_0$$ by ordinary methods of maximization.

Our solution does firstly consider the setting of two treatments, and then, according to [24] UI and Jonckheere–Terpstra’s approaches, by a breakdown of the hypotheses into $$C-1$$ pairs of sub-hypotheses. Later, all resulting dependent partial tests are combined by a NPC method.

3 The Two-Sample Basic Problem

Let us firstly consider the two-sample basic case, where data are in a $$2\times K$$ table and the specific hypotheses are expressed as $$H_0: X_1 \overset{d}{=} X_2 \equiv \{F_1(c_k)=F_2(c_k), k=1,\ldots K \}$$ against $$H_1: X_1 \overset{d}{\prec } X_2\equiv \{F_1(c_k)\ge F_2(c_k), k=1,\ldots K \} $$ with at least one strict inequality. The related testing problem can be equivalently set as $$H_0: \bigcap _{k=1}^{K-1}[F_{1}(c_{k})=F_{2}(c_{k})]$$ against the set of restricted alternatives $$H_1: \bigcup _{k=1}^{K-1}[F_{1}(c_{k})> F_{2}(c_{k})]$$.

It is worth noting that i) according to [24] the problem is equivalently broken-down into $$K-1$$ one-sided sub-problems; ii) $$H_1$$ defines a multi-one-sided set of alternatives; iii) since under both $$H_0$$ and $$H_1$$ it is $$F_1(c_K) = F_2(c_K) = 1$$, category $$c_K$$ is not considered; iv) the global solution requires the joint comparison of $$K-1$$ random relative frequencies: $$\hat{F}_1(c_k)-\hat{F}_2(c_k), k=1,\ldots ,K-1 $$.

Since the number of unknown nuisance parameters to take care in any $$2\times K$$ testing process is $$2\times K-1$$ and the likelihood is to be maximized in the $$(K-1)$$-dimensional positive orthant, indeed a very difficult task especially when $$K>4$$, our approach is to stay within the conditional principle of inference.

The conditioning should be on a set of sufficient statistics in the null hypothesis for the unknown underlying common distribution F. To this end, let $$p_{F}(X)$$ be the underlying likelihood related to F, and let the two independent samples of IID data, respectively, sized $$n_1$$ and $$n_2$$, be $$\mathbf {X}_1=(X_{11},\ldots ,X_{1n_{1}})$$ and $$\mathbf {X}_2=(X_{21},\ldots ,X_{2n_{1}})$$. So the data set is $$\mathbf {X}=(\mathbf {X}_1, \mathbf {X}_2)$$, whose joint likelihood is $$p_{F}(\mathbf {X})=\prod _{i=1}^{n_{1}}p_{F_{1}}(X_{1i})\prod _{i=1}^{n_{2}}p_{F_{2}}(X_{2i})$$. In null hypothesis, it is assumed that there is no difference between two distributions, namely, $$F_1=F_2=F$$. Thus, the joint null likelihood $$p_{F}(\mathbf {X})=\prod _{j=1}^{2}\prod _{i=1}^{n_{j}}p_{F}(X_{ji})$$ is invariable with respect to any permutation $$\mathbf {X}^*$$ of the observed pooled data $$\mathbf {X}=(\mathbf {X}_1 \biguplus \mathbf {X}_2)$$, where $$\biguplus $$ is the symbol for pooling two data sets. This shows that data under $$H_0$$ are exchangeable, i.e., permutable. Moreover, under $$H_0$$ pooled data $$\mathbf {X}$$ are always a set of sufficient statistics for any underlying distribution F [1820, 22]; so, any information on parameters defining F is wholly contained in $$\mathbf {X}$$. The set of all permutations $$\mathbf {X}^*$$ of $$\mathbf {X}$$ is indicated with $$\Pi (\mathbf {X})$$. It is worth noting that $$\Pi (\mathbf {X})=\Pi (\mathbf {X}^*)$$, i.e., the set of permutations of $$\mathbf {X}$$ coincides, $$\forall \mathbf {X}^* \in \Pi (\mathbf {X})$$, with that of $$\mathbf {X}^*$$. Of course, under the alternative $$H_1$$ the above invariable property does not work, because the two distributions are different by assumption: indeed $$\mathbf {X}_1$$ is sufficient for $$F_1$$ and $$\mathbf {X}_2$$ is sufficient for $$F_2$$ and so pooled data are not exchangeable.

The act of conditioning on a set of sufficient statistics for F in $$H_0$$ entails that any conditional inference is independent of the underlying population distribution F. This conditioning gives rise to the following fundamental property:

Let $$(\mathcal {X}, \mathcal {A}, F)$$ be the probability space related to data X, then sufficiency of $$\mathbf {X}$$ for underlying F, under$$H_0$$, implies that the null conditional probability of any event $$A\in \mathcal {A}$$, given $$\mathbf {X}$$, is independent of F, i.e., $$Pr\{\mathbf {X}^* \in A; F\mid \mathbf {X}\} = Pr\{\mathbf {X}^* \in A\mid \mathbf {X}\} = P[A\mid \mathbf {X}]$$.

Three relevant consequences of this property are c1) under $$H_0$$ all M permutations $$\mathbf {X}^*$$ of $$\mathbf {X}$$ are equally likely; c2) so $$P[A\mid \mathbf {X}]=\# (\mathbf {X}^*\in A)/M$$, where $$\#(\cdot )$$ is the number of elements of $$\Pi (\mathbf {X})$$ that satisfy condition $$(\cdot )$$, i.e., $$P[A\mid \mathbf {X}]$$ is properly a count ratio; c3) if $$\mathbf {T}= (T_1,\ldots , T_S)^{\top } $$ is a vector of $$S\ge 1$$ permutation statistics (e.g., tests) and $$\varphi : \mathcal {R}^S \rightarrow \mathcal {R}^1$$ is any measurable function, then the conditional null distribution of $$\varphi $$ is independent of F; indeed,
$$\begin{aligned}&Pr\{\varphi (T_1^*,\ldots ,T_S^*)\le z; F\mid \mathbf {X}\} = Pr\{\varphi (T_1^*,\ldots ,T_S^*)\le z \mid \mathbf {X}\}\nonumber \\&\qquad \, \qquad \, \qquad \,\qquad = Pr[\varphi _\mathbf {T}^{-1}(z)\mid \mathbf {X}]=\frac{\#[\mathbf {X}^*\in \varphi _{\mathbf {T}}^{-1}(z)]}{M}, \end{aligned}$$
(1)
since, due to measurability of $$\varphi $$, $$\forall z\in \mathcal {R}^1$$, it is $$\varphi _{\mathbf {T}}^{-1}(z)\in \mathcal {A}$$.

It is worth noting that (c3) is the central property for deducing and justifying the NPC of dependent permutation tests. Also worth noting is (i) the conditional probability $$P[A\mid \mathbf {X}]$$ has always an objective existence; (ii) the conditional null distribution of $$\varphi $$ is independent of all dependence parameters underlying $$\mathbf {T}$$; (iii) to characterize sufficiency of $$\mathbf {X}$$ in $$H_0$$, permutation tests require the existence of a likelihood $$p_F(\mathbf {X}) >0$$, not its calculability; (iv) when $$\mathbf {X}$$ is minimal sufficient for F, it makes no sense to work outside the permutation testing principle [20]; (v) permutation tests are nonparametric, distribution-free, and intrinsically robust.

For operating with categorical data, to the usual contingency table we prefer using its unit-by-unit representation: $$\mathbf {X}=\{X(i),i=1,\ldots ,n;n_{1},n_{2}\}$$, with $$n=n_1+n_2$$, where it is intended that the first $$n_1$$ data belong to the first sample and the rest belong to the second. Such a representation, is one-to-one related with the table for unidimensional variables, as with the example; it is much more efficient for multidimensional variables, where working with contingency tables becomes more and more difficult, up to impossibility, as the number of variables increases. Using that representation, a random permutation $$\mathbf {X}^*\in \Pi (\mathbf {X})$$ can be expressed as $$\mathbf {X}^*=\{X(u_i^*), i=1,\ldots ,n;n_1,n_2 \}$$, where $$\mathbf {u}^*\in \Pi (\mathbf {u})$$ is a random permutation of the unit labels $$\mathbf {u} = \{ 1,\ldots ,n\}$$. The corresponding permuted table is calculate as $$\{f_{jk}^{*}=\#(X_{ji}^{*}\in c_{k}),$$ $$k=1,\ldots ,K, j=1,2\}$$. Obviously, the marginal frequencies are permutation invariable quantities since $$f_{\cdot k} =f_{1k}+f_{2k}=f_{1k}^{*}+f_{2k}^{*}=f_{\cdot k}^{*}$$, $$k=1,\ldots K$$. Similarly, the cumulative marginal frequencies are also invariable: $$N_{\cdot k}=N_{1k}+N_{2k}=N_{\cdot k}^{*}$$ with $$N_{jk}=\sum _{s\le k}f_{js}$$.

3.1 The $$2\times K$$ One-Dimensional Case

We start with the two-samples one-dimensional problem. For the case of $$C=2$$, the related stochastic dominance testing problem becomes $$H_{0}:F_{1}=F_{2}\equiv \bigcap _{k=1}^{K-1}[F_{1}(c_{k})=F_{2}(c_{k})]$$ against $$H_{1}:F_{1}>F_{2}\equiv \bigcup _{k=1}^{K-1}[F_{1}(c_{k})> F_{2}(c_{k})]$$, whose global analysis requires the joint comparison of $$K-1$$ differences of random frequencies: $$\hat{F}_1(c_k)-\hat{F}_2(c_k), k=1,\ldots ,K-1 $$. Since the crucial point for that joint analysis is the proper handling of all underlying dependences, to attain general solutions we must work within the UI-NPC of related dependent permutation tests because, due to c3) (see Sect. 3), the estimation of dependence coefficients is not required since NPC works independently of such dependences, how complex these are.

Accordingly, the $$K-1$$ partial test statistics are
$$\begin{aligned} T_{k}^{*}=C(n_1,n_2)\cdot [\hat{F}_{1k}^{*}-\hat{F}_{2k}^{*}]\left[ \bar{F}_{\cdot k}(1-\bar{F}_{\cdot k})\right] ^{-\frac{1}{2}},\;\;k=1,\ldots ,K-1, \end{aligned}$$
(2)
where $$\hat{F}_{jk}^{*}=\hat{F}_{j}^{*}(c_{k})=N_{jk}^{*}/n_{j}$$, $$j=1,2$$; $$\bar{F}_{\cdot k}=N_{\mathbf {\cdot }k}/n$$ are permutation and marginal empirical distribution functions (EDFs); $$N_{1k}^{*}$$ and $$N_{2k}^{*}$$, $$k=1,\ldots ,K-1$$ are permutation cumulative frequencies obtained from the permuted table $$\{f_{jk}^{*},$$ $$k=1,\ldots ,K$$, $$j=1,2\}$$.

It is worth noting that (i) EDFs $$\hat{F}_{jk}$$ are maximum likelihood unbiased estimates of population CDFs $$F_{j}(c_{k})$$, $$k=1,\ldots ,K-1$$, $$j=1,2$$; (ii) each partial tests $$T_{k}^{*}$$ is a reformulation of Fisher’s exact probability test and so it is a best conditional test; (iii) large values of each partial test $$T_{k}^{*}$$ are significant against its related null sub-hypothesis $$H_{1k}$$; (iv) the $$K-1$$ partial tests are positively dependent; (v) for computation of $$T_k^*$$, 0 is assigned to expressions with the form 0/0; (vi) $$C(n_1,n_2) = [n_1 n_2 (n-1)/n^2]^{1/2} $$ is a permutation constant not dependent on k; (vii) for increasing sample sizes, each $$T_{k}^{*}$$ under $$H_0$$ converges to the standardized normal distribution: $$T_{k}^{*} \overset{d}{\rightarrow }\mathcal {N}(0,1) $$.

According to the approach discussed in [18, 19, 22], the global testing solution can be obtained by their UI-NPC while using any admissible combining function. The simplest admissible combination is by the direct sum of partial tests:
$$\begin{aligned} T_{AD}^{*}= \sum _{k=1}^{K-1}T_k^* =C(n_1,n_2)\cdot \sum _{k=1}^{K-1}[ \hat{F}_{1k}^{*}-\hat{F} _{2k}^{*}]\left[ \bar{F}_{\cdot k}(1-\bar{F}_{\cdot k})\right] ^{-\frac{1}{2}}. \end{aligned}$$
(3)
Such a solution looks like the discrete version of Anderson–Darling goodness-of-fit type test for multi-one-sided alternatives. It is worth noting that (i) each partial test is unbiased and so $$T_{AD}^{*}$$ is unbiased; (ii) at least one partial test is consistent and so $$T_{AD}^{*}$$ is consistent; (iii) $$T_{AD}^{*}$$ is an admissible combination of partial best tests and so provided with good power behavior. Of course, by using other admissible combining functions one can obtain other good solutions, none of which, however, being uniformly better than any other.

The corresponding p-value-like statistics can be written as $$\lambda _{AD}=\Pr \{T_{AD}^{*}\ge T_{AD}^{o}\mid \mathbf {X}\}$$, where $$T_{AD}^{o}=T_{AD}(\mathbf {X})$$ is the observed value of $$T_{AD}$$ on pooled data $$\mathbf {X}$$. So, remembering that p-value-like statistics play the role of tests whose common critical value is $$\alpha $$, if $$\lambda _{AD}\le \alpha $$, the null hypothesis is rejected at significance level $$\alpha >0$$.

Consider the representation displayed in Table 2. It corresponds to the NPC procedure for a general problem with K partial tests, R random permutations, and combining function $$\psi $$.
Table 2

Representation of the conditional Monte Carlo method in multivariate tests

$$\mathbf {X}$$

$$\mathbf {X}_{1}^{*}$$

 

$$\mathbf {X}_{r}^{*}$$

 

$$\mathbf {X} _{R}^{*}$$

$$T_{1}^{o}$$

$$T_{11}^{*}$$

$$\cdots $$

$$T_{1r}^{*}$$

$$\cdots $$

$$T_{1R}^{*}$$

$$\vdots $$

$$\vdots $$

 

$$\vdots $$

 

$$\vdots $$

$$T_{K}^{o}$$

$$T_{K1}^{*}$$

$$\cdots $$

$$T_{Kr}^{*}$$

$$\cdots $$

$$T_{KR}^{*}$$

  

$$\downarrow $$

   

$$T_{\psi }^{o}$$

$$T_{\psi 1}^{*}$$

$$\cdots $$

$$T_{\psi r}^{*}$$

$$\cdots $$

$$T_{\psi R}^{*}$$

Under $$H_0$$, the sub-matrix $$\{T_{kr}^{*}\}_{K\times R}$$ simulates the K-dimensional null distribution of K partial permutation tests. The sub-vector $$\{T_{\psi r}^{*}\}_{R}$$ simulates the null permutation distribution of combined test $$T_\psi $$.

Thus, the statistic $$\hat{\lambda }_{\psi }=\#(T_{\psi }^{*}\ge T_{\psi }^{o})/R$$ gives an unbiased and, as R diverges, a strongly consistent estimate of the p-value statistic $$\lambda _{\psi }$$ of $$T_{\psi }$$.

Under $$H_1$$, at least one $$T_k^{o}$$ presents larger observed values than in $$H_0$$; so, if the combining function $$\psi $$ is non-decreasing in each argument, the p-value statistic satisfies the relation: $$\hat{\lambda }_{\psi ; H_{1}}\overset{d}{\le } \hat{\lambda }_{\psi ; H_{0}}$$ uniformly for every data set $$\mathbf {X}$$ and every underlying distribution F. Hence, the latter justifies that $$H_0$$ is rejected when $$\hat{\lambda }_{\psi }\le \alpha $$; moreover, it can be proved that $$T_\psi $$ is provided with the unbiasedness and consistency properties. Details and proofs for these and other properties are in [1820, 22].

3.2 The $$2\times K$$ Multidimensional Case

In the general multidimensional case, let us start from two-sample V-dimensional problem, $$V\ge 2$$. The formulation of testing for multidimensional hypotheses are $$H_{0}:\mathbf {X}_{1}\overset{d}{=}\mathbf {X}_{2}$$ against $$H_{1}:\mathbf {X}_{1}\overset{d}{\prec }\mathbf {X}_{2}$$. The hypotheses $$H_0$$ and $$H_1$$, according to [24] are assumed to be equivalently broken-down into $$K\ge 2$$ sub-hypotheses, $$H_0\equiv \bigcap _{k=1}^{K} H_{0k}$$ and $$H_1\equiv \bigcup _{k=1}^{K}H_{1k}$$. Thus, with V dimensional ordinal data and K ordered categories for each variable, the hypotheses are equivalently written as $$\bigcap _{v=1}^{V}\bigcap _{k=1}^{K-1}[F_{1v}(c_{k})= F_{2v}(c_{k})]$$ and $$\bigcup _{v=1}^{V}\bigcup _{k=1}^{K-1}[F_{1v}(c_{k})> F_{2v}(c_{k})]$$, respectively. Thus, for variable $$v=1,\ldots ,V$$, partial test is $$T_{AD v}^*$$ according to Sect. 3.1. Since all these partial tests are standardized and so, sharing the same asymptotic null distribution, for their combination we can proceed with their direct sum. This provides for the V-dimensional extension of Anderson–Darling test for multi-one-sided alternatives:
$$\begin{aligned} T_{AD}^{*}=\sum _{v=1}^{V}T_{AD v}^*=C(n_1,n_2)\cdot \sum _{v=1}^{V}\sum _{k=1}^{K-1}[ \hat{F}_{1vk}^{*}-\hat{F} _{2vk}^{*}]\left[ \bar{F}_{\cdot vk}(1-\bar{F}_{\cdot vk})\right] ^{-\frac{1}{2}}. \end{aligned}$$
(4)
It is worth noting that, now, with symbol $$\mathbf {X}$$ it is represented the V-dimensional variable and the pooled sample data matrix, the context generally suffices avoiding misunderstandings. Of course, the V-dimensional $$T_{AD}^*$$ enjoys the same good properties as the unidimensional. In place of the direct combination of V partial tests $$T_{AD v}^*$$, i.e., one Anderson–Darling test for each variable, it is possible to think of a more general combination like, for instance, $$T_{\psi }^{*}=\psi (T_{AD1}^{*},\ldots ,T_{ADV}^{*})$$. The most commonly used combining functions $$\psi $$ are Fisher’s $$T_{F}=-2\sum _{v}\log (\lambda _{ADv}^*)$$, or Liptak’s $$T_{L}^*=\sum _{v}\Phi ^{-1}(1-\lambda _{ADv}^*)$$, where $$\lambda _{ADv}^*$$ is the p-value statistic of $$T_{ADv}^{*}$$ and $$\Phi (\cdot )^{-1}$$ is the inverse standard normal CDF. Since in $$T_{AD}^{*}$$ all summands are well defined, it is also of some interest to observe that the double summation can equivalently be computed as $$\sum _k \sum _v$$.

4 The C-sample Stochastic Ordering Problem

Considering the Jonckheere–Terpstra idea, the $$C\times K$$ table can be broken-down into $$(C-1)$$ sub-tables. Accordingly, the testing problem is broken-down into $$(C-1)$$ sub-problems each based on a $$2\times K$$ sub-table. To be specific, for any $$j\in \{1,\ldots ,C-1\}$$, we divide the data set into two pooled pseudo-groups, where the first pseudo-group is obtained by pooling data of the first j ordered groups and the second by pooling the rest. Thus, the procedure considers the first pooled pseudo-group as $$\mathbf {Y}_{1(j)}=\mathbf {X}_{1}\biguplus \ldots \biguplus \mathbf {X}_{j}$$ and the second as $$\mathbf {Y}_{2(j)}=\mathbf {X}_{j+1}\biguplus \ldots \biguplus \mathbf {X}_{C}$$, $$j=1,\ldots ,C-1$$, where $$\mathbf {X}_{j}=\{X_{ji},i=1,\ldots ,n_{j}\}$$ is the data set in the jth group.

In the null hypothesis $$H_0$$, related pooled variables satisfy the relationships $$Y_{1(j)}\overset{d}{=}Y_{2(j)}$$, $$j=1,\ldots ,C-1$$, thus, data from every pair of pseudo-groups are exchangeable. In the alternative $$H_1$$, as for at least one j the relation inequality $$X_j \overset{d}{\prec } X_{j+1}, 1\le j\le C-1$$ is strict, the corresponding stochastic dominance between each pair of pseudo-groups $$Y_{1(j)}\overset{d}{\prec }Y_{2(j)}$$ is true for all $$j\le C-1$$. Therefore, the hypotheses for monotonic stochastic ordering problem can be equivalently written as $$H_{0}:\{\bigcap _{j=1}^{C-1}(Y_{1(j)}\overset{d}{=}Y_{2(j)})\}$$ and $$H_{1}:\{\bigcup _{j=1}^{C-1}(Y_{1(j)}\overset{d}{\prec }Y_{2(j)})\}$$, emphasizing a break-down into a set of $$C-1$$ sub-hypotheses. For each sub-problem we can consider the test:
$$\begin{aligned} T_{AD(j)}^*=C(n_{1(j)}, n_{2(j)})\cdot \sum _{k=1}^{K-1}\left[ \hat{F}_{1(j)k}^{*}-\hat{F}_{2(j)k}^{*}\right] [\bar{F}_{\cdot (j)k}(1-\bar{F}_{\cdot (j)k})]^{-\frac{1}{2}},\;j=1,\ldots ,C-1, \end{aligned}$$
(5)
where $$n_{1(j)} = n_1 +\ldots +n_j$$, $$n_{2(j)}=n-n_{1(j)}$$; the permutation relative frequencies are $$\hat{F}_{l(j)k}^{*} = \#(X_{l(j)}^*\preceq c_k)/n_{l(j)}$$, $$l=1,2$$; the marginal relative frequencies are $$\bar{F}_{\cdot (j)k} =[\#(X_{1(j)}^*\preceq c_k)+\#(X_{2(j)}^*\preceq c_k)]/n$$; partial tests $$T_{AD(j)}^*$$ are positively dependent; and $$C(n_{1(j)}, n_{2(j)})$$ are the permutation k-invariable constants. So the global problem is solved by combining the $$C-1$$ partial tests within the UI-NPC as, for instance, by
$$\begin{aligned} T_{AD}^* = \sum _{j=1}^{C-1}T_{AD(j)}^*. \end{aligned}$$
(6)
According to our experience, except for the direct, the most suitable combining functions for this problem are Fisher’s and Liptak’s. Since in the stochastic ordering alternative all $$C-1$$ partial tests contain a positive non-centrality quantity, i.e., all lie in their respective sub-alternatives, Tippett’s combination is less sensitive than others.
Of course, if $$V>1$$ variables were involved, the multivariate stochastic ordering solution would require one stochastic ordering partial test for each variable $$v=1,\ldots ,V$$. So, with clear meanings of the symbols, the global test, by direct combination, is
$$\begin{aligned} T_{AD,V}^* =\sum _{j=1}^{C-1} C(n_{1(j)}, n_{2(j)})\cdot \sum _{v=1}^{V}\sum _{k=1}^{K-1}\left[ \hat{F}_{1v(j)k}^{*}-\hat{F}_{2v(j)k}^{*}\right] [\bar{F}_{\cdot v(j)k}(1-\bar{F}_{\cdot v(j)k})]^{-\frac{1}{2}}. \end{aligned}$$
(7)

5 Solution of Medical Example

The analyses of the data from medical example, based on $$R=100\,000$$ random permutations, for tests: Anderson–Darling $$T_{AD}^{*},$$ on scores $$ T_{W}^{*}$$, and on mid-ranks $$T_{M}$$, and their combination functions: $$T_{D}^{\prime \prime }$$ direct, $$T_{F}^{\prime \prime }$$ Fisher’s, $$T_{L}^{\prime \prime }$$ Liptak’s, and $$T_{T}^{\prime \prime }$$ Tippett’s are shown in Table 3. Note that (i) W scores are assigned to ordering integer numbers as $$(w_{1}=1, w_{2}=2, w_{3}=3, w_{4}=4, w_{5}=5)$$; (ii) since small p-value statistics are evidence for $$H_1$$, Fisher’s, Liptak’s, and Tippett’s are non-increasing functions of partial p-values. The p-values based on UI-NPC method are
Table 3

p-values based on UI-NPC approach

 

$$T_{(1)}^{*}$$

$$T_{(2)}^{*}$$

$$T_{(3)}^{*}$$

$$T_{D}^{^{\prime \prime }}$$

$$T_{F}^{^{\prime \prime }}$$

$$T_{L}^{^{\prime \prime }}$$

$$T_{T}^{^{\prime \prime }}$$

$$ \hat{\lambda }_{AD(j)}$$

0.0141

0.0025

0.0074

0.0017

0.0015

0.0012

0.0068

$$ \hat{\lambda }_{W(j)}$$

0.0131

0.0021

0.0076

0.0010

0.0012

0.0010

0.0053

$$ \hat{\lambda }_{M(j)}$$

0.0144

0.0024

0.0062

0.0011

0.0014

0.0011

0.0068

Results in Table 3 clearly show that the p-values based on four different combination functions $$T_{D}^{^{\prime \prime }}$$, $$T_{F}^{^{\prime \prime }}$$, $$T_{L}^{^{\prime \prime }}$$, and $$T_{T}^{\prime \prime }$$, all reject the null hypothesis at significance level $$\alpha =0.01$$ of monotonic stochastic ordering among the $$C=4$$ doses. So the inferential conclusion is that patients present non-decreasing responses as the dose increases.

It is worth noting that the three combined p-value statistics $$T_{D}^{^{\prime \prime }}$$, $$T_{F}^{^{\prime \prime }}$$, and $$T_{L}^{^{\prime \prime }}$$ differ only slightly in the fourth digit. This means that related tests are all suitable for testing unidimensional dominance and stochastic ordering alternatives. In our case, if the stochastic ordering alternative is true, it is also jointly true by construction for all $$C-1$$ partial tests $$T_{(j)}^{*}$$. So, Tippett’s $$T_{T}^{\prime \prime }$$ differs from other combination functions because its power behavior is mostly sensitive when only one partial test lies in the alternative. Due to too many ties in the data set, test with rank transformations was not considered.

Since all p-values statistics related to $$T_{AD(3)}$$ are $$< 0.05/3$$, by simple Bonferroni’s rule it results that subjects taking High dose exhibit significantly lower responses than those taking lower doses.

6 Concluding Remarks

The basic idea in this paper is to test for stochastic ordering restrictions with multivariate ordered categorical data through a suitable combination of a set of partial tests by UI-NPC approach based within the permutation theory. Such problems have quite difficult solutions within the likelihood ratio theory which, when available, have nice characterizations under their usually too stringent assumptions.

The UI-NPC approach is within the conditionality principle of inference, where the conditioning is with respect to a set of sufficient statistics in the null hypothesis like the pooled observed data. So, it is based on the permutation testing approach and the NPC of dependent permutation tests. The NPC approach shows a good general power behavior, it is rather efficient and less demanding in terms of underlying assumptions comparing to parametric competitors when these exist and are available.