1 Introduction
Ordered categorical data are frequently encountered in many research and decision-making fields. For instance, records from patients under different treatments in clinical experiments, feedbacks of questionnaire in social sciences, data on some questions about feeling, thought or opinion collected in a natural way in psychology, quality examination of products in marketing and technology, etc. Taking clinical trails as a guide, we intend to find if results of cure plans satisfy stochastic ordering and to find the best treatment among cure plans. Problems of comparing whether different groups with ordinal responses on the same measurement scale satisfy stochastic dominance is our principal interest. Thereby, we intend to provide tests of hypotheses with ordinal responses especially by testing for stochastic dominance since that for stochastic ordering, , is obtained as a combination of dominance partial tests. This is known to be a rather difficult problem. Many approaches are proposed in the literature to settle it within likelihood frameworks. [14] proposed an iterative procedure with censored data which is based on a pair-wise algorithm to find the asymptotic MLE’s of Kaplan–Meier form. [25] introduced numerical approximation of MLE’s of two V-dimensional distributions under stochastic ordering. [27] derived the null asymptotic distribution for the likelihood ratio test statistic for some testing procedures. Testing procedures based on maximum likelihood estimates of odds ratios have been considered by [2, 3] and others. Moreover, Kateri and Agresti (2013), in place of traditional frequentist methods, applied a Bayesian approach to test if the structure of an association between the response variable and the explanatory variable in two samples is ordinal. When available, likelihood-based solutions within their stringent assumptions are provided with known inferential properties. In general, however, it is quite difficult to obtain proper testing inference, especially for the multivariate case. Multivariate case is much more difficult to be analyzed within likelihood frameworks than the univariate one. In such a setting, the number of underlying nuisance parameters and/or that of observed variables can often be much larger than sample sizes. So, unless clearly justified assumptions allowing for considerable reduction of underlying complexity, the most intriguing of which is when one pseudo-parameter is expressed as a function of many underlying nuisance parameters, no correct general testing solution is possible within that approach.
Our approach to this kind of problems is within the conditionality principle of inference [13], where the conditioning is with respect to a set of sufficient statistics in the null hypothesis as usually the pooled observed data is. That is, by using the permutation testing theory and the nonparametric combination (NPC) of dependent permutation tests [4–8, 18–23]. When the underlying population distribution is unknown, nonparametric permutation methods might become a necessity. This is especially true when the number of categories and/or that of underlying nuisance parameters are not very small. [16] studied the testing for marginal inhomogeneity and direction-independent marginal order under the global permutation tests. [15] utilized -P statistic with small sample size under the permutation approach. The NPC approach is a general methodology for multivariate problems, especially, for stochastic dominance and stochastic ordering. The NPC testing solution performs [24] Union-Intersection (UI) approach when an equivalent set of sub-problems is properly carried out.
In principle, the exact calculations of required testing distributions are obtained through complete enumeration of all data permutations. This, however, becomes impossible in practice when the cardinality of permutation spaces are large. To this end, a conditional Monte Carlo procedure was suggested to practically obtain their estimations, at any desired degree of accuracy ([18, 20, 22]). Main NPC routines are achieved in MATLAB, R, Python, StatXact, SAS, etc.
The rest of the paper is organized as follows. Section 2 introduces a typical real example. Section 3 discusses the two-sample basic problem under unidimensional and multidimensional cases. Section 4 studies approaches for stochastic ordering restriction in C-sample designs. Solutions to the example are in Sect. 5. Some concluding remarks are in Sect. 6.
2 A Typical Medical Example
Dose and Extent of trauma due to subarachnoid hemorrhage
Treatment | Death | Veget | Major | Minor | Recov | Total |
---|---|---|---|---|---|---|
Placebo | 59 | 25 | 46 | 48 | 32 | 210 |
Low | 48 | 21 | 44 | 47 | 30 | 190 |
Medium | 44 | 14 | 54 | 64 | 31 | 207 |
High | 43 | 4 | 49 | 58 | 41 | 195 |
Total | 194 | 64 | 193 | 217 | 134 | 802 |
Based on our intuition, but also in accordance with quoted authors, patients taking Placebo are expected to achieve lower treatment effect than those taking Low dose, patients taking Low dose have lower effects than those with Medium dose, and so forth. Therefore, it is expected that patients exhibit monotonically non-decreasing responses X as the dose increases. Thus, it is required to test whether there is a monotonic stochastic ordering on related response data. Formally, the hypotheses to consider are against with at least one strict inequality. If responses were quantitative, this problem is also termed of isotonic regression. Defining the cumulative distribution function for responses X at ordered categories as , namely, the hypotheses are equivalently expressed as against , with at least one strict inequality.
With clear meaning of the symbols, the rationale for this formulation resides in that if, according to increasing doses, non-decreasing treatment effects occur at latent variables Y, i.e., , , then latent responses should behave as .
The related testing problem has a rather difficult solution within the likelihood-ratio theory, which with categorical data in addition presents quite a serious difficulty: even for moderate number of cells it is recognized to be not unique ([10–12, 26, 27]; etc.). Moreover, to get a solution, important supplementary options, difficult to justify in terms of the real problem under study, are required. This difficulty mostly consists in that the set of alternatives is restricted to lie in the -Dimensional positive orthant where the likelihood cannot be maximized under by ordinary methods of maximization.
Our solution does firstly consider the setting of two treatments, and then, according to [24] UI and Jonckheere–Terpstra’s approaches, by a breakdown of the hypotheses into pairs of sub-hypotheses. Later, all resulting dependent partial tests are combined by a NPC method.
3 The Two-Sample Basic Problem
Let us firstly consider the two-sample basic case, where data are in a table and the specific hypotheses are expressed as against with at least one strict inequality. The related testing problem can be equivalently set as against the set of restricted alternatives .
It is worth noting that i) according to [24] the problem is equivalently broken-down into one-sided sub-problems; ii) defines a multi-one-sided set of alternatives; iii) since under both and it is , category is not considered; iv) the global solution requires the joint comparison of random relative frequencies: .
Since the number of unknown nuisance parameters to take care in any testing process is and the likelihood is to be maximized in the -dimensional positive orthant, indeed a very difficult task especially when , our approach is to stay within the conditional principle of inference.
The conditioning should be on a set of sufficient statistics in the null hypothesis for the unknown underlying common distribution F. To this end, let be the underlying likelihood related to F, and let the two independent samples of IID data, respectively, sized and , be and . So the data set is , whose joint likelihood is . In null hypothesis, it is assumed that there is no difference between two distributions, namely, . Thus, the joint null likelihood is invariable with respect to any permutation of the observed pooled data , where is the symbol for pooling two data sets. This shows that data under are exchangeable, i.e., permutable. Moreover, under pooled data are always a set of sufficient statistics for any underlying distribution F [18–20, 22]; so, any information on parameters defining F is wholly contained in . The set of all permutations of is indicated with . It is worth noting that , i.e., the set of permutations of coincides, , with that of . Of course, under the alternative the above invariable property does not work, because the two distributions are different by assumption: indeed is sufficient for and is sufficient for and so pooled data are not exchangeable.
The act of conditioning on a set of sufficient statistics for F in entails that any conditional inference is independent of the underlying population distribution F. This conditioning gives rise to the following fundamental property:
Let be the probability space related to data X, then sufficiency of for underlying F, under, implies that the null conditional probability of any event , given , is independent of F, i.e., .
It is worth noting that (c3) is the central property for deducing and justifying the NPC of dependent permutation tests. Also worth noting is (i) the conditional probability has always an objective existence; (ii) the conditional null distribution of is independent of all dependence parameters underlying ; (iii) to characterize sufficiency of in , permutation tests require the existence of a likelihood , not its calculability; (iv) when is minimal sufficient for F, it makes no sense to work outside the permutation testing principle [20]; (v) permutation tests are nonparametric, distribution-free, and intrinsically robust.
For operating with categorical data, to the usual contingency table we prefer using its unit-by-unit representation: , with , where it is intended that the first data belong to the first sample and the rest belong to the second. Such a representation, is one-to-one related with the table for unidimensional variables, as with the example; it is much more efficient for multidimensional variables, where working with contingency tables becomes more and more difficult, up to impossibility, as the number of variables increases. Using that representation, a random permutation can be expressed as , where is a random permutation of the unit labels . The corresponding permuted table is calculate as . Obviously, the marginal frequencies are permutation invariable quantities since , . Similarly, the cumulative marginal frequencies are also invariable: with .
3.1 The One-Dimensional Case
We start with the two-samples one-dimensional problem. For the case of , the related stochastic dominance testing problem becomes against , whose global analysis requires the joint comparison of differences of random frequencies: . Since the crucial point for that joint analysis is the proper handling of all underlying dependences, to attain general solutions we must work within the UI-NPC of related dependent permutation tests because, due to c3) (see Sect. 3), the estimation of dependence coefficients is not required since NPC works independently of such dependences, how complex these are.
It is worth noting that (i) EDFs are maximum likelihood unbiased estimates of population CDFs , , ; (ii) each partial tests is a reformulation of Fisher’s exact probability test and so it is a best conditional test; (iii) large values of each partial test are significant against its related null sub-hypothesis ; (iv) the partial tests are positively dependent; (v) for computation of , 0 is assigned to expressions with the form 0/0; (vi) is a permutation constant not dependent on k; (vii) for increasing sample sizes, each under converges to the standardized normal distribution: .
The corresponding p-value-like statistics can be written as , where is the observed value of on pooled data . So, remembering that p-value-like statistics play the role of tests whose common critical value is , if , the null hypothesis is rejected at significance level .
Representation of the conditional Monte Carlo method in multivariate tests
Under , the sub-matrix simulates the K-dimensional null distribution of K partial permutation tests. The sub-vector simulates the null permutation distribution of combined test .
Thus, the statistic gives an unbiased and, as R diverges, a strongly consistent estimate of the p-value statistic of .
Under , at least one presents larger observed values than in ; so, if the combining function is non-decreasing in each argument, the p-value statistic satisfies the relation: uniformly for every data set and every underlying distribution F. Hence, the latter justifies that is rejected when ; moreover, it can be proved that is provided with the unbiasedness and consistency properties. Details and proofs for these and other properties are in [18–20, 22].
3.2 The Multidimensional Case
4 The C-sample Stochastic Ordering Problem
Considering the Jonckheere–Terpstra idea, the table can be broken-down into sub-tables. Accordingly, the testing problem is broken-down into sub-problems each based on a sub-table. To be specific, for any , we divide the data set into two pooled pseudo-groups, where the first pseudo-group is obtained by pooling data of the first j ordered groups and the second by pooling the rest. Thus, the procedure considers the first pooled pseudo-group as and the second as , , where is the data set in the jth group.
5 Solution of Medical Example
p-values based on UI-NPC approach
0.0141 | 0.0025 | 0.0074 | 0.0017 | 0.0015 | 0.0012 | 0.0068 | |
0.0131 | 0.0021 | 0.0076 | 0.0010 | 0.0012 | 0.0010 | 0.0053 | |
0.0144 | 0.0024 | 0.0062 | 0.0011 | 0.0014 | 0.0011 | 0.0068 |
Results in Table 3 clearly show that the p-values based on four different combination functions , , , and , all reject the null hypothesis at significance level of monotonic stochastic ordering among the doses. So the inferential conclusion is that patients present non-decreasing responses as the dose increases.
It is worth noting that the three combined p-value statistics , , and differ only slightly in the fourth digit. This means that related tests are all suitable for testing unidimensional dominance and stochastic ordering alternatives. In our case, if the stochastic ordering alternative is true, it is also jointly true by construction for all partial tests . So, Tippett’s differs from other combination functions because its power behavior is mostly sensitive when only one partial test lies in the alternative. Due to too many ties in the data set, test with rank transformations was not considered.
Since all p-values statistics related to are , by simple Bonferroni’s rule it results that subjects taking High dose exhibit significantly lower responses than those taking lower doses.
6 Concluding Remarks
The basic idea in this paper is to test for stochastic ordering restrictions with multivariate ordered categorical data through a suitable combination of a set of partial tests by UI-NPC approach based within the permutation theory. Such problems have quite difficult solutions within the likelihood ratio theory which, when available, have nice characterizations under their usually too stringent assumptions.
The UI-NPC approach is within the conditionality principle of inference, where the conditioning is with respect to a set of sufficient statistics in the null hypothesis like the pooled observed data. So, it is based on the permutation testing approach and the NPC of dependent permutation tests. The NPC approach shows a good general power behavior, it is rather efficient and less demanding in terms of underlying assumptions comparing to parametric competitors when these exist and are available.