Statistical Depth Functions for Ranking Distributions: Definitions,
Statistical Learning and Applications
- URL: http://arxiv.org/abs/2201.08105v1
- Date: Thu, 20 Jan 2022 10:30:56 GMT
- Title: Statistical Depth Functions for Ranking Distributions: Definitions,
Statistical Learning and Applications
- Authors: Morgane Goibert, St\'ephan Cl\'emen\c{c}on, Ekhine Irurozki, Pavlo
Mozharovskyi
- Abstract summary: The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data.
It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities.
- Score: 3.7564482287844205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The concept of median/consensus has been widely investigated in order to
provide a statistical summary of ranking data, i.e. realizations of a random
permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$
say. As it sheds light onto only one aspect of $\Sigma$'s distribution $P$, it
may neglect other informative features. It is the purpose of this paper to
define analogs of quantiles, ranks and statistical procedures based on such
quantities for the analysis of ranking data by means of a metric-based notion
of depth function on the symmetric group. Overcoming the absence of vector
space structure on $\mathfrak{S}_n$, the latter defines a center-outward
ordering of the permutations in the support of $P$ and extends the classic
metric-based formulation of consensus ranking (medians corresponding then to
the deepest permutations). The axiomatic properties that ranking depths should
ideally possess are listed, while computational and generalization issues are
studied at length. Beyond the theoretical analysis carried out, the relevance
of the novel concepts and methods introduced for a wide variety of statistical
tasks are also supported by numerous numerical experiments.
Related papers
- Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions.
In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns.
We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Robust Consensus in Ranking Data Analysis: Definitions, Properties and
Computational Issues [2.867517731896504]
We introduce notions of robustness, together with dedicated statistical methods, for Consensus Ranking.
We propose specific extensions of the popular concept of breakdown point, tailored to consensus ranking.
arXiv Detail & Related papers (2023-03-22T19:36:56Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Ranking Inferences Based on the Top Choice of Multiway Comparisons [2.468314282946207]
This paper considers ranking inference of $n$ items based on the observed data on the top choice among $M$ randomly selected items at each trial.
It is a useful modification of the Plackett-Luce model for $M$-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to $M=2$.
arXiv Detail & Related papers (2022-11-22T02:34:52Z) - Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties
and Finite Sample Analysis [1.2891210250935146]
We propose an extension of the textitintegrated rank-weighted statistical depth (IRW depth in abbreviated form) originally introduced in citeIRW.
The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study.
arXiv Detail & Related papers (2021-06-21T12:53:37Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Distributed Estimation for Principal Component Analysis: an Enlarged
Eigenspace Analysis [45.829683377074524]
This paper studies distributed estimation for a fundamental statistical machine learning problem, principal component analysis (PCA)
We propose a novel multi-round algorithm for constructing top-$L$-dim eigenspace for distributed data.
Our algorithm takes advantage of shift-and-invert preconditioning and convex optimization.
arXiv Detail & Related papers (2020-04-05T22:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.