Robust Consensus in Ranking Data Analysis: Definitions, Properties and
Computational Issues
- URL: http://arxiv.org/abs/2303.12878v1
- Date: Wed, 22 Mar 2023 19:36:56 GMT
- Title: Robust Consensus in Ranking Data Analysis: Definitions, Properties and
Computational Issues
- Authors: Morgane Goibert, Cl\'ement Calauz\`enes, Ekhine Irurozki, St\'ephan
Cl\'emen\c{c}on
- Abstract summary: We introduce notions of robustness, together with dedicated statistical methods, for Consensus Ranking.
We propose specific extensions of the popular concept of breakdown point, tailored to consensus ranking.
- Score: 2.867517731896504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the issue of robustness in AI systems becomes vital, statistical learning
techniques that are reliable even in presence of partly contaminated data have
to be developed. Preference data, in the form of (complete) rankings in the
simplest situations, are no exception and the demand for appropriate concepts
and tools is all the more pressing given that technologies fed by or producing
this type of data (e.g. search engines, recommending systems) are now massively
deployed. However, the lack of vector space structure for the set of rankings
(i.e. the symmetric group $\mathfrak{S}_n$) and the complex nature of
statistics considered in ranking data analysis make the formulation of
robustness objectives in this domain challenging. In this paper, we introduce
notions of robustness, together with dedicated statistical methods, for
Consensus Ranking the flagship problem in ranking data analysis, aiming at
summarizing a probability distribution on $\mathfrak{S}_n$ by a median ranking.
Precisely, we propose specific extensions of the popular concept of breakdown
point, tailored to consensus ranking, and address the related computational
issues. Beyond the theoretical contributions, the relevance of the approach
proposed is supported by an experimental study.
Related papers
- Debiasing Synthetic Data Generated by Deep Generative Models [40.165159490379146]
Deep generative models (DGMs) for synthetic data generation induce bias and imprecision in synthetic data analyses.
We propose a new strategy that targets synthetic data created by DGMs for specific data analyses.
Our approach accounts for biases, enhances convergence rates, and facilitates the calculation of estimators with easily approximated large sample variances.
arXiv Detail & Related papers (2024-11-06T19:24:34Z) - Sequential Manipulation Against Rank Aggregation: Theory and Algorithm [119.57122943187086]
We leverage an online attack on the vulnerable data collection process.
From the game-theoretic perspective, the confrontation scenario is formulated as a distributionally robust game.
The proposed method manipulates the results of rank aggregation methods in a sequential manner.
arXiv Detail & Related papers (2024-07-02T03:31:21Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Statistical Depth Functions for Ranking Distributions: Definitions,
Statistical Learning and Applications [3.7564482287844205]
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data.
It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities.
arXiv Detail & Related papers (2022-01-20T10:30:56Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - Evaluating Model Robustness and Stability to Dataset Shift [7.369475193451259]
We propose a framework for analyzing stability of machine learning models.
We use the original evaluation data to determine distributions under which the algorithm performs poorly.
We estimate the algorithm's performance on the "worst-case" distribution.
arXiv Detail & Related papers (2020-10-28T17:35:39Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.