Fast and Robust Rank Aggregation against Model Misspecification
- URL: http://arxiv.org/abs/1905.12341v2
- Date: Fri, 5 May 2023 08:06:38 GMT
- Title: Fast and Robust Rank Aggregation against Model Misspecification
- Authors: Yuangang Pan, Weijie Chen, Gang Niu, Ivor W. Tsang, Masashi Sugiyama
- Abstract summary: In rank aggregation (RA), a collection of preferences from different users are summarized into a total order under the assumption of homogeneity of users.
Model misspecification in RA arises since the homogeneity assumption fails to be satisfied in the complex real-world situation.
We propose CoarsenRank, which possesses robustness against model misspecification.
- Score: 105.54181634234266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In rank aggregation (RA), a collection of preferences from different users
are summarized into a total order under the assumption of homogeneity of users.
Model misspecification in RA arises since the homogeneity assumption fails to
be satisfied in the complex real-world situation. Existing robust RAs usually
resort to an augmentation of the ranking model to account for additional
noises, where the collected preferences can be treated as a noisy perturbation
of idealized preferences. Since the majority of robust RAs rely on certain
perturbation assumptions, they cannot generalize well to agnostic
noise-corrupted preferences in the real world. In this paper, we propose
CoarsenRank, which possesses robustness against model misspecification.
Specifically, the properties of our CoarsenRank are summarized as follows: (1)
CoarsenRank is designed for mild model misspecification, which assumes there
exist the ideal preferences (consistent with model assumption) that locates in
a neighborhood of the actual preferences. (2) CoarsenRank then performs regular
RAs over a neighborhood of the preferences instead of the original dataset
directly. Therefore, CoarsenRank enjoys robustness against model
misspecification within a neighborhood. (3) The neighborhood of the dataset is
defined via their empirical data distributions. Further, we put an exponential
prior on the unknown size of the neighborhood, and derive a much-simplified
posterior formula for CoarsenRank under particular divergence measures. (4)
CoarsenRank is further instantiated to Coarsened Thurstone, Coarsened
Bradly-Terry, and Coarsened Plackett-Luce with three popular probability
ranking models. Meanwhile, tractable optimization strategies are introduced
with regards to each instantiation respectively. In the end, we apply
CoarsenRank on four real-world datasets.
Related papers
- Robust Gaussian Processes via Relevance Pursuit [17.39376866275623]
We propose and study a GP model that achieves robustness against sparse outliers by inferring data-point-specific noise levels.
We show, surprisingly, that the model can be parameterized such that the associated log marginal likelihood is strongly concave in the data-point-specific noise variances.
arXiv Detail & Related papers (2024-10-31T17:59:56Z) - Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation [0.0]
The Gaussian Process (GP) is a flexible non-linear regression approach.
It provides a principled approach to handling our uncertainty over predicted (counterfactual) values.
This is especially valuable under conditions of extrapolation or weak overlap.
arXiv Detail & Related papers (2024-07-15T05:09:50Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Robust Estimation of Causal Heteroscedastic Noise Models [7.568978862189266]
Student's $t$-distribution is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape.
Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
arXiv Detail & Related papers (2023-12-15T02:26:35Z) - The Decaying Missing-at-Random Framework: Doubly Robust Causal Inference
with Partially Labeled Data [10.021381302215062]
In real-world scenarios, data collection limitations often result in partially labeled datasets, leading to difficulties in drawing reliable causal inferences.
Traditional approaches in the semi-parametric (SS) and missing data literature may not adequately handle these complexities, leading to biased estimates.
This framework tackles missing outcomes in high-dimensional settings and accounts for selection bias.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Learning Inconsistent Preferences with Gaussian Processes [14.64963271587818]
We revisit widely used preferential Gaussian processes by Chu et al.(2005) and challenge their modelling assumption that imposes rankability of data items via latent utility function values.
We propose a generalisation of pgp which can capture more expressive latent preferential structures in the data.
Our experimental findings support the conjecture that violations of rankability are ubiquitous in real-world preferential data.
arXiv Detail & Related papers (2020-06-06T11:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.