On Medians of (Randomized) Pairwise Means
- URL: http://arxiv.org/abs/2211.00603v1
- Date: Tue, 1 Nov 2022 17:18:15 GMT
- Title: On Medians of (Randomized) Pairwise Means
- Authors: Pierre Laforgue, Stephan Cl\'emen\c{c}on, Patrice Bertail
- Abstract summary: Tournament procedures, recently introduced in Lugosi & Mendelson, offer an appealing alternative to the principle of Empirical Risk Minimization in machine learning.
This paper extends this approach to address other learning problems, in particular for which the performance criterion takes the form of an expectation over pairs of observations.
- Score: 8.497456090408084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tournament procedures, recently introduced in Lugosi & Mendelson (2016),
offer an appealing alternative, from a theoretical perspective at least, to the
principle of Empirical Risk Minimization in machine learning. Statistical
learning by Median-of-Means (MoM) basically consists in segmenting the training
data into blocks of equal size and comparing the statistical performance of
every pair of candidate decision rules on each data block: that with highest
performance on the majority of the blocks is declared as the winner. In the
context of nonparametric regression, functions having won all their duels have
been shown to outperform empirical risk minimizers w.r.t. the mean squared
error under minimal assumptions, while exhibiting robustness properties. It is
the purpose of this paper to extend this approach in order to address other
learning problems, in particular for which the performance criterion takes the
form of an expectation over pairs of observations rather than over one single
observation, as may be the case in pairwise ranking, clustering or metric
learning. Precisely, it is proved here that the bounds achieved by MoM are
essentially conserved when the blocks are built by means of independent
sampling without replacement schemes instead of a simple segmentation. These
results are next extended to situations where the risk is related to a pairwise
loss function and its empirical counterpart is of the form of a $U$-statistic.
Beyond theoretical results guaranteeing the performance of the
learning/estimation methods proposed, some numerical experiments provide
empirical evidence of their relevance in practice.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Joint empirical risk minimization for instance-dependent
positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task.
The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances.
Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Mixture Proportion Estimation and PU Learning: A Modern Approach [47.34499672878859]
Given only positive examples and unlabeled examples, we might hope to estimate an accurate positive-versus-negative classifier.
classical methods for both problems break down in high-dimensional settings.
We propose two simple techniques: Best Bin Estimation (BBE) and Value Ignoring Risk (CVIR)
arXiv Detail & Related papers (2021-11-01T14:42:23Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generalization Bounds in the Presence of Outliers: a Median-of-Means
Study [8.905677748354364]
The Median-of-Means (MoM) is an estimator of the mean $theta$ of a square integrable r.v. $Z$.
Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning.
A new line of work is now trying to characterize and leverage MoM's ability to deal with corrupted data.
arXiv Detail & Related papers (2020-06-09T13:21:39Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.