Universal rates of ERM for agnostic learning
- URL: http://arxiv.org/abs/2506.14110v2
- Date: Tue, 15 Jul 2025 13:56:42 GMT
- Title: Universal rates of ERM for agnostic learning
- Authors: Steve Hanneke, Mingyue Xu,
- Abstract summary: Empirical Risk Minimization principle being fundamental in the PAC theory and ubiquitous in practical machine learning.<n>We explore the possibilities of agnostic universal rates, being either $e-n$, $o(n-1/2)$, or arbitrarily slow.
- Score: 15.001378595582269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical Risk Minimization (ERM) principle being fundamental in the PAC theory and ubiquitous in practical machine learning, the recent work of arXiv:2412.02810 studied the universal rates of ERM for binary classification under the realizable setting. However, the assumption of realizability is too restrictive to hold in practice. Indeed, the majority of the literature on universal learning has focused on the realizable case, leaving the non-realizable case barely explored. In this paper, we consider the problem of universal learning by ERM for binary classification under the agnostic setting, where the ''learning curve" reflects the decay of the excess risk as the sample size increases. We explore the possibilities of agnostic universal rates and reveal a compact trichotomy: there are three possible agnostic universal rates of ERM, being either $e^{-n}$, $o(n^{-1/2})$, or arbitrarily slow. We provide a complete characterization of which concept classes fall into each of these categories. Moreover, we also establish complete characterizations for the target-dependent universal rates as well as the Bayes-dependent universal rates.
Related papers
- Universal Rates of Empirical Risk Minimization [15.001378595582269]
We study the problem of universal learning by empirical risk (ERM) in the realizable case.<n>There are only four possible universal learning rates by ERM, namely, the learning curves of any concept class learnable by ERM decay either at $e-n$, $1/n$, $log(n)/n$, or arbitrarily slow rates.
arXiv Detail & Related papers (2024-12-03T20:20:39Z) - Rethinking Multi-domain Generalization with A General Learning Objective [17.155829981870045]
Multi-domain generalization (mDG) is universally aimed to minimize discrepancy between training and testing distributions.<n>Existing mDG literature lacks a general learning objective paradigm.<n>We propose to leverage a $Y$-mapping to relax the constraint.
arXiv Detail & Related papers (2024-02-29T05:00:30Z) - Generalizable Heterogeneous Federated Cross-Correlation and Instance
Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation.
For heterogeneous issue, we leverage irrelevant unlabeled public data for communication.
For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z) - A Universal Unbiased Method for Classification from Aggregate
Observations [115.20235020903992]
This paper presents a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses.
Our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses.
arXiv Detail & Related papers (2023-06-20T07:22:01Z) - On the Stability and Generalization of Triplet Learning [55.75784102837832]
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks.
This paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis.
arXiv Detail & Related papers (2023-02-20T07:32:50Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - Multiclass Learnability Beyond the PAC Framework: Universal Rates and
Partial Concept Classes [31.2676304636432]
We study the problem of multiclass classification with a bounded number of different labels $k$, in the realizable setting.
We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions.
arXiv Detail & Related papers (2022-10-05T14:36:27Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.