Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise
- URL: http://arxiv.org/abs/2602.21039v1
- Date: Tue, 24 Feb 2026 16:00:15 GMT
- Title: Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise
- Authors: Rafael Hanashiro, Abhishek Shetty, Patrick Jaillet,
- Abstract summary: We show that learning across $k$ distributions incurs slow rates scaling with $k/2$, even under constant noise levels, unless each distribution is learned separately.<n>A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality.
- Score: 26.182166506085114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast $1/ε$ rates achievable in single-task learning extend to this regime with minimal dependence on $k$. Surprisingly, we show that this is not the case. We demonstrate that learning across $k$ distributions inherently incurs slow rates scaling with $k/ε^2$, even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a \textit{multiplicative} penalty in $k$. This establishes a \textit{statistical} separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.
Related papers
- Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness [63.833913892018536]
We study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$.<n>We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness.<n>We show that the generalized smoothness also characterizes private learnability under distributional constraints.
arXiv Detail & Related papers (2026-02-24T06:15:59Z) - Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations [0.723486289593299]
In 2018, Ashtiani et al. reframed emphsample compressibility, originally due to Littlestone and Warmuth (1986), as a structural property of distribution classes.<n>We establish that sample compressible families remain learnable even from perturbed samples, subject to a set of necessary and sufficient conditions.
arXiv Detail & Related papers (2025-06-07T01:11:50Z) - Contextual Learning for Stochastic Optimization [1.0819408603463425]
Motivated by optimization, we introduce the problem of learning from samples of contextual value distributions.<n>A contextual value distribution can be understood as a family of real-valued distributions, where each sample consists of a context $x$ and a random variable drawn from the corresponding real-valued distribution $D_x$.
arXiv Detail & Related papers (2025-05-22T16:01:49Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.<n>We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation.
Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Pitfalls of Gaussians as a noise distribution in NCE [22.23473249312549]
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.
We show that the choice of $q$ can severely impact the computational and statistical efficiency of NCE.
arXiv Detail & Related papers (2022-10-01T04:42:56Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Sample Complexity Bounds for Robustly Learning Decision Lists against
Evasion Attacks [25.832511407411637]
A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks.
We work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability.
For every fixed $k$ the class of $k$-decision lists has sample complexity against a $log(n)$-bounded adversary.
arXiv Detail & Related papers (2022-05-12T14:40:18Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.