Towards Fundamental Limits for Active Multi-distribution Learning
- URL: http://arxiv.org/abs/2506.17607v1
- Date: Sat, 21 Jun 2025 06:08:58 GMT
- Title: Towards Fundamental Limits for Active Multi-distribution Learning
- Authors: Chicheng Zhang, Yihan Zhou,
- Abstract summary: We develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds.<n>We show that the bound in the realizable setting is information-theoretically optimal and that the $knu/varepsilon2$ term in the setting is fundamental for proper learners.
- Score: 16.639855803241524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-distribution learning extends agnostic Probably Approximately Correct (PAC) learning to the setting in which a family of $k$ distributions, $\{D_i\}_{i\in[k]}$, is considered and a classifier's performance is measured by its error under the worst distribution. This problem has attracted a lot of recent interests due to its applications in collaborative learning, fairness, and robustness. Despite a rather complete picture of sample complexity of passive multi-distribution learning, research on active multi-distribution learning remains scarce, with algorithms whose optimality remaining unknown. In this paper, we develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds, in distribution-dependent and distribution-free settings. Specifically, in the near-realizable setting we prove an upper bound of $\widetilde{O}\Bigl(\theta_{\max}(d+k)\ln\frac{1}{\varepsilon}\Bigr)$ and $\widetilde{O}\Bigl(\theta_{\max}(d+k)\Bigl(\ln\frac{1}{\varepsilon}+\frac{\nu^2}{\varepsilon^2}\Bigr)+\frac{k\nu}{\varepsilon^2}\Bigr)$ in the realizable and agnostic settings respectively, where $\theta_{\max}$ is the maximum disagreement coefficient among the $k$ distributions, $d$ is the VC dimension of the hypothesis class, $\nu$ is the multi-distribution error of the best hypothesis, and $\varepsilon$ is the target excess error. Moreover, we show that the bound in the realizable setting is information-theoretically optimal and that the $k\nu/\varepsilon^2$ term in the agnostic setting is fundamental for proper learners. We also establish instance-dependent sample complexity bound for passive multidistribution learning that smoothly interpolates between realizable and agnostic regimes~\citep{blum2017collaborative,zhang2024optimal}, which may be of independent interest.
Related papers
- Robust learning of halfspaces under log-concave marginals [6.852292115526837]
We give an algorithm that learns linear threshold functions and returns a classifier with boundary volume $O(r+varepsilon)$ at radius perturbation $r$.<n>The time and sample complexity of $dtildeO (1/varepsilon2)$ matches the complexity of Boolean regression.
arXiv Detail & Related papers (2025-05-19T20:12:16Z) - Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution.<n>We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties.<n>We show that in the presence of random classification noise, the complexity of our algorithm scales agnosticly with $1/epsilon$.
arXiv Detail & Related papers (2025-02-13T17:37:42Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - The Sample Complexity of Multi-Distribution Learning for VC Classes [25.73730126599202]
Multi-distribution learning is a generalization of PAC learning to settings with multiple data distributions.
There remains a significant gap between the known upper and lower bounds for PAC-learnable classes.
We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
arXiv Detail & Related papers (2023-07-22T18:02:53Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Agnostic Multi-Group Active Learning [24.97598179536084]
We consider a variant of this problem from the perspective of active learning, where the learner is endowed with the power to decide which examples are labeled from each distribution in the collection.
Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective.
We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning.
arXiv Detail & Related papers (2023-06-02T21:24:13Z) - Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations [79.66404989555566]
We consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy.
We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP.
arXiv Detail & Related papers (2021-06-22T03:20:40Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.