The Cost of Parallelizing Boosting
- URL: http://arxiv.org/abs/2402.15145v1
- Date: Fri, 23 Feb 2024 07:03:52 GMT
- Title: The Cost of Parallelizing Boosting
- Authors: Xin Lyu, Hongxun Wu, Junzhao Yang
- Abstract summary: We study the cost of parallelizing weak-to-strong boosting algorithms for learning.
We show that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training.
- Score: 1.9235143628887907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for
learning, following the recent work of Karbasi and Larsen. Our main results are
two-fold:
- First, we prove a tight lower bound, showing that even "slight"
parallelization of boosting requires an exponential blow-up in the complexity
of training.
Specifically, let $\gamma$ be the weak learner's advantage over random
guessing. The famous \textsc{AdaBoost} algorithm produces an accurate
hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma^2)$
rounds where each round runs in polynomial time.
Karbasi and Larsen showed that "significant" parallelization must incur
exponential blow-up: Any boosting algorithm either interacts with the weak
learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up
in the complexity of training, where $d$ is the VC dimension of the hypothesis
class. We close the gap by showing that any boosting algorithm either has
$\Omega(1 / \gamma^2)$ rounds of interaction or incurs a smaller exponential
blow-up of $\exp(d)$.
-Complementing our lower bound, we show that there exists a boosting
algorithm using $\tilde{O}(1/(t \gamma^2))$ rounds, and only suffer a blow-up
of $\exp(d \cdot t^2)$.
Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower
bound is tight. More interestingly, this provides the first trade-off between
the parallelism and the total work required for boosting.
Related papers
- The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem [53.446980306786095]
Smooth boosters generate distributions that do not place too much weight on any given example.
Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, mildly, and quantum learning theory.
arXiv Detail & Related papers (2024-09-17T23:09:25Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Quantum learning algorithms imply circuit lower bounds [7.970954821067043]
We establish the first general connection between the design of quantum algorithms and circuit lower bounds.
Our proof builds on several works in learning theory, pseudorandomness, and computational complexity.
arXiv Detail & Related papers (2020-12-03T14:03:20Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z) - Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces.
We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Quantum Boosting [8.93449755281201]
Boosting is a technique that converts a weak and inaccurate machine learning algorithm into a strong accurate learning algorithm.
We show how quantum techniques can improve the time complexity of classical AdaBoost.
arXiv Detail & Related papers (2020-02-12T15:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.