Related papers: The Cost of Parallelizing Boosting

The Cost of Parallelizing Boosting

URL: http://arxiv.org/abs/2402.15145v1
Date: Fri, 23 Feb 2024 07:03:52 GMT
Title: The Cost of Parallelizing Boosting
Authors: Xin Lyu, Hongxun Wu, Junzhao Yang
Abstract summary: We study the cost of parallelizing weak-to-strong boosting algorithms for learning. We show that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training.
Score: 1.9235143628887907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $\gamma$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma^2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $\Omega(1 / \gamma^2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t \gamma^2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t^2)$. Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.

Related papers

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise [36.29182619215646]
We study the problem of PAC learning $gamma$-margin halfspaces in the presence of Massart noise. Our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.
arXiv Detail & Related papers (2025-01-16T17:44:18Z)
The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria [37.300102993926046]
We study the problem of solving matrix games of the form $max_mathbfwinmathcalWmin_mathbfpinDeltamathbfptopAmathbfw$.<n>This problem encapsulates canonical tasks such as finding a linear separator and computing Nash equilibria in zero-sum games.
arXiv Detail & Related papers (2024-12-09T20:58:26Z)
The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem [53.446980306786095]
Smooth boosters generate distributions that do not place too much weight on any given example. Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, mildly, and quantum learning theory.
arXiv Detail & Related papers (2024-09-17T23:09:25Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Quantum learning algorithms imply circuit lower bounds [7.970954821067043]
We establish the first general connection between the design of quantum algorithms and circuit lower bounds. Our proof builds on several works in learning theory, pseudorandomness, and computational complexity.
arXiv Detail & Related papers (2020-12-03T14:03:20Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)
Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces. We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Quantum Boosting [8.93449755281201]
Boosting is a technique that converts a weak and inaccurate machine learning algorithm into a strong accurate learning algorithm. We show how quantum techniques can improve the time complexity of classical AdaBoost.
arXiv Detail & Related papers (2020-02-12T15:47:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.