Boosting Simple Learners
- URL: http://arxiv.org/abs/2001.11704v8
- Date: Fri, 16 Jun 2023 14:06:37 GMT
- Title: Boosting Simple Learners
- Authors: Noga Alon and Alon Gonen and Elad Hazan and Shay Moran
- Abstract summary: We focus on two main questions: (i) Complexity: How many weak hypotheses are needed to produce an accurate hypothesis?
We design a novel boosting algorithm which circumvents a classical lower bound by Freund and Schapire ('95, '12)
We provide an affirmative answer to the second question for well-studied corollary classes, including half-spaces and decision stumps.
- Score: 45.09968166110557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Boosting is a celebrated machine learning approach which is based on the idea
of combining weak and moderately inaccurate hypotheses to a strong and accurate
one. We study boosting under the assumption that the weak hypotheses belong to
a class of bounded capacity. This assumption is inspired by the common
convention that weak hypotheses are "rules-of-thumbs" from an "easy-to-learn
class". (Schapire and Freund~'12, Shalev-Shwartz and Ben-David '14.) Formally,
we assume the class of weak hypotheses has a bounded VC dimension. We focus on
two main questions: (i) Oracle Complexity: How many weak hypotheses are needed
to produce an accurate hypothesis? We design a novel boosting algorithm and
demonstrate that it circumvents a classical lower bound by Freund and Schapire
('95, '12). Whereas the lower bound shows that $\Omega({1}/{\gamma^2})$ weak
hypotheses with $\gamma$-margin are sometimes necessary, our new method
requires only $\tilde{O}({1}/{\gamma})$ weak hypothesis, provided that they
belong to a class of bounded VC dimension. Unlike previous boosting algorithms
which aggregate the weak hypotheses by majority votes, the new boosting
algorithm uses more complex ("deeper") aggregation rules. We complement this
result by showing that complex aggregation rules are in fact necessary to
circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can
be learned by boosting weak hypotheses from a bounded VC class? Can complex
concepts that are "far away" from the class be learned? Towards answering the
first question we {introduce combinatorial-geometric parameters which capture
expressivity in boosting.} As a corollary we provide an affirmative answer to
the second question for well-studied classes, including half-spaces and
decision stumps. Along the way, we establish and exploit connections with
Discrepancy Theory.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Local Borsuk-Ulam, Stability, and Replicability [16.6959756920423]
We show that both list-replicable and globally stable learning are impossible in the PAC setting.
Specifically, we establish optimal bounds for list replicability and global stability numbers in finite classes.
arXiv Detail & Related papers (2023-11-02T21:10:16Z) - Multiclass Boosting: Simple and Intuitive Weak Learning Criteria [72.71096438538254]
We give a simple and efficient boosting algorithm, that does not require realizability assumptions.
We present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning.
arXiv Detail & Related papers (2023-07-02T19:26:58Z) - Lower Bounds for Learning in Revealing POMDPs [88.23337313766355]
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging emphpartially observable setting.
For emphmulti-step revealing POMDPs, we show that the latent state-space dependence is at least $Omega(S1.5)$ in the sample complexity.
arXiv Detail & Related papers (2023-02-02T18:59:30Z) - The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest [0.0]
One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise.
We show that compression is neither necessary nor sufficient to maximise performance.
We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.
arXiv Detail & Related papers (2023-01-30T15:29:40Z) - Comparative Learning: A Sample Complexity Theory for Two Hypothesis
Classes [5.194264506657145]
We introduce comparative learning as a combination of the realizable and agnostic settings in PAC learning.
Even when both $S$ and $B$ have infinite VC dimensions, comparative learning can still have a small sample complexity.
We show that the sample complexity of comparative learning is characterized by the mutual VC dimension $mathsfVC(S,B)$.
arXiv Detail & Related papers (2022-11-16T18:38:24Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Sample-efficient proper PAC learning with approximate differential
privacy [51.09425023771246]
We prove that the sample complexity of properly learning a class of Littlestone dimension $d$ with approximate differential privacy is $tilde O(d6)$, ignoring privacy and accuracy parameters.
This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of $2O(d)$ on the sample complexity.
arXiv Detail & Related papers (2020-12-07T18:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.