Fine-Grained Distribution-Dependent Learning Curves
- URL: http://arxiv.org/abs/2208.14615v1
- Date: Wed, 31 Aug 2022 03:29:21 GMT
- Title: Fine-Grained Distribution-Dependent Learning Curves
- Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Jonathan Shafer, Ilya
Tolstikhin
- Abstract summary: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples.
In this paper we introduce a new dimension characterization called the grained PAC that improves and refines the recent results of Bousquet et al.
Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds.
- Score: 27.09513298165498
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning curves plot the expected error of a learning algorithm as a function
of the number of labeled input samples. They are widely used by machine
learning practitioners as a measure of an algorithm's performance, but classic
PAC learning theory cannot explain their behavior. In this paper we introduce a
new combinatorial characterization called the VCL dimension that improves and
refines the recent results of Bousquet et al. (2021). Our characterization
sheds new light on the structure of learning curves by providing fine-grained
bounds, and showing that for classes with finite VCL, the rate of decay can be
decomposed into a linear component that depends only on the hypothesis class
and an exponential component that depends also on the target distribution. In
particular, the finer nuance of the VCL dimension implies lower bounds that are
quantitatively stronger than the bounds of Bousquet et al. (2021) and
qualitatively stronger than classic 'no free lunch' lower bounds. The VCL
characterization solves an open problem studied by Antos and Lugosi (1998), who
asked in what cases such lower bounds exist. As a corollary, we recover their
lower bound for half-spaces in $\mathbb{R}^d$, and we do so in a principled way
that should be applicable to other cases as well. Finally, to provide another
viewpoint on our work and how it compares to traditional PAC learning bounds,
we also present an alternative formulation of our results in a language that is
closer to the PAC setting.
Related papers
- Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Few-Shot Class-Incremental Learning with Prior Knowledge [94.95569068211195]
We propose Learning with Prior Knowledge (LwPK) to enhance the generalization ability of the pre-trained model.
Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting.
arXiv Detail & Related papers (2024-02-02T08:05:35Z) - Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement
Learning with General Function Approximation [26.277745106128197]
We propose an algorithm to tackle long planning horizon problems in reinforcement learning with general function approximation.
The derived regret bound is deemed emphsharp as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors.
The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs and (ii) fine-grained analyses.
arXiv Detail & Related papers (2023-12-07T17:35:34Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - ELM: Embedding and Logit Margins for Long-Tail Learning [70.19006872113862]
Long-tail learning is the problem of learning under skewed label distributions.
We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space.
The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.
arXiv Detail & Related papers (2022-04-27T21:53:50Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Sample-efficient L0-L2 constrained structure learning of sparse Ising
models [3.056751497358646]
We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples.
We leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients.
arXiv Detail & Related papers (2020-12-03T07:52:20Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Batch Value-function Approximation with Only Realizability [17.692408242465763]
We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Qstar$ from an exploratory dataset.
Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure.
We also discuss how BVFT can be applied to model selection among other extensions and open problems.
arXiv Detail & Related papers (2020-08-11T20:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.