Related papers: Fine-Grained Distribution-Dependent Learning Curves

Fine-Grained Distribution-Dependent Learning Curves

URL: http://arxiv.org/abs/2208.14615v1
Date: Wed, 31 Aug 2022 03:29:21 GMT
Title: Fine-Grained Distribution-Dependent Learning Curves
Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Jonathan Shafer, Ilya Tolstikhin
Abstract summary: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. In this paper we introduce a new dimension characterization called the grained PAC that improves and refines the recent results of Bousquet et al. Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds.
Score: 27.09513298165498
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. They are widely used by machine learning practitioners as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. In this paper we introduce a new combinatorial characterization called the VCL dimension that improves and refines the recent results of Bousquet et al. (2021). Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds, and showing that for classes with finite VCL, the rate of decay can be decomposed into a linear component that depends only on the hypothesis class and an exponential component that depends also on the target distribution. In particular, the finer nuance of the VCL dimension implies lower bounds that are quantitatively stronger than the bounds of Bousquet et al. (2021) and qualitatively stronger than classic 'no free lunch' lower bounds. The VCL characterization solves an open problem studied by Antos and Lugosi (1998), who asked in what cases such lower bounds exist. As a corollary, we recover their lower bound for half-spaces in $\mathbb{R}^d$, and we do so in a principled way that should be applicable to other cases as well. Finally, to provide another viewpoint on our work and how it compares to traditional PAC learning bounds, we also present an alternative formulation of our results in a language that is closer to the PAC setting.

Related papers

Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z)
Few-Shot Class-Incremental Learning with Prior Knowledge [94.95569068211195]
We propose Learning with Prior Knowledge (LwPK) to enhance the generalization ability of the pre-trained model. Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting.
arXiv Detail & Related papers (2024-02-02T08:05:35Z)
Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation [26.277745106128197]
We propose an algorithm to tackle long planning horizon problems in reinforcement learning with general function approximation. The derived regret bound is deemed emphsharp as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors. The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs and (ii) fine-grained analyses.
arXiv Detail & Related papers (2023-12-07T17:35:34Z)
Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning. Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z)
Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)
ELM: Embedding and Logit Margins for Long-Tail Learning [70.19006872113862]
Long-tail learning is the problem of learning under skewed label distributions. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.
arXiv Detail & Related papers (2022-04-27T21:53:50Z)
Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning. Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations. We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Sample-efficient L0-L2 constrained structure learning of sparse Ising models [3.056751497358646]
We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples. We leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients.
arXiv Detail & Related papers (2020-12-03T07:52:20Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Batch Value-function Approximation with Only Realizability [17.692408242465763]
We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Qstar$ from an exploratory dataset. Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure. We also discuss how BVFT can be applied to model selection among other extensions and open problems.
arXiv Detail & Related papers (2020-08-11T20:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.