Generalization error bounds for iterative learning algorithms with
bounded updates
- URL: http://arxiv.org/abs/2309.05077v3
- Date: Sat, 14 Oct 2023 15:13:14 GMT
- Title: Generalization error bounds for iterative learning algorithms with
bounded updates
- Authors: Jingwen Fu and Nanning Zheng
- Abstract summary: We introduce a novel bound for the generalization of these algorithms with bounded updates.
We employ a variance decomposition technique to decompose information across iterations.
To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models.
- Score: 41.87646434714452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the generalization characteristics of iterative learning
algorithms with bounded updates for non-convex loss functions, employing
information-theoretic techniques. Our key contribution is a novel bound for the
generalization error of these algorithms with bounded updates. Our approach
introduces two main novelties: 1) we reformulate the mutual information as the
uncertainty of updates, providing a new perspective, and 2) instead of using
the chaining rule of mutual information, we employ a variance decomposition
technique to decompose information across iterations, allowing for a simpler
surrogate process. We analyze our generalization bound under various settings
and demonstrate improved bounds. To bridge the gap between theory and practice,
we also examine the previously observed scaling behavior in large language
models. Ultimately, our work takes a further step for developing practical
generalization theories.
Related papers
- An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Minimum Description Length and Generalization Guarantees for
Representation Learning [16.2444595840653]
This paper presents a framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm.
Rather than the mutual information between the encoder's input and the representation, our new bounds involve the "multi-letter" relative entropy.
To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning.
arXiv Detail & Related papers (2024-02-05T18:12:28Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Information-theoretic generalization bounds for black-box learning
algorithms [46.44597430985965]
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm.
We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
arXiv Detail & Related papers (2021-10-04T17:28:41Z) - Information Complexity and Generalization Bounds [0.0]
We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms.
We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
arXiv Detail & Related papers (2021-05-04T20:37:57Z) - Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms [19.186110989897738]
Generalization error bounds are critical to understanding the performance of machine learning models.
We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
arXiv Detail & Related papers (2021-02-03T11:38:00Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.