Related papers: Generalization error bounds for iterative learning algorithms with bounded updates

Generalization error bounds for iterative learning algorithms with bounded updates

URL: http://arxiv.org/abs/2309.05077v3
Date: Sat, 14 Oct 2023 15:13:14 GMT
Title: Generalization error bounds for iterative learning algorithms with bounded updates
Authors: Jingwen Fu and Nanning Zheng
Abstract summary: We introduce a novel bound for the generalization of these algorithms with bounded updates. We employ a variance decomposition technique to decompose information across iterations. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models.
Score: 41.87646434714452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates. Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step for developing practical generalization theories.

Related papers

Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z)
Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early. We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z)
Minimum Description Length and Generalization Guarantees for Representation Learning [16.2444595840653]
This paper presents a framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm. Rather than the mutual information between the encoder's input and the representation, our new bounds involve the "multi-letter" relative entropy. To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning.
arXiv Detail & Related papers (2024-02-05T18:12:28Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Information-theoretic generalization bounds for black-box learning algorithms [46.44597430985965]
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
arXiv Detail & Related papers (2021-10-04T17:28:41Z)
Information Complexity and Generalization Bounds [0.0]
We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms. We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
arXiv Detail & Related papers (2021-05-04T20:37:57Z)
Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms [19.186110989897738]
Generalization error bounds are critical to understanding the performance of machine learning models. We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
arXiv Detail & Related papers (2021-02-03T11:38:00Z)
Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck. By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.