Generalization Bounds For Meta-Learning: An Information-Theoretic
Analysis
- URL: http://arxiv.org/abs/2109.14595v1
- Date: Wed, 29 Sep 2021 17:45:54 GMT
- Title: Generalization Bounds For Meta-Learning: An Information-Theoretic
Analysis
- Authors: Qi Chen, Changjian Shui, Mario Marchand
- Abstract summary: We propose a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning algorithms.
We provide a data-dependent generalization bound for a variant of MAML, which is non-vacuous for deep few-shot learning.
- Score: 8.028776552383365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive a novel information-theoretic analysis of the generalization
property of meta-learning algorithms. Concretely, our analysis proposes a
generic understanding of both the conventional learning-to-learn framework and
the modern model-agnostic meta-learning (MAML) algorithms. Moreover, we provide
a data-dependent generalization bound for a stochastic variant of MAML, which
is non-vacuous for deep few-shot learning. As compared to previous bounds that
depend on the square norm of gradients, empirical validations on both simulated
data and a well-known few-shot benchmark show that our bound is orders of
magnitude tighter in most situations.
Related papers
- Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Information-theoretic limits of a multiview low-rank symmetric spiked
matrix model [19.738567726658875]
We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models.
We rigorously establish the information-theoretic limits through the proof of single-letter formulas.
We improve the recently introduced adaptive method, so that it can be used to study low-rank models.
arXiv Detail & Related papers (2020-05-16T15:31:07Z) - Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm.
In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z) - Reasoning About Generalization via Conditional Mutual Information [26.011933885798506]
We use Mutual Information (CMI) to quantify how well the input can be recognized.
We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
We then show that bounded CMI implies various forms of generalization.
arXiv Detail & Related papers (2020-01-24T18:13:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.