EMO: Episodic Memory Optimization for Few-Shot Meta-Learning
- URL: http://arxiv.org/abs/2306.05189v3
- Date: Mon, 26 Jun 2023 18:36:32 GMT
- Title: EMO: Episodic Memory Optimization for Few-Shot Meta-Learning
- Authors: Yingjun Du, Jiayi Shen, Xiantong Zhen, Cees G.M. Snoek
- Abstract summary: episodic memory optimization for meta-learning, we call EMO, is inspired by the human ability to recall past learning experiences from the brain's memory.
EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative.
EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods.
- Score: 69.50380510879697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot meta-learning presents a challenge for gradient descent optimization
due to the limited number of training samples per task. To address this issue,
we propose an episodic memory optimization for meta-learning, we call EMO,
which is inspired by the human ability to recall past learning experiences from
the brain's memory. EMO retains the gradient history of past experienced tasks
in external memory, enabling few-shot learning in a memory-augmented way. By
learning to retain and recall the learning process of past training tasks, EMO
nudges parameter updates in the right direction, even when the gradients
provided by a limited number of examples are uninformative. We prove
theoretically that our algorithm converges for smooth, strongly convex
objectives. EMO is generic, flexible, and model-agnostic, making it a simple
plug-and-play optimizer that can be seamlessly embedded into existing
optimization-based few-shot meta-learning approaches. Empirical results show
that EMO scales well with most few-shot classification benchmarks and improves
the performance of optimization-based meta-learning methods, resulting in
accelerated convergence.
Related papers
- Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting [41.891312602770746]
Gradient Episodic Memory (GEM) achieves balance by utilizing a subset of past training samples to restrict the update direction of the model parameters.
We show that memory strength is effective mainly because it improves GEM's ability generalization and therefore leads to a more favorable trade-off.
arXiv Detail & Related papers (2024-10-01T17:03:56Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself.
The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric.
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z) - Memory Augmented Optimizers for Deep Learning [10.541705775336657]
We propose a framework of memory-augmented gradient descents that retain a limited view of their gradient history in their internal memory.
We show that the proposed class of gradient descents with fixed-size memory converge under assumptions of strong convexity.
arXiv Detail & Related papers (2021-06-20T14:58:08Z) - La-MAML: Look-ahead Meta Learning for Continual Learning [14.405620521842621]
We propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory.
La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
arXiv Detail & Related papers (2020-07-27T23:07:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.