Multi-step Estimation for Gradient-based Meta-learning
- URL: http://arxiv.org/abs/2006.04298v1
- Date: Mon, 8 Jun 2020 00:37:01 GMT
- Title: Multi-step Estimation for Gradient-based Meta-learning
- Authors: Jin-Hwa Kim, Junyoung Park, Yongseok Choi
- Abstract summary: We propose a simple yet straightforward method to reduce the cost by reusing the same gradient in a window of inner steps.
We show that our method significantly reduces training time and memory usage, maintaining competitive accuracies, or even outperforming in some cases.
- Score: 3.4376560669160385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-based meta-learning approaches have been successful in few-shot
learning, transfer learning, and a wide range of other domains. Despite its
efficacy and simplicity, the burden of calculating the Hessian matrix with
large memory footprints is the critical challenge in large-scale applications.
To tackle this issue, we propose a simple yet straightforward method to reduce
the cost by reusing the same gradient in a window of inner steps. We describe
the dynamics of the multi-step estimation in the Lagrangian formalism and
discuss how to reduce evaluating second-order derivatives estimating the
dynamics. To validate our method, we experiment on meta-transfer learning and
few-shot learning tasks for multiple settings. The experiment on meta-transfer
emphasizes the applicability of training meta-networks, where other
approximations are limited. For few-shot learning, we evaluate time and memory
complexities compared with popular baselines. We show that our method
significantly reduces training time and memory usage, maintaining competitive
accuracies, or even outperforming in some cases.
Related papers
- Meta-Value Learning: a General Framework for Learning with Learning
Awareness [1.4323566945483497]
We propose to judge joint policies by their long-term prospects as measured by the meta-value.
We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates.
arXiv Detail & Related papers (2023-07-17T21:40:57Z) - Meta Learning to Bridge Vision and Language Models for Multimodal
Few-Shot Learning [38.37682598345653]
We introduce a multimodal meta-learning approach to bridge the gap between vision and language models.
We define a meta-mapper network, acting as a meta-learner, to efficiently bridge frozen large-scale vision and language models.
We evaluate our approach on recently proposed multimodal few-shot benchmarks, measuring how rapidly the model can bind novel visual concepts to words.
arXiv Detail & Related papers (2023-02-28T17:46:18Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Curriculum Meta-Learning for Few-shot Classification [1.5039745292757671]
We propose an adaptation of the curriculum training framework, applicable to state-of-the-art meta learning techniques for few-shot classification.
Our experiments with the MAML algorithm on two few-shot image classification tasks show significant gains with the curriculum training framework.
arXiv Detail & Related papers (2021-12-06T10:29:23Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - Large-Scale Meta-Learning with Continual Trajectory Shifting [76.29017270864308]
We show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale tasks.
In order to increase the frequency of meta-updates, we propose to estimate the required shift of the task-specific parameters.
We show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence.
arXiv Detail & Related papers (2021-02-14T18:36:33Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - La-MAML: Look-ahead Meta Learning for Continual Learning [14.405620521842621]
We propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory.
La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
arXiv Detail & Related papers (2020-07-27T23:07:01Z) - Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize.
We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.