Regularizing Meta-Learning via Gradient Dropout
- URL: http://arxiv.org/abs/2004.05859v1
- Date: Mon, 13 Apr 2020 10:47:02 GMT
- Title: Regularizing Meta-Learning via Gradient Dropout
- Authors: Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin,
Ming-Hsuan Yang
- Abstract summary: meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize.
We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
- Score: 102.29924160341572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing attention on learning-to-learn new tasks using only a few
examples, meta-learning has been widely used in numerous problems such as
few-shot classification, reinforcement learning, and domain generalization.
However, meta-learning models are prone to overfitting when there are no
sufficient training tasks for the meta-learners to generalize. Although
existing approaches such as Dropout are widely used to address the overfitting
problem, these methods are typically designed for regularizing models of a
single task in supervised training. In this paper, we introduce a simple yet
effective method to alleviate the risk of overfitting for gradient-based
meta-learning. Specifically, during the gradient-based adaptation stage, we
randomly drop the gradient in the inner-loop optimization of each parameter in
deep neural networks, such that the augmented gradients improve generalization
to new tasks. We present a general form of the proposed gradient dropout
regularization and show that this term can be sampled from either the Bernoulli
or Gaussian distribution. To validate the proposed method, we conduct extensive
experiments and analysis on numerous computer vision tasks, demonstrating that
the gradient dropout regularization mitigates the overfitting problem and
improves the performance upon various gradient-based meta-learning frameworks.
Related papers
- Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Penalizing Gradient Norm for Efficiently Improving Generalization in
Deep Learning [13.937644559223548]
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning.
We propose an effective method to improve the model generalization by penalizing the gradient norm of loss function during optimization.
arXiv Detail & Related papers (2022-02-08T02:03:45Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.