Improving Generalization in Meta-Learning via Meta-Gradient Augmentation
- URL: http://arxiv.org/abs/2306.08460v1
- Date: Wed, 14 Jun 2023 12:04:28 GMT
- Title: Improving Generalization in Meta-Learning via Meta-Gradient Augmentation
- Authors: Ren Wang, Haoliang Sun, Qi Wei, Xiushan Nie, Yuling Ma, Yilong Yin
- Abstract summary: We propose a data-independent textbfMeta-textbfGradient textbfAugmentation (textbfMGAug) method to alleviate overfitting in meta-learning.
The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework.
- Score: 42.48021701246389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning methods typically follow a two-loop framework, where each loop
potentially suffers from notorious overfitting, hindering rapid adaptation and
generalization to new tasks. Existing schemes solve it by enhancing the
mutual-exclusivity or diversity of training samples, but these data
manipulation strategies are data-dependent and insufficiently flexible. This
work alleviates overfitting in meta-learning from the perspective of gradient
regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient
\textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break
the rote memories by network pruning to address memorization overfitting in the
inner loop, and then the gradients of pruned sub-networks naturally form the
high-quality augmentation of the meta-gradient to alleviate learner overfitting
in the outer loop. Specifically, we explore three pruning strategies, including
\textit{random width pruning}, \textit{random parameter pruning}, and a newly
proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying
Amount (MMCA) score for each parameter and prunes high-score ones to break rote
memories as much as possible. The proposed MGAug is theoretically guaranteed by
the generalization bound from the PAC-Bayes framework. In addition, we extend a
lightweight version, called MGAug-MaxUp, as a trade-off between performance
gains and resource overhead. Extensive experiments on multiple few-shot
learning benchmarks validate MGAug's effectiveness and significant improvement
over various meta-baselines. The code is publicly available at
\url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.
Related papers
- Coarse-to-Fine Lightweight Meta-Embedding for ID-Based Recommendation [13.732081010190962]
We develop a novel graph neural networks (GNNs) based recommender where each user and item serves as the node.
In contrast to coarse-grained semantics, fine-grained semantics are well captured through sparse meta-embeddings.
We propose a weight bridging update strategy that focuses on matching each coarse-grained meta-embedding with several fine-grained meta-ems based on the users/items' semantics.
arXiv Detail & Related papers (2025-01-21T03:56:23Z) - Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation [7.071677694758966]
We introduce Mix-of-Granularity (MoG), a method that determines the optimal granularity of a knowledge source based on input queries using a router.
We extend MoG to MoG-Graph (MoGG), where reference documents are pre-processed as graphs, enabling the retrieval of distantly situated snippets.
Experiments demonstrate that MoG and MoGG effectively predict optimal granularity levels, significantly enhancing the performance of the RAG system in downstream tasks.
arXiv Detail & Related papers (2024-06-01T14:45:03Z) - Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization
for Few-shot Generalization [40.45470744120691]
Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER)
This paper proposes a novel Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER)
arXiv Detail & Related papers (2023-03-22T05:04:21Z) - ReMix: A General and Efficient Framework for Multiple Instance Learning
based Whole Slide Image Classification [14.78430890440035]
Whole slide image (WSI) classification often relies on weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels.
We propose ReMix, a general and efficient framework for MIL based WSI classification.
arXiv Detail & Related papers (2022-07-05T04:21:35Z) - RU-Net: Regularized Unrolling Network for Scene Graph Generation [92.95032610978511]
Scene graph generation (SGG) aims to detect objects and predict the relationships between each pair of objects.
Existing SGG methods usually suffer from several issues, including 1) ambiguous object representations, and 2) low diversity in relationship predictions.
We propose a regularized unrolling network (RU-Net) to address both problems.
arXiv Detail & Related papers (2022-05-03T04:21:15Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space.
We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively.
We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z) - Improving Generalization in Meta-learning via Task Augmentation [69.83677015207527]
We propose two task augmentation methods, including MetaMix and Channel Shuffle.
Both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets.
arXiv Detail & Related papers (2020-07-26T01:50:42Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.