G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
- URL: http://arxiv.org/abs/2308.03236v2
- Date: Sat, 19 Aug 2023 16:14:18 GMT
- Title: G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
- Authors: Xingyu Li and Bo Tang
- Abstract summary: We propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models.
We introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix, which partition the training data into two subsets based on the sharpness-sensitivity of each example.
Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models.
- Score: 17.473268736086137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have demonstrated promising results in various
complex tasks. However, current DNNs encounter challenges with
over-parameterization, especially when there is limited training data
available. To enhance the generalization capability of DNNs, the Mixup
technique has gained popularity. Nevertheless, it still produces suboptimal
outcomes. Inspired by the successful Sharpness-Aware Minimization (SAM)
approach, which establishes a connection between the sharpness of the training
loss landscape and model generalization, we propose a new learning framework
called Generalized-Mixup, which combines the strengths of Mixup and SAM for
training DNN models. The theoretical analysis provided demonstrates how the
developed G-Mix framework enhances generalization. Additionally, to further
optimize DNN performance with the G-Mix framework, we introduce two novel
algorithms: Binary G-Mix and Decomposed G-Mix. These algorithms partition the
training data into two subsets based on the sharpness-sensitivity of each
example to address the issue of "manifold intrusion" in Mixup. Both theoretical
explanations and experimental results reveal that the proposed BG-Mix and
DG-Mix algorithms further enhance model generalization across multiple datasets
and models, achieving state-of-the-art performance.
Related papers
- Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep-Learning and Interpolators [6.537685198688539]
We present a methodology for using unlabeled data to design semi supervised learning (SSL) methods.
We include in each of them a mixing parameter $alpha$, controlling the weight given to the unlabeled data.
We demonstrate the effectiveness of our methodology in delivering substantial improvement compared to the standard supervised models.
arXiv Detail & Related papers (2023-02-19T09:55:18Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data.
We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch.
Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z) - MG-GCN: Fast and Effective Learning with Mix-grained Aggregators for
Training Large Graph Convolutional Networks [20.07942308916373]
Graph convolutional networks (GCNs) generate the embeddings of nodes by aggregating the information of their neighbors layer by layer.
The high computational and memory cost of GCNs makes it infeasible for training on large graphs.
A new model, named Mix-grained GCN (MG-GCN), achieves state-of-the-art performance in terms of accuracy, training speed, convergence speed, and memory cost.
arXiv Detail & Related papers (2020-11-17T14:51:57Z) - DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for
Uncertainty Inference [52.899219617256655]
We propose a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition.
In the DS-UI, we combine the last fully-connected (FC) layer with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer.
Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection.
arXiv Detail & Related papers (2020-11-17T12:35:02Z) - Hyperspectral Unmixing Network Inspired by Unfolding an Optimization
Problem [2.4016406737205753]
The hyperspectral image (HSI) unmixing task is essentially an inverse problem, which is commonly solved by optimization algorithms.
We propose two novel network architectures, named U-ADMM-AENet and U-ADMM-BUNet, for abundance estimation and blind unmixing.
We show that the unfolded structures can find corresponding interpretations in machine learning literature, which further demonstrates the effectiveness of proposed methods.
arXiv Detail & Related papers (2020-05-21T18:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.