Related papers: G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

URL: http://arxiv.org/abs/2308.03236v2
Date: Sat, 19 Aug 2023 16:14:18 GMT
Title: G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
Authors: Xingyu Li and Bo Tang
Abstract summary: We propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. We introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix, which partition the training data into two subsets based on the sharpness-sensitivity of each example. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models.
Score: 17.473268736086137
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, current DNNs encounter challenges with over-parameterization, especially when there is limited training data available. To enhance the generalization capability of DNNs, the Mixup technique has gained popularity. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful Sharpness-Aware Minimization (SAM) approach, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix. These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of "manifold intrusion" in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.

Related papers

Layer-wise Quantization for Quantized Optimistic Dual Averaging [75.4148236967503]
We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training.<n>We propose a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs.
arXiv Detail & Related papers (2025-05-20T13:53:58Z)
Gaussian Mixture Models Based Augmentation Enhances GNN Generalization [22.04352144324223]
We introduce a theoretical framework using Rademacher complexity to compute a regret bound on the generalization error. This framework informs the design of GMM-GDA, an efficient graph data augmentation (GDA) algorithm.
arXiv Detail & Related papers (2024-11-13T14:26:04Z)
Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet) AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep-Learning and Interpolators [6.537685198688539]
We present a methodology for using unlabeled data to design semi supervised learning (SSL) methods. We include in each of them a mixing parameter $alpha$, controlling the weight given to the unlabeled data. We demonstrate the effectiveness of our methodology in delivering substantial improvement compared to the standard supervised models.
arXiv Detail & Related papers (2023-02-19T09:55:18Z)
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z)
Mixed Graph Contrastive Network for Semi-Supervised Node Classification [63.924129159538076]
We propose a novel graph contrastive learning method, termed Mixed Graph Contrastive Network (MGCN) In our method, we improve the discriminative capability of the latent embeddings by an unperturbed augmentation strategy and a correlation reduction mechanism. By combining the two settings, we extract rich supervision information from both the abundant nodes and the rare yet valuable labeled nodes for discriminative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework. Speaker embeddings are optimized during training such that it better fits iGMM clustering. Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z)
LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z)
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data. We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch. Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z)
DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference [52.899219617256655]
We propose a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the last fully-connected (FC) layer with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection.
arXiv Detail & Related papers (2020-11-17T12:35:02Z)
Hyperspectral Unmixing Network Inspired by Unfolding an Optimization Problem [2.4016406737205753]
The hyperspectral image (HSI) unmixing task is essentially an inverse problem, which is commonly solved by optimization algorithms. We propose two novel network architectures, named U-ADMM-AENet and U-ADMM-BUNet, for abundance estimation and blind unmixing. We show that the unfolded structures can find corresponding interpretations in machine learning literature, which further demonstrates the effectiveness of proposed methods.
arXiv Detail & Related papers (2020-05-21T18:49:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.