PLACE dropout: A Progressive Layer-wise and Channel-wise Dropout for
Domain Generalization
- URL: http://arxiv.org/abs/2112.03676v2
- Date: Sun, 17 Sep 2023 15:58:05 GMT
- Title: PLACE dropout: A Progressive Layer-wise and Channel-wise Dropout for
Domain Generalization
- Authors: Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao
- Abstract summary: Domain generalization (DG) aims to learn a generic model from multiple observed source domains.
The major challenge in DG is that the model inevitably faces a severe overfitting issue due to the domain gap between source and target domains.
We develop a novel layer-wise and channel-wise dropout for DG, which randomly selects one layer and then randomly selects its channels to conduct dropout.
- Score: 29.824723021053565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain generalization (DG) aims to learn a generic model from multiple
observed source domains that generalizes well to arbitrary unseen target
domains without further training. The major challenge in DG is that the model
inevitably faces a severe overfitting issue due to the domain gap between
source and target domains. To mitigate this problem, some dropout-based methods
have been proposed to resist overfitting by discarding part of the
representation of the intermediate layers. However, we observe that most of
these methods only conduct the dropout operation in some specific layers,
leading to an insufficient regularization effect on the model. We argue that
applying dropout at multiple layers can produce stronger regularization
effects, which could alleviate the overfitting problem on source domains more
adequately than previous layer-specific dropout methods. In this paper, we
develop a novel layer-wise and channel-wise dropout for DG, which randomly
selects one layer and then randomly selects its channels to conduct dropout.
Particularly, the proposed method can generate a variety of data variants to
better deal with the overfitting issue. We also provide theoretical analysis
for our dropout method and prove that it can effectively reduce the
generalization error bound. Besides, we leverage the progressive scheme to
increase the dropout ratio with the training progress, which can gradually
boost the difficulty of training the model to enhance its robustness. Extensive
experiments on three standard benchmark datasets have demonstrated that our
method outperforms several state-of-the-art DG methods. Our code is available
at https://github.com/lingeringlight/PLACEdropout.
Related papers
- DAdEE: Unsupervised Domain Adaptation in Early Exit PLMs [5.402030962296633]
Early Exit (EE) strategies handle the issue by allowing the samples to exit from classifiers attached to intermediary layers.
We propose Unsupervised Domain Adaptation in EE framework (DADEE) that employs multi-level adaptation using knowledge distillation.
Experiments on tasks such as sentiment analysis, entailment classification, and natural language inference demonstrate that DADEE consistently outperforms early exit methods.
arXiv Detail & Related papers (2024-10-06T09:44:58Z) - MADG: Margin-based Adversarial Learning for Domain Generalization [25.45950080930517]
We propose a novel adversarial learning DG algorithm, MADG, motivated by a margin loss-based discrepancy metric.
The proposed MADG model learns domain-invariant features across all source domains and uses adversarial training to generalize well to the unseen target domain.
We extensively experiment with the MADG model on popular real-world DG datasets.
arXiv Detail & Related papers (2023-11-14T19:53:09Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning.
Our method introduces diverse information at the feature level and improves the generalization of the main path.
In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z) - SUG: Single-dataset Unified Generalization for 3D Point Cloud
Classification [44.27324696068285]
We propose a Single-dataset Unified Generalization (SUG) framework to alleviate the unforeseen domain differences faced by a well-trained source model.
Specifically, we first design a Multi-grained Sub-domain Alignment (MSA) method, which can constrain the learned representations to be domain-agnostic and discriminative.
Then, a Sample-level Domain-aware Attention (SDA) strategy is presented, which can selectively enhance easy-to-adapt samples from different sub-domains.
arXiv Detail & Related papers (2023-05-16T04:36:04Z) - When Neural Networks Fail to Generalize? A Model Sensitivity Perspective [82.36758565781153]
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions.
This paper considers a more realistic yet more challenging scenario, namely Single Domain Generalization (Single-DG)
We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity"
We propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies.
arXiv Detail & Related papers (2022-12-01T20:15:15Z) - Normalization Perturbation: A Simple Domain Generalization Method for
Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises.
Deep models only know the training domain style.
We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z) - Learning Gradient-based Mixup towards Flatter Minima for Domain
Generalization [44.04047359057987]
We develop a new domain generalization algorithm named Flatness-aware Gradient-based Mixup (FGMix)
FGMix learns the similarity function towards flatter minima for better generalization.
On the DomainBed benchmark, we validate the efficacy of various designs of FGMix and demonstrate its superiority over other DG algorithms.
arXiv Detail & Related papers (2022-09-29T13:01:14Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.