Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
- URL: http://arxiv.org/abs/2405.18861v1
- Date: Wed, 29 May 2024 08:22:33 GMT
- Title: Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
- Authors: Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang,
- Abstract summary: This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts.
It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains.
Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle.
- Score: 38.78485556098491
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.
Related papers
- GDO: Gradual Domain Osmosis [1.62060928868899]
We propose a new method called Gradual Domain Osmosis, which aims to solve the problem of smooth knowledge migration from source domain to target domain in Gradual Domain Adaptation (GDA)
Traditional Gradual Domain Adaptation methods mitigate domain bias by introducing intermediate domains and self-training strategies, but often face the challenges of inefficient knowledge migration or missing data in intermediate domains.
arXiv Detail & Related papers (2025-01-31T14:25:45Z) - Sharpness-Aware Gradient Matching for Domain Generalization [84.14789746460197]
The goal of domain generalization (DG) is to enhance the generalization capability of the model learned from a source domain to other unseen domains.
The recently developed Sharpness-Aware Minimization (SAM) method aims to achieve this goal by minimizing the sharpness measure of the loss landscape.
We present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM)
Our proposed SAGM method consistently outperforms the state-of-the-art methods on five DG benchmarks.
arXiv Detail & Related papers (2023-03-18T07:25:12Z) - Domain Generalisation via Domain Adaptation: An Adversarial Fourier
Amplitude Approach [13.642506915023871]
We adversarially synthesise the worst-case target domain and adapt a model to that worst-case domain.
On the DomainBedNet dataset, the proposed approach yields significantly improved domain generalisation performance.
arXiv Detail & Related papers (2023-02-23T14:19:07Z) - Constrained Maximum Cross-Domain Likelihood for Domain Generalization [14.91361835243516]
Domain generalization aims to learn a generalizable model on multiple source domains, which is expected to perform well on unseen test domains.
In this paper, we propose a novel domain generalization method, which minimizes the KL-divergence between posterior distributions from different domains.
Experiments on four standard benchmark datasets, i.e., Digits-DG, PACS, Office-Home and miniDomainNet, highlight the superior performance of our method.
arXiv Detail & Related papers (2022-10-09T03:41:02Z) - Generalizing to Unseen Domains with Wasserstein Distributional Robustness under Limited Source Knowledge [22.285156929279207]
Domain generalization aims at learning a universal model that performs well on unseen target domains.
We propose a novel domain generalization framework called Wasserstein Distributionally Robust Domain Generalization (WDRDG)
arXiv Detail & Related papers (2022-07-11T14:46:50Z) - Domain Adversarial Training: A Game Perspective [80.3821370633883]
This paper defines optimal solutions in domain-adversarial training from a game theoretical perspective.
We show that descent in domain-adversarial training can violate the convergence guarantees of the gradient, oftentimes hindering the transfer performance.
Ours are easy to implement, free of additional parameters, and can be plugged into any domain-adversarial framework.
arXiv Detail & Related papers (2022-02-10T22:17:30Z) - Model-Based Domain Generalization [96.84818110323518]
We propose a novel approach for the domain generalization problem called Model-Based Domain Generalization.
Our algorithms beat the current state-of-the-art methods on the very-recently-proposed WILDS benchmark by up to 20 percentage points.
arXiv Detail & Related papers (2021-02-23T00:59:02Z) - On the Convergence of Overlapping Schwarz Decomposition for Nonlinear
Optimal Control [7.856998585396421]
We study the convergence properties of an overlapping decomposition algorithm for solving nonlinear Schwarz problems.
We show that the algorithm exhibits local linear convergence, and that the convergence rate improves exponentially with the overlap size.
arXiv Detail & Related papers (2020-05-14T00:19:28Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z) - Bi-Directional Generation for Unsupervised Domain Adaptation [61.73001005378002]
Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information.
Conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure.
We propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains.
arXiv Detail & Related papers (2020-02-12T09:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.