Flatness-Aware Minimization for Domain Generalization
- URL: http://arxiv.org/abs/2307.11108v1
- Date: Thu, 20 Jul 2023 05:48:20 GMT
- Title: Flatness-Aware Minimization for Domain Generalization
- Authors: Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng
Cu
- Abstract summary: Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts.
Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default for all datasets.
We propose Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG.
- Score: 17.430563368226853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain generalization (DG) seeks to learn robust models that generalize well
under unknown distribution shifts. As a critical aspect of DG, optimizer
selection has not been explored in depth. Currently, most DG methods follow the
widely used benchmark, DomainBed, and utilize Adam as the default optimizer for
all datasets. However, we reveal that Adam is not necessarily the optimal
choice for the majority of current DG methods and datasets. Based on the
perspective of loss landscape flatness, we propose a novel approach,
Flatness-Aware Minimization for Domain Generalization (FAD), which can
efficiently optimize both zeroth-order and first-order flatness simultaneously
for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD)
generalization error and convergence. Our experimental results demonstrate the
superiority of FAD on various DG datasets. Additionally, we confirm that FAD is
capable of discovering flatter optima in comparison to other zeroth-order and
first-order flatness-aware optimization methods.
Related papers
- Disentangling Masked Autoencoders for Unsupervised Domain Generalization [57.56744870106124]
Unsupervised domain generalization is fast gaining attention but is still far from well-studied.
Disentangled Masked Auto (DisMAE) aims to discover the disentangled representations that faithfully reveal intrinsic features.
DisMAE co-trains the asymmetric dual-branch architecture with semantic and lightweight variation encoders.
arXiv Detail & Related papers (2024-07-10T11:11:36Z) - PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization [24.413415998529754]
We propose a new benchmark Hybrid Domain Generalization (HDG) and a novel metric $H2$-CV, which construct various splits to assess the robustness of algorithms.
Our method outperforms state-of-the-art algorithms on multiple datasets, especially improving the robustness when confronting data scarcity.
arXiv Detail & Related papers (2024-04-13T13:41:13Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - MADG: Margin-based Adversarial Learning for Domain Generalization [25.45950080930517]
We propose a novel adversarial learning DG algorithm, MADG, motivated by a margin loss-based discrepancy metric.
The proposed MADG model learns domain-invariant features across all source domains and uses adversarial training to generalize well to the unseen target domain.
We extensively experiment with the MADG model on popular real-world DG datasets.
arXiv Detail & Related papers (2023-11-14T19:53:09Z) - Towards Optimization and Model Selection for Domain Generalization: A
Mixup-guided Solution [43.292274574847234]
We propose Mixup guided optimization and selection techniques for domain generalization.
For optimization, we utilize an out-of-distribution dataset that can guide the preference direction.
For model selection, we generate a validation dataset with a closer distance to the target distribution.
arXiv Detail & Related papers (2022-09-01T02:18:00Z) - On Certifying and Improving Generalization to Unseen Domains [87.00662852876177]
Domain Generalization aims to learn models whose performance remains high on unseen domains encountered at test-time.
It is challenging to evaluate DG algorithms comprehensively using a few benchmark datasets.
We propose a universal certification framework that can efficiently certify the worst-case performance of any DG method.
arXiv Detail & Related papers (2022-06-24T16:29:43Z) - Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data.
We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG)
Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z) - Reappraising Domain Generalization in Neural Networks [8.06370138649329]
Domain generalization (DG) of machine learning algorithms is defined as their ability to learn a domain agnostic hypothesis from multiple training distributions.
We find that a straightforward Empirical Risk Minimization (ERM) baseline consistently outperforms existing DG methods.
We propose a classwise-DG formulation, where for each class, we randomly select one of the domains and keep it aside for testing.
arXiv Detail & Related papers (2021-10-15T10:06:40Z) - Learning to Generate Novel Domains for Domain Generalization [115.21519842245752]
This paper focuses on the task of learning from multiple source domains a model that generalizes well to unseen domains.
We employ a data generator to synthesize data from pseudo-novel domains to augment the source domains.
Our method, L2A-OT, outperforms current state-of-the-art DG methods on four benchmark datasets.
arXiv Detail & Related papers (2020-07-07T09:34:17Z) - When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties.
We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.