ERM++: An Improved Baseline for Domain Generalization
- URL: http://arxiv.org/abs/2304.01973v4
- Date: Mon, 09 Dec 2024 19:26:08 GMT
- Title: ERM++: An Improved Baseline for Domain Generalization
- Authors: Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko, Bryan A. Plummer,
- Abstract summary: Empirical Risk Minimization (ERM) can outperform most more complex Domain Generalization (DG) methods when properly tuned.
ERM++ improves DG performance by over 5% compared to prior ERM baselines.
- Score: 69.80606575323691
- License:
- Abstract: Domain Generalization (DG) aims to develop classifiers that can generalize to new, unseen data distributions, a critical capability when collecting new domain-specific data is impractical. A common DG baseline minimizes the empirical risk on the source domains. Recent studies have shown that this approach, known as Empirical Risk Minimization (ERM), can outperform most more complex DG methods when properly tuned. However, these studies have primarily focused on a narrow set of hyperparameters, neglecting other factors that can enhance robustness and prevent overfitting and catastrophic forgetting, properties which are critical for strong DG performance. In our investigation of training data utilization (i.e., duration and setting validation splits), initialization, and additional regularizers, we find that tuning these previously overlooked factors significantly improves model generalization across diverse datasets without adding much complexity. We call this improved, yet simple baseline ERM++. Despite its ease of implementation, ERM++ improves DG performance by over 5\% compared to prior ERM baselines on a standard benchmark of 5 datasets with a ResNet-50 and over 15\% with a ViT-B/16. It also outperforms all state-of-the-art methods on DomainBed datasets with both architectures. Importantly, ERM++ is easy to integrate into existing frameworks like DomainBed, making it a practical and powerful tool for researchers and practitioners. Overall, ERM++ challenges the need for more complex DG methods by providing a stronger, more reliable baseline that maintains simplicity and ease of use. Code is available at \url{https://github.com/piotr-teterwak/erm_plusplus}
Related papers
- Is Large-Scale Pretraining the Secret to Good Domain Generalization? [69.80606575323691]
Multi-Source Domain Generalization (DG) is the task of training on multiple source domains and achieving high classification performance on unseen target domains.
Recent methods combine robust features from web-scale pretrained backbones with new features learned from source data, and this has dramatically improved benchmark results.
We show that all evaluated DG methods struggle on DomainBed-OOP, while recent methods excel on DomainBed-IP.
arXiv Detail & Related papers (2024-12-03T21:43:11Z) - Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization [1.1534313664323637]
Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains.
FedDG attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift.
Here, we introduce a novel architectural method for FedDG, namely gPerXAN, which relies on a normalization scheme working with a guiding regularizer.
arXiv Detail & Related papers (2024-03-22T20:22:08Z) - Cross Domain Generative Augmentation: Domain Generalization with Latent
Diffusion Models [11.309433257851122]
Cross Domain Generative Augmentation (CDGA) generates synthetic images to fill the gap between all domains.
We show that CDGA outperforms SOTA DG methods under the Domainbed benchmark.
arXiv Detail & Related papers (2023-12-08T21:52:00Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - On-Device Domain Generalization [93.79736882489982]
Domain generalization is critical to on-device machine learning applications.
We find that knowledge distillation is a strong candidate for solving the problem.
We propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data.
arXiv Detail & Related papers (2022-09-15T17:59:31Z) - Back-to-Bones: Rediscovering the Role of Backbones in Domain
Generalization [1.6799377888527687]
Domain Generalization studies the capability of a deep learning model to generalize to out-of-training distributions.
Recent research has provided a reproducible benchmark for DG, pointing out the effectiveness of naive empirical risk minimization (ERM) over existing algorithms.
In this paper, we evaluate the backbones proposing a comprehensive analysis of their intrinsic generalization capabilities.
arXiv Detail & Related papers (2022-09-02T15:30:17Z) - On Certifying and Improving Generalization to Unseen Domains [87.00662852876177]
Domain Generalization aims to learn models whose performance remains high on unseen domains encountered at test-time.
It is challenging to evaluate DG algorithms comprehensively using a few benchmark datasets.
We propose a universal certification framework that can efficiently certify the worst-case performance of any DG method.
arXiv Detail & Related papers (2022-06-24T16:29:43Z) - Improving Multi-Domain Generalization through Domain Re-labeling [31.636953426159224]
We study the important link between pre-specified domain labels and the generalization performance.
We introduce a general approach for multi-domain generalization, MulDEns, that uses an ERM-based deep ensembling backbone.
We show that MulDEns does not require tailoring the augmentation strategy or the training process specific to a dataset.
arXiv Detail & Related papers (2021-12-17T23:21:50Z) - META: Mimicking Embedding via oThers' Aggregation for Generalizable
Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time.
This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z) - Reappraising Domain Generalization in Neural Networks [8.06370138649329]
Domain generalization (DG) of machine learning algorithms is defined as their ability to learn a domain agnostic hypothesis from multiple training distributions.
We find that a straightforward Empirical Risk Minimization (ERM) baseline consistently outperforms existing DG methods.
We propose a classwise-DG formulation, where for each class, we randomly select one of the domains and keep it aside for testing.
arXiv Detail & Related papers (2021-10-15T10:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.