Related papers: DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

URL: http://arxiv.org/abs/2302.14685v2
Date: Sat, 10 Jun 2023 15:11:02 GMT
Title: DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
Authors: Samyak Jain, Sravanti Addepalli, Pawan Sahu, Priyam Dey and R. Venkatesh Babu
Abstract summary: Generalization of neural networks is crucial for deploying them safely in the real world. In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch. We then propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin. We find that Repeating the step of aggregation throughout training improves the overall optimization trajectory and also ensures that the individual models have a sufficiently low loss barrier to obtain improved generalization on combining them.
Score: 39.69378006723682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalization of neural networks is crucial for deploying them safely in the real world. Common training strategies to improve generalization involve the use of data augmentations, ensembling and model averaging. In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch, and show that this can learn a more balanced distribution of features. Further, we propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin, and further Aggregates their weights to combine their expertise and obtain improved generalization. We find that Repeating the step of Aggregation throughout training improves the overall optimization trajectory and also ensures that the individual models have a sufficiently low loss barrier to obtain improved generalization on combining them. We shed light on our approach by casting it in the framework proposed by Shen et al. and theoretically show that it indeed generalizes better. In addition to improvements in In- Domain generalization, we demonstrate SOTA performance on the Domain Generalization benchmarks in the popular DomainBed framework as well. Our method is generic and can easily be integrated with several base training algorithms to achieve performance gains.

Related papers

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training [105.74524789405514]
adversarial training (AT) is currently the most effective defense against neural networks.<n>We propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner.<n>In the later stages of training, we interpolate their parameters to form a knowledgeable global learner.<n>We term this framework Generalist and introduce three variants tailored to different application scenarios.
arXiv Detail & Related papers (2025-10-15T09:47:54Z)
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization [6.619253289031494]
Single Domain Generalization aims to develop models capable of generalizing to unseen target domains using only one source domain.<n>We propose LEAwareSGD, a novel Lyapunov Exponent (LE)-guided optimization approach inspired by dynamical systems theory.<n>Experiments on PACS, OfficeHome, and DomainNet demonstrate that LEAwareSGD yields substantial generalization gains.
arXiv Detail & Related papers (2025-07-06T09:03:08Z)
PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization [12.15086255236961]
We show that the performance of such augmentation-based methods in the target domains universally fluctuates during training.<n>We propose a novel generalization method, coined.<n>Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data.
arXiv Detail & Related papers (2025-05-19T06:01:11Z)
Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs [5.855552389030083]
We show that training on more states can indeed improve generalisation, but can come at a cost of reducing the accuracy of the learned value function. We propose a method Explore-Go that implements an exploration phase at the beginning of each episode.
arXiv Detail & Related papers (2024-10-04T16:15:31Z)
Improved Generalization Bounds for Communication Efficient Federated Learning [4.3707341422218215]
This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning. We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis.
arXiv Detail & Related papers (2024-04-17T21:17:48Z)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models. We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z)
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation [62.738582127114704]
Adar is a framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based solvers (RL-based)
arXiv Detail & Related papers (2023-10-22T03:15:36Z)
NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning. Our method introduces diverse information at the feature level and improves the generalization of the main path. In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z)
Augmentation-based Domain Generalization for Semantic Segmentation [2.179313476241343]
Unsupervised Domain Adaptation (UDA) and domain generalization (DG) aim to tackle the lack of generalization of Deep Neural Networks (DNNs) towards unseen domains. We study the in- and out-of-domain generalization capabilities of simple, rule-based image augmentations like blur, noise, color jitter and many more. Our experiments confirm the common scientific standard that combination of multiple different augmentations out-performs single augmentations.
arXiv Detail & Related papers (2023-04-24T14:26:53Z)
Semi-Supervised Domain Generalization with Stochastic StyleMatch [90.98288822165482]
In real-world applications, we might have only a few labels available from each source domain due to high annotation cost. In this work, we investigate semi-supervised domain generalization, a more realistic and practical setting. Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling.
arXiv Detail & Related papers (2021-06-01T16:00:08Z)
Contrastive Syn-to-Real Generalization [125.54991489017854]
We make a key observation that the diversity of the learned feature embeddings plays an important role in the generalization performance. We propose contrastive synthetic-to-real generalization (CSG), a novel framework that leverages the pre-trained ImageNet knowledge to prevent overfitting to the synthetic domain. We demonstrate the effectiveness of CSG on various synthetic training tasks, exhibiting state-of-the-art performance on zero-shot domain generalization.
arXiv Detail & Related papers (2021-04-06T05:10:29Z)
Rethinking Domain Generalization Baselines [21.841393368012977]
deep learning models can be brittle when deployed in scenarios different from those on which they were trained. Data augmentation strategies have shown to be helpful tools to increase data variability, supporting model robustness across domains. This issue open new scenarios for domain generalization research, highlighting the need of novel methods properly able to take advantage of the introduced data variability.
arXiv Detail & Related papers (2021-01-22T11:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.