Diverse Weight Averaging for Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2205.09739v1
- Date: Thu, 19 May 2022 17:44:22 GMT
- Title: Diverse Weight Averaging for Out-of-Distribution Generalization
- Authors: Alexandre Rame, Matthieu Kirchmeyer, Thibaud Rahier, Alain
Rakotomamonjy, Patrick Gallinari, Matthieu Cord
- Abstract summary: We propose Diverse Weight Averaging (DiWA) to average weights obtained from several independent training runs rather than from a single run.
DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.
- Score: 100.22155775568761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard neural networks struggle to generalize under distribution shifts.
For out-of-distribution generalization in computer vision, the best current
approach averages the weights along a training run. In this paper, we propose
Diverse Weight Averaging (DiWA) that makes a simple change to this strategy:
DiWA averages the weights obtained from several independent training runs
rather than from a single run. Perhaps surprisingly, averaging these weights
performs well under soft constraints despite the network's nonlinearities. The
main motivation behind DiWA is to increase the functional diversity across
averaged models. Indeed, models obtained from different runs are more diverse
than those collected along a single run thanks to differences in
hyperparameters and training procedures. We motivate the need for diversity by
a new bias-variance-covariance-locality decomposition of the expected error,
exploiting similarities between DiWA and standard functional ensembling.
Moreover, this decomposition highlights that DiWA succeeds when the variance
term dominates, which we show happens when the marginal distribution changes at
test time. Experimentally, DiWA consistently improves the state of the art on
the competitive DomainBed benchmark without inference overhead.
Related papers
- DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World [6.816521410643928]
This paper proposes a new general method, named Diversity Adaptive Test-Time Adaptation (DATTA), aimed at improving Quality of Experience (QoE)
It features three key components: Diversity Discrimination (DD) to assess batch diversity, Diversity Adaptive Batch Normalization (DABN) to tailor normalization methods based on DD insights, and Diversity Adaptive Fine-Tuning (DAFT) to selectively fine-tune the model.
Experimental results show that our method achieves up to a 21% increase in accuracy compared to state-of-the-art methodologies.
arXiv Detail & Related papers (2024-08-15T09:50:11Z) - WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average [21.029085451757368]
Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model.
We introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy.
arXiv Detail & Related papers (2024-05-27T09:02:57Z) - Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition [114.96385572118042]
We argue that the variation in test label distributions can be broken down hierarchically into global and local levels.
We propose a new MoE strategy, $mathsfDirMixE$, which assigns experts to different Dirichlet meta-distributions of the label distribution.
We show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization.
arXiv Detail & Related papers (2024-05-13T14:24:56Z) - IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks [52.61590955479261]
Iterative Model Weight Averaging (IMWA) is a technique for class-imbalanced learning tasks.
Compared to vanilla MWA, IMWA achieves higher performance improvements with the same computational cost.
arXiv Detail & Related papers (2024-04-25T04:37:35Z) - Hierarchical Weight Averaging for Deep Neural Networks [39.45493779043969]
gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs)
Weight averaging (WA) which averages the weights of multiple models has recently received much attention in the literature.
In this work, we firstly attempt to incorporate online and offline WA into a general training framework termed Hierarchical Weight Averaging (HWA)
arXiv Detail & Related papers (2023-04-23T02:58:03Z) - Regularising for invariance to data augmentation improves supervised
learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation.
We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.