Diverse Weight Averaging for Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2205.09739v1
- Date: Thu, 19 May 2022 17:44:22 GMT
- Title: Diverse Weight Averaging for Out-of-Distribution Generalization
- Authors: Alexandre Rame, Matthieu Kirchmeyer, Thibaud Rahier, Alain
Rakotomamonjy, Patrick Gallinari, Matthieu Cord
- Abstract summary: We propose Diverse Weight Averaging (DiWA) to average weights obtained from several independent training runs rather than from a single run.
DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.
- Score: 100.22155775568761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard neural networks struggle to generalize under distribution shifts.
For out-of-distribution generalization in computer vision, the best current
approach averages the weights along a training run. In this paper, we propose
Diverse Weight Averaging (DiWA) that makes a simple change to this strategy:
DiWA averages the weights obtained from several independent training runs
rather than from a single run. Perhaps surprisingly, averaging these weights
performs well under soft constraints despite the network's nonlinearities. The
main motivation behind DiWA is to increase the functional diversity across
averaged models. Indeed, models obtained from different runs are more diverse
than those collected along a single run thanks to differences in
hyperparameters and training procedures. We motivate the need for diversity by
a new bias-variance-covariance-locality decomposition of the expected error,
exploiting similarities between DiWA and standard functional ensembling.
Moreover, this decomposition highlights that DiWA succeeds when the variance
term dominates, which we show happens when the marginal distribution changes at
test time. Experimentally, DiWA consistently improves the state of the art on
the competitive DomainBed benchmark without inference overhead.
Related papers
- WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average [21.029085451757368]
Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model.
We introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy.
arXiv Detail & Related papers (2024-05-27T09:02:57Z) - IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks [52.61590955479261]
Iterative Model Weight Averaging (IMWA) is a technique for class-imbalanced learning tasks.
Compared to vanilla MWA, IMWA achieves higher performance improvements with the same computational cost.
arXiv Detail & Related papers (2024-04-25T04:37:35Z) - Mitigating Biases with Diverse Ensembles and Diffusion Models [99.6100669122048]
We propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Hierarchical Weight Averaging for Deep Neural Networks [39.45493779043969]
gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs)
Weight averaging (WA) which averages the weights of multiple models has recently received much attention in the literature.
In this work, we firstly attempt to incorporate online and offline WA into a general training framework termed Hierarchical Weight Averaging (HWA)
arXiv Detail & Related papers (2023-04-23T02:58:03Z) - Regularising for invariance to data augmentation improves supervised
learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation.
We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.