Related papers: Diverse Weight Averaging for Out-of-Distribution Generalization

Diverse Weight Averaging for Out-of-Distribution Generalization

URL: http://arxiv.org/abs/2205.09739v1
Date: Thu, 19 May 2022 17:44:22 GMT
Title: Diverse Weight Averaging for Out-of-Distribution Generalization
Authors: Alexandre Rame, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Patrick Gallinari, Matthieu Cord
Abstract summary: We propose Diverse Weight Averaging (DiWA) to average weights obtained from several independent training runs rather than from a single run. DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.
Score: 100.22155775568761
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard neural networks struggle to generalize under distribution shifts. For out-of-distribution generalization in computer vision, the best current approach averages the weights along a training run. In this paper, we propose Diverse Weight Averaging (DiWA) that makes a simple change to this strategy: DiWA averages the weights obtained from several independent training runs rather than from a single run. Perhaps surprisingly, averaging these weights performs well under soft constraints despite the network's nonlinearities. The main motivation behind DiWA is to increase the functional diversity across averaged models. Indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between DiWA and standard functional ensembling. Moreover, this decomposition highlights that DiWA succeeds when the variance term dominates, which we show happens when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.

Related papers

SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence. We transform the discrete selection problem into a continuous subset optimization framework. We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z)
Weight Averaging for Out-of-Distribution Generalization and Few-Shot Domain Adaptation [0.0]
Two techniques have been developed for addressing out-of-distribution generalization in computer vision. We propose increasing the model diversity in WA explicitly by gradient similarity as a loss regularizer. We also propose combining WA and SAM to solve the problem of few-shot domain adaptation.
arXiv Detail & Related papers (2025-01-14T10:04:05Z)
DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World [6.816521410643928]
This paper proposes a new general method, named Diversity Adaptive Test-Time Adaptation (DATTA), aimed at improving Quality of Experience (QoE) It features three key components: Diversity Discrimination (DD) to assess batch diversity, Diversity Adaptive Batch Normalization (DABN) to tailor normalization methods based on DD insights, and Diversity Adaptive Fine-Tuning (DAFT) to selectively fine-tune the model. Experimental results show that our method achieves up to a 21% increase in accuracy compared to state-of-the-art methodologies.
arXiv Detail & Related papers (2024-08-15T09:50:11Z)
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average [21.029085451757368]
Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model. We introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy.
arXiv Detail & Related papers (2024-05-27T09:02:57Z)
Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition [114.96385572118042]
We argue that the variation in test label distributions can be broken down hierarchically into global and local levels. We propose a new MoE strategy, $mathsfDirMixE$, which assigns experts to different Dirichlet meta-distributions of the label distribution. We show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization.
arXiv Detail & Related papers (2024-05-13T14:24:56Z)
IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks [52.61590955479261]
Iterative Model Weight Averaging (IMWA) is a technique for class-imbalanced learning tasks. Compared to vanilla MWA, IMWA achieves higher performance improvements with the same computational cost.
arXiv Detail & Related papers (2024-04-25T04:37:35Z)
Hierarchical Weight Averaging for Deep Neural Networks [39.45493779043969]
gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs) Weight averaging (WA) which averages the weights of multiple models has recently received much attention in the literature. In this work, we firstly attempt to incorporate online and offline WA into a general training framework termed Hierarchical Weight Averaging (HWA)
arXiv Detail & Related papers (2023-04-23T02:58:03Z)
Regularising for invariance to data augmentation improves supervised learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation. We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.