Model Merging by Uncertainty-Based Gradient Matching
- URL: http://arxiv.org/abs/2310.12808v2
- Date: Fri, 23 Aug 2024 16:25:44 GMT
- Title: Model Merging by Uncertainty-Based Gradient Matching
- Authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan,
- Abstract summary: We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch.
Our new method gives consistent improvements for large language models and vision transformers.
- Score: 70.54580972266096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters. Code available here.
Related papers
- Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Faithful Heteroscedastic Regression with Neural Networks [2.2835610890984164]
Parametric methods that employ neural networks for parameter maps can capture complex relationships in the data.
We make two simple modifications to optimization to produce a heteroscedastic model with mean estimates that are provably as accurate as those from its homoscedastic counterpart.
Our approach provably retains the accuracy of an equally flexible mean-only model while also offering best-in-class variance calibration.
arXiv Detail & Related papers (2022-12-18T22:34:42Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - Closer Look at the Uncertainty Estimation in Semantic Segmentation under
Distributional Shift [2.05617385614792]
Uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift.
It was shown that simple color transformations already provide a strong baseline.
ensemble of models was utilized in the self-training setting to improve the pseudo-labels generation.
arXiv Detail & Related papers (2021-05-31T19:50:43Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.