Related papers: Model Merging by Uncertainty-Based Gradient Matching

Model Merging by Uncertainty-Based Gradient Matching

URL: http://arxiv.org/abs/2310.12808v2
Date: Fri, 23 Aug 2024 16:25:44 GMT
Title: Model Merging by Uncertainty-Based Gradient Matching
Authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan,
Abstract summary: We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. Our new method gives consistent improvements for large language models and vision transformers.
Score: 70.54580972266096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters. Code available here.

Related papers

Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge) We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands. Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers. We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z)
Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
Faithful Heteroscedastic Regression with Neural Networks [2.2835610890984164]
Parametric methods that employ neural networks for parameter maps can capture complex relationships in the data. We make two simple modifications to optimization to produce a heteroscedastic model with mean estimates that are provably as accurate as those from its homoscedastic counterpart. Our approach provably retains the accuracy of an equally flexible mean-only model while also offering best-in-class variance calibration.
arXiv Detail & Related papers (2022-12-18T22:34:42Z)
Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one. Our approach effectively involves computing a weighted average of the models' parameters. We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z)
Closer Look at the Uncertainty Estimation in Semantic Segmentation under Distributional Shift [2.05617385614792]
Uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift. It was shown that simple color transformations already provide a strong baseline. ensemble of models was utilized in the self-training setting to improve the pseudo-labels generation.
arXiv Detail & Related papers (2021-05-31T19:50:43Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.