Related papers: Superpose Singular Features for Model Merging

Superpose Singular Features for Model Merging

URL: http://arxiv.org/abs/2502.10698v1
Date: Sat, 15 Feb 2025 07:05:55 GMT
Title: Superpose Singular Features for Model Merging
Authors: Haiquan Qiu, You Wu, Quanming Yao,
Abstract summary: Superpose Features from Task Matrix (SFTM) is a novel approach that superposes features from individual task models into a merged model.<n>Our method consistently outperforms existing methods, achieving superior performance and enhanced out-of-distribution generalization.
Score: 29.728307343119894
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Model merging is a critical technique for combining the capabilities of multiple fine-tuned models without requiring additional training. While existing methods treat parameters as vectors, they overlook the intrinsic structure of linear transformation matrices - the core components that comprise the majority of model parameters. These matrices are fundamental to neural networks, mapping input representations to output features through linear combinations. Motivated by the linear representation hypothesis, we introduce task matrix and propose to Superpose Features from Task Matrix (SFTM), a novel approach that superposes features from individual task models into a merged model. SFTM employs singular value decomposition to identify feature bases of linear transformation matrices and solves a linear system to optimally combine them while preserving input-output mappings from individual task models. Extensive experiments on vision transformers and language models demonstrate that our method consistently outperforms existing methods, achieving superior performance and enhanced out-of-distribution generalization.

Related papers

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs [51.09983600916971]
Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic. We argue that this linearity already exists within the model's submodules. We propose an innovative model merging strategy that independently merges these submodules.
arXiv Detail & Related papers (2025-04-15T06:23:24Z)
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z)
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation [53.88562288388169]
A common strategy for. Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks. We propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.
arXiv Detail & Related papers (2024-10-30T12:08:30Z)
Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z)
Input Guided Multiple Deconstruction Single Reconstruction neural network models for Matrix Factorization [0.0]
This paper develops two models based on the concept of Non-negative Matrix Factorization (NMF) They aim to deal with high-dimensional data by discovering its low rank approximation by determining a unique pair of factor matrices. The superiority of low dimensional embedding over that of the original data justifying the need for dimension reduction has been established.
arXiv Detail & Related papers (2024-05-22T08:41:32Z)
Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques. Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z)
Classification of BCI-EEG based on augmented covariance matrix [0.0]
We propose a new framework based on the augmented covariance extracted from an autoregressive model to improve motor imagery classification. We will test our approach on several datasets and several subjects using the MOABB framework.
arXiv Detail & Related papers (2023-02-09T09:04:25Z)
Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper. FNMF learns the weights of features adaptively according to their importances. It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z)
Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC) Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features. We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)
Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.