Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
- URL: http://arxiv.org/abs/2503.08099v1
- Date: Tue, 11 Mar 2025 07:01:35 GMT
- Title: Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
- Authors: Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, Chun Yuan,
- Abstract summary: textbfWUDI-Merging (textbfWhoever started the interference shotextbfUld entextbfD textbfIt) is a model merging method that eliminates interference without any additional data or rescaling coefficients.<n> Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method's superiority.
- Score: 27.848233831749216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging seeks to integrate task-specific expert models into a unified architecture while preserving multi-task generalization capabilities, yet parameter interference between constituent models frequently induces performance degradation. Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains challenging. In this paper, we theoretically demonstrate that the task vectors of the linear layer constitute an approximate linear subspace for its corresponding input. Therefore, we can minimize interference under the guidance of task vectors. Based on this insight, we propose \textbf{WUDI-Merging} (\textbf{W}hoever started the interference sho\textbf{U}ld en\textbf{D} \textbf{I}t), a simple yet effective model merging method that eliminates interference without any additional data or rescaling coefficients. Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method's superiority, achieving state-of-the-art performance in data-free model merging scenarios (average 10.9\% improvement versus baseline methods) while even outperforming mainstream test-time adaptation approaches by 3.3\%, and only very few computing resources are required. The code will be publicly available soon.
Related papers
- Locating and Mitigating Gradient Conflicts in Point Cloud Domain Adaptation via Saliency Map Skewness [11.470070927586017]
We develop a new scoring mechanism based on the skewness of 3D saliency maps to estimate gradient conflicts without requiring target labels.
Our approach is scalable, introducing modest computational overhead, and can be integrated into all the point cloud UDA MTL frameworks.
arXiv Detail & Related papers (2025-04-22T11:16:19Z) - LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging [10.33844295243509]
We propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named textscLoRE-Merging.<n>Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference.
arXiv Detail & Related papers (2025-02-15T10:18:46Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Transport with Support: Data-Conditional Diffusion Bridges [18.933928516349397]
We introduce the Iterative Smoothing Bridge (ISB) to solve constrained time-series data generation tasks.
We show that the ISB generalises well to high-dimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times.
arXiv Detail & Related papers (2023-01-31T13:50:16Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Unsupervised Source Separation via Bayesian Inference in the Latent
Domain [4.583433328833251]
State of the art audio source separation models rely on supervised data-driven approaches.
We propose a simple yet effective unsupervised separation algorithm, which operates directly on a latent representation of time-domain signals.
We validate our approach on the Slakh dataset arXiv:1909.08494, demonstrating results in line with state of the art supervised approaches.
arXiv Detail & Related papers (2021-10-11T14:32:55Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.