Related papers: Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors

Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors

URL: http://arxiv.org/abs/2511.17987v1
Date: Sat, 22 Nov 2025 09:01:05 GMT
Title: Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors
Authors: Jinping Wang, Zhiqiang Gao, Dinggen Zhang, Zhiwu Xie,
Abstract summary: Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability.<n> Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations-addition and negation-based on task vectors.<n>We propose Difference Vector-based Anisotropic Scaling Iterative algorithm (DV-BASI) to enable a continuous optimization process for task arithmetic methods.
Score: 7.805099851866648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations-addition and negation-based on task vectors which are the differences between fine-tuned and pre-trained model weights, to efficiently modify model behavior. However, the full potential of task arithmetic remains underexplored, primarily due to limited mechanisms for overcoming optimization stagnation. To address this challenge, we introduce the notion of difference vector, a generalized form of task vectors derived from the historical movements during optimization. Using difference vectors as directed perturbations, we propose the Difference Vector-based Anisotropic Scaling Iterative algorithm (DV-BASI) to enable a continuous optimization process for task arithmetic methods without relying on any additional modules or components. Notably, by leveraging escapability and directional advantages of difference vectors, the average performance on different tasks of the multi-task model merged by DV-BASI may even outperform models individually fine-tuned. Based on this observation, we extend the application of difference vectors to a feasible fine-tuning method for single-task models. On the practical side, DV-BASI allows expressive searching directions with few learnable parameters and forms a scalable framework. We also integrate DV-BASI with task arithmetic methods and advanced optimization techniques to achieve state-of-the-art performance on both supervised and unsupervised evaluation protocols.

Related papers

Decomposing Task Vectors for Refined Model Editing [21.799465464971092]
We propose a principled decomposition method that separates each task vector into two components.<n>By identifying invariant subspaces across projections, our approach enables more precise control over concept manipulation.
arXiv Detail & Related papers (2025-12-27T07:53:44Z)
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging [83.5273168208788]
Model merging aims to integrate task-specific abilities from individually fine-tuned models into a single model without extra training.<n>The merged model often suffers from notable performance degradation due to the conflicts caused by task-irrelevant redundancy in task vectors.<n>We propose Purifying TAsk Vectors (PAVE) in knowledge-aware subspace to overcome these challenges.
arXiv Detail & Related papers (2025-10-16T14:02:57Z)
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors.<n>This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks.<n>We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z)
Efficient Model Editing with Task-Localized Sparse Fine-tuning [14.792099973449794]
We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization.<n>We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks.<n>Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
arXiv Detail & Related papers (2025-04-03T14:20:06Z)
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging [23.649762835129167]
Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework.<n>We propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models.<n>AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.
arXiv Detail & Related papers (2025-03-28T06:49:06Z)
Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic [24.40854328492979]
We propose Task Vector Bases, a framework compressing $T$ task vectors into $M T$ basis vectors while preserving the functionality of task arithmetic.<n>By representing each task vector as a structured linear combination of basis atoms, our approach supports standard operations such as addition, negation, as well as more advanced arithmetic ones.
arXiv Detail & Related papers (2025-02-03T03:18:26Z)
Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.<n>We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.<n>We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.