When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
- URL: http://arxiv.org/abs/2504.10957v2
- Date: Fri, 18 Apr 2025 15:14:13 GMT
- Title: When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
- Authors: Hongkang Li, Yihua Zhang, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen,
- Abstract summary: Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors.<n>This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks.<n>We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
- Score: 64.1656365676171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
Related papers
- Efficient Model Editing with Task-Localized Sparse Fine-tuning [14.792099973449794]
We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization.<n>We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks.<n>Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
arXiv Detail & Related papers (2025-04-03T14:20:06Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Efficient Model Editing with Task Vector Bases: A Theoretical Framework and Scalable Approach [27.395660760819133]
It is easy to manipulate saved task vectors with arithmetic for different purposes, but compositional flexibility demands high memory usage.<n>This work addresses these issues with a theoretically grounded framework that explains task vector arithmetic and introduces the task vector bases framework.<n>Our method significantly reduces the memory cost for downstream arithmetic with little effort, while achieving competitive performance and maintaining compositional advantage.
arXiv Detail & Related papers (2025-02-03T03:18:26Z) - Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.
We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - Task Arithmetic Through The Lens Of One-Shot Federated Learning [3.8230727103887943]
Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model.
We show that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning.
We adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic.
arXiv Detail & Related papers (2024-11-27T18:53:41Z) - Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
Chain-of-shift (CoT) is an efficient method that enables the reasoning ability of large language models by augmenting the query using examples with multiple intermediate steps.
We show that despite the theoretical success of CoT, it fails to provide an accurate generalization when CoT does.
arXiv Detail & Related papers (2024-10-03T03:12:51Z) - Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models [96.9373147383119]
We show that weight disentanglement is the crucial factor that makes task arithmetic effective.
We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement.
This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
arXiv Detail & Related papers (2023-05-22T08:39:25Z) - Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems.
We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task.
We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z) - Analysis of Catastrophic Forgetting for Random Orthogonal Transformation
Tasks in the Overparameterized Regime [9.184987303791292]
We show that in permuted MNIST image classification tasks, the performance of multilayer perceptrons trained by vanilla gradient descent can be improved.
We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem.
We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small.
arXiv Detail & Related papers (2022-06-01T18:04:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.