Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models
- URL: http://arxiv.org/abs/2305.12827v3
- Date: Tue, 21 Nov 2023 18:43:43 GMT
- Title: Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models
- Authors: Guillermo Ortiz-Jimenez, Alessandro Favero, Pascal Frossard
- Abstract summary: We show that weight disentanglement is the crucial factor that makes task arithmetic effective.
We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement.
This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
- Score: 96.9373147383119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task arithmetic has recently emerged as a cost-effective and scalable
approach to edit pre-trained models directly in weight space: By adding the
fine-tuned weights of different tasks, the model's performance can be improved
on these tasks, while negating them leads to task forgetting. Yet, our
understanding of the effectiveness of task arithmetic and its underlying
principles remains limited. We present a comprehensive study of task arithmetic
in vision-language models and show that weight disentanglement is the crucial
factor that makes it effective. This property arises during pre-training and
manifests when distinct directions in weight space govern separate, localized
regions in function space associated with the tasks. Notably, we show that
fine-tuning models in their tangent space by linearizing them amplifies weight
disentanglement. This leads to substantial performance improvements across
multiple task arithmetic benchmarks and diverse models. Building on these
findings, we provide theoretical and empirical analyses of the neural tangent
kernel (NTK) of these models and establish a compelling link between task
arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our
work uncovers novel insights into the fundamental mechanisms of task arithmetic
and offers a more reliable and effective approach to edit pre-trained models
through the NTK linearization.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic [11.142414096809734]
We propose a method that only fine-tunes linear layers, which improves weight disentanglement and efficiency simultaneously.
Our study reveals that only fine-tuning the linear layers in the attention modules makes the whole model occur in a linear regime.
In particular, we find that the representation model plays an important role in improving weight disentanglement whereas the task-specific models such as the classification heads can degenerate the weight disentanglement performance.
arXiv Detail & Related papers (2024-07-09T17:59:17Z) - Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks [12.146530928616386]
A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined weight set that guides model adaptation within the weight space of a pre-trained model.
arXiv Detail & Related papers (2023-12-11T19:10:55Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems.
We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task.
We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.