Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models
- URL: http://arxiv.org/abs/2305.12827v3
- Date: Tue, 21 Nov 2023 18:43:43 GMT
- Title: Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
Models
- Authors: Guillermo Ortiz-Jimenez, Alessandro Favero, Pascal Frossard
- Abstract summary: We show that weight disentanglement is the crucial factor that makes task arithmetic effective.
We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement.
This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
- Score: 96.9373147383119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task arithmetic has recently emerged as a cost-effective and scalable
approach to edit pre-trained models directly in weight space: By adding the
fine-tuned weights of different tasks, the model's performance can be improved
on these tasks, while negating them leads to task forgetting. Yet, our
understanding of the effectiveness of task arithmetic and its underlying
principles remains limited. We present a comprehensive study of task arithmetic
in vision-language models and show that weight disentanglement is the crucial
factor that makes it effective. This property arises during pre-training and
manifests when distinct directions in weight space govern separate, localized
regions in function space associated with the tasks. Notably, we show that
fine-tuning models in their tangent space by linearizing them amplifies weight
disentanglement. This leads to substantial performance improvements across
multiple task arithmetic benchmarks and diverse models. Building on these
findings, we provide theoretical and empirical analyses of the neural tangent
kernel (NTK) of these models and establish a compelling link between task
arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our
work uncovers novel insights into the fundamental mechanisms of task arithmetic
and offers a more reliable and effective approach to edit pre-trained models
through the NTK linearization.
Related papers
- Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective [125.00228936051657]
We introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features.
By fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks.
arXiv Detail & Related papers (2024-07-24T09:30:04Z) - Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic [11.142414096809734]
We propose a method that only fine-tunes linear layers, which improves weight disentanglement and efficiency simultaneously.
Our study reveals that only fine-tuning the linear layers in the attention modules makes the whole model occur in a linear regime.
In particular, we find that the representation model plays an important role in improving weight disentanglement whereas the task-specific models such as the classification heads can degenerate the weight disentanglement performance.
arXiv Detail & Related papers (2024-07-09T17:59:17Z) - Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks [14.349517221831364]
A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model.
Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks.
arXiv Detail & Related papers (2023-12-11T19:10:55Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems.
We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task.
We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z) - Transfer RL across Observation Feature Spaces via Model-Based
Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations.
We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task.
Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.