Related papers: Efficient Model Editing with Task-Localized Sparse Fine-tuning

Efficient Model Editing with Task-Localized Sparse Fine-tuning

URL: http://arxiv.org/abs/2504.02620v1
Date: Thu, 03 Apr 2025 14:20:06 GMT
Title: Efficient Model Editing with Task-Localized Sparse Fine-tuning
Authors: Leonardo Iurada, Marco Ciccone, Tatiana Tommasi,
Abstract summary: We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization.<n>We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks.<n>Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
Score: 14.792099973449794
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task arithmetic has emerged as a promising approach for editing models by representing task-specific knowledge as composable task vectors. However, existing methods rely on network linearization to derive task vectors, leading to computational bottlenecks during training and inference. Moreover, linearization alone does not ensure weight disentanglement, the key property that enables conflict-free composition of task vectors. To address this, we propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization and sharing information across tasks. We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks, and that sparsely updating only these parameters allows for promoting weight disentanglement during fine-tuning. Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation. By enabling modular parameter editing, our approach fosters practical deployment of adaptable foundation models in real-world applications.

Related papers

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors. This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks. We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z)
Efficient Model Editing with Task Vector Bases: A Theoretical Framework and Scalable Approach [27.395660760819133]
It is easy to manipulate saved task vectors with arithmetic for different purposes, but compositional flexibility demands high memory usage.<n>This work addresses these issues with a theoretically grounded framework that explains task vector arithmetic and introduces the task vector bases framework.<n>Our method significantly reduces the memory cost for downstream arithmetic with little effort, while achieving competitive performance and maintaining compositional advantage.
arXiv Detail & Related papers (2025-02-03T03:18:26Z)
Revisiting Weight Averaging for Model Merging [16.503826062785773]
Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. Weight averaging implicitly induces task vectors centered around the weight averaging itself. Applying a low-rank approximation to these centered task vectors significantly improves merging performance.
arXiv Detail & Related papers (2024-12-11T06:29:20Z)
Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging. We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z)
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic [11.142414096809734]
In recent years, task arithmetic has garnered increasing attention.<n>This approach edits pre-trained models directly in weight space by combining the fine-tuned weights of various tasks into a unified model.<n>Applying such a unified model to individual tasks can lead to interference from other tasks (lack of weight disentanglement)
arXiv Detail & Related papers (2024-07-09T17:59:17Z)
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models [96.9373147383119]
We show that weight disentanglement is the crucial factor that makes task arithmetic effective. We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
arXiv Detail & Related papers (2023-05-22T08:39:25Z)
Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework. We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.