Related papers: Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

URL: http://arxiv.org/abs/2407.02880v2
Date: Tue, 29 Oct 2024 05:10:30 GMT
Title: Knowledge Composition using Task Vectors with Learned Anisotropic Scaling
Authors: Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad,
Abstract summary: We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
Score: 51.4661186662329
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsic dimensionality of pre-trained models, with only a few coefficients being the learnable parameters. Furthermore, composition of parameter blocks leverages the already learned representations, thereby reducing the dependency on large amounts of data. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives. In particular, we show that (1) learned anisotropic scaling allows task vectors to be more disentangled, causing less interference in composition; (2) task vector composition excels with scarce or no labeled data and is less prone to domain shift, thus leading to better generalisability; (3) mixing the most informative parameter blocks across different task vectors prior to training can reduce the memory footprint and improve the flexibility of knowledge transfer. Moreover, we show the potential of aTLAS as a PEFT method, particularly with less data, and demonstrate its scalibility.

Related papers

Decomposing Task Vectors for Refined Model Editing [21.799465464971092]
We propose a principled decomposition method that separates each task vector into two components.<n>By identifying invariant subspaces across projections, our approach enables more precise control over concept manipulation.
arXiv Detail & Related papers (2025-12-27T07:53:44Z)
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging [83.5273168208788]
Model merging aims to integrate task-specific abilities from individually fine-tuned models into a single model without extra training.<n>The merged model often suffers from notable performance degradation due to the conflicts caused by task-irrelevant redundancy in task vectors.<n>We propose Purifying TAsk Vectors (PAVE) in knowledge-aware subspace to overcome these challenges.
arXiv Detail & Related papers (2025-10-16T14:02:57Z)
Variational Task Vector Composition [53.476598858325985]
We propose variational task vector composition, where composition coefficients are taken as latent variables and estimated in a Bayesian inference framework.<n>Motivated by the observation of structural redundancy in task vectors, we introduce a Spike-and-Slab prior that promotes sparsity.<n>We develop a gated sampling mechanism that constructs a controllable posterior by filtering the composition coefficients based on both uncertainty and importance.
arXiv Detail & Related papers (2025-09-21T02:46:02Z)
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations [19.539276425108987]
This work proposes the Linear Combination Conjecture, positing that task vectors act as single in-context demonstrations formed through linear combinations of the original ones.<n>We show that task vectors naturally emerge in linear transformers trained on triplet-formatted prompts through loss landscape analysis.<n>We predict the failure of task vectors on representing high-rank mappings and confirm this on practical LLMs.
arXiv Detail & Related papers (2025-06-10T17:59:31Z)
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors. This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks. We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z)
Efficient Model Editing with Task-Localized Sparse Fine-tuning [14.792099973449794]
We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization. We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks. Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
arXiv Detail & Related papers (2025-04-03T14:20:06Z)
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain. We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters. Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z)
Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance. We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices. We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z)
Scalable variable selection for two-view learning tasks with projection operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z)
Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers [66.66228496844191]
The problem of learning one task with samples from another task is central to transfer learning (TL) In this paper, we examine a fundamental question: When does combining the data samples from a source task and a target task perform better than single-task learning with the target task alone?
arXiv Detail & Related papers (2020-10-22T14:14:20Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
An Advance on Variable Elimination with Applications to Tensor-Based Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference. The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z)
Graph-based Interpolation of Feature Vectors for Accurate Few-Shot Classification [2.922007656878633]
In few-shot classification, the aim is to learn models able to discriminate classes using only a small number of labeled examples. We propose a new method that relies on graphs only to interpolate feature vectors instead, resulting in a transductive learning setting.
arXiv Detail & Related papers (2020-01-27T15:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.