Task Matrices: Linear Maps for Cross-Model Finetuning Transfer
- URL: http://arxiv.org/abs/2512.14880v1
- Date: Tue, 16 Dec 2025 19:51:29 GMT
- Title: Task Matrices: Linear Maps for Cross-Model Finetuning Transfer
- Authors: Darrin O' Brien, Dhikshith Gajulapalli, Eric Xia,
- Abstract summary: We develop the concept of a task matrix, a linear transformation from a base to finetuned embedding state.<n>We show that for vision and text models and ten different datasets, a base model augmented with a task matrix results surpassing linear probes.<n>Our results validate the existence of cross-layer linear encodings between pretrained and finetuned architectures.
- Score: 4.243926243206826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Results in interpretability suggest that large vision and language models learn implicit linear encodings when models are biased by in-context prompting. However, the existence of similar linear representations in more general adaptation regimes has not yet been demonstrated. In this work, we develop the concept of a task matrix, a linear transformation from a base to finetuned embedding state. We demonstrate that for vision and text models and ten different datasets, a base model augmented with a task matrix achieves results surpassing linear probes, sometimes approaching finetuned levels. Our results validate the existence of cross-layer linear encodings between pretrained and finetuned architectures. Moreover, we show that a data-based approximation for such encodings is both efficient and generalizable to multiple domains. We make our implementation publicly available.
Related papers
- Neighborhood-Adaptive Generalized Linear Graph Embedding with Latent Pattern Mining [0.0]
Graph embedding has been widely applied in areas such as network analysis, social network mining, recommendation systems, and bioinformatics.<n>We propose a novel model, Neighborhood-Adaptive Generalized Linear Graph Embedding (NGLGE), grounded in latent pattern mining.<n>This model introduces an adaptive graph learning method tailored to the neighborhood, effectively revealing intrinsic data correlations.
arXiv Detail & Related papers (2025-10-07T09:37:29Z) - Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices.
We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data.
We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Location Sensitive Embedding for Knowledge Graph Reasoning [0.0]
Key challenge in translational distance models is their inability to effectively differentiate between 'head' and 'tail' entities in graphs.<n>To address this problem, a novel location-sensitive embedding (LSE) method has been developed.<n>LSE innovatively modifies the head entity using relation-specific mappings, conceptualizing relations as linear transformations rather than mere translations.<n>A more streamlined variant, LSEd, which employs a diagonal matrix for transformations to enhance practical efficiency, is also proposed.
arXiv Detail & Related papers (2023-12-01T22:35:19Z) - Optimising for Interpretability: Convolutional Dynamic Alignment
Networks [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns.
CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-09-27T12:39:46Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Convolutional Dynamic Alignment Networks for Interpretable
Classifications [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA-Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which linearly transform their input with weight vectors that dynamically align with task-relevant patterns.
CoDA-Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-03-31T18:03:53Z) - Causality-aware counterfactual confounding adjustment for feature
representations learned by deep models [14.554818659491644]
Causal modeling has been recognized as a potential solution to many challenging problems in machine learning (ML)
We describe how a recently proposed counterfactual approach can still be used to deconfound the feature representations learned by deep neural network (DNN) models.
arXiv Detail & Related papers (2020-04-20T17:37:36Z) - BasisVAE: Translation-invariant feature-level clustering with
Variational Autoencoders [9.51828574518325]
Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction.
We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE.
arXiv Detail & Related papers (2020-03-06T23:10:52Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.