Linear Interpolation In Parameter Space is Good Enough for Fine-Tuned
Language Models
- URL: http://arxiv.org/abs/2211.12092v1
- Date: Tue, 22 Nov 2022 08:49:22 GMT
- Title: Linear Interpolation In Parameter Space is Good Enough for Fine-Tuned
Language Models
- Authors: Mark Rofin, Nikita Balagansky, Daniil Gavrilov
- Abstract summary: We explore linear connectivity between parameters of pre-trained models after fine-tuning.
Surprisingly, we could perform linear inference without a performance drop in intermediate points for fine-tuned models.
For controllable text generation, such inference could be seen as moving a model towards or against the desired text.
- Score: 0.21485350418225244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The simplest way to obtain continuous interpolation between two points in
high dimensional space is to draw a line between them. While previous works
focused on the general connectivity between model parameters, we explored
linear interpolation for parameters of pre-trained models after fine-tuning.
Surprisingly, we could perform linear interpolation without a performance drop
in intermediate points for fine-tuned models. For controllable text generation,
such interpolation could be seen as moving a model towards or against the
desired text attribute (e.g., positive sentiment), which could be used as
grounds for further methods for controllable text generation without inference
speed overhead.
Related papers
- State Soup: In-Context Skill Learning, Retrieval and Mixing [22.485700977542127]
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.
Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter.
Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined.
arXiv Detail & Related papers (2024-06-12T17:06:07Z) - On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm [47.55215041326702]
We discover an intriguing linear phenomenon in models that are from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL)
We show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linearities of features in two finetuned models at each layer.
We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space.
arXiv Detail & Related papers (2024-02-06T03:28:36Z) - Shuffled Autoregression For Motion Interpolation [53.61556200049156]
This work aims to provide a deep-learning solution for the motion task.
We propose a novel framework, referred to as emphShuffled AutoRegression, which expands the autoregression to generate in arbitrary (shuffled) order.
We also propose an approach to constructing a particular kind of dependency graph, with three stages assembled into an end-to-end spatial-temporal motion Transformer.
arXiv Detail & Related papers (2023-06-10T07:14:59Z) - Generalized Relation Modeling for Transformer Tracking [13.837171342738355]
One-stream trackers let the template interact with all parts inside the search region throughout all the encoder layers.
This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative.
We propose a generalized relation modeling method based on adaptive token division.
Our method is superior to the two-stream and one-stream pipelines and achieves state-of-the-art performance on six challenging benchmarks with a real-time running speed.
arXiv Detail & Related papers (2023-03-29T10:29:25Z) - Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space.
We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space.
Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z) - Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion.
This approach is not optimal when the temporal distance between the input sequence increases.
We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z) - NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One
Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes.
NeuroMorph produces smooth and point-to-point correspondences between them.
It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - On Linear Interpolation in the Latent Space of Deep Generative Models [0.0]
Smoothness and plausibility of linears in latent space are associated with the quality of the underlying generative model.
We show that not all such curves are comparable as they can deviate arbitrarily from the shortest curve given by the geodesic.
This deviation is revealed by computing curve lengths with the pull-back metric of the generative model.
arXiv Detail & Related papers (2021-05-08T10:27:07Z) - Rationalizing Text Matching: Learning Sparse Alignments via Optimal
Transport [14.86310501896212]
In this work, we extend this selective rationalization approach to text matching.
The goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction.
Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs.
arXiv Detail & Related papers (2020-05-27T01:20:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.