StochCA: A Novel Approach for Exploiting Pretrained Models with
Cross-Attention
- URL: http://arxiv.org/abs/2402.16092v1
- Date: Sun, 25 Feb 2024 13:53:49 GMT
- Title: StochCA: A Novel Approach for Exploiting Pretrained Models with
Cross-Attention
- Authors: Seungwon Seo, Suho Lee, Sangheum Hwang
- Abstract summary: We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
- Score: 2.992602379681373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Utilizing large-scale pretrained models is a well-known strategy to enhance
performance on various target tasks. It is typically achieved through
fine-tuning pretrained models on target tasks. However, na\"{\i}ve fine-tuning
may not fully leverage knowledge embedded in pretrained models. In this study,
we introduce a novel fine-tuning method, called stochastic cross-attention
(StochCA), specific to Transformer architectures. This method modifies the
Transformer's self-attention mechanism to selectively utilize knowledge from
pretrained models during fine-tuning. Specifically, in each block, instead of
self-attention, cross-attention is performed stochastically according to the
predefined probability, where keys and values are extracted from the
corresponding block of a pretrained model. By doing so, queries and
channel-mixing multi-layer perceptron layers of a target model are fine-tuned
to target tasks to learn how to effectively exploit rich representations of
pretrained models. To verify the effectiveness of StochCA, extensive
experiments are conducted on benchmarks in the areas of transfer learning and
domain generalization, where the exploitation of pretrained models is critical.
Our experimental results show the superiority of StochCA over state-of-the-art
approaches in both areas. Furthermore, we demonstrate that StochCA is
complementary to existing approaches, i.e., it can be combined with them to
further improve performance. Our code is available at
https://github.com/daintlab/stochastic_cross_attention
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained
Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align)
Our method aims to bolster the model's generalizability by preserving the consistency of spurious features.
Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.