Transformer-F: A Transformer network with effective methods for learning
universal sentence representation
- URL: http://arxiv.org/abs/2107.00653v1
- Date: Fri, 2 Jul 2021 03:20:11 GMT
- Title: Transformer-F: A Transformer network with effective methods for learning
universal sentence representation
- Authors: Yu Shi
- Abstract summary: The Transformer model is widely used in natural language processing for sentence representation.
In this paper, two approaches are introduced to improve the performance of Transformers.
- Score: 8.225067988604351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Transformer model is widely used in natural language processing for
sentence representation. However, the previous Transformer-based models focus
on function words that have limited meaning in most cases and could merely
extract high-level semantic abstraction features. In this paper, two approaches
are introduced to improve the performance of Transformers. We calculated the
attention score by multiplying the part-of-speech weight vector with the
correlation coefficient, which helps extract the words with more practical
meaning. The weight vector is obtained by the input text sequence based on the
importance of the part-of-speech. Furthermore, we fuse the features of each
layer to make the sentence representation results more comprehensive and
accurate. In experiments, we demonstrate the effectiveness of our model
Transformer-F on three standard text classification datasets. Experimental
results show that our proposed model significantly boosts the performance of
text classification as compared to the baseline model. Specifically, we obtain
a 5.28% relative improvement over the vanilla Transformer on the simple tasks.
Related papers
- Differential Transformer [99.5117269150629]
Transformer tends to overallocate attention to irrelevant context.
We introduce Diff Transformer, which amplifies attention to relevant context while canceling noise.
It offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers.
arXiv Detail & Related papers (2024-10-07T17:57:38Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Enhanced Transformer Architecture for Natural Language Processing [2.6071653283020915]
Transformer is a state-of-the-art model in the field of natural language processing (NLP)
In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention.
The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset.
arXiv Detail & Related papers (2023-10-17T01:59:07Z) - When to Use Efficient Self Attention? Profiling Text, Speech and Image
Transformer Variants [39.00433193973159]
We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision.
We identify input length thresholds (tipping points) at which efficient Transformer variants become more efficient than vanilla models.
To conduct this analysis for speech, we introduce L-HuBERT, a novel local-attention variant of a self-supervised speech model.
arXiv Detail & Related papers (2023-06-14T17:59:02Z) - A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences.
We introduce a relative position embedding to explicitly maximize attention resolution.
We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z) - Transformer over Pre-trained Transformer for Neural Text Segmentation
with Enhanced Topic Coherence [6.73258176462356]
It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings.
Our experiments show that Transformer$2$ manages to surpass state-of-the-art text segmentation models in terms of a commonly-used semantic coherence measure.
arXiv Detail & Related papers (2021-10-14T05:26:39Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Fastformer: Additive Attention Can Be All You Need [51.79399904527525]
We propose Fastformer, which is an efficient Transformer model based on additive attention.
In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts.
In this way, Fastformer can achieve effective context modeling with linear complexity.
arXiv Detail & Related papers (2021-08-20T09:44:44Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.