A Frustratingly Easy Improvement for Position Embeddings via Random
Padding
- URL: http://arxiv.org/abs/2305.04859v1
- Date: Mon, 8 May 2023 17:08:14 GMT
- Title: A Frustratingly Easy Improvement for Position Embeddings via Random
Padding
- Authors: Mingxu Tao and Yansong Feng and Dongyan Zhao
- Abstract summary: In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to existing pre-trained language models.
Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions.
- Score: 68.75670223005716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Position embeddings, encoding the positional relationships among tokens in
text sequences, make great contributions to modeling local context features in
Transformer-based pre-trained language models. However, in Extractive Question
Answering, position embeddings trained with instances of varied context lengths
may not perform well as we expect. Since the embeddings of rear positions are
updated fewer times than the front position embeddings, the rear ones may not
be properly trained. In this paper, we propose a simple but effective strategy,
Random Padding, without any modifications to architectures of existing
pre-trained language models. We adjust the token order of input sequences when
fine-tuning, to balance the number of updating times of every position
embedding. Experiments show that Random Padding can significantly improve model
performance on the instances whose answers are located at rear positions,
especially when models are trained on short contexts but evaluated on long
contexts. Our code and data will be released for future research.
Related papers
- Unlocking the Transferability of Tokens in Deep Models for Tabular Data [67.11727608815636]
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks.
In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens.
We introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features.
arXiv Detail & Related papers (2023-10-23T17:53:09Z) - Extending Input Contexts of Language Models through Training on Segmented Sequences [34.42433279419559]
We develop a training procedure to extend the input context size of pretrained models with no architectural changes.
We demonstrate our method can extend input contexts by a factor of 4x while improving perplexity.
arXiv Detail & Related papers (2023-10-23T07:13:31Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - Dynamic Position Encoding for Transformers [18.315954297959617]
Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years.
Transformers could fail to properly encode sequential/positional information due to their non-recurrent nature.
We propose a novel architecture with new position embeddings depending on the input text to address this shortcoming.
arXiv Detail & Related papers (2022-04-18T03:08:48Z) - Towards the Unseen: Iterative Text Recognition by Distilling from Errors [41.43280922432707]
Prior arts mostly struggle with recognising unseen (or rarely seen) character sequences.
We put forward a novel framework to tackle this "unseen" problem.
Key to our success is a unique cross-modal variational autoencoder.
arXiv Detail & Related papers (2021-07-26T10:06:42Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - Locally Aware Piecewise Transformation Fields for 3D Human Mesh
Registration [67.69257782645789]
We propose piecewise transformation fields that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space.
We show that fitting parametric models with poses by our network results in much better registration quality, especially for extreme poses.
arXiv Detail & Related papers (2021-04-16T15:16:09Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Improve Transformer Models with Better Relative Position Embeddings [18.59434691153783]
Transformer architectures rely on explicit position encodings to preserve a notion of word order.
We argue that existing work does not fully utilize position information.
We propose new techniques that encourage increased interaction between query, key and relative position embeddings.
arXiv Detail & Related papers (2020-09-28T22:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.