Related papers: A Frustratingly Easy Improvement for Position Embeddings via Random Padding

A Frustratingly Easy Improvement for Position Embeddings via Random Padding

URL: http://arxiv.org/abs/2305.04859v1
Date: Mon, 8 May 2023 17:08:14 GMT
Title: A Frustratingly Easy Improvement for Position Embeddings via Random Padding
Authors: Mingxu Tao and Yansong Feng and Dongyan Zhao
Abstract summary: In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to existing pre-trained language models. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions.
Score: 68.75670223005716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Position embeddings, encoding the positional relationships among tokens in text sequences, make great contributions to modeling local context features in Transformer-based pre-trained language models. However, in Extractive Question Answering, position embeddings trained with instances of varied context lengths may not perform well as we expect. Since the embeddings of rear positions are updated fewer times than the front position embeddings, the rear ones may not be properly trained. In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to architectures of existing pre-trained language models. We adjust the token order of input sequences when fine-tuning, to balance the number of updating times of every position embedding. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions, especially when models are trained on short contexts but evaluated on long contexts. Our code and data will be released for future research.

Related papers

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding [89.52931576290976]
Transformers rely on both content-based and position-based addressing mechanisms to make predictions. TAPE is a novel framework that enhances positional embeddings by incorporating sequence content across layers. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
arXiv Detail & Related papers (2025-01-01T03:23:00Z)
Unlocking the Transferability of Tokens in Deep Models for Tabular Data [67.11727608815636]
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens. We introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features.
arXiv Detail & Related papers (2023-10-23T17:53:09Z)
Extending Input Contexts of Language Models through Training on Segmented Sequences [34.42433279419559]
We develop a training procedure to extend the input context size of pretrained models with no architectural changes. We demonstrate our method can extend input contexts by a factor of 4x while improving perplexity.
arXiv Detail & Related papers (2023-10-23T07:13:31Z)
Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z)
Dynamic Position Encoding for Transformers [18.315954297959617]
Recurrent models have been dominating the field of neural machine translation (NMT) for the past few years. Transformers could fail to properly encode sequential/positional information due to their non-recurrent nature. We propose a novel architecture with new position embeddings depending on the input text to address this shortcoming.
arXiv Detail & Related papers (2022-04-18T03:08:48Z)
Towards the Unseen: Iterative Text Recognition by Distilling from Errors [41.43280922432707]
Prior arts mostly struggle with recognising unseen (or rarely seen) character sequences. We put forward a novel framework to tackle this "unseen" problem. Key to our success is a unique cross-modal variational autoencoder.
arXiv Detail & Related papers (2021-07-26T10:06:42Z)
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences. We introduce denoising word alignment as a new cross-lingual pre-training task. Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z)
Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration [67.69257782645789]
We propose piecewise transformation fields that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space. We show that fitting parametric models with poses by our network results in much better registration quality, especially for extreme poses.
arXiv Detail & Related papers (2021-04-16T15:16:09Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
Improve Transformer Models with Better Relative Position Embeddings [18.59434691153783]
Transformer architectures rely on explicit position encodings to preserve a notion of word order. We argue that existing work does not fully utilize position information. We propose new techniques that encourage increased interaction between query, key and relative position embeddings.
arXiv Detail & Related papers (2020-09-28T22:18:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.