Word Order Matters when you Increase Masking
- URL: http://arxiv.org/abs/2211.04427v1
- Date: Tue, 8 Nov 2022 18:14:04 GMT
- Title: Word Order Matters when you Increase Masking
- Authors: Karim Lasri and Alessandro Lenci and Thierry Poibeau
- Abstract summary: We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
- Score: 70.29624135819884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word order, an essential property of natural languages, is injected in
Transformer-based neural language models using position encoding. However,
recent experiments have shown that explicit position encoding is not always
useful, since some models without such feature managed to achieve state-of-the
art performance on some tasks. To understand better this phenomenon, we examine
the effect of removing position encodings on the pre-training objective itself
(i.e., masked language modelling), to test whether models can reconstruct
position information from co-occurrences alone. We do so by controlling the
amount of masked tokens in the input sentence, as a proxy to affect the
importance of position information for the task. We find that the necessity of
position information increases with the amount of masking, and that masked
language models without position encodings are not able to reconstruct this
information on the task. These findings point towards a direct relationship
between the amount of masking and the ability of Transformers to capture
order-sensitive aspects of language using position encoding.
Related papers
- Contextual Position Encoding: Learning to Count What's Important [42.038277620194]
We propose a new position encoding method, Contextual Position Flop (CoPE)
CoPE allows positions to be conditioned on context by incrementing position on certain tokens determined by the model.
We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail.
arXiv Detail & Related papers (2024-05-29T02:57:15Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - The Curious Case of Absolute Position Embeddings [65.13827063579728]
Transformer language models encode the notion of word order using positional information.
In natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated.
We observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information.
arXiv Detail & Related papers (2022-10-23T00:00:04Z) - Position Prediction as an Effective Pretraining Strategy [20.925906203643883]
We propose a novel, but surprisingly simple alternative to content reconstruction-- that of predicting locations from content, without providing positional information for it.
Our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods.
arXiv Detail & Related papers (2022-07-15T17:10:48Z) - Transformer Language Models without Positional Encodings Still Learn
Positional Information [45.42248458957122]
We find that transformer language models without any explicit positional encoding are still competitive with standard models.
We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position.
arXiv Detail & Related papers (2022-03-30T19:37:07Z) - Disentangling Representations of Text by Masking Transformers [27.6903196190087]
We learn binary masks over transformer weights or hidden units to uncover subsets of features that correlate with a specific factor of variation.
We evaluate this method with respect to its ability to disentangle representations of sentiment from genre in movie reviews, "toxicity" from dialect in Tweets, and syntax from semantics.
arXiv Detail & Related papers (2021-04-14T22:45:34Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.