Word Order Does Matter (And Shuffled Language Models Know It)
- URL: http://arxiv.org/abs/2203.10995v1
- Date: Mon, 21 Mar 2022 14:10:15 GMT
- Title: Word Order Does Matter (And Shuffled Language Models Know It)
- Authors: Vinit Ravishankar, Mostafa Abdou, Artur Kulmizev, Anders S{\o}gaard
- Abstract summary: Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
- Score: 9.990431777927421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that language models pretrained and/or fine-tuned
on randomly permuted sentences exhibit competitive performance on GLUE, putting
into question the importance of word order information. Somewhat
counter-intuitively, some of these studies also report that position embeddings
appear to be crucial for models' good performance with shuffled text. We probe
these language models for word order information and investigate what position
embeddings learned from shuffled text encode, showing that these models retain
information pertaining to the original, naturalistic word order. We show this
is in part due to a subtlety in how shuffling is implemented in previous work
-- before rather than after subword segmentation. Surprisingly, we find even
Language models trained on text shuffled after subword segmentation retain some
semblance of information about word order because of the statistical
dependencies between sentence length and unigram probabilities. Finally, we
show that beyond GLUE, a variety of language understanding tasks do require
word order information, often to an extent that cannot be learned through
fine-tuning.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - On the Difficulty of Segmenting Words with Attention [32.97060026226872]
We show, however, that even on monolingual data this approach is brittle.
In experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task.
arXiv Detail & Related papers (2021-09-21T11:37:08Z) - Studying word order through iterative shuffling [14.530986799844873]
We show that word order encodes meaning essential to performing NLP benchmark tasks.
We use IBIS, a novel, efficient procedure that finds the ordering of a bag of words having the highest likelihood under a fixed language model.
We discuss how shuffling inference procedures such as IBIS can benefit language modeling and constrained generation.
arXiv Detail & Related papers (2021-09-10T13:27:06Z) - Does He Wink or Does He Nod? A Challenging Benchmark for Evaluating Word
Understanding of Language Models [0.6091702876917281]
Recent progress in pretraining language models on large corpora has resulted in large performance gains on many NLP tasks.
To assess what kind of knowledge is acquired, language models are commonly probed by querying them with fill in the blank' style cloze questions.
We introduce WDLMPro to evaluate word understanding directly using dictionary definitions of words.
arXiv Detail & Related papers (2021-02-06T15:15:57Z) - Out of Order: How important is the sequential order of words in a
sentence in Natural Language Understanding tasks? [34.18339528128342]
We find that state-of-the-art natural language understanding models don't care about word order when making predictions.
BERT-based models exploit superficial cues to make correct decisions when tokens are arranged in random orders.
Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.
arXiv Detail & Related papers (2020-12-30T14:56:12Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.