Out of Order: How important is the sequential order of words in a
sentence in Natural Language Understanding tasks?
- URL: http://arxiv.org/abs/2012.15180v1
- Date: Wed, 30 Dec 2020 14:56:12 GMT
- Title: Out of Order: How important is the sequential order of words in a
sentence in Natural Language Understanding tasks?
- Authors: Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen
- Abstract summary: We find that state-of-the-art natural language understanding models don't care about word order when making predictions.
BERT-based models exploit superficial cues to make correct decisions when tokens are arranged in random orders.
Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.
- Score: 34.18339528128342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Do state-of-the-art natural language understanding models care about word
order - one of the most important characteristics of a sequence? Not always! We
found 75% to 90% of the correct predictions of BERT-based classifiers, trained
on many GLUE tasks, remain constant after input words are randomly shuffled.
Despite BERT embeddings are famously contextual, the contribution of each
individual word to downstream tasks is almost unchanged even after the word's
context is shuffled. BERT-based models are able to exploit superficial cues
(e.g. the sentiment of keywords in sentiment analysis; or the word-wise
similarity between sequence-pair inputs in natural language inference) to make
correct decisions when tokens are arranged in random orders. Encouraging
classifiers to capture word order information improves the performance on most
GLUE tasks, SQuAD 2.0 and out-of-samples. Our work suggests that many GLUE
tasks are not challenging machines to understand the meaning of a sentence.
Related papers
- When does word order matter and when doesn't it? [31.092367724062644]
Language models (LMs) may appear insensitive to word order changes in natural language understanding tasks.
linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues provide overlapping and thus redundant information.
We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences.
arXiv Detail & Related papers (2024-02-29T04:11:10Z) - Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features [19.261178173399784]
Our work studies word sensitivity (WS) in the prototypical setting of random features.
We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map.
We then translate these results on the word sensitivity into generalization bounds.
arXiv Detail & Related papers (2024-02-05T12:47:19Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z) - Pretraining without Wordpieces: Learning Over a Vocabulary of Millions
of Words [50.11559460111882]
We explore the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces.
Results show that, compared to standard wordpiece-based BERT, WordBERT makes significant improvements on cloze test and machine reading comprehension.
Since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets.
arXiv Detail & Related papers (2022-02-24T15:15:48Z) - Studying word order through iterative shuffling [14.530986799844873]
We show that word order encodes meaning essential to performing NLP benchmark tasks.
We use IBIS, a novel, efficient procedure that finds the ordering of a bag of words having the highest likelihood under a fixed language model.
We discuss how shuffling inference procedures such as IBIS can benefit language modeling and constrained generation.
arXiv Detail & Related papers (2021-09-10T13:27:06Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.