When does word order matter and when doesn't it?
- URL: http://arxiv.org/abs/2402.18838v2
- Date: Fri, 1 Mar 2024 17:40:04 GMT
- Title: When does word order matter and when doesn't it?
- Authors: Xuanda Chen and Timothy O'Donnell and Siva Reddy
- Abstract summary: Language models (LMs) may appear insensitive to word order changes in natural language understanding tasks.
linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues provide overlapping and thus redundant information.
We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences.
- Score: 31.092367724062644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) may appear insensitive to word order changes in natural
language understanding (NLU) tasks. In this paper, we propose that linguistic
redundancy can explain this phenomenon, whereby word order and other linguistic
cues such as case markers provide overlapping and thus redundant information.
Our hypothesis is that models exhibit insensitivity to word order when the
order provides redundant information, and the degree of insensitivity varies
across tasks. We quantify how informative word order is using mutual
information (MI) between unscrambled and scrambled sentences. Our results show
the effect that the less informative word order is, the more consistent the
model's predictions are between unscrambled and scrambled sentences. We also
find that the effect varies across tasks: for some tasks, like SST-2, LMs'
prediction is almost always consistent with the original one even if the
Pointwise-MI (PMI) changes, while for others, like RTE, the consistency is near
random when the PMI gets lower, i.e., word order is really important.
Related papers
- Word Order's Impacts: Insights from Reordering and Generation Analysis [9.0720895802828]
Existing works have studied the impacts of the order of words within natural text.
Considering this findings, different hypothesis about word order is proposed.
ChatGPT relies on word order to infer, but cannot support or negate the redundancy relations between word order lexical semantics.
arXiv Detail & Related papers (2024-03-18T04:45:44Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - A Cross-Linguistic Pressure for Uniform Information Density in Word
Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Out of Order: How important is the sequential order of words in a
sentence in Natural Language Understanding tasks? [34.18339528128342]
We find that state-of-the-art natural language understanding models don't care about word order when making predictions.
BERT-based models exploit superficial cues to make correct decisions when tokens are arranged in random orders.
Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.
arXiv Detail & Related papers (2020-12-30T14:56:12Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Characterizing the Effect of Sentence Context on Word Meanings: Mapping
Brain to Behavior [0.0]
This paper aims to answer whether the subjects are aware of such changes and agree with them.
Subjects were asked to judge how the word change from their generic meaning when the words were used in specific sentences.
Results support the hypothesis that word meaning change systematically depending on sentence context.
arXiv Detail & Related papers (2020-07-27T20:12:30Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.