Word Order and World Knowledge
- URL: http://arxiv.org/abs/2403.00876v1
- Date: Fri, 1 Mar 2024 08:13:48 GMT
- Title: Word Order and World Knowledge
- Authors: Qinghua Zhao, Vinit Ravishankar, Nicolas Garneau and Anders S{\o}gaard
- Abstract summary: We study how word order affects the induction of world knowledge from raw text using language models.
Specifically, in addition to the natural word order, we first respectively extract texts of six fixed word orders from five languages.
- Score: 9.22384870426709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word order is an important concept in natural language, and in this work, we
study how word order affects the induction of world knowledge from raw text
using language models. We use word analogies to probe for such knowledge.
Specifically, in addition to the natural word order, we first respectively
extract texts of six fixed word orders from five languages and then pretrain
the language models on these texts. Finally, we analyze the experimental
results of the fixed word orders on word analogies and show that i) certain
fixed word orders consistently outperform or underperform others, though the
specifics vary across languages, and ii) the Wov2Lex hypothesis is not hold in
pre-trained language models, and the natural word order typically yields
mediocre results. The source code will be made publicly available at
https://github.com/lshowway/probing_by_analogy.
Related papers
- Word Order's Impacts: Insights from Reordering and Generation Analysis [9.0720895802828]
Existing works have studied the impacts of the order of words within natural text.
Considering this findings, different hypothesis about word order is proposed.
ChatGPT relies on word order to infer, but cannot support or negate the redundancy relations between word order lexical semantics.
arXiv Detail & Related papers (2024-03-18T04:45:44Z) - A Cross-Linguistic Pressure for Uniform Information Density in Word
Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders.
Among SVO languages, real word orders consistently have greater uniformity than reverse word orders.
Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z) - Pretraining without Wordpieces: Learning Over a Vocabulary of Millions
of Words [50.11559460111882]
We explore the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces.
Results show that, compared to standard wordpiece-based BERT, WordBERT makes significant improvements on cloze test and machine reading comprehension.
Since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets.
arXiv Detail & Related papers (2022-02-24T15:15:48Z) - Dict-BERT: Enhancing Language Model Pre-training with Dictionary [42.0998323292348]
Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora.
In this work, we focus on enhancing language model pre-training by leveraging definitions of rare words in dictionaries.
We propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions.
arXiv Detail & Related papers (2021-10-13T04:29:14Z) - On the Evolution of Word Order [7.2610922684683645]
We show that an optimal language is one with fixed word order.
We also show that adding information to the sentence, such as case markers and noun-verb distinction, reduces the need for fixed word order.
arXiv Detail & Related papers (2021-01-23T20:30:17Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.