Do Language Embeddings Capture Scales?
- URL: http://arxiv.org/abs/2010.05345v3
- Date: Tue, 24 Nov 2020 07:25:41 GMT
- Title: Do Language Embeddings Capture Scales?
- Authors: Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth
- Abstract summary: We show that pretrained language models capture a significant amount of information about the scalar magnitudes of objects.
We identify contextual information in pre-training and numeracy as two key factors affecting their performance.
- Score: 54.1633257459927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained Language Models (LMs) have been shown to possess significant
linguistic, common sense, and factual knowledge. One form of knowledge that has
not been studied yet in this context is information about the scalar magnitudes
of objects. We show that pretrained language models capture a significant
amount of this information but are short of the capability required for general
common-sense reasoning. We identify contextual information in pre-training and
numeracy as two key factors affecting their performance and show that a simple
method of canonicalizing numbers can have a significant effect on the results.
Related papers
- Enhancing elusive clues in knowledge learning by contrasting attention of language models [19.37767409898751]
The paper proposes a method to enhance knowledge learning during language model pretraining.
We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models.
arXiv Detail & Related papers (2024-09-26T15:30:54Z) - Roles of Scaling and Instruction Tuning in Language Perception: Model
vs. Human Attention [58.817405319722596]
This work compares the self-attention of several large language models (LLMs) in different sizes to assess the effect of scaling and instruction tuning on language perception.
Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not.
We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models.
arXiv Detail & Related papers (2023-10-29T17:16:40Z) - What does BERT learn about prosody? [1.1548853370822343]
We study whether prosody is part of the structural information of the language that models learn.
Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.
arXiv Detail & Related papers (2023-04-25T10:34:56Z) - Large Language Models Can Be Easily Distracted by Irrelevant Context [29.315230178997002]
We investigate how the model problem-solving accuracy can be influenced by irrelevant context.
We use benchmark to measure the distractibility of cutting-edge prompting techniques for large language models.
arXiv Detail & Related papers (2023-01-31T20:48:57Z) - Crawling the Internal Knowledge-Base of Language Models [53.95793060766248]
We describe a procedure for crawling'' the internal knowledge-base of a language model.
We evaluate our approach on graphs crawled starting from dozens of seed entities.
arXiv Detail & Related papers (2023-01-30T12:03:36Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Probing Linguistic Information For Logical Inference In Pre-trained
Language Models [2.4366811507669124]
We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations.
We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded.
We have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.
arXiv Detail & Related papers (2021-12-03T07:19:42Z) - Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format.
This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks.
Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z) - On the Multilingual Capabilities of Very Large-Scale English Language
Models [0.0]
Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning.
In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan.
We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario.
arXiv Detail & Related papers (2021-08-30T16:18:50Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.