Probing Physical Reasoning with Counter-Commonsense Context
- URL: http://arxiv.org/abs/2306.02258v1
- Date: Sun, 4 Jun 2023 04:24:43 GMT
- Title: Probing Physical Reasoning with Counter-Commonsense Context
- Authors: Kazushi Kondo, Saku Sugawara, Akiko Aizawa
- Abstract summary: This study investigates how physical commonsense affects the contextualized size comparison task.
This dataset tests the ability of language models to predict the size relationship between objects under various contexts.
- Score: 34.8562766828087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we create a CConS (Counter-commonsense Contextual Size
comparison) dataset to investigate how physical commonsense affects the
contextualized size comparison task; the proposed dataset consists of both
contexts that fit physical commonsense and those that do not. This dataset
tests the ability of language models to predict the size relationship between
objects under various contexts generated from our curated noun list and
templates. We measure the ability of several masked language models and
generative models. The results show that while large language models can use
prepositions such as ``in'' and ``into'' in the provided context to infer size
relationships, they fail to use verbs and thus make incorrect judgments led by
their prior physical commonsense.
Related papers
- Towards a theory of how the structure of language is acquired by deep neural networks [6.363756171493383]
We use a tree-like generative model that captures many of the hierarchical structures found in natural languages.
We show that token-token correlations can be used to build a representation of the grammar's hidden variables.
We conjecture that the relationship between training set size and effective range of correlations holds beyond our synthetic datasets.
arXiv Detail & Related papers (2024-05-28T17:01:22Z) - Context versus Prior Knowledge in Language Models [49.17879668110546]
Language models often need to integrate prior knowledge learned during pretraining and new information presented in context.
We propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity.
arXiv Detail & Related papers (2024-04-06T13:46:53Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Assessing Linguistic Generalisation in Language Models: A Dataset for
Brazilian Portuguese [4.941630596191806]
We propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese.
These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions.
arXiv Detail & Related papers (2023-05-23T13:49:14Z) - Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets [18.754562380068815]
State-of-the-art contextualized models such as BERT use tasks such as WiC and WSD to evaluate their word-in-context representations.
This study presents the first quantitative analysis (using probing baselines) on the context-word interaction being tested in major contextual lexical semantic tasks.
arXiv Detail & Related papers (2021-12-13T15:37:05Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.