Reading Books is Great, But Not if You Are Driving! Visually Grounded
Reasoning about Defeasible Commonsense Norms
- URL: http://arxiv.org/abs/2310.10418v2
- Date: Sat, 11 Nov 2023 13:40:27 GMT
- Title: Reading Books is Great, But Not if You Are Driving! Visually Grounded
Reasoning about Defeasible Commonsense Norms
- Authors: Seungju Han and Junhyeok Kim and Jack Hessel and Liwei Jiang and Jiwan
Chung and Yejin Son and Yejin Choi and Youngjae Yu
- Abstract summary: We construct a new benchmark for studying visual-grounded commonsense norms: NORMLENS.
We find that state-of-the-art model judgments and explanations are not well-aligned with human annotation.
We present a new approach to better align models with humans by distilling social commonsense knowledge from large language models.
- Score: 65.17491295329991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonsense norms are defeasible by context: reading books is usually great,
but not when driving a car. While contexts can be explicitly described in
language, in embodied scenarios, contexts are often provided visually. This
type of visually grounded reasoning about defeasible commonsense norms is
generally easy for humans, but (as we show) poses a challenge for machines, as
it necessitates both visual understanding and reasoning about commonsense
norms. We construct a new multimodal benchmark for studying visual-grounded
commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments
accompanied by free-form explanations covering 2K multimodal situations, and
serves as a probe to address two questions: (1) to what extent can models align
with average human judgment? and (2) how well can models explain their
predicted judgments? We find that state-of-the-art model judgments and
explanations are not well-aligned with human annotation. Additionally, we
present a new approach to better align models with humans by distilling social
commonsense knowledge from large language models. The data and code are
released at https://seungjuhan.me/normlens.
Related papers
- One Thousand and One Pairs: A "novel" challenge for long-context language models [56.60667988954638]
NoCha is a dataset of 1,001 pairs of true and false claims about 67 fictional books.
Our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify.
On average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning.
arXiv Detail & Related papers (2024-06-24T02:03:57Z) - Are language models rational? The case of coherence norms and belief revision [63.78798769882708]
We consider logical coherence norms as well as coherence norms tied to the strength of belief in language models.
We argue that rational norms tied to coherence do apply to some language models, but not to others.
arXiv Detail & Related papers (2024-06-05T16:36:21Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Language Models Understand Us, Poorly [0.0]
I investigate three views of human language understanding: as-mapping, as-reliability and as-representation.
I argue that while behavioral reliability is necessary for understanding, internal representations are sufficient.
We need work which probes model internals, adds more of human language, and measures what models can learn.
arXiv Detail & Related papers (2022-10-19T15:58:59Z) - NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning [59.16962123636579]
This paper proposes a new take on Prolog-based inference engines.
We replace handcrafted rules with a combination of neural language modeling, guided generation, and semi dense retrieval.
Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA.
arXiv Detail & Related papers (2022-09-16T00:54:44Z) - Machine Reading, Fast and Slow: When Do Models "Understand" Language? [59.897515617661874]
We investigate the behavior of reading comprehension models with respect to two linguistic'skills': coreference resolution and comparison.
We find that for comparison (but not coreference) the systems based on larger encoders are more likely to rely on the 'right' information.
arXiv Detail & Related papers (2022-09-15T16:25:44Z) - Norm Participation Grounds Language [16.726800816202033]
I propose a different, and more wide-ranging conception of how grounding should be understood: What grounds language is its normative nature.
There are standards for doing things right, these standards are public and authoritative, while at the same time acceptance of authority can be disputed and negotiated.
What grounds language, then, is the determined use that language users make of it, and what it is grounded in is the community of language users.
arXiv Detail & Related papers (2022-06-06T20:21:59Z) - Reframing Human-AI Collaboration for Generating Free-Text Explanations [46.29832336779188]
We consider the task of generating free-text explanations using a small number of human-written examples.
We find that crowdworkers often prefer explanations generated by GPT-3 to crowdsourced human-written explanations.
We create a pipeline that combines GPT-3 with a supervised filter that incorporates humans-in-the-loop via binary acceptability judgments.
arXiv Detail & Related papers (2021-12-16T07:31:37Z) - Delphi: Towards Machine Ethics and Norms [38.8316885346292]
We identify four underlying challenges towards machine ethics and norms.
Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning.
We present Commonsense Norm Bank, a moral textbook customized for machines.
arXiv Detail & Related papers (2021-10-14T17:38:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.