CLEVRER-Humans: Describing Physical and Causal Events the Human Way
- URL: http://arxiv.org/abs/2310.03635v1
- Date: Thu, 5 Oct 2023 16:09:48 GMT
- Title: CLEVRER-Humans: Describing Physical and Causal Events the Human Way
- Authors: Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu
- Abstract summary: We present the CLEVRER-Humans benchmark, a video dataset for causal judgment of physical events with human labels.
We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models.
- Score: 55.44915246065028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building machines that can reason about physical events and their causal
relationships is crucial for flexible interaction with the physical world.
However, most existing physical and causal reasoning benchmarks are exclusively
based on synthetically generated events and synthetic natural language
descriptions of causal relationships. This design brings up two issues. First,
there is a lack of diversity in both event types and natural language
descriptions; second, causal relationships based on manually-defined heuristics
are different from human judgments. To address both shortcomings, we present
the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of
physical events with human labels. We employ two techniques to improve data
collection efficiency: first, a novel iterative event cloze task to elicit a
new representation of events in videos, which we term Causal Event Graphs
(CEGs); second, a data augmentation technique based on neural language
generative models. We convert the collected CEGs into questions and answers to
be consistent with prior work. Finally, we study a collection of baseline
approaches for CLEVRER-Humans question-answering, highlighting the great
challenges set forth by our benchmark.
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering [30.000134835133522]
Event Causality Identification (DECI) aims to identify causal relations between two events in documents.
Recent research tends to use pre-trained language models to generate the event causal relations.
We propose a multi-task learning framework to enhance event causality identification with rationale and structure-aware causal question answering.
arXiv Detail & Related papers (2024-03-17T07:41:58Z) - Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs [61.796960984541464]
We present COM2 (COMplex COMmonsense), a new dataset created by sampling logical queries.
We verbalize them using handcrafted rules and large language models into multiple-choice and text generation questions.
Experiments show that language models trained on COM2 exhibit significant improvements in complex reasoning ability.
arXiv Detail & Related papers (2024-03-12T08:13:52Z) - Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations.
We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning
Across Event Categories [4.4920673251997885]
Violation-of-Expectation (VoE) is used to label scenes as either expected or surprising with knowledge of only expected scenes.
Existing VoE-based 3D datasets in physical reasoning provide mainly vision data with little to no-truths or inductive biases.
We set up a benchmark to study physical reasoning by curating a novel large-scale synthetic 3D VoE dataset armed with ground-truth labels of causally relevant features and rules.
arXiv Detail & Related papers (2021-11-16T22:59:25Z) - A Targeted Assessment of Incremental Processing in Neural LanguageModels
and Humans [2.7624021966289605]
We present a scaled-up comparison of incremental processing in humans and neural language models.
Data comes from a novel online experimental paradigm called the Interpolated Maze task.
We find that both humans and language models show increased processing difficulty in ungrammatical sentence regions.
arXiv Detail & Related papers (2021-06-06T20:04:39Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - ESPRIT: Explaining Solutions to Physical Reasoning Tasks [106.77019206219984]
ESPRIT is a framework for commonsense reasoning about qualitative physics in natural language.
Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution.
Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations.
arXiv Detail & Related papers (2020-05-02T07:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.