CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
- URL: http://arxiv.org/abs/2012.04293v1
- Date: Tue, 8 Dec 2020 09:11:32 GMT
- Title: CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
- Authors: Tayfun Ates, Muhammed Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert
Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret
- Abstract summary: CRAFT is a new visual question answering dataset that requires causal reasoning about physical forces and object interactions.
It contains 38K video and question pairs that are generated from 3K videos from 10 different virtual environments.
Inspired by the theory of force dynamics from the field of human cognitive psychology, we introduce new question categories that involve understanding the intentions of objects.
- Score: 11.078508605894411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Artificial Intelligence and deep learning have revived the
interest in studying the gap between the reasoning capabilities of humans and
machines. In this ongoing work, we introduce CRAFT, a new visual question
answering dataset that requires causal reasoning about physical forces and
object interactions. It contains 38K video and question pairs that are
generated from 3K videos from 10 different virtual environments, containing
different number of objects in motion that interact with each other. Two
question categories from CRAFT include previously studied descriptive and
counterfactual questions. Besides, inspired by the theory of force dynamics
from the field of human cognitive psychology, we introduce new question
categories that involve understanding the intentions of objects through the
notions of cause, enable, and prevent. Our preliminary results demonstrate that
even though these tasks are very intuitive for humans, the implemented
baselines could not cope with the underlying challenges.
Related papers
- Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions.
We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties.
We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z) - Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries [91.70689724416698]
We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources.
Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - STAR: A Benchmark for Situated Reasoning in Real-World Videos [94.78038233351758]
This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos.
The dataset includes four types of questions, including interaction, sequence, prediction, and feasibility.
We propose a diagnostic neuro-symbolic model that can disentangle visual perception, situation abstraction, language understanding, and functional reasoning.
arXiv Detail & Related papers (2024-05-15T21:53:54Z) - BDIQA: A New Dataset for Video Question Answering to Explore Cognitive
Reasoning through Theory of Mind [21.806678376095576]
Theory of mind (ToM) can make AI more closely resemble human thought processes.
Video question answer (VideoQA) datasets focus on studying causal reasoning within events few of them genuinely incorporating human ToM.
This paper presents BDIQA, the first benchmark to explore the cognitive reasoning capabilities of VideoQA models in the context of ToM.
arXiv Detail & Related papers (2024-02-12T04:34:19Z) - ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
Videos [53.92440577914417]
ACQUIRED consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints.
Each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal.
We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap.
arXiv Detail & Related papers (2023-11-02T22:17:03Z) - CLEVRER-Humans: Describing Physical and Causal Events the Human Way [55.44915246065028]
We present the CLEVRER-Humans benchmark, a video dataset for causal judgment of physical events with human labels.
We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models.
arXiv Detail & Related papers (2023-10-05T16:09:48Z) - EgoTaskQA: Understanding Human Tasks in Egocentric Videos [89.9573084127155]
EgoTaskQA benchmark provides home for crucial dimensions of task understanding through question-answering on real-world egocentric videos.
We meticulously design questions that target the understanding of (1) action dependencies and effects, (2) intents and goals, and (3) agents' beliefs about others.
We evaluate state-of-the-art video reasoning models on our benchmark and show their significant gaps between humans in understanding complex goal-oriented egocentric videos.
arXiv Detail & Related papers (2022-10-08T05:49:05Z) - Understanding the computational demands underlying visual reasoning [10.308647202215708]
We systematically assess the ability of modern deep convolutional neural networks to learn to solve visual reasoning problems.
Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by the type of relations and the number of relations used to compose the underlying rules.
arXiv Detail & Related papers (2021-08-08T10:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.