ComPhy: Compositional Physical Reasoning of Objects and Events from
Videos
- URL: http://arxiv.org/abs/2205.01089v1
- Date: Mon, 2 May 2022 17:59:13 GMT
- Title: ComPhy: Compositional Physical Reasoning of Objects and Events from
Videos
- Authors: Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba,
Joshua B. Tenenbaum, Chuang Gan
- Abstract summary: The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world.
Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction.
We propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution.
- Score: 113.2646904729092
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Objects' motions in nature are governed by complex interactions and their
properties. While some properties, such as shape and material, can be
identified via the object's visual appearances, others like mass and electric
charge are not directly visible. The compositionality between the visible and
hidden properties poses unique challenges for AI models to reason from the
physical world, whereas humans can effortlessly infer them with limited
observations. Existing studies on video reasoning mainly focus on visually
observable elements such as object appearance, movement, and contact
interaction. In this paper, we take an initial step to highlight the importance
of inferring the hidden physical properties not directly observable from visual
appearances, by introducing the Compositional Physical Reasoning (ComPhy)
dataset. For a given set of objects, ComPhy includes few videos of them moving
and interacting under different initial conditions. The model is evaluated
based on its capability to unravel the compositional hidden properties, such as
mass and charge, and use this knowledge to answer a set of questions posted on
one of the videos. Evaluation results of several state-of-the-art video
reasoning models on ComPhy show unsatisfactory performance as they fail to
capture these hidden properties. We further propose an oracle neural-symbolic
framework named Compositional Physics Learner (CPL), combining visual
perception, physical property learning, dynamic prediction, and symbolic
execution into a unified framework. CPL can effectively identify objects'
physical properties from their interactions and predict their dynamics to
answer questions.
Related papers
- Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions.
We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties.
We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z) - Physical Property Understanding from Language-Embedded Feature Fields [27.151380830258603]
We present a novel approach for dense prediction of the physical properties of objects using a collection of images.
Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object.
Our method is accurate, annotation-free, and applicable to any object in the open world.
arXiv Detail & Related papers (2024-04-05T17:45:07Z) - Intrinsic Physical Concepts Discovery with Object-Centric Predictive
Models [86.25460882547581]
We introduce the PHYsical Concepts Inference NEtwork (PHYCINE), a system that infers physical concepts in different abstract levels without supervision.
We show that object representations containing the discovered physical concepts variables could help achieve better performance in causal reasoning tasks.
arXiv Detail & Related papers (2023-03-03T11:52:21Z) - CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
via Video Question Answering [50.61988087577871]
We introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene.
CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning.
Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties.
arXiv Detail & Related papers (2022-11-07T18:55:26Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.