Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties
- URL: http://arxiv.org/abs/2306.15668v2
- Date: Thu, 2 Nov 2023 03:35:51 GMT
- Title: Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties
- Authors: Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel Bear, Chuang Gan,
Joshua B. Tenenbaum, Daniel LK Yamins, Judith E Fan, Kevin A. Smith
- Abstract summary: This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems.
We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability.
We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
- Score: 100.19685489335828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General physical scene understanding requires more than simply localizing and
recognizing objects -- it requires knowledge that objects can have different
latent properties (e.g., mass or elasticity), and that those properties affect
the outcome of physical events. While there has been great progress in physical
and video prediction models in recent years, benchmarks to test their
performance typically do not require an understanding that objects have
individual physical properties, or at best test only those properties that are
directly observable (e.g., size or color). This work proposes a novel dataset
and benchmark, termed Physion++, that rigorously evaluates visual physical
prediction in artificial systems under circumstances where those predictions
rely on accurate estimates of the latent physical properties of objects in the
scene. Specifically, we test scenarios where accurate prediction relies on
estimates of properties such as mass, friction, elasticity, and deformability,
and where the values of those properties can only be inferred by observing how
objects move and interact with other objects or fluids. We evaluate the
performance of a number of state-of-the-art prediction models that span a
variety of levels of learning vs. built-in knowledge, and compare that
performance to a set of human predictions. We find that models that have been
trained using standard regimes and datasets do not spontaneously learn to make
inferences about latent properties, but also that models that encode objectness
and physical states tend to make better predictions. However, there is still a
huge gap between all models and human performance, and all models' predictions
correlate poorly with those made by humans, suggesting that no state-of-the-art
model is learning to make physical predictions in a human-like way. Project
page: https://dingmyu.github.io/physion_v2/
Related papers
- Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions.
We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties.
We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z) - Physical Property Understanding from Language-Embedded Feature Fields [27.151380830258603]
We present a novel approach for dense prediction of the physical properties of objects using a collection of images.
Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object.
Our method is accurate, annotation-free, and applicable to any object in the open world.
arXiv Detail & Related papers (2024-04-05T17:45:07Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - ComPhy: Compositional Physical Reasoning of Objects and Events from
Videos [113.2646904729092]
The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world.
Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction.
We propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution.
arXiv Detail & Related papers (2022-05-02T17:59:13Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.