Visual Grounding of Learned Physical Models
- URL: http://arxiv.org/abs/2004.13664v2
- Date: Mon, 29 Jun 2020 15:13:21 GMT
- Title: Visual Grounding of Learned Physical Models
- Authors: Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins,
Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba
- Abstract summary: Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
- Score: 66.04898704928517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans intuitively recognize objects' physical properties and predict their
motion, even when the objects are engaged in complicated interactions. The
abilities to perform physical reasoning and to adapt to new environments, while
intrinsic to humans, remain challenging to state-of-the-art computational
models. In this work, we present a neural model that simultaneously reasons
about physics and makes future predictions based on visual and dynamics priors.
The visual prior predicts a particle-based representation of the system from
visual observations. An inference module operates on those particles,
predicting and refining estimates of particle locations, object states, and
physical parameters, subject to the constraints imposed by the dynamics prior,
which we refer to as visual grounding. We demonstrate the effectiveness of our
method in environments involving rigid objects, deformable materials, and
fluids. Experiments show that our model can infer the physical properties
within a few observations, which allows the model to quickly adapt to unseen
scenarios and make accurate predictions into the future.
Related papers
- Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions.
We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties.
We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - Learning Physical Dynamics for Object-centric Visual Prediction [7.395357888610685]
The ability to model the underlying dynamics of visual scenes and reason about the future is central to human intelligence.
This paper proposes an unsupervised object-centric prediction model that makes future predictions by learning visual dynamics between objects.
arXiv Detail & Related papers (2024-03-15T07:45:25Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties [100.19685489335828]
This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems.
We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability.
We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
arXiv Detail & Related papers (2023-06-27T17:59:33Z) - ComPhy: Compositional Physical Reasoning of Objects and Events from
Videos [113.2646904729092]
The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world.
Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction.
We propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution.
arXiv Detail & Related papers (2022-05-02T17:59:13Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.