Physion: Evaluating Physical Prediction from Vision in Humans and
Machines
- URL: http://arxiv.org/abs/2106.08261v2
- Date: Thu, 17 Jun 2021 17:20:27 GMT
- Title: Physion: Evaluating Physical Prediction from Vision in Humans and
Machines
- Authors: Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiau-Yu
Fish Tung, R.T. Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun
Sun, Li Fei-Fei, Nancy Kanwisher, Joshua B. Tenenbaum, Daniel L.K. Yamins,
Judith E. Fan
- Abstract summary: We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
- Score: 46.19008633309041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While machine learning algorithms excel at many challenging visual tasks, it
is unclear that they can make predictions about commonplace real world physical
events. Here, we present a visual and physical prediction benchmark that
precisely measures this capability. In realistically simulating a wide variety
of physical phenomena -- rigid and soft-body collisions, stable multi-object
configurations, rolling and sliding, projectile motion -- our dataset presents
a more comprehensive challenge than existing benchmarks. Moreover, we have
collected human responses for our stimuli so that model predictions can be
directly compared to human judgments. We compare an array of algorithms --
varying in their architecture, learning objective, input-output structure, and
training data -- on their ability to make diverse physical predictions. We find
that graph neural networks with access to the physical state best capture human
behavior, whereas among models that receive only visual input, those with
object-centric representations or pretraining do best but fall far short of
human accuracy. This suggests that extracting physically meaningful
representations of scenes is the main bottleneck to achieving human-like visual
prediction. We thus demonstrate how our benchmark can identify areas for
improvement and measure progress on this key aspect of physical understanding.
Related papers
- Identifying Terrain Physical Parameters from Vision -- Towards Physical-Parameter-Aware Locomotion and Navigation [33.10872127224328]
We propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation.
We train a physical decoder in simulation to predict friction and stiffness from multi-modal input.
The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment.
arXiv Detail & Related papers (2024-08-29T14:35:14Z) - Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties [100.19685489335828]
This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems.
We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability.
We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
arXiv Detail & Related papers (2023-06-27T17:59:33Z) - Neural Foundations of Mental Simulation: Future Prediction of Latent
Representations on Dynamic Scenes [3.2744507958793143]
We combine a goal-driven modeling approach with dense neurophysiological data and human behavioral readouts to impinge on this question.
Specifically, we construct and evaluate several classes of sensory-cognitive networks to predict the future state of rich, ethologically-relevant environments.
We find strong differentiation across these model classes in their ability to predict neural and behavioral data both within and across diverse environments.
arXiv Detail & Related papers (2023-05-19T15:56:06Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.