X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
- URL: http://arxiv.org/abs/2308.10441v1
- Date: Mon, 21 Aug 2023 03:28:23 GMT
- Title: X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
- Authors: Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi
Zhang, Yixin Zhu
- Abstract summary: This study introduces X-VoE, a benchmark dataset to assess AI agents' grasp of intuitive physics.
X-VoE establishes a higher bar for the explanatory capacities of intuitive physics models.
We present an explanation-based learning system that captures physics dynamics and infers occluded object states.
- Score: 75.94926117990435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intuitive physics is pivotal for human understanding of the physical world,
enabling prediction and interpretation of events even in infancy. Nonetheless,
replicating this level of intuitive physics in artificial intelligence (AI)
remains a formidable challenge. This study introduces X-VoE, a comprehensive
benchmark dataset, to assess AI agents' grasp of intuitive physics. Built on
the developmental psychology-rooted Violation of Expectation (VoE) paradigm,
X-VoE establishes a higher bar for the explanatory capacities of intuitive
physics models. Each VoE scenario within X-VoE encompasses three distinct
settings, probing models' comprehension of events and their underlying
explanations. Beyond model evaluation, we present an explanation-based learning
system that captures physics dynamics and infers occluded object states solely
from visual sequences, without explicit occlusion labels. Experimental outcomes
highlight our model's alignment with human commonsense when tested against
X-VoE. A remarkable feature is our model's ability to visually expound VoE
events by reconstructing concealed scenes. Concluding, we discuss the findings'
implications and outline future research directions. Through X-VoE, we catalyze
the advancement of AI endowed with human-like intuitive physics capabilities.
Related papers
- Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.