ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
- URL: http://arxiv.org/abs/2402.06119v2
- Date: Sun, 28 Jul 2024 05:43:43 GMT
- Title: ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
- Authors: Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan,
- Abstract summary: ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
- Score: 86.63174804149216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense. ContPhy complements existing physical reasoning benchmarks by encompassing the inference of diverse physical properties, such as mass and density, across various scenarios and predicting corresponding dynamics. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy, which shows that the current AI models still lack physical commonsense for the continuum, especially soft-bodies, and illustrates the value of the proposed dataset. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models, which enjoy the advantages of both models, precise dynamic predictions, and interpretable reasoning. ContPhy aims to spur progress in perception and reasoning within diverse physical settings, narrowing the divide between human and machine intelligence in understanding the physical world. Project page: https://physical-reasoning-project.github.io
Related papers
- Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties [100.19685489335828]
This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems.
We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability.
We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
arXiv Detail & Related papers (2023-06-27T17:59:33Z) - A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning
Across Event Categories [4.4920673251997885]
Violation-of-Expectation (VoE) is used to label scenes as either expected or surprising with knowledge of only expected scenes.
Existing VoE-based 3D datasets in physical reasoning provide mainly vision data with little to no-truths or inductive biases.
We set up a benchmark to study physical reasoning by curating a novel large-scale synthetic 3D VoE dataset armed with ground-truth labels of causally relevant features and rules.
arXiv Detail & Related papers (2021-11-16T22:59:25Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z) - Augmenting Physical Models with Deep Networks for Complex Dynamics
Forecasting [34.61959169976758]
APHYNITY is a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models.
It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model.
arXiv Detail & Related papers (2020-10-09T09:31:03Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.