Related papers: PROST: Physical Reasoning of Objects through Space and Time

PROST: Physical Reasoning of Objects through Space and Time

URL: http://arxiv.org/abs/2106.03634v1
Date: Mon, 7 Jun 2021 14:06:20 GMT
Title: PROST: Physical Reasoning of Objects through Space and Time
Authors: St\'ephane Aroca-Ouellette, Cory Paik, Alessandro Roncone, and Katharina Kann
Abstract summary: This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates. We conduct an analysis which demonstrates that state-of-the-art pretrained models are inadequate at physical reasoning.
Score: 68.69796589964076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a new probing dataset named PROST: Physical Reasoning about Objects Through Space and Time. This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates, covering 10 physical reasoning concepts. All questions are designed to probe both causal and masked language models in a zero-shot setting. We conduct an extensive analysis which demonstrates that state-of-the-art pretrained models are inadequate at physical reasoning: they are influenced by the order in which answer options are presented to them, they struggle when the superlative in a question is inverted (e.g., most <-> least), and increasing the amount of pretraining data and parameters only yields minimal improvements. These results provide support for the hypothesis that current pretrained models' ability to reason about physical interactions is inherently limited by a lack of real world experience. By highlighting these limitations, we hope to motivate the development of models with a human-like understanding of the physical world.

Related papers

PhysiX: A Foundation Model for Physics Simulations [27.359872113159405]
We introduce PhysiX, the first large-scale foundation model for physics simulation.<n>We show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines.<n>Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation.
arXiv Detail & Related papers (2025-06-21T18:10:12Z)
CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models [4.889577550694335]
CausalVQA is a benchmark dataset for video question answering (VQA)<n>It consists of question-answer pairs that probe models' understanding of causality in the physical world.
arXiv Detail & Related papers (2025-06-11T17:10:36Z)
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT [24.085953089267772]
We show how OpenAI o3 and GPT-4o fail to grasp basic physical laws, spatial interactions, and causal effects in complex scenes.<n>We introduce MVPBench, a benchmark designed to rigorously evaluate visual physical reasoning through the lens of visual chain-of-thought (CoT)<n> Experimental results reveal a concerning trend: even cutting-edge MLLMs exhibit poor visual reasoning accuracy and weak image-text alignment in physical domains.
arXiv Detail & Related papers (2025-05-30T03:48:59Z)
Compositional Physical Reasoning of Objects and Events from Videos [122.6862357340911]
This paper addresses the challenge of inferring hidden physical properties from objects' motion and interactions. We evaluate state-of-the-art video reasoning models on ComPhy and reveal their limited ability to capture these hidden properties. We also propose a novel neuro-symbolic framework, Physical Concept Reasoner (PCR), that learns and reasons about both visible and hidden physical properties.
arXiv Detail & Related papers (2024-08-02T15:19:55Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties [100.19685489335828]
This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems. We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability. We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
arXiv Detail & Related papers (2023-06-27T17:59:33Z)
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning [119.0100966278682]
This paper contributes PACS: the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains a total of 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. Using PACS, we evaluate multiple state-of-the-art models on this new challenging task.
arXiv Detail & Related papers (2022-03-21T17:05:23Z)
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions [11.078508605894411]
CRAFT is a new visual question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 38K video and question pairs that are generated from 3K videos from 10 different virtual environments. Inspired by the theory of force dynamics from the field of human cognitive psychology, we introduce new question categories that involve understanding the intentions of objects.
arXiv Detail & Related papers (2020-12-08T09:11:32Z)
Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors. Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.