PROST: Physical Reasoning of Objects through Space and Time
- URL: http://arxiv.org/abs/2106.03634v1
- Date: Mon, 7 Jun 2021 14:06:20 GMT
- Title: PROST: Physical Reasoning of Objects through Space and Time
- Authors: St\'ephane Aroca-Ouellette, Cory Paik, Alessandro Roncone, and
Katharina Kann
- Abstract summary: This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates.
We conduct an analysis which demonstrates that state-of-the-art pretrained models are inadequate at physical reasoning.
- Score: 68.69796589964076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new probing dataset named PROST: Physical Reasoning about
Objects Through Space and Time. This dataset contains 18,736 multiple-choice
questions made from 14 manually curated templates, covering 10 physical
reasoning concepts. All questions are designed to probe both causal and masked
language models in a zero-shot setting. We conduct an extensive analysis which
demonstrates that state-of-the-art pretrained models are inadequate at physical
reasoning: they are influenced by the order in which answer options are
presented to them, they struggle when the superlative in a question is inverted
(e.g., most <-> least), and increasing the amount of pretraining data and
parameters only yields minimal improvements. These results provide support for
the hypothesis that current pretrained models' ability to reason about physical
interactions is inherently limited by a lack of real world experience. By
highlighting these limitations, we hope to motivate the development of models
with a human-like understanding of the physical world.
Related papers
- ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [90.97595947781426]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - Physion++: Evaluating Physical Scene Understanding that Requires Online
Inference of Different Physical Properties [100.19685489335828]
This work proposes a novel dataset and benchmark, termed Physion++, to rigorously evaluate visual physical prediction in artificial systems.
We test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability.
We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions.
arXiv Detail & Related papers (2023-06-27T17:59:33Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - PACS: A Dataset for Physical Audiovisual CommonSense Reasoning [119.0100966278682]
This paper contributes PACS: the first audiovisual benchmark annotated for physical commonsense attributes.
PACS contains a total of 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos.
Using PACS, we evaluate multiple state-of-the-art models on this new challenging task.
arXiv Detail & Related papers (2022-03-21T17:05:23Z) - CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions [11.078508605894411]
CRAFT is a new visual question answering dataset that requires causal reasoning about physical forces and object interactions.
It contains 38K video and question pairs that are generated from 3K videos from 10 different virtual environments.
Inspired by the theory of force dynamics from the field of human cognitive psychology, we introduce new question categories that involve understanding the intentions of objects.
arXiv Detail & Related papers (2020-12-08T09:11:32Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.