Phy-Q: A Benchmark for Physical Reasoning
- URL: http://arxiv.org/abs/2108.13696v1
- Date: Tue, 31 Aug 2021 09:11:27 GMT
- Title: Phy-Q: A Benchmark for Physical Reasoning
- Authors: Cheng Xue, Vimukthini Pinto, Chathura Gamage, Ekaterina Nikonova, Peng
Zhang, Jochen Renz
- Abstract summary: We propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly.
Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios.
For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule.
- Score: 5.45672244836119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans are well-versed in reasoning about the behaviors of physical objects
when choosing actions to accomplish tasks, while it remains a major challenge
for AI. To facilitate research addressing this problem, we propose a new
benchmark that requires an agent to reason about physical scenarios and take an
action accordingly. Inspired by the physical knowledge acquired in infancy and
the capabilities required for robots to operate in real-world environments, we
identify 15 essential physical scenarios. For each scenario, we create a wide
variety of distinct task templates, and we ensure all the task templates within
the same scenario can be solved by using one specific physical rule. By having
such a design, we evaluate two distinct levels of generalization, namely the
local generalization and the broad generalization. We conduct an extensive
evaluation with human players, learning agents with varying input types and
architectures, and heuristic agents with different strategies. The benchmark
gives a Phy-Q (physical reasoning quotient) score that reflects the physical
reasoning ability of the agents. Our evaluation shows that 1) all agents fail
to reach human performance, and 2) learning agents, even with good local
generalization ability, struggle to learn the underlying physical reasoning
rules and fail to generalize broadly. We encourage the development of
intelligent agents with broad generalization abilities in physical domains.
Related papers
- HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Benchmarks for Physical Reasoning AI [28.02418565463541]
We offer an overview of existing benchmarks and their solution approaches for measuring the physical reasoning capacity of AI systems.
We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks.
We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.
arXiv Detail & Related papers (2023-12-17T14:24:03Z) - Physical Reasoning and Object Planning for Household Embodied Agents [19.88210708022216]
We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios.
COAT offers insights into the complexities of practical decision-making in real-world environments.
Our contributions include insightful human preference mappings for all three factors and four extensive QA datasets.
arXiv Detail & Related papers (2023-11-22T18:32:03Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Brain in a Vat: On Missing Pieces Towards Artificial General
Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents.
We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations.
We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z) - NovPhy: A Testbed for Physical Reasoning in Open-world Environments [5.736794130342911]
In the real world, we constantly face novel situations we have not encountered before.
An agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment.
We propose a new testbed, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties.
arXiv Detail & Related papers (2023-03-03T04:59:03Z) - QKSA: Quantum Knowledge Seeking Agent [0.0]
We present the motivation and the core thesis towards the implementation of a Quantum Knowledge Seeking Agent (QKSA)
QKSA is a general reinforcement learning agent that can be used to model classical and quantum dynamics.
arXiv Detail & Related papers (2021-07-03T13:07:58Z) - CausalCity: Complex Simulations with Agency for Causal Discovery and
Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning.
A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios.
We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z) - Hi-Phy: A Benchmark for Hierarchical Physical Reasoning [5.854222601444695]
Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds.
We propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities.
Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds.
arXiv Detail & Related papers (2021-06-17T17:46:50Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.