Hi-Phy: A Benchmark for Hierarchical Physical Reasoning
- URL: http://arxiv.org/abs/2106.09692v1
- Date: Thu, 17 Jun 2021 17:46:50 GMT
- Title: Hi-Phy: A Benchmark for Hierarchical Physical Reasoning
- Authors: Cheng Xue, Vimukthini Pinto, Chathura Gamage, Peng Zhang and Jochen
Renz
- Abstract summary: Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds.
We propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities.
Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds.
- Score: 5.854222601444695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning about the behaviour of physical objects is a key capability of
agents operating in physical worlds. Humans are very experienced in physical
reasoning while it remains a major challenge for AI. To facilitate research
addressing this problem, several benchmarks have been proposed recently.
However, these benchmarks do not enable us to measure an agent's granular
physical reasoning capabilities when solving a complex reasoning task. In this
paper, we propose a new benchmark for physical reasoning that allows us to test
individual physical reasoning capabilities. Inspired by how humans acquire
these capabilities, we propose a general hierarchy of physical reasoning
capabilities with increasing complexity. Our benchmark tests capabilities
according to this hierarchy through generated physical reasoning tasks in the
video game Angry Birds. This benchmark enables us to conduct a comprehensive
agent evaluation by measuring the agent's granular physical reasoning
capabilities. We conduct an evaluation with human players, learning agents, and
heuristic agents and determine their capabilities. Our evaluation shows that
learning agents, with good local generalization ability, still struggle to
learn the underlying physical reasoning capabilities and perform worse than
current state-of-the-art heuristic agents and humans. We believe that this
benchmark will encourage researchers to develop intelligent agents with
advanced, human-like physical reasoning capabilities. URL:
https://github.com/Cheng-Xue/Hi-Phy
Related papers
- Benchmarks for Physical Reasoning AI [28.02418565463541]
We offer an overview of existing benchmarks and their solution approaches for measuring the physical reasoning capacity of AI systems.
We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks.
We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.
arXiv Detail & Related papers (2023-12-17T14:24:03Z) - NovPhy: A Testbed for Physical Reasoning in Open-world Environments [5.736794130342911]
In the real world, we constantly face novel situations we have not encountered before.
An agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment.
We propose a new testbed, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties.
arXiv Detail & Related papers (2023-03-03T04:59:03Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - EgoTaskQA: Understanding Human Tasks in Egocentric Videos [89.9573084127155]
EgoTaskQA benchmark provides home for crucial dimensions of task understanding through question-answering on real-world egocentric videos.
We meticulously design questions that target the understanding of (1) action dependencies and effects, (2) intents and goals, and (3) agents' beliefs about others.
We evaluate state-of-the-art video reasoning models on our benchmark and show their significant gaps between humans in understanding complex goal-oriented egocentric videos.
arXiv Detail & Related papers (2022-10-08T05:49:05Z) - On some Foundational Aspects of Human-Centered Artificial Intelligence [52.03866242565846]
There is no clear definition of what is meant by Human Centered Artificial Intelligence.
This paper introduces the term HCAI agent to refer to any physical or software computational agent equipped with AI components.
We see the notion of HCAI agent, together with its components and functions, as a way to bridge the technical and non-technical discussions on human-centered AI.
arXiv Detail & Related papers (2021-12-29T09:58:59Z) - Phy-Q: A Benchmark for Physical Reasoning [5.45672244836119]
We propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly.
Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios.
For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule.
arXiv Detail & Related papers (2021-08-31T09:11:27Z) - CausalCity: Complex Simulations with Agency for Causal Discovery and
Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning.
A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios.
We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Thinking Fast and Slow in AI [38.8581204791644]
This paper proposes a research direction to advance AI which draws inspiration from cognitive theories of human decision making.
The premise is that if we gain insights about the causes of some human capabilities that are still lacking in AI, we may obtain similar capabilities in an AI system.
arXiv Detail & Related papers (2020-10-12T20:10:05Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.