NovPhy: A Testbed for Physical Reasoning in Open-world Environments
- URL: http://arxiv.org/abs/2303.01711v2
- Date: Sat, 5 Aug 2023 12:47:07 GMT
- Title: NovPhy: A Testbed for Physical Reasoning in Open-world Environments
- Authors: Chathura Gamage, Vimukthini Pinto, Cheng Xue, Peng Zhang, Ekaterina
Nikonova, Matthew Stephenson, Jochen Renz
- Abstract summary: In the real world, we constantly face novel situations we have not encountered before.
An agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment.
We propose a new testbed, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties.
- Score: 5.736794130342911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the emergence of AI systems that interact with the physical
environment, there is an increased interest in incorporating physical reasoning
capabilities into those AI systems. But is it enough to only have physical
reasoning capabilities to operate in a real physical environment? In the real
world, we constantly face novel situations we have not encountered before. As
humans, we are competent at successfully adapting to those situations.
Similarly, an agent needs to have the ability to function under the impact of
novelties in order to properly operate in an open-world physical environment.
To facilitate the development of such AI systems, we propose a new testbed,
NovPhy, that requires an agent to reason about physical scenarios in the
presence of novelties and take actions accordingly. The testbed consists of
tasks that require agents to detect and adapt to novelties in physical
scenarios. To create tasks in the testbed, we develop eight novelties
representing a diverse novelty space and apply them to five commonly
encountered scenarios in a physical environment. According to our testbed
design, we evaluate two capabilities of an agent: the performance on a novelty
when it is applied to different physical scenarios and the performance on a
physical scenario when different novelties are applied to it. We conduct a
thorough evaluation with human players, learning agents, and heuristic agents.
Our evaluation shows that humans' performance is far beyond the agents'
performance. Some agents, even with good normal task performance, perform
significantly worse when there is a novelty, and the agents that can adapt to
novelties typically adapt slower than humans. We promote the development of
intelligent agents capable of performing at the human level or above when
operating in open-world physical environments. Testbed website:
https://github.com/phy-q/novphy
Related papers
- HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible.
We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains.
We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z) - Novelty Accommodating Multi-Agent Planning in High Fidelity Simulated
Open World [0.0]
Novelty is an unexpected phenomenon that can alter the core characteristics, composition, and dynamics of the environment.
Previous studies show that novelty has catastrophic impact on agent performance.
In this work, we demonstrate that a domain-independent AI agent can be adapted to successfully perform and reason with novelty in realistic high-fidelity simulator of the military domain.
arXiv Detail & Related papers (2023-06-22T03:44:04Z) - Characterizing Novelty in the Military Domain [0.0]
In operation, a rich environment is likely to present challenges not seen in training sets or accounted for in engineered models.
A program at the Defense Advanced Research Project Agency (DARPA) seeks to develop agents that are robust to novelty.
This capability will be required, before AI has the role envisioned within mission critical environments.
arXiv Detail & Related papers (2023-02-23T20:21:24Z) - Towards the Neuroevolution of Low-level Artificial General Intelligence [5.2611228017034435]
We argue that the search for Artificial General Intelligence (AGI) should start from a much lower level than human-level intelligence.
Our hypothesis is that learning occurs through sensory feedback when an agent acts in an environment.
We evaluate a method to evolve a biologically-inspired artificial neural network that learns from environment reactions.
arXiv Detail & Related papers (2022-07-27T15:30:50Z) - The Introspective Agent: Interdependence of Strategy, Physiology, and
Sensing for Embodied Agents [51.94554095091305]
We argue for an introspective agent, which considers its own abilities in the context of its environment.
Just as in nature, we hope to reframe strategy as one tool, among many, to succeed in an environment.
arXiv Detail & Related papers (2022-01-02T20:14:01Z) - OPEn: An Open-ended Physics Environment for Learning Without a Task [132.6062618135179]
We will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks.
We build a benchmark Open-ended Physics ENvironment (OPEn) and also design several tasks to test learning representations in this environment explicitly.
We find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
arXiv Detail & Related papers (2021-10-13T17:48:23Z) - Phy-Q: A Benchmark for Physical Reasoning [5.45672244836119]
We propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly.
Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios.
For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule.
arXiv Detail & Related papers (2021-08-31T09:11:27Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z) - ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments.
We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.