Thinking agents for zero-shot generalization to qualitatively novel tasks
- URL: http://arxiv.org/abs/2503.19815v1
- Date: Tue, 25 Mar 2025 16:26:31 GMT
- Title: Thinking agents for zero-shot generalization to qualitatively novel tasks
- Authors: Thomas Miconi, Kevin McKee, Yicong Zheng, Jed McCaleb,
- Abstract summary: We propose a method to train agents endowed with world models to make use of their mental simulation abilities.<n>The resulting agent successfully simulated alternative scenarios and used the resulting information to guide its behavior in the actual environment.
- Score: 0.974963895316339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent organisms can solve truly novel problems which they have never encountered before, either in their lifetime or their evolution. An important component of this capacity is the ability to ``think'', that is, to mentally manipulate objects, concepts and behaviors in order to plan and evaluate possible solutions to novel problems, even without environment interaction. To generate problems that are truly qualitatively novel, while still solvable zero-shot (by mental simulation), we use the combinatorial nature of environments: we train the agent while withholding a specific combination of the environment's elements. The novel test task, based on this combination, is thus guaranteed to be truly novel, while still mentally simulable since the agent has been exposed to each individual element (and their pairwise interactions) during training. We propose a method to train agents endowed with world models to make use their mental simulation abilities, by selecting tasks based on the difference between the agent's pre-thinking and post-thinking performance. When tested on the novel, withheld problem, the resulting agent successfully simulated alternative scenarios and used the resulting information to guide its behavior in the actual environment, solving the novel task in a single real-environment trial (zero-shot).
Related papers
- Metacognition for Unknown Situations and Environments (MUSE) [3.2020845462590697]
We propose the Metacognition for Unknown Situations and Environments (MUSE) framework.
MUSE integrates metacognitive processes--specifically self-awareness and self-regulation--into autonomous agents.
Agents show significant improvements in self-awareness and self-regulation.
arXiv Detail & Related papers (2024-11-20T18:41:03Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting.<n>We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems.<n>We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
arXiv Detail & Related papers (2023-11-16T08:52:27Z) - Novelty Accommodating Multi-Agent Planning in High Fidelity Simulated Open World [7.821603097781892]
We address the challenge that arises when unexpected phenomena, termed textitnovelties, emerge within the environment.<n>The introduction of novelties into the environment can lead to inaccuracies within the planner's internal model, rendering previously generated plans obsolete.<n>We propose a general purpose AI agent framework designed to detect, characterize, and adapt to support concurrent actions and external scheduling.
arXiv Detail & Related papers (2023-06-22T03:44:04Z) - Reflexion: Language Agents with Verbal Reinforcement Learning [44.85337947858337]
Reflexion is a novel framework to reinforce language agents not by updating weights, but through linguistic feedback.
It is flexible enough to incorporate various types (scalar values or free-form language) and sources (external or internally simulated) of feedback signals.
For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%.
arXiv Detail & Related papers (2023-03-20T18:08:50Z) - NovPhy: A Testbed for Physical Reasoning in Open-world Environments [5.736794130342911]
In the real world, we constantly face novel situations we have not encountered before.
An agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment.
We propose a new testbed, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties.
arXiv Detail & Related papers (2023-03-03T04:59:03Z) - CausalCity: Complex Simulations with Agency for Causal Discovery and
Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning.
A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios.
We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z) - HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem
Solving [104.79156980475686]
Humans learn compositional and causal abstraction, ie, knowledge, in response to the structure of naturalistic tasks.
We argue there shall be three levels of generalization in how an agent represents its knowledge: perceptual, conceptual, and algorithmic.
This benchmark is centered around a novel task domain, HALMA, for visual concept development and rapid problem-solving.
arXiv Detail & Related papers (2021-02-22T20:37:01Z) - Latent Skill Planning for Exploration and Transfer [49.25525932162891]
In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent.
We leverage the idea of partial amortization for fast adaptation at test time.
We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks.
arXiv Detail & Related papers (2020-11-27T18:40:03Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Transforming task representations to perform novel tasks [12.008469282323492]
An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero-shot)
We propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks.
arXiv Detail & Related papers (2020-05-08T23:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.