Brain-Inspired Planning for Better Generalization in Reinforcement Learning
- URL: http://arxiv.org/abs/2511.06470v1
- Date: Sun, 09 Nov 2025 17:32:55 GMT
- Title: Brain-Inspired Planning for Better Generalization in Reinforcement Learning
- Authors: Mingde "Harry" Zhao,
- Abstract summary: This thesis explores the direction of enhancing agents' zero-shot systematic generalization abilities.<n>We introduce a top-down attention mechanism, which allows a decision-time planning agent to dynamically focus its reasoning on the most relevant aspects of the environmental state.<n>We also developed the Skipper framework to automatically decompose complex tasks into simpler, more manageable sub-tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Reinforcement Learning (RL) systems encounter significant challenges when applied to real-world scenarios, primarily due to poor generalization across environments that differ from their training conditions. This thesis explores the direction of enhancing agents' zero-shot systematic generalization abilities by granting RL agents reasoning behaviors that are found to help systematic generalization in the human brain. Inspired by human conscious planning behaviors, we first introduced a top-down attention mechanism, which allows a decision-time planning agent to dynamically focus its reasoning on the most relevant aspects of the environmental state given its instantaneous intentions, a process we call "spatial abstraction". This approach significantly improves systematic generalization outside the training tasks. Subsequently, building on spatial abstraction, we developed the Skipper framework to automatically decompose complex tasks into simpler, more manageable sub-tasks. Skipper provides robustness against distributional shifts and efficacy in long-term, compositional planning by focusing on pertinent spatial and temporal elements of the environment. Finally, we identified a common failure mode and safety risk in planning agents that rely on generative models to generate state targets during planning. It is revealed that most agents blindly trust the targets they hallucinate, resulting in delusional planning behaviors. Inspired by how the human brain rejects delusional intentions, we propose learning a feasibility evaluator to enable rejecting hallucinated infeasible targets, which led to significant performance improvements in various kinds of planning agents. Finally, we suggest directions for future research, aimed at achieving general task abstraction and fully enabling abstract planning.
Related papers
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.32591749156416]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning [51.54559117314768]
Recent work investigated the use of Reinforcement Learning (RL) for the synthesis of guidance to improve the performance of temporal planners.<n>We propose an evolution of this learning and planning framework that focuses on exploiting the information provided by symbolics during both the RL and planning phases.
arXiv Detail & Related papers (2025-05-19T17:19:13Z) - Interpreting Emergent Planning in Model-Free Reinforcement Learning [13.820891288919002]
We present the first evidence that model-free reinforcement learning agents can learn to plan.<n>This is achieved by applying a methodology based on concept-based interpretability to a model-free agent in Sokoban.
arXiv Detail & Related papers (2025-04-02T16:24:23Z) - Rejecting Hallucinated State Targets during Planning [84.179112256683]
In planning processes, generative or predictive models are often used to propose "targets" representing sets of expected or desirable states.<n>Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns.<n>We devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator.
arXiv Detail & Related papers (2024-10-09T17:35:25Z) - Synthesizing Evolving Symbolic Representations for Autonomous Systems [2.4233709516962785]
This paper presents an open-ended learning system able to synthesize from scratch its experience into a PPDDL representation and update it over time.
The system explores the environment and iteratively: (a) discover options, (b) explore the environment using options, (c) abstract the knowledge collected and (d) plan.
arXiv Detail & Related papers (2024-09-18T07:23:26Z) - AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation [81.32722475387364]
Large Language Model-based agents have garnered significant attention and are becoming increasingly popular.<n>Planning ability is a crucial component of an LLM-based agent, which generally entails achieving a desired goal from an initial state.<n>Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities.
arXiv Detail & Related papers (2024-08-01T17:59:46Z) - Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games [6.532258098619471]
We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data.<n>We show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate.
arXiv Detail & Related papers (2024-03-16T03:53:55Z) - Dynamic planning in hierarchical active inference [0.0]
We refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions.
This study focuses on the topic of dynamic planning in active inference.
arXiv Detail & Related papers (2024-02-18T17:32:53Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Novelty Accommodating Multi-Agent Planning in High Fidelity Simulated Open World [7.821603097781892]
We address the challenge that arises when unexpected phenomena, termed textitnovelties, emerge within the environment.<n>The introduction of novelties into the environment can lead to inaccuracies within the planner's internal model, rendering previously generated plans obsolete.<n>We propose a general purpose AI agent framework designed to detect, characterize, and adapt to support concurrent actions and external scheduling.
arXiv Detail & Related papers (2023-06-22T03:44:04Z) - SPOTTER: Extending Symbolic Planning Operators through Targeted
Reinforcement Learning [24.663586662594703]
Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains.
Reinforcement learning approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards.
We propose an integrated framework named SPOTTER that uses RL to augment and support ("spot") a planning agent by discovering new operators needed to accomplish goals that are initially unreachable for the agent.
arXiv Detail & Related papers (2020-12-24T00:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.