Related papers: Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models

Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models

URL: http://arxiv.org/abs/2010.01909v3
Date: Mon, 15 Nov 2021 21:12:54 GMT
Title: Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models
Authors: Sunandita Patra, James Mason, Malik Ghallab, Dana Nau, Paolo Traverso
Abstract summary: In AI research, a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action. executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used. We implement an integrated acting and planning system in which both planning and acting use the same operational models.
Score: 5.597986898418404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In AI research, synthesizing a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action, and are tailored for efficiently computing state transitions. However, executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used to specify how to perform an action in a nondeterministic execution context, react to events and adapt to an unfolding situation. Deliberative actors, which integrate acting and planning, have typically needed to use both of these models together -- which causes problems when attempting to develop the different models, verify their consistency, and smoothly interleave acting and planning. As an alternative, we define and implement an integrated acting and planning system in which both planning and acting use the same operational models. These rely on hierarchical task-oriented refinement methods offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system. At each decision step, RAE can get advice from a planner for a near-optimal choice with respect to a utility function. The anytime planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM, whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. We demonstrate the asymptotic convergence of UPOM towards optimal methods in static domains, and show experimentally that UPOM and the learning strategies significantly improve the acting efficiency and robustness.

Related papers

Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Interpreting Emergent Planning in Model-Free Reinforcement Learning [13.820891288919002]
We present the first evidence that model-free reinforcement learning agents can learn to plan. This is achieved by applying a methodology based on concept-based interpretability to a model-free agent in Sokoban.
arXiv Detail & Related papers (2025-04-02T16:24:23Z)
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [39.53836535326121]
We propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines.
arXiv Detail & Related papers (2025-02-26T10:16:57Z)
Adaptive Planning with Generative Models under Uncertainty [20.922248169620783]
Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges. Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories.
arXiv Detail & Related papers (2024-08-02T18:07:53Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Meta-operators for Enabling Parallel Planning Using Deep Reinforcement Learning [0.8287206589886881]
We introduce the concept of meta-operator as the result of simultaneously applying multiple planning operators. We show that including meta-operators in the RL action space enables new planning perspectives to be addressed using RL, such as parallel planning.
arXiv Detail & Related papers (2024-03-13T19:00:36Z)
AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z)
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning [104.3643447579578]
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state. The design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
arXiv Detail & Related papers (2021-06-03T19:35:19Z)
Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)
Integrating Acting, Planning and Learning in Hierarchical Operational Models [7.009282389520865]
We present new planning and learning algorithms for RAE, the Refinement Acting Engine. Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand. Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains.
arXiv Detail & Related papers (2020-03-09T06:05:25Z)
STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.