Deliberative Acting, Online Planning and Learning with Hierarchical
Operational Models
- URL: http://arxiv.org/abs/2010.01909v3
- Date: Mon, 15 Nov 2021 21:12:54 GMT
- Title: Deliberative Acting, Online Planning and Learning with Hierarchical
Operational Models
- Authors: Sunandita Patra, James Mason, Malik Ghallab, Dana Nau, Paolo Traverso
- Abstract summary: In AI research, a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action.
executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used.
We implement an integrated acting and planning system in which both planning and acting use the same operational models.
- Score: 5.597986898418404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In AI research, synthesizing a plan of action has typically used descriptive
models of the actions that abstractly specify what might happen as a result of
an action, and are tailored for efficiently computing state transitions.
However, executing the planned actions has needed operational models, in which
rich computational control structures and closed-loop online decision-making
are used to specify how to perform an action in a nondeterministic execution
context, react to events and adapt to an unfolding situation. Deliberative
actors, which integrate acting and planning, have typically needed to use both
of these models together -- which causes problems when attempting to develop
the different models, verify their consistency, and smoothly interleave acting
and planning.
As an alternative, we define and implement an integrated acting and planning
system in which both planning and acting use the same operational models. These
rely on hierarchical task-oriented refinement methods offering rich control
structures. The acting component, called Reactive Acting Engine (RAE), is
inspired by the well-known PRS system. At each decision step, RAE can get
advice from a planner for a near-optimal choice with respect to a utility
function. The anytime planner uses a UCT-like Monte Carlo Tree Search
procedure, called UPOM, whose rollouts are simulations of the actor's
operational models. We also present learning strategies for use with RAE and
UPOM that acquire, from online acting experiences and/or simulated planning
results, a mapping from decision contexts to method instances as well as a
heuristic function to guide UPOM. We demonstrate the asymptotic convergence of
UPOM towards optimal methods in static domains, and show experimentally that
UPOM and the learning strategies significantly improve the acting efficiency
and robustness.
Related papers
- Adaptive Planning with Generative Models under Uncertainty [20.922248169620783]
Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains.
While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges.
Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories.
arXiv Detail & Related papers (2024-08-02T18:07:53Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Meta-operators for Enabling Parallel Planning Using Deep Reinforcement Learning [0.8287206589886881]
We introduce the concept of meta-operator as the result of simultaneously applying multiple planning operators.
We show that including meta-operators in the RL action space enables new planning perspectives to be addressed using RL, such as parallel planning.
arXiv Detail & Related papers (2024-03-13T19:00:36Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z) - A Consciousness-Inspired Planning Agent for Model-Based Reinforcement
Learning [104.3643447579578]
We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state.
The design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
arXiv Detail & Related papers (2021-06-03T19:35:19Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z) - Integrating Acting, Planning and Learning in Hierarchical Operational
Models [7.009282389520865]
We present new planning and learning algorithms for RAE, the Refinement Acting Engine.
Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand.
Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains.
arXiv Detail & Related papers (2020-03-09T06:05:25Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.