AI Planning Annotation for Sample Efficient Reinforcement Learning
- URL: http://arxiv.org/abs/2203.00669v1
- Date: Tue, 1 Mar 2022 18:38:41 GMT
- Title: AI Planning Annotation for Sample Efficient Reinforcement Learning
- Authors: Junkyu Lee, Michael Katz, Don Joven Agravante, Miao Liu, Tim Klinger,
Murray Campbell, Shirin Sohrabi, Gerald Tesauro
- Abstract summary: We show that a suitably defined planning model can be used to improve the efficiency of Reinforcement Learning (RL)
Our experiments demonstrate an improved sample efficiency on a variety of RL environments over the previous state-of-the-art.
- Score: 39.4624736757278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI planning and Reinforcement Learning (RL) both solve sequential
decision-making problems under the different formulations. AI Planning requires
operator models, but then allows efficient plan generation. RL requires no
operator model, instead learns a policy to guide an agent to high reward
states. Planning can be brittle in the face of noise whereas RL is more
tolerant. However, RL requires a large number of training examples to learn the
policy. In this work, we aim to bring AI planning and RL closer by showing that
a suitably defined planning model can be used to improve the efficiency of RL.
Specifically, we show that the options in the hierarchical RL can be derived
from a planning task and integrate planning and RL algorithms for training
option policy functions. Our experiments demonstrate an improved sample
efficiency on a variety of RL environments over the previous state-of-the-art.
Related papers
- ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Action and Trajectory Planning for Urban Autonomous Driving with
Hierarchical Reinforcement Learning [1.3397650653650457]
We propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method.
We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios.
arXiv Detail & Related papers (2023-06-28T07:11:02Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - Learning to Execute: Efficient Learning of Universal Plan-Conditioned
Policies in Robotics [20.148408520475655]
We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans.
In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
arXiv Detail & Related papers (2021-11-15T16:58:50Z) - Hierarchies of Planning and Reinforcement Learning for Robot Navigation [22.08479169489373]
In many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available.
Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation.
This work proposes a novel hierarchical framework that utilizes a trainable planning policy for the HL representation.
arXiv Detail & Related papers (2021-09-23T07:18:15Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - Model-Based Offline Planning with Trajectory Pruning [15.841609263723575]
offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction.
We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning.
Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
arXiv Detail & Related papers (2021-05-16T05:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.