Related papers: AI Planning Annotation for Sample Efficient Reinforcement Learning

AI Planning Annotation for Sample Efficient Reinforcement Learning

URL: http://arxiv.org/abs/2203.00669v1
Date: Tue, 1 Mar 2022 18:38:41 GMT
Title: AI Planning Annotation for Sample Efficient Reinforcement Learning
Authors: Junkyu Lee, Michael Katz, Don Joven Agravante, Miao Liu, Tim Klinger, Murray Campbell, Shirin Sohrabi, Gerald Tesauro
Abstract summary: We show that a suitably defined planning model can be used to improve the efficiency of Reinforcement Learning (RL) Our experiments demonstrate an improved sample efficiency on a variety of RL environments over the previous state-of-the-art.
Score: 39.4624736757278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI planning and Reinforcement Learning (RL) both solve sequential decision-making problems under the different formulations. AI Planning requires operator models, but then allows efficient plan generation. RL requires no operator model, instead learns a policy to guide an agent to high reward states. Planning can be brittle in the face of noise whereas RL is more tolerant. However, RL requires a large number of training examples to learn the policy. In this work, we aim to bring AI planning and RL closer by showing that a suitably defined planning model can be used to improve the efficiency of RL. Specifically, we show that the options in the hierarchical RL can be derived from a planning task and integrate planning and RL algorithms for training option policy functions. Our experiments demonstrate an improved sample efficiency on a variety of RL environments over the previous state-of-the-art.

Related papers

Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model [57.20064815347607]
offline reinforcement learning (RL) recently gains growing interests from RL researchers.<n>The performance of offline RL suffers from the out-of-distribution problem, which can be corrected by feedback in online RL.<n>In this paper, we first build a bridge over the batch data and the performance of offline RL algorithms theoretically.<n>We show that in task-agnostic settings, a series of policies trained by unsupervised RL can minimize the worst-case regret in the performance gap.
arXiv Detail & Related papers (2025-06-24T14:08:36Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL) We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z)
Reinforced Imitative Trajectory Planning for Urban Automated Driving [3.2436298824947434]
This paper proposes a novel RL-based trajectory planning method that integrates RL with imitation learning to enable multi-step planning. A transformer-based Bayesian reward function is developed, providing effective reward signals for RL in urban scenarios. The proposed methods were validated on the large-scale real-world urban automated driving nuPlan dataset.
arXiv Detail & Related papers (2024-10-21T03:04:29Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks. Recent practices tend to distill optimized action sequences into an RL policy during the training phase. We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z)
Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning [1.3397650653650457]
We propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method. We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios.
arXiv Detail & Related papers (2023-06-28T07:11:02Z)
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z)
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences. Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z)
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL) We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z)
Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics [20.148408520475655]
We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
arXiv Detail & Related papers (2021-11-15T16:58:50Z)
Hierarchies of Planning and Reinforcement Learning for Robot Navigation [22.08479169489373]
In many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation. This work proposes a novel hierarchical framework that utilizes a trainable planning policy for the HL representation.
arXiv Detail & Related papers (2021-09-23T07:18:15Z)
POAR: Efficient Policy Optimization via Online Abstract State Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states. We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations. We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.