Related papers: Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

URL: http://arxiv.org/abs/2405.03379v1
Date: Mon, 6 May 2024 11:33:12 GMT
Title: Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su,
Abstract summary: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency.
Score: 17.092640837991883
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics. Our approach consists of a reverse curriculum followed by a forward curriculum. Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets. The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems. A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control.

Related papers

Reinforcement Learning via Implicit Imitation Guidance [49.88208134736617]
A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy.<n>We propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints.<n>Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline methods across seven simulated continuous control tasks.
arXiv Detail & Related papers (2025-06-09T07:32:52Z)
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning [8.860366821983211]
STRAP is a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion.
arXiv Detail & Related papers (2024-12-19T18:54:06Z)
MILES: Making Imitation Learning Easy with Self-Supervision [12.314942459360605]
MILES is a fully autonomous, self-supervised data collection paradigm. We show that MILES enables efficient policy learning from just a single demonstration and a single environment reset.
arXiv Detail & Related papers (2024-10-25T17:06:50Z)
Outcome-directed Reinforcement Learning by Uncertainty & Temporal Distance-Aware Curriculum Goal Generation [29.155620517531656]
Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed. We propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem. It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods.
arXiv Detail & Related papers (2023-01-27T14:25:04Z)
A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting. We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z)
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states. VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z)
Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning. Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents. We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.