Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency
- URL: http://arxiv.org/abs/2310.01827v2
- Date: Sun, 19 Nov 2023 15:55:56 GMT
- Title: Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency
- Authors: Francisco Roldan Sanchez, Qiang Wang, David Cordova Bulens, Kevin
McGuinness, Stephen Redmond, Noel O'Connor
- Abstract summary: We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
- Score: 7.806014635635933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hindsight Experience Replay (HER) is a technique used in reinforcement
learning (RL) that has proven to be very efficient for training off-policy
RL-based agents to solve goal-based robotic manipulation tasks using sparse
rewards. Even though HER improves the sample efficiency of RL-based agents by
learning from mistakes made in past experiences, it does not provide any
guidance while exploring the environment. This leads to very large training
times due to the volume of experience required to train an agent using this
replay strategy. In this paper, we propose a method that uses primitive
behaviours that have been previously learned to solve simple tasks in order to
guide the agent toward more rewarding actions during exploration while learning
other more complex tasks. This guidance, however, is not executed by a manually
designed curriculum, but rather using a critic network to decide at each
timestep whether or not to use the actions proposed by the previously-learned
primitive policies. We evaluate our method by comparing its performance against
HER and other more efficient variations of this algorithm in several block
manipulation tasks. We demonstrate the agents can learn a successful policy
faster when using our proposed method, both in terms of sample efficiency and
computation time. Code is available at https://github.com/franroldans/qmp-her.
Related papers
- Learning Diverse Policies with Soft Self-Generated Guidance [2.9602904918952695]
Reinforcement learning with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained.
This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL.
arXiv Detail & Related papers (2024-02-07T02:53:50Z) - Backward Curriculum Reinforcement Learning [0.0]
Current reinforcement learning algorithms train an agent using forward-generated trajectories.
While realizing the value of reinforcement learning results from sufficient exploration, this approach leads to a trade-off in losing sample efficiency.
We propose novel backward curriculum reinforcement learning that begins training the agent using the backward trajectory of the episode instead of the original forward trajectory.
arXiv Detail & Related papers (2022-12-29T08:23:39Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.