Generalized Hindsight for Reinforcement Learning
- URL: http://arxiv.org/abs/2002.11708v1
- Date: Wed, 26 Feb 2020 18:57:05 GMT
- Title: Generalized Hindsight for Reinforcement Learning
- Authors: Alexander C. Li, Lerrel Pinto, Pieter Abbeel
- Abstract summary: We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
- Score: 154.0545226284078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key reasons for the high sample complexity in reinforcement
learning (RL) is the inability to transfer knowledge from one task to another.
In standard multi-task RL settings, low-reward data collected while trying to
solve one task provides little to no signal for solving that particular task
and is hence effectively wasted. However, we argue that this data, which is
uninformative for one task, is likely a rich source of information for other
tasks. To leverage this insight and efficiently reuse data, we present
Generalized Hindsight: an approximate inverse reinforcement learning technique
for relabeling behaviors with the right tasks. Intuitively, given a behavior
generated under one task, Generalized Hindsight returns a different task that
the behavior is better suited for. Then, the behavior is relabeled with this
new task before being used by an off-policy RL optimizer. Compared to standard
relabeling techniques, Generalized Hindsight provides a substantially more
efficient reuse of samples, which we empirically demonstrate on a suite of
multi-task navigation and manipulation tasks. Videos and code can be accessed
here: https://sites.google.com/view/generalized-hindsight.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code
Models [33.78307982736911]
Cross-task generalization is of strong research and application value.
We propose a large-scale benchmark that includes 216 existing code-related tasks.
arXiv Detail & Related papers (2023-02-08T13:04:52Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.