Related papers: Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning

Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning

URL: http://arxiv.org/abs/2102.13515v1
Date: Wed, 24 Feb 2021 16:51:02 GMT
Title: Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning
Authors: V\'ictor Campos, Pablo Sprechmann, Steven Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adri\`a Puigdom\`enech Badia, Charles Blundell
Abstract summary: We argue that representation alone is not enough for efficient transfer in challenging domains and explore how to transfer knowledge through behavior. The behavior of pre-trained policies may be used for solving the task at hand (exploitation) or for collecting useful data to solve the problem (exploration)
Score: 16.12658895065585
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Designing agents that acquire knowledge autonomously and use it to solve new tasks efficiently is an important challenge in reinforcement learning, and unsupervised learning provides a useful paradigm for autonomous acquisition of task-agnostic knowledge. In supervised settings, representations discovered through unsupervised pre-training offer important benefits when transferred to downstream tasks. Given the nature of the reinforcement learning problem, we argue that representation alone is not enough for efficient transfer in challenging domains and explore how to transfer knowledge through behavior. The behavior of pre-trained policies may be used for solving the task at hand (exploitation), as well as for collecting useful data to solve the problem (exploration). We argue that policies pre-trained to maximize coverage will produce behavior that is useful for both strategies. When using these policies for both exploitation and exploration, our agents discover better solutions. The largest gains are generally observed in domains requiring structured exploration, including settings where the behavior of the pre-trained policies is misaligned with the downstream task.

Related papers

Reinforcement Learning via Implicit Imitation Guidance [49.88208134736617]
A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy.<n>We propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints.<n>Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline methods across seven simulated continuous control tasks.
arXiv Detail & Related papers (2025-06-09T07:32:52Z)
Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
Introspective Action Advising for Interpretable Transfer Learning [7.673465837624365]
Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task. We propose an alternative approach to transfer learning between tasks based on action advising, in which a teacher trained in a source task actively guides a student's exploration in a target task.
arXiv Detail & Related papers (2023-06-21T14:53:33Z)
Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery [22.32327908453603]
Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks. We propose a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward. We show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks.
arXiv Detail & Related papers (2022-04-29T06:57:46Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z)
Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Adaptive Policy Transfer in Reinforcement Learning [9.594432031144715]
We introduce a principled mechanism that can "Adapt-to-Learn", that is adapt the source policy to learn to solve a target task. We show that the presented method learns to seamlessly combine learning from adaptation and exploration and leads to a robust policy transfer algorithm.
arXiv Detail & Related papers (2021-05-10T22:42:03Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient. We propose a model-based reinforcement learning framework that learns an active feature acquisition policy. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.