DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies
- URL: http://arxiv.org/abs/2104.11707v1
- Date: Fri, 23 Apr 2021 16:51:58 GMT
- Title: DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies
- Authors: Soroush Nasiriany, Vitchyr H. Pong, Ashvin Nair, Alexander Khazatsky,
Glen Berseth, Sergey Levine
- Abstract summary: We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
- Score: 116.12670064963625
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Can we use reinforcement learning to learn general-purpose policies that can
perform a wide range of different tasks, resulting in flexible and reusable
skills? Contextual policies provide this capability in principle, but the
representation of the context determines the degree of generalization and
expressivity. Categorical contexts preclude generalization to entirely new
tasks. Goal-conditioned policies may enable some generalization, but cannot
capture all tasks that might be desired. In this paper, we propose goal
distributions as a general and broadly applicable task representation suitable
for contextual policies. Goal distributions are general in the sense that they
can represent any state-based reward function when equipped with an appropriate
distribution class, while the particular choice of distribution class allows us
to trade off expressivity and learnability. We develop an off-policy algorithm
called distribution-conditioned reinforcement learning (DisCo RL) to
efficiently learn these policies. We evaluate DisCo RL on a variety of robot
manipulation tasks and find that it significantly outperforms prior methods on
tasks that require generalization to new goal distributions.
Related papers
- Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Interpretable Reinforcement Learning with Multilevel Subgoal Discovery [77.34726150561087]
We propose a novel Reinforcement Learning model for discrete environments.
In the model, an agent learns information about environment in the form of probabilistic rules.
No reward function is required for learning; an agent only needs to be given a primary goal to achieve.
arXiv Detail & Related papers (2022-02-15T14:04:44Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Generalization in Mean Field Games by Learning Master Policies [34.67098179276852]
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents.
We study how to leverage generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution.
arXiv Detail & Related papers (2021-09-20T17:47:34Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Improving Generalization of Reinforcement Learning with Minimax
Distributional Soft Actor-Critic [11.601356612579641]
This paper introduces the minimax formulation and distributional framework to improve the generalization ability of RL algorithms.
We implement our method on the decision-making tasks of autonomous vehicles at intersections and test the trained policy in distinct environments.
arXiv Detail & Related papers (2020-02-13T14:09:22Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.