Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning
- URL: http://arxiv.org/abs/2106.01404v1
- Date: Wed, 2 Jun 2021 18:12:26 GMT
- Title: Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning
- Authors: Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang
Shane Gu
- Abstract summary: We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
- Score: 114.07623388322048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to reach goal states and learning diverse skills through mutual
information (MI) maximization have been proposed as principled frameworks for
self-supervised reinforcement learning, allowing agents to acquire broadly
applicable multitask policies with minimal reward engineering. Starting from a
simple observation that the standard goal-conditioned RL (GCRL) is encapsulated
by the optimization objective of variational empowerment, we discuss how GCRL
and MI-based RL can be generalized into a single family of methods, which we
name variational GCRL (VGCRL), interpreting variational MI maximization, or
variational empowerment, as representation learning methods that acquire
functionally-aware state representations for goal reaching. This novel
perspective allows us to: (1) derive simple but unexplored variants of GCRL to
study how adding small representation capacity can already expand its
capabilities; (2) investigate how discriminator function capacity and
smoothness determine the quality of discovered skills, or latent goals, through
modifying latent dimensionality and applying spectral normalization; (3) adapt
techniques such as hindsight experience replay (HER) from GCRL to MI-based RL;
and lastly, (4) propose a novel evaluation metric, named latent goal reaching
(LGR), for comparing empowerment algorithms with different choices of latent
dimensionality and discriminator parameterization. Through principled
mathematical derivations and careful experimental studies, our work lays a
novel foundation from which to evaluate, analyze, and develop representation
learning techniques in goal-based RL.
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning [48.79569442193824]
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds.
This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
arXiv Detail & Related papers (2024-02-04T09:58:42Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Generalizing Goal-Conditioned Reinforcement Learning with Variational
Causal Reasoning [24.09547181095033]
Causal Graph is a structure built upon the relation between objects and events.
We propose a framework with theoretical performance guarantees that alternates between two steps.
Our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training.
arXiv Detail & Related papers (2022-07-19T05:31:16Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Weakly Supervised Disentangled Representation for Goal-conditioned
Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization.
In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation.
We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.