Control-Aware Representations for Model-based Reinforcement Learning
- URL: http://arxiv.org/abs/2006.13408v1
- Date: Wed, 24 Jun 2020 01:00:32 GMT
- Title: Control-Aware Representations for Model-based Reinforcement Learning
- Authors: Brandon Cui and Yinlam Chow and Mohammad Ghavamzadeh
- Abstract summary: A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations.
Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space.
Two important questions in this area are how to learn a representation that is amenable to the control problem at hand, and how to achieve an end-to-end framework for representation learning and control.
- Score: 36.221391601609255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge in modern reinforcement learning (RL) is efficient control
of dynamical systems from high-dimensional sensory observations. Learning
controllable embedding (LCE) is a promising approach that addresses this
challenge by embedding the observations into a lower-dimensional latent space,
estimating the latent dynamics, and utilizing it to perform control in the
latent space. Two important questions in this area are how to learn a
representation that is amenable to the control problem at hand, and how to
achieve an end-to-end framework for representation learning and control. In
this paper, we take a few steps towards addressing these questions. We first
formulate a LCE model to learn representations that are suitable to be used by
a policy iteration style algorithm in the latent space. We call this model
control-aware representation learning (CARL). We derive a loss function for
CARL that has close connection to the prediction, consistency, and curvature
(PCC) principle for representation learning. We derive three implementations of
CARL. In the offline implementation, we replace the locally-linear control
algorithm (e.g.,~iLQR) used by the existing LCE methods with a RL algorithm,
namely model-based soft actor-critic, and show that it results in significant
improvement. In online CARL, we interleave representation learning and control,
and demonstrate further gain in performance. Finally, we propose value-guided
CARL, a variation in which we optimize a weighted version of the CARL loss
function, where the weights depend on the TD-error of the current policy. We
evaluate the proposed algorithms by extensive experiments on benchmark tasks
and compare them with several LCE baselines.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning [24.684363928059113]
We propose an efficient representation learning method using only a self-supervised latent-state consistency loss.
We achieve high performance and prevent representation collapse by quantizing the latent representation.
Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm.
arXiv Detail & Related papers (2024-06-04T18:15:44Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - On Task-Relevant Loss Functions in Meta-Reinforcement Learning and
Online LQR [9.355903533901023]
We propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner.
As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment.
arXiv Detail & Related papers (2023-12-09T04:52:28Z) - Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems [1.8799681615947088]
We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms.
We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
arXiv Detail & Related papers (2023-05-20T10:11:09Z) - Large Language Models can Implement Policy Iteration [18.424558160071808]
In-Context Policy Iteration is an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models.
ICPI learns to perform RL tasks without expert demonstrations or gradients.
ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment.
arXiv Detail & Related papers (2022-10-07T21:18:22Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.