The Difficulty of Passive Learning in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2110.14020v1
- Date: Tue, 26 Oct 2021 20:50:49 GMT
- Title: The Difficulty of Passive Learning in Deep Reinforcement Learning
- Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney
- Abstract summary: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL)
Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset.
We propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning.
- Score: 26.124032923011328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to act from observational data without active environmental
interaction is a well-known challenge in Reinforcement Learning (RL). Recent
approaches involve constraints on the learned policy or conservative updates,
preventing strong deviations from the state-action distribution of the dataset.
Although these methods are evaluated using non-linear function approximation,
theoretical justifications are mostly limited to the tabular or linear cases.
Given the impressive results of deep reinforcement learning, we argue for a
need to more clearly understand the challenges in this setting.
In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem
learning" experimental paradigm which facilitates our empirical analysis of the
difficulties in offline reinforcement learning. We identify function
approximation in conjunction with fixed data distributions as the strongest
factors, thereby extending but also challenging hypotheses stated in past work.
Our results provide relevant insights for offline deep reinforcement learning,
while also shedding new light on phenomena observed in the online case of
learning control.
Related papers
- Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage? [35.15844215216846]
EDL methods are trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function.
Recent studies identify limitations of the existing methods to conclude their learned uncertainties are unreliable.
We provide a sharper understanding of the behavior of a wide class of EDL methods by unifying various objective functions.
We conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities.
arXiv Detail & Related papers (2024-02-09T03:23:39Z) - Efficient Two-Phase Offline Deep Reinforcement Learning from Preference
Feedback [5.683832910692926]
We find a challenge in applying two-phase learning in the offline PBRL setting.
We propose a two-phasing learning approach under behavior regularization through action clipping.
Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency.
arXiv Detail & Related papers (2023-12-30T21:37:18Z) - Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty.
One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world.
Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Causal Reinforcement Learning using Observational and Interventional
Data [14.856472820492364]
Learning efficiently a causal model of the environment is a key challenge of model RL agents operating in POMDPs.
We consider a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment.
We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model.
arXiv Detail & Related papers (2021-06-28T06:58:20Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.