The Difficulty of Passive Learning in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2110.14020v1
- Date: Tue, 26 Oct 2021 20:50:49 GMT
- Title: The Difficulty of Passive Learning in Deep Reinforcement Learning
- Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney
- Abstract summary: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL)
Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset.
We propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning.
- Score: 26.124032923011328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to act from observational data without active environmental
interaction is a well-known challenge in Reinforcement Learning (RL). Recent
approaches involve constraints on the learned policy or conservative updates,
preventing strong deviations from the state-action distribution of the dataset.
Although these methods are evaluated using non-linear function approximation,
theoretical justifications are mostly limited to the tabular or linear cases.
Given the impressive results of deep reinforcement learning, we argue for a
need to more clearly understand the challenges in this setting.
In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem
learning" experimental paradigm which facilitates our empirical analysis of the
difficulties in offline reinforcement learning. We identify function
approximation in conjunction with fixed data distributions as the strongest
factors, thereby extending but also challenging hypotheses stated in past work.
Our results provide relevant insights for offline deep reinforcement learning,
while also shedding new light on phenomena observed in the online case of
learning control.
Related papers
- Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement [19.883973457999282]
Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale.<n>This paper investigates idealized scenarios with mostly bimodal-quality data distributions and introduces a method to learn from such data.<n>Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data.
arXiv Detail & Related papers (2025-07-09T09:55:23Z) - Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.
We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z) - Accurate Forgetting for Heterogeneous Federated Continual Learning [89.08735771893608]
We propose a new concept accurate forgetting (AF) and develop a novel generative-replay methodMethodwhich selectively utilizes previous knowledge in federated networks.
We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge.
arXiv Detail & Related papers (2025-02-20T02:35:17Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage? [35.15844215216846]
EDL methods are trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function.
Recent studies identify limitations of the existing methods to conclude their learned uncertainties are unreliable.
We provide a sharper understanding of the behavior of a wide class of EDL methods by unifying various objective functions.
We conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities.
arXiv Detail & Related papers (2024-02-09T03:23:39Z) - Loss Dynamics of Temporal Difference Reinforcement Learning [36.772501199987076]
We study the case learning curves for temporal difference learning of a value function with linear function approximators.
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
arXiv Detail & Related papers (2023-07-10T18:17:50Z) - Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty.
One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world.
Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Causal Reinforcement Learning using Observational and Interventional
Data [14.856472820492364]
Learning efficiently a causal model of the environment is a key challenge of model RL agents operating in POMDPs.
We consider a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment.
We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model.
arXiv Detail & Related papers (2021-06-28T06:58:20Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.