Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
Learning?
- URL: http://arxiv.org/abs/2312.17168v1
- Date: Thu, 28 Dec 2023 17:54:56 GMT
- Title: Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
Learning?
- Authors: Gunshi Gupta, Tim G. J. Rudner, Rowan Thomas McAllister, Adrien
Gaidon, Yarin Gal
- Abstract summary: Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data.
This phenomenon is particularly pronounced in domains such as robotics.
In this paper, we study causal confusion in offline reinforcement learning.
- Score: 58.942118128503104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal confusion is a phenomenon where an agent learns a policy that reflects
imperfect spurious correlations in the data. Such a policy may falsely appear
to be optimal during training if most of the training data contain such
spurious correlations. This phenomenon is particularly pronounced in domains
such as robotics, with potentially large gaps between the open- and closed-loop
performance of an agent. In such settings, causally confused models may appear
to perform well according to open-loop metrics during training but fail
catastrophically when deployed in the real world. In this paper, we study
causal confusion in offline reinforcement learning. We investigate whether
selectively sampling appropriate points from a dataset of demonstrations may
enable offline reinforcement learning agents to disambiguate the underlying
causal mechanisms of the environment, alleviate causal confusion in offline
reinforcement learning, and produce a safer model for deployment. To answer
this question, we consider a set of tailored offline reinforcement learning
datasets that exhibit causal ambiguity and assess the ability of active
sampling techniques to reduce causal confusion at evaluation. We provide
empirical evidence that uniform and active sampling techniques are able to
consistently reduce causal confusion as training progresses and that active
sampling is able to do so significantly more efficiently than uniform sampling.
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Causal Deep Reinforcement Learning Using Observational Data [11.790171301328158]
We propose two deconfounding methods in deep reinforcement learning (DRL)
The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function.
We prove the effectiveness of our deconfounding methods and validate them experimentally.
arXiv Detail & Related papers (2022-11-28T14:34:39Z) - An Empirical Study of Implicit Regularization in Deep Offline RL [44.62587507925864]
We study the relation between effective rank and performance on three offline RL datasets.
We identify three phases of learning that explain the impact of implicit regularization on the learning dynamics.
arXiv Detail & Related papers (2022-07-05T15:07:31Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Causal Reinforcement Learning using Observational and Interventional
Data [14.856472820492364]
Learning efficiently a causal model of the environment is a key challenge of model RL agents operating in POMDPs.
We consider a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment.
We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model.
arXiv Detail & Related papers (2021-06-28T06:58:20Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.