Causal Deep Reinforcement Learning Using Observational Data
- URL: http://arxiv.org/abs/2211.15355v2
- Date: Fri, 9 Jun 2023 17:03:15 GMT
- Title: Causal Deep Reinforcement Learning Using Observational Data
- Authors: Wenxuan Zhu, Chao Yu, Qiang Zhang
- Abstract summary: We propose two deconfounding methods in deep reinforcement learning (DRL)
The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function.
We prove the effectiveness of our deconfounding methods and validate them experimentally.
- Score: 11.790171301328158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) requires the collection of interventional
data, which is sometimes expensive and even unethical in the real world, such
as in the autonomous driving and the medical field. Offline reinforcement
learning promises to alleviate this issue by exploiting the vast amount of
observational data available in the real world. However, observational data may
mislead the learning agent to undesirable outcomes if the behavior policy that
generates the data depends on unobserved random variables (i.e., confounders).
In this paper, we propose two deconfounding methods in DRL to address this
problem. The methods first calculate the importance degree of different samples
based on the causal inference technique, and then adjust the impact of
different samples on the loss function by reweighting or resampling the offline
dataset to ensure its unbiasedness. These deconfounding methods can be flexibly
combined with existing model-free DRL algorithms such as soft actor-critic and
deep Q-learning, provided that a weak condition can be satisfied by the loss
functions of these algorithms. We prove the effectiveness of our deconfounding
methods and validate them experimentally.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations [10.931620604044486]
This study provides an in-depth analysis of the impact of diverse data resampling methods on existingAD approaches.
We assess the performance of theseAD approaches across four datasets with different levels of class imbalance.
We evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data.
arXiv Detail & Related papers (2024-05-06T14:01:05Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - A Data-Centric Approach for Improving Adversarial Training Through the
Lens of Out-of-Distribution Detection [0.4893345190925178]
We propose detecting and removing hard samples directly from the training procedure rather than applying complicated algorithms to mitigate their effects.
Our results on SVHN and CIFAR-10 datasets show the effectiveness of this method in improving the adversarial training without adding too much computational cost.
arXiv Detail & Related papers (2023-01-25T08:13:50Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Sample-Efficient Reinforcement Learning via Counterfactual-Based Data
Augmentation [15.451690870640295]
In some scenarios such as healthcare, usually only few records are available for each patient, impeding the application of currentReinforcement learning algorithms.
We propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics.
We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function.
arXiv Detail & Related papers (2020-12-16T17:21:13Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.