Efficient Deep Reinforcement Learning Requires Regulating Overfitting
- URL: http://arxiv.org/abs/2304.10466v1
- Date: Thu, 20 Apr 2023 17:11:05 GMT
- Title: Efficient Deep Reinforcement Learning Requires Regulating Overfitting
- Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine
- Abstract summary: We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
- Score: 91.88004732618381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning algorithms that learn policies by trial-and-error
must learn from limited amounts of data collected by actively interacting with
the environment. While many prior works have shown that proper regularization
techniques are crucial for enabling data-efficient RL, a general understanding
of the bottlenecks in data-efficient RL has remained unclear. Consequently, it
has been difficult to devise a universal technique that works well across all
domains. In this paper, we attempt to understand the primary bottleneck in
sample-efficient deep RL by examining several potential hypotheses such as
non-stationarity, excessive action distribution shift, and overfitting. We
perform thorough empirical analysis on state-based DeepMind control suite (DMC)
tasks in a controlled and systematic way to show that high temporal-difference
(TD) error on the validation set of transitions is the main culprit that
severely affects the performance of deep RL algorithms, and prior methods that
lead to good performance do in fact, control the validation TD error to be low.
This observation gives us a robust principle for making deep RL efficient: we
can hill-climb on the validation TD error by utilizing any form of
regularization techniques from supervised learning. We show that a simple
online model selection method that targets the validation TD error is effective
across state-based DMC and Gym tasks.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning [25.071018803326254]
Distribution shift is a major obstacle in offline reinforcement learning.
Previous conservative offline RL algorithms struggle to generalize to unseen actions.
We propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions.
arXiv Detail & Related papers (2024-06-11T17:59:29Z) - Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - A Transferable and Automatic Tuning of Deep Reinforcement Learning for
Cost Effective Phishing Detection [21.481974148873807]
Many challenging real-world problems require the deployment of ensembles multiple complementary learning models.
Deep Reinforcement Learning (DRL) offers a cost-effective alternative, where detectors are dynamically chosen based on the output of their predecessors.
arXiv Detail & Related papers (2022-09-19T14:09:07Z) - OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field
Disparity Estimation [22.389903710616508]
unsupervised methods can achieve comparable accuracy, but much higher generalization capacity and efficiency than supervised methods.
We present OPAL, which successfully extracts and encodes the general occlusion patterns inherent in the light field for loss calculation.
arXiv Detail & Related papers (2022-03-04T10:32:18Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - Data-efficient Weakly-supervised Learning for On-line Object Detection
under Domain Shift in Robotics [24.878465999976594]
Several object detection methods have been proposed in the literature, the vast majority based on Deep Convolutional Neural Networks (DCNNs)
These methods have important limitations for robotics: Learning solely on off-line data may introduce biases, and prevents adaptation to novel tasks.
In this work, we investigate how weakly-supervised learning can cope with these problems.
arXiv Detail & Related papers (2020-12-28T16:36:11Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.