Automating Control of Overestimation Bias for Continuous Reinforcement
Learning
- URL: http://arxiv.org/abs/2110.13523v1
- Date: Tue, 26 Oct 2021 09:27:12 GMT
- Title: Automating Control of Overestimation Bias for Continuous Reinforcement
Learning
- Authors: Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha,
Dmitry Vetrov
- Abstract summary: We present a data-driven approach for guiding bias correction.
We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm.
- Score: 65.63607016094305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bias correction techniques are used by most of the high-performing methods
for off-policy reinforcement learning. However, these techniques rely on a
pre-defined bias correction policy that is either not flexible enough or
requires environment-specific tuning of hyperparameters. In this work, we
present a simple data-driven approach for guiding bias correction. We
demonstrate its effectiveness on the Truncated Quantile Critics -- a
state-of-the-art continuous control algorithm. The proposed technique can
adjust the bias correction across environments automatically. As a result, it
eliminates the need for an extensive hyperparameter search, significantly
reducing the actual number of interactions and computation.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning [36.643572071860554]
We propose a general method called Adaptively Calibrated Critics (ACC)
ACC uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets.
We show that ACC is quite general by further applying it to TD3 and showing an improved performance also in this setting.
arXiv Detail & Related papers (2021-11-24T18:07:33Z) - Parameter-Free Deterministic Reduction of the Estimation Bias in
Continuous Control [0.0]
We introduce a parameter-free, novel deep Q-learning variant to reduce this underestimation bias for continuous control.
We test the performance of our improvement on a set of MuJoCo and Box2D continuous control tasks.
arXiv Detail & Related papers (2021-09-24T07:41:07Z) - Semantic Perturbations with Normalizing Flows for Improved
Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations.
We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z) - Efficient Hyperparameter Tuning with Dynamic Accuracy Derivative-Free
Optimization [0.27074235008521236]
We apply a recent dynamic accuracy derivative-free optimization method to hyperparameter tuning.
This method allows inexact evaluations of the learning problem while retaining convergence guarantees.
We demonstrate its robustness and efficiency compared to a fixed accuracy approach.
arXiv Detail & Related papers (2020-11-06T00:59:51Z) - Technical Report: Adaptive Control for Linearizable Systems Using
On-Policy Reinforcement Learning [41.24484153212002]
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system.
It does not require the learned inverse model to be invertible at all instances of time.
A simulated example of a double pendulum demonstrates the utility of the proposed theory.
arXiv Detail & Related papers (2020-04-06T15:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.