Related papers: Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Automating Control of Overestimation Bias for Continuous Reinforcement Learning

URL: http://arxiv.org/abs/2110.13523v1
Date: Tue, 26 Oct 2021 09:27:12 GMT
Title: Automating Control of Overestimation Bias for Continuous Reinforcement Learning
Authors: Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha, Dmitry Vetrov
Abstract summary: We present a data-driven approach for guiding bias correction. We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm.
Score: 65.63607016094305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bias correction techniques are used by most of the high-performing methods for off-policy reinforcement learning. However, these techniques rely on a pre-defined bias correction policy that is either not flexible enough or requires environment-specific tuning of hyperparameters. In this work, we present a simple data-driven approach for guiding bias correction. We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm. The proposed technique can adjust the bias correction across environments automatically. As a result, it eliminates the need for an extensive hyperparameter search, significantly reducing the actual number of interactions and computation.

Related papers

Re-evaluating Group Robustness via Adaptive Class-Specific Scaling [47.41034887474166]
Group distributionally robust optimization is a prominent algorithm used to mitigate spurious correlations and address dataset bias. Existing approaches have reported improvements in robust accuracies but come at the cost of average accuracy due to inherent trade-offs. We propose a class-specific scaling strategy, directly applicable to existing debiasing algorithms with no additional training. We develop an instance-wise adaptive scaling technique to alleviate this trade-off, even leading to improvements in both robust and average accuracies.
arXiv Detail & Related papers (2024-12-19T16:01:51Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance. We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective. We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z)
Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z)
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning [36.643572071860554]
We propose a general method called Adaptively Calibrated Critics (ACC) ACC uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We show that ACC is quite general by further applying it to TD3 and showing an improved performance also in this setting.
arXiv Detail & Related papers (2021-11-24T18:07:33Z)
Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control [0.0]
We introduce a parameter-free, novel deep Q-learning variant to reduce this underestimation bias for continuous control. We test the performance of our improvement on a set of MuJoCo and Box2D continuous control tasks.
arXiv Detail & Related papers (2021-09-24T07:41:07Z)
Semantic Perturbations with Normalizing Flows for Improved Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
Efficient Hyperparameter Tuning with Dynamic Accuracy Derivative-Free Optimization [0.27074235008521236]
We apply a recent dynamic accuracy derivative-free optimization method to hyperparameter tuning. This method allows inexact evaluations of the learning problem while retaining convergence guarantees. We demonstrate its robustness and efficiency compared to a fixed accuracy approach.
arXiv Detail & Related papers (2020-11-06T00:59:51Z)
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning [41.24484153212002]
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system. It does not require the learned inverse model to be invertible at all instances of time. A simulated example of a double pendulum demonstrates the utility of the proposed theory.
arXiv Detail & Related papers (2020-04-06T15:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.