Balanced Q-learning: Combining the Influence of Optimistic and
Pessimistic Targets
- URL: http://arxiv.org/abs/2111.02787v1
- Date: Wed, 3 Nov 2021 07:30:19 GMT
- Title: Balanced Q-learning: Combining the Influence of Optimistic and
Pessimistic Targets
- Authors: Thommen George Karimpanal, Hung Le, Majid Abdolshah, Santu Rana, Sunil
Gupta, Truyen Tran, Svetha Venkatesh
- Abstract summary: We show that specific types of biases may be preferable, depending on the scenario.
We design a novel reinforcement learning algorithm, Balanced Q-learning, in which the target is modified to be a convex combination of a pessimistic and an optimistic term.
- Score: 74.04426767769785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The optimistic nature of the Q-learning target leads to an overestimation
bias, which is an inherent problem associated with standard $Q-$learning. Such
a bias fails to account for the possibility of low returns, particularly in
risky scenarios. However, the existence of biases, whether overestimation or
underestimation, need not necessarily be undesirable. In this paper, we
analytically examine the utility of biased learning, and show that specific
types of biases may be preferable, depending on the scenario. Based on this
finding, we design a novel reinforcement learning algorithm, Balanced
Q-learning, in which the target is modified to be a convex combination of a
pessimistic and an optimistic term, whose associated weights are determined
online, analytically. We prove the convergence of this algorithm in a tabular
setting, and empirically demonstrate its superior learning performance in
various environments.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Regularized Q-learning through Robust Averaging [3.4354636842203026]
We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner.
One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance.
We show that 2RA Q-learning converges to the optimal policy and analyze its theoretical mean-squared error.
arXiv Detail & Related papers (2024-05-03T15:57:26Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - On the Estimation Bias in Double Q-Learning [20.856485777692594]
Double Q-learning is not fully unbiased and suffers from underestimation bias.
We show that such underestimation bias may lead to multiple non-optimal fixed points under an approximated Bellman operator.
We propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning.
arXiv Detail & Related papers (2021-09-29T13:41:24Z) - Using Pareto Simulated Annealing to Address Algorithmic Bias in Machine
Learning [2.055949720959582]
We present a multi-objective optimisation strategy that optimises for both balanced accuracy and underestimation.
We demonstrate the effectiveness of this strategy on one synthetic and two real-world datasets.
arXiv Detail & Related papers (2021-05-31T15:51:43Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry.
Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.