Robust Losses for Learning Value Functions
- URL: http://arxiv.org/abs/2205.08464v2
- Date: Mon, 17 Apr 2023 21:33:30 GMT
- Title: Robust Losses for Learning Value Functions
- Authors: Andrew Patterson, Victor Liao, Martha White
- Abstract summary: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error.
We build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem.
We derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
- Score: 26.515147684526124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most value function learning algorithms in reinforcement learning are based
on the mean squared (projected) Bellman error. However, squared errors are
known to be sensitive to outliers, both skewing the solution of the objective
and resulting in high-magnitude and high-variance gradients. To control these
high-magnitude updates, typical strategies in RL involve clipping gradients,
clipping rewards, rescaling rewards, or clipping errors. While these strategies
appear to be related to robust losses -- like the Huber loss -- they are built
on semi-gradient update rules which do not minimize a known loss. In this work,
we build on recent insights reformulating squared Bellman errors as a
saddlepoint optimization problem and propose a saddlepoint reformulation for a
Huber Bellman error and Absolute Bellman error. We start from a formalization
of robust losses, then derive sound gradient-based approaches to minimize these
losses in both the online off-policy prediction and control settings. We
characterize the solutions of the robust losses, providing insight into the
problem settings where the robust losses define notably better solutions than
the mean squared Bellman error. Finally, we show that the resulting
gradient-based algorithms are more stable, for both prediction and control,
with less sensitivity to meta-parameters.
Related papers
- LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation [29.69428894587431]
In this paper, we study the offline RL problem with linear function approximation.
Our main structural assumption is that the MDP has low inherent Bellman error.
We show that the scaling of the suboptimality with $sqrtvarepsilon_mathrmBE$ cannot be improved for any algorithm.
arXiv Detail & Related papers (2024-06-17T16:04:06Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - When is Realizability Sufficient for Off-Policy Reinforcement Learning? [17.317841035807696]
We analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class.
We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error.
arXiv Detail & Related papers (2022-11-10T03:15:31Z) - Do We Need to Penalize Variance of Losses for Learning with Label Noise? [91.38888889609002]
We find that the variance should be increased for the problem of learning with noisy labels.
By exploiting the label noise transition matrix, regularizers can be easily designed to reduce the variance of losses.
Empirically, the proposed method by increasing the variance of losses significantly improves the generalization ability of baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-30T06:19:08Z) - Analysis and Optimisation of Bellman Residual Errors with Neural
Function Approximation [0.0]
Recent development of Deep Reinforcement Learning has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces.
One specific approach is to deploy neural networks to approximate value by minimising the Mean Squared Bellman Error function.
arXiv Detail & Related papers (2021-06-16T13:35:14Z) - A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning [25.39784277231972]
We introduce a new generalized MSPBE that extends the linear MSPBE to the nonlinear setting.
We derive an easy-to-use, but sound, algorithm to minimize the generalized objective.
arXiv Detail & Related papers (2021-04-28T15:50:34Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Learning Adaptive Loss for Robust Learning with Noisy Labels [59.06189240645958]
Robust loss is an important strategy for handling robust learning issue.
We propose a meta-learning method capable of robust hyper tuning.
Four kinds of SOTA loss functions are attempted to be minimization, general availability and effectiveness.
arXiv Detail & Related papers (2020-02-16T00:53:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.