Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
for Value Error
- URL: http://arxiv.org/abs/2201.12417v1
- Date: Fri, 28 Jan 2022 21:03:59 GMT
- Title: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
for Value Error
- Authors: Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane
Gu
- Abstract summary: We study the use of the Bellman equation as a surrogate objective for value prediction accuracy.
We find that the Bellman error is a poor proxy for the accuracy of the value function.
- Score: 83.10489974736404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study the use of the Bellman equation as a surrogate
objective for value prediction accuracy. While the Bellman equation is uniquely
solved by the true value function over all state-action pairs, we find that the
Bellman error (the difference between both sides of the equation) is a poor
proxy for the accuracy of the value function. In particular, we show that (1)
due to cancellations from both sides of the Bellman equation, the magnitude of
the Bellman error is only weakly related to the distance to the true value
function, even when considering all state-action pairs, and (2) in the finite
data regime, the Bellman equation can be satisfied exactly by infinitely many
suboptimal solutions. This means that the Bellman error can be minimized
without improving the accuracy of the value function. We demonstrate these
phenomena through a series of propositions, illustrative toy examples, and
empirical analysis in standard benchmark domains.
Related papers
- The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation [29.69428894587431]
In this paper, we study the offline RL problem with linear function approximation.
Our main structural assumption is that the MDP has low inherent Bellman error.
We show that the scaling of the suboptimality with $sqrtvarepsilon_mathrmBE$ cannot be improved for any algorithm.
arXiv Detail & Related papers (2024-06-17T16:04:06Z) - On the Uniqueness of Solution for the Bellman Equation of LTL Objectives [12.918524838804016]
We show that the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed.
We then propose a condition for the Bellman equation to have the expected return as the unique solution.
arXiv Detail & Related papers (2024-04-07T21:06:52Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - When is Realizability Sufficient for Off-Policy Reinforcement Learning? [17.317841035807696]
We analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class.
We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error.
arXiv Detail & Related papers (2022-11-10T03:15:31Z) - Robust Losses for Learning Value Functions [26.515147684526124]
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error.
We build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem.
We derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
arXiv Detail & Related papers (2022-05-17T16:10:05Z) - Bellman-consistent Pessimism for Offline Reinforcement Learning [46.97637726255375]
We introduce the notion of Bellman-consistent pessimism for general function approximation.
Our theoretical guarantees only require Bellman closedness as standard in the exploratory setting.
arXiv Detail & Related papers (2021-06-13T05:50:36Z) - Non-Boolean Hidden Variables model reproduces Quantum Mechanics'
predictions for Bell's experiment [91.3755431537592]
Theory aimed to violate Bell's inequalities must start by giving up Boolean logic.
"Hard" problem is to predict the time values when single particles are detected.
"Soft" problem is to explain the violation of Bell's inequalities within (non-Boolean) Local Realism.
arXiv Detail & Related papers (2020-05-20T21:46:35Z) - Learning Near Optimal Policies with Low Inherent Bellman Error [115.16037976819331]
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning.
We show that exploration is possible using only emphbatch assumptions with an algorithm that achieves the optimal statistical rate for the setting we consider.
arXiv Detail & Related papers (2020-02-29T02:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.