When is Realizability Sufficient for Off-Policy Reinforcement Learning?
- URL: http://arxiv.org/abs/2211.05311v2
- Date: Tue, 6 Jun 2023 01:40:23 GMT
- Title: When is Realizability Sufficient for Off-Policy Reinforcement Learning?
- Authors: Andrea Zanette
- Abstract summary: We analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class.
We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error.
- Score: 17.317841035807696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free algorithms for reinforcement learning typically require a
condition called Bellman completeness in order to successfully operate
off-policy with function approximation, unless additional conditions are met.
However, Bellman completeness is a requirement that is much stronger than
realizability and that is deemed to be too strong to hold in practice. In this
work, we relax this structural assumption and analyze the statistical
complexity of off-policy reinforcement learning when only realizability holds
for the prescribed function class.
We establish finite-sample guarantees for off-policy reinforcement learning
that are free of the approximation error term known as inherent Bellman error,
and that depend on the interplay of three factors. The first two are well
known: they are the metric entropy of the function class and the
concentrability coefficient that represents the cost of learning off-policy.
The third factor is new, and it measures the violation of Bellman completeness,
namely the mis-alignment between the chosen function class and its image
through the Bellman operator.
In essence, these error bounds establish that off-policy reinforcement
learning remains statistically viable even in absence of Bellman completeness,
and characterize the intermediate situation between the favorable Bellman
complete setting and the worst-case scenario where exponential lower bounds are
in force. Our analysis directly applies to the solution found by temporal
difference algorithms when they converge.
Related papers
- Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions [29.69428894587431]
It is assumed that Bellman holds, which ensures that these regression problems are well-specified.
We give the first particular algorithm for RL under linear Bellman when the number actions is any constant.
arXiv Detail & Related papers (2024-06-17T15:24:49Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Robust Losses for Learning Value Functions [26.515147684526124]
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error.
We build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem.
We derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
arXiv Detail & Related papers (2022-05-17T16:10:05Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
for Value Error [83.10489974736404]
We study the use of the Bellman equation as a surrogate objective for value prediction accuracy.
We find that the Bellman error is a poor proxy for the accuracy of the value function.
arXiv Detail & Related papers (2022-01-28T21:03:59Z) - Reinforcement Learning in Linear MDPs: Constant Regret and
Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Bellman-consistent Pessimism for Offline Reinforcement Learning [46.97637726255375]
We introduce the notion of Bellman-consistent pessimism for general function approximation.
Our theoretical guarantees only require Bellman closedness as standard in the exploratory setting.
arXiv Detail & Related papers (2021-06-13T05:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.