Parameterized Projected Bellman Operator
- URL: http://arxiv.org/abs/2312.12869v3
- Date: Wed, 6 Mar 2024 15:14:11 GMT
- Title: Parameterized Projected Bellman Operator
- Authors: Th\'eo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters,
Marcello Restelli and Carlo D'Eramo
- Abstract summary: Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
- Score: 64.129598593852
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Approximate value iteration (AVI) is a family of algorithms for reinforcement
learning (RL) that aims to obtain an approximation of the optimal value
function. Generally, AVI algorithms implement an iterated procedure where each
step consists of (i) an application of the Bellman operator and (ii) a
projection step into a considered function space. Notoriously, the Bellman
operator leverages transition samples, which strongly determine its behavior,
as uninformative samples can result in negligible updates or long detours,
whose detrimental effects are further exacerbated by the computationally
intensive projection step. To address these issues, we propose a novel
alternative approach based on learning an approximate version of the Bellman
operator rather than estimating it through samples as in AVI approaches. This
way, we are able to (i) generalize across transition samples and (ii) avoid the
computationally intensive projection step. For this reason, we call our novel
operator projected Bellman operator (PBO). We formulate an optimization problem
to learn PBO for generic sequential decision-making problems, and we
theoretically analyze its properties in two representative classes of RL
problems. Furthermore, we theoretically study our approach under the lens of
AVI and devise algorithmic implementations to learn PBO in offline and online
settings by leveraging neural network parameterizations. Finally, we
empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on
several RL problems.
Related papers
- Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Free from Bellman Completeness: Trajectory Stitching via Model-based
Return-conditioned Supervised Learning [22.287106840756483]
We show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent challenges of Bellman completeness.
We propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories.
arXiv Detail & Related papers (2023-10-30T07:03:14Z) - Multi-Bellman operator for convergence of $Q$-learning with linear
function approximation [3.6218162133579694]
We study the convergence of $Q$-learning with linear function approximation.
By exploring the properties of a novel multi-Bellman operator, we identify conditions under which the projected multi-Bellman operator becomes contractive.
We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy.
arXiv Detail & Related papers (2023-09-28T19:56:31Z) - Parameter and Computation Efficient Transfer Learning for
Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models.
We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z) - Learning Bellman Complete Representations for Offline Policy Evaluation [51.96704525783913]
Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage.
We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL.
arXiv Detail & Related papers (2022-07-12T21:02:02Z) - ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems.
Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations.
We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical
Comparison [17.692408242465763]
We prove performance guarantees of two algorithms for approxing $Qstar$ in batch reinforcement learning.
One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous "double sampling" difficulty in Bellman error estimation.
arXiv Detail & Related papers (2020-03-09T05:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.