Related papers: Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

URL: http://arxiv.org/abs/2309.16819v1
Date: Thu, 28 Sep 2023 19:56:31 GMT
Title: Multi-Bellman operator for convergence of $Q$-learning with linear function approximation
Authors: Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo
Abstract summary: We study the convergence of $Q$-learning with linear function approximation. By exploring the properties of a novel multi-Bellman operator, we identify conditions under which the projected multi-Bellman operator becomes contractive. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy.
Score: 3.6218162133579694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the convergence of $Q$-learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes contractive, providing improved fixed-point guarantees compared to the Bellman operator. To leverage these insights, we propose the multi $Q$-learning algorithm with linear function approximation. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy. Finally, we validate our approach by applying it to well-known environments, showcasing the effectiveness and applicability of our findings.

Related papers

Spectral Bellman Method: Unifying Representation and Exploration in RL [54.71169912483302]
This work introduces Spectral Bellman Representation, a novel framework for learning representations for value-based reinforcement learning.<n>We show that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics.<n>Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact.
arXiv Detail & Related papers (2025-07-17T14:50:52Z)
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning [47.57615889991631]
For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL)<n>This study examines the effectiveness of incorporating the Bellman optimality operator into actor-critic frameworks.
arXiv Detail & Related papers (2025-06-06T10:46:20Z)
Nonparametric Sparse Online Learning of the Koopman Operator [11.710740395697128]
The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. We present an operator approximation algorithm to learn the Koopman operator iteratively with control over the complexity of the representation.
arXiv Detail & Related papers (2025-01-27T20:48:10Z)
Operator Learning of Lipschitz Operators: An Information-Theoretic Perspective [2.375038919274297]
This work addresses the complexity of neural operator approximations for the general class of Lipschitz continuous operators. Our main contribution establishes lower bounds on the metric entropy of Lipschitz operators in two approximation settings. It is shown that, regardless of the activation function used, neural operator architectures attaining an approximation accuracy $epsilon$ must have a size that is exponentially large in $epsilon-1$.
arXiv Detail & Related papers (2024-06-26T23:36:46Z)
Nonparametric Sparse Online Learning of the Koopman Operator [11.710740395697128]
The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. Data-driven techniques to learn the Koopman operator typically assume that the chosen function space is closed under system dynamics. We present an operator approximation algorithm to learn the Koopman operator iteratively with control over the complexity of the representation.
arXiv Detail & Related papers (2024-05-13T02:18:49Z)
Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning [19.4531905603925]
i-QN is a principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods.
arXiv Detail & Related papers (2024-03-04T15:07:33Z)
Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z)
Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) We propose a novel alternative approach based on learning an approximate version of the Bellman operator. We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z)
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL) Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z)
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback. We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode. We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.