Related papers: Target Network and Truncation Overcome The Deadly triad in $Q$-Learning

Target Network and Truncation Overcome The Deadly triad in $Q$-Learning

URL: http://arxiv.org/abs/2203.02628v1
Date: Sat, 5 Mar 2022 00:54:58 GMT
Title: Target Network and Truncation Overcome The Deadly triad in $Q$-Learning
Authors: Zaiwei Chen, John Paul Clarke, and Siva Theja Maguluri
Abstract summary: We propose a stable design for $Q$-learning with linear function approximation using target network and truncation. Our result implies an $mathcalO(epsilon-2)$ sample complexity up to a function approximation error. This is the first variant of $Q$-learning with linear function approximation that is provably stable without requiring strong assumptions or modifying the problem parameters.
Score: 7.532013242448151
License: http://creativecommons.org/licenses/by/4.0/
Abstract: $Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community. Even in the basic linear function approximation setting, there are well-known divergent examples. In this work, we propose a stable design for $Q$-learning with linear function approximation using target network and truncation, and establish its finite-sample guarantees. Our result implies an $\mathcal{O}(\epsilon^{-2})$ sample complexity up to a function approximation error. This is the first variant of $Q$-learning with linear function approximation that is provably stable without requiring strong assumptions or modifying the problem parameters, and achieves the optimal sample complexity.

Related papers

$γ$-weakly $θ$-up-concavity: Linearizable Non-Convex Optimization with Applications to DR-Submodular and OSS Functions [52.031993908548294]
We introduce and $$-weakly $$-up-concavity, a novel first-orderizing condition that characterizes a broad approximation of such functions.<n>Our framework recovers the optimal coefficient for DR-submodular class and significantly improves existing approximation coefficients.
arXiv Detail & Related papers (2026-02-13T22:34:44Z)
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features [33.19711311247482]
This paper establishes the first $L2$ convergence rates for linear TD($lambda$) operating under arbitrary features.<n>To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel approximation result featuring convergence rates to the solution set instead of a single point.
arXiv Detail & Related papers (2025-05-27T16:17:49Z)
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning. MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z)
On the Model-Misspecification in Reinforcement Learning [9.864462523050843]
We present a unified theoretical framework for addressing model misspecification in reinforcement learning. We show that value-based and model-based methods can achieve robustness under local misspecification error bounds. We also propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $zeta$.
arXiv Detail & Related papers (2023-06-19T04:31:59Z)
Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations. Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? [30.065091907118827]
This work considers sample complexity of finding an $epsilon$-optimal policy in a Markov decision process (MDP) We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for reinforcement learning.
arXiv Detail & Related papers (2020-10-12T13:13:01Z)
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation. We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.