Uniform-PAC Bounds for Reinforcement Learning with Linear Function
Approximation
- URL: http://arxiv.org/abs/2106.11612v1
- Date: Tue, 22 Jun 2021 08:48:56 GMT
- Title: Uniform-PAC Bounds for Reinforcement Learning with Linear Function
Approximation
- Authors: Jiafan He and Dongruo Zhou and Quanquan Gu
- Abstract summary: We study reinforcement learning with linear function approximation.
Existing algorithms only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees.
We propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability.
- Score: 92.3161051419884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study reinforcement learning (RL) with linear function approximation.
Existing algorithms for this problem only have high-probability regret and/or
Probably Approximately Correct (PAC) sample complexity guarantees, which cannot
guarantee the convergence to the optimal policy. In this paper, in order to
overcome the limitation of existing algorithms, we propose a new algorithm
called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with
high probability. The uniform-PAC guarantee is the strongest possible guarantee
for reinforcement learning in the literature, which can directly imply both PAC
and high probability regret bounds, making our algorithm superior to all
existing algorithms with linear function approximation. At the core of our
algorithm is a novel minimax value function estimator and a multi-level
partition scheme to select the training samples from historical observations.
Both of these techniques are new and of independent interest.
Related papers
- A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees [28.974797385513263]
We study a primal-dual (PD) reinforcement learning (RL) algorithm for online Markov constrained decision processes (CMDPs)
In this paper, we introduce a novel policy gradient PD algorithm with uniform probably approximate correctness (Uniform-PAC) guarantees, simultaneously ensuring convergence to optimal policies, sublinear regret, and sample complexity for any target accuracy.
Notably, this represents the first Uniform-PAC algorithm for the online CMDP problem.
arXiv Detail & Related papers (2024-01-31T12:23:24Z) - Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability [17.771354881467435]
We show that a simple algorithm with a universal and instance-independent step size is sufficient to obtain near-optimal variance and bias terms.
Our proof technique is based on refined error bounds for linear approximation together with the novel stability result for the product of random matrices.
arXiv Detail & Related papers (2023-10-22T12:37:25Z) - Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension [86.3584476711976]
We propose algorithms for both nonlinear bandits and model-based episodic RL using the general function class with a bounded eluder.
The achieved uniform-PAC sample complexity is tight in the sense that it matches the state-of-the-art regret bounds or sample complexity guarantees when reduced to the linear case.
arXiv Detail & Related papers (2023-05-15T05:07:45Z) - A relaxed proximal gradient descent algorithm for convergent
plug-and-play with proximal denoiser [6.2484576862659065]
This paper presents a new convergent Plug-and-fidelity Descent (Play) algorithm.
The algorithm converges for a wider range of regular convexization parameters, thus allowing more accurate restoration of an image.
arXiv Detail & Related papers (2023-01-31T16:11:47Z) - PAC-Bayesian Learning of Optimization Algorithms [6.624726878647541]
We apply the PAC-Bayes theory to the setting of learning-to-optimize.
We learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed.
Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families.
arXiv Detail & Related papers (2022-10-20T09:16:36Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.