Reinforcement Learning with Unbiased Policy Evaluation and Linear
Function Approximation
- URL: http://arxiv.org/abs/2210.07338v1
- Date: Thu, 13 Oct 2022 20:16:19 GMT
- Title: Reinforcement Learning with Unbiased Policy Evaluation and Linear
Function Approximation
- Authors: Anna Winnicki, R. Srikant
- Abstract summary: We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes.
We analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via at least squares at each iteration.
The second algorithm involves a two-time-scale approximation algorithm taking several steps of gradient descent towards the least squares solution.
- Score: 11.345796608258434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide performance guarantees for a variant of simulation-based policy
iteration for controlling Markov decision processes that involves the use of
stochastic approximation algorithms along with state-of-the-art techniques that
are useful for very large MDPs, including lookahead, function approximation,
and gradient descent. Specifically, we analyze two algorithms; the first
algorithm involves a least squares approach where a new set of weights
associated with feature vectors is obtained via least squares minimization at
each iteration and the second algorithm involves a two-time-scale stochastic
approximation algorithm taking several steps of gradient descent towards the
least squares solution before obtaining the next iterate using a stochastic
approximation algorithm.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z) - Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint [8.087699764574788]
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning context.
Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme.
arXiv Detail & Related papers (2021-01-06T17:06:42Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Differentially Private Accelerated Optimization Algorithms [0.7874708385247353]
We present two classes of differentially private optimization algorithms.
The first algorithm is inspired by Polyak's heavy ball method.
The second class of algorithms are based on Nesterov's accelerated gradient method.
arXiv Detail & Related papers (2020-08-05T08:23:01Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Finite-sample Analysis of Greedy-GQ with Linear Function Approximation
under Markovian Noise [23.62008807533706]
This paper develops the first finite-sample analysis for the Greedy-GQ algorithm.
Our paper extends the finite-sample analyses of two timescale reinforcement learning learning algorithms.
arXiv Detail & Related papers (2020-05-20T16:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.