The Role of Lookahead and Approximate Policy Evaluation in Policy
Iteration with Linear Value Function Approximation
- URL: http://arxiv.org/abs/2109.13419v1
- Date: Tue, 28 Sep 2021 01:20:08 GMT
- Title: The Role of Lookahead and Approximate Policy Evaluation in Policy
Iteration with Linear Value Function Approximation
- Authors: Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant
- Abstract summary: We show that when linear function approximation is used to represent the value function, a certain minimum amount of lookahead and multi-step return is needed.
And when this condition is met, we characterize the finite-time performance of policies obtained using such approximate policy.
- Score: 14.528756508275622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When the sizes of the state and action spaces are large, solving MDPs can be
computationally prohibitive even if the probability transition matrix is known.
So in practice, a number of techniques are used to approximately solve the
dynamic programming problem, including lookahead, approximate policy evaluation
using an m-step return, and function approximation. In a recent paper, (Efroni
et al. 2019) studied the impact of lookahead on the convergence rate of
approximate dynamic programming. In this paper, we show that these convergence
results change dramatically when function approximation is used in conjunction
with lookout and approximate policy evaluation using an m-step return.
Specifically, we show that when linear function approximation is used to
represent the value function, a certain minimum amount of lookahead and
multi-step return is needed for the algorithm to even converge. And when this
condition is met, we characterize the finite-time performance of policies
obtained using such approximate policy iteration. Our results are presented for
two different procedures to compute the function approximation: linear
least-squares regression and gradient descent.
Related papers
- Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs [44.69257217086967]
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives.
In this paper, we address the learning problem given linear function approximation with $q_pi$-realizability.
arXiv Detail & Related papers (2024-06-26T17:57:13Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
Theory and Algorithms [1.776746672434207]
We study policy gradient (PG) for reinforcement learning in continuous time and space.
We propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly.
arXiv Detail & Related papers (2021-11-22T14:27:04Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z) - Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs.
This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.