Biased Gradient Estimate with Drastic Variance Reduction for Meta
Reinforcement Learning
- URL: http://arxiv.org/abs/2112.07328v1
- Date: Tue, 14 Dec 2021 12:29:43 GMT
- Title: Biased Gradient Estimate with Drastic Variance Reduction for Meta
Reinforcement Learning
- Authors: Yunhao Tang
- Abstract summary: biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates.
We propose linearized score function (LSF) gradient estimates, which have bias $mathcalO (1/sqrtN)$ and variance $mathcalO (1/N)$.
We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.
- Score: 25.639542287310768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the empirical success of meta reinforcement learning (meta-RL), there
are still a number poorly-understood discrepancies between theory and practice.
Critically, biased gradient estimates are almost always implemented in
practice, whereas prior theory on meta-RL only establishes convergence under
unbiased gradient estimates. In this work, we investigate such a discrepancy.
In particular, (1) We show that unbiased gradient estimates have variance
$\Theta(N)$ which linearly depends on the sample size $N$ of the inner loop
updates; (2) We propose linearized score function (LSF) gradient estimates,
which have bias $\mathcal{O}(1/\sqrt{N})$ and variance $\mathcal{O}(1/N)$; (3)
We show that most empirical prior work in fact implements variants of the LSF
gradient estimates. This implies that practical algorithms "accidentally"
introduce bias to achieve better performance; (4) We establish theoretical
guarantees for the LSF gradient estimates in meta-RL regarding its convergence
to stationary points, showing better dependency on $N$ than prior work when $N$
is large.
Related papers
- Analysis of the expected $L_2$ error of an over-parametrized deep neural
network estimate learned by gradient descent without regularization [7.977229957867868]
Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L$ risk are universally consistent.
In this paper, we show that the regularization term is not necessary to obtain similar results.
arXiv Detail & Related papers (2023-11-24T17:04:21Z) - The Role of Baselines in Policy Gradient Optimization [83.42050606055822]
We show that the emphstate value baseline allows on-policy.
emphnatural policy gradient (NPG) to converge to a globally optimal.
policy at an $O (1/t) rate gradient.
We find that the primary effect of the value baseline is to textbfreduce the aggressiveness of the updates rather than their variance.
arXiv Detail & Related papers (2023-01-16T06:28:00Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Provably Efficient Convergence of Primal-Dual Actor-Critic with
Nonlinear Function Approximation [15.319335698574932]
We show the first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sampling.
Results on Open GymAI continuous control tasks.
arXiv Detail & Related papers (2022-02-28T15:16:23Z) - A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning [16.824515577815696]
Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures.
We show that existing meta-gradient estimators adopted by GMRL are actually text-bfbiased.
We conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.
arXiv Detail & Related papers (2021-12-31T11:56:40Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite
Epochs [25.158203665218164]
We show that adaptive gradient methods can be faster than random shuffling SGD after finite time.
To the best of our knowledge, it is the first to demonstrate that adaptive gradient methods can be faster than SGD after finite time.
arXiv Detail & Related papers (2020-06-12T09:39:47Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.