Partial advantage estimator for proximal policy optimization
- URL: http://arxiv.org/abs/2301.10920v1
- Date: Thu, 26 Jan 2023 03:42:39 GMT
- Title: Partial advantage estimator for proximal policy optimization
- Authors: Xiulei Song, Yizhao Jin, Greg Slabaugh, Simon Lucas
- Abstract summary: Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to $lambda$-return.
In practical applications, a truncated GAE is used due to the incompleteness of the trajectory, which results in a large bias during estimation.
We propose to take a part of it when calculating updates, which significantly reduces the bias resulting from the incomplete trajectory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimation of value in policy gradient methods is a fundamental problem.
Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator
of an advantage function similar to $\lambda$-return. It substantially reduces
the variance of policy gradient estimates at the expense of bias. In practical
applications, a truncated GAE is used due to the incompleteness of the
trajectory, which results in a large bias during estimation. To address this
challenge, instead of using the entire truncated GAE, we propose to take a part
of it when calculating updates, which significantly reduces the bias resulting
from the incomplete trajectory. We perform experiments in MuJoCo and $\mu$RTS
to investigate the effect of different partial coefficient and sampling
lengths. We show that our partial GAE approach yields better empirical results
in both environments.
Related papers
- Generalized Advantage Estimation for Distributional Policy Gradients [3.878500880725885]
Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL)<n>We propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions.<n>Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE)
arXiv Detail & Related papers (2025-07-23T14:07:56Z) - Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning [50.93804891554481]
We introduce a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators.<n>Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions.<n>In the off-policy learning scenario, we establish bounds on the regret -- the performance gap between our LSE estimator and the optimal policy.
arXiv Detail & Related papers (2025-06-07T17:37:10Z) - Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators.
We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z) - A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - U-Statistics for Importance-Weighted Variational Inference [29.750633016889655]
We propose the use of U-statistics to reduce variance for estimation in importance-weighted variational inference.
We find empirically that U-statistic variance reduction can lead to modest to significant improvements in inference performance on a range of models.
arXiv Detail & Related papers (2023-02-27T16:08:43Z) - Variance Reduction for Score Functions Using Optimal Baselines [0.0]
This paper studies baselines, a variance reduction technique for score functions.
Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline.
arXiv Detail & Related papers (2022-12-27T19:17:28Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z) - Biased Gradient Estimate with Drastic Variance Reduction for Meta
Reinforcement Learning [25.639542287310768]
biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates.
We propose linearized score function (LSF) gradient estimates, which have bias $mathcalO (1/sqrtN)$ and variance $mathcalO (1/N)$.
We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.
arXiv Detail & Related papers (2021-12-14T12:29:43Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - On the Convergence of SGD with Biased Gradients [28.400751656818215]
We analyze the guiding domain of biased gradient methods (SGD), where individual updates are corrupted by compression.
We quantify how many magnitudes of bias accuracy and convergence rates are impacted.
arXiv Detail & Related papers (2020-07-31T19:37:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.