Related papers: Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

URL: http://arxiv.org/abs/2206.06827v2
Date: Wed, 15 Jun 2022 07:49:16 GMT
Title: Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization
Authors: Maxim Kaledin, Alexander Golubev, Denis Belomestny
Abstract summary: Policy-gradient methods in Reinforcement Learning suffer from the high variance of the gradient estimate. In this paper we for the first time investigate the performance of the one called Empirical Variance(EV) Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.
Score: 69.32510868632988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including actor-critic(AC) and advantage actor-critic(A2C) methods. Recently the approaches have got new perspective due to the introduction of Deep RL: both new control variates(CV) and new sub-sampling procedures became available in the setting of complex models like neural networks. The vital part of CV-based methods is the goal functional for the training of the CV, the most popular one is the least-squares criterion of A2C. Despite its practical success, the criterion is not the only one possible. In this paper we for the first time investigate the performance of the one called Empirical Variance(EV). We observe in the experiments that not only EV-criterion performs not worse than A2C but sometimes can be considerably better. Apart from that, we also prove some theoretical guarantees of the actual variance reduction under very general assumptions and show that A2C least-squares goal functional is an upper bound for EV goal. Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.

Related papers

Solving Hidden Monotone Variational Inequalities with Surrogate Losses [23.565183680315073]
We propose a principled surrogate-based approach compatible with deep learning to solve variational inequality (VI) problems. We demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. In the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.
arXiv Detail & Related papers (2024-11-07T22:42:08Z)
Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
Regularized DeepIV with Model Selection [72.17508967124081]
Regularized DeepIV (RDIV) regression can converge to the least-norm IV solution. Our method matches the current state-of-the-art convergence rate.
arXiv Detail & Related papers (2024-03-07T05:38:56Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Robust Long-Tailed Learning via Label-Aware Bounded CVaR [36.26100472960534]
We propose two novel approaches to improve the performance of long-tailed learning with a solid theoretical ground. Specifically, we introduce a Label-Aware Bounded CVaR loss to overcome the pessimistic result of the original CVaR. We additionally propose a LAB-CVaR with logit adjustment to stabilize the optimization process.
arXiv Detail & Related papers (2023-08-29T16:07:18Z)
Mixture Proportion Estimation and PU Learning: A Modern Approach [47.34499672878859]
Given only positive examples and unlabeled examples, we might hope to estimate an accurate positive-versus-negative classifier. classical methods for both problems break down in high-dimensional settings. We propose two simple techniques: Best Bin Estimation (BBE) and Value Ignoring Risk (CVIR)
arXiv Detail & Related papers (2021-11-01T14:42:23Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.