Variance Reduction for Policy-Gradient Methods via Empirical Variance
Minimization
- URL: http://arxiv.org/abs/2206.06827v2
- Date: Wed, 15 Jun 2022 07:49:16 GMT
- Title: Variance Reduction for Policy-Gradient Methods via Empirical Variance
Minimization
- Authors: Maxim Kaledin, Alexander Golubev, Denis Belomestny
- Abstract summary: Policy-gradient methods in Reinforcement Learning suffer from the high variance of the gradient estimate.
In this paper we for the first time investigate the performance of the one called Empirical Variance(EV)
Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.
- Score: 69.32510868632988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy-gradient methods in Reinforcement Learning(RL) are very universal and
widely applied in practice but their performance suffers from the high variance
of the gradient estimate. Several procedures were proposed to reduce it
including actor-critic(AC) and advantage actor-critic(A2C) methods. Recently
the approaches have got new perspective due to the introduction of Deep RL:
both new control variates(CV) and new sub-sampling procedures became available
in the setting of complex models like neural networks. The vital part of
CV-based methods is the goal functional for the training of the CV, the most
popular one is the least-squares criterion of A2C. Despite its practical
success, the criterion is not the only one possible. In this paper we for the
first time investigate the performance of the one called Empirical
Variance(EV). We observe in the experiments that not only EV-criterion performs
not worse than A2C but sometimes can be considerably better. Apart from that,
we also prove some theoretical guarantees of the actual variance reduction
under very general assumptions and show that A2C least-squares goal functional
is an upper bound for EV goal. Our experiments indicate that in terms of
variance reduction EV-based methods are much better than A2C and allow stronger
variance reduction.
Related papers
- Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Regularized DeepIV with Model Selection [72.17508967124081]
Regularized DeepIV (RDIV) regression can converge to the least-norm IV solution.
Our method matches the current state-of-the-art convergence rate.
arXiv Detail & Related papers (2024-03-07T05:38:56Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Robust Long-Tailed Learning via Label-Aware Bounded CVaR [36.26100472960534]
We propose two novel approaches to improve the performance of long-tailed learning with a solid theoretical ground.
Specifically, we introduce a Label-Aware Bounded CVaR loss to overcome the pessimistic result of the original CVaR.
We additionally propose a LAB-CVaR with logit adjustment to stabilize the optimization process.
arXiv Detail & Related papers (2023-08-29T16:07:18Z) - Mixture Proportion Estimation and PU Learning: A Modern Approach [47.34499672878859]
Given only positive examples and unlabeled examples, we might hope to estimate an accurate positive-versus-negative classifier.
classical methods for both problems break down in high-dimensional settings.
We propose two simple techniques: Best Bin Estimation (BBE) and Value Ignoring Risk (CVIR)
arXiv Detail & Related papers (2021-11-01T14:42:23Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.