Smoothing Advantage Learning
- URL: http://arxiv.org/abs/2203.10445v1
- Date: Sun, 20 Mar 2022 03:52:32 GMT
- Title: Smoothing Advantage Learning
- Authors: Yaozhong Gan, Zhe Zhang, Xiaoyang Tan
- Abstract summary: We propose a simple variant of Advantage learning (AL) named smoothing advantage learning (SAL)
The proposed value smoothing technique not only helps to stabilize the training procedure of AL by controlling the trade-off between convergence rate and the upper bound of the approximation errors, but is beneficial to increase the action gap between the optimal and sub-optimal action value as well.
- Score: 20.760987175553645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advantage learning (AL) aims to improve the robustness of value-based
reinforcement learning against estimation errors with action-gap-based
regularization. Unfortunately, the method tends to be unstable in the case of
function approximation. In this paper, we propose a simple variant of AL, named
smoothing advantage learning (SAL), to alleviate this problem. The key to our
method is to replace the original Bellman Optimal operator in AL with a smooth
one so as to obtain more reliable estimation of the temporal difference target.
We give a detailed account of the resulting action gap and the performance
bound for approximate SAL. Further theoretical analysis reveals that the
proposed value smoothing technique not only helps to stabilize the training
procedure of AL by controlling the trade-off between convergence rate and the
upper bound of the approximation errors, but is beneficial to increase the
action gap between the optimal and sub-optimal action value as well.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Smooth Sailing: Improving Active Learning for Pre-trained Language
Models with Representation Smoothness Analysis [3.490038106567192]
Active learning (AL) methods aim to reduce label complexity in supervised learning.
We propose an early stopping technique that does not require a validation set.
We find that task adaptation improves AL, whereas standard short fine-tuning in AL does not provide improvements over random sampling.
arXiv Detail & Related papers (2022-12-20T19:37:20Z) - Robust Action Gap Increasing with Clipped Advantage Learning [20.760987175553645]
We present a novel method, named clipped Advantage Learning (clipped AL) to address this issue.
Our simple clipped AL operator not only enjoys fast convergence guarantee but also retains proper action gaps, hence achieving a good balance between the large action gap and the fast convergence.
arXiv Detail & Related papers (2022-03-20T03:41:26Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Distributionally Robust Learning with Stable Adversarial Training [34.74504615726101]
Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts.
We propose a novel Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set.
arXiv Detail & Related papers (2021-06-30T03:05:45Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Stable Adversarial Learning under Distributional Shifts [46.98655899839784]
Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts.
We propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set.
arXiv Detail & Related papers (2020-06-08T08:42:34Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.