Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration,
Convergence, and Stabilization
- URL: http://arxiv.org/abs/2110.08896v2
- Date: Wed, 20 Oct 2021 05:12:46 GMT
- Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration,
Convergence, and Stabilization
- Authors: Ke Sun, Yafei Wang, Yi Liu, Yingnan Zhao, Bo Pan, Shangling Jui, Bei
Jiang, Linglong Kong
- Abstract summary: We provide insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms.
Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy schemes by an extra contraction factor.
We propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior.
- Score: 7.418163369920758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anderson mixing has been heuristically applied to reinforcement learning (RL)
algorithms for accelerating convergence and improving the sampling efficiency
of deep RL. Despite its heuristic improvement of convergence, a rigorous
mathematical justification for the benefits of Anderson mixing in RL has not
yet been put forward. In this paper, we provide deeper insights into a class of
acceleration schemes built on Anderson mixing that improve the convergence of
deep RL algorithms. Our main results establish a connection between Anderson
mixing and quasi-Newton methods and prove that Anderson mixing increases the
convergence radius of policy iteration schemes by an extra contraction factor.
The key focus of the analysis roots in the fixed-point iteration nature of RL.
We further propose a stabilization strategy by introducing a stable
regularization term in Anderson mixing and a differentiable, non-expansive
MellowMax operator that can allow both faster convergence and more stable
behavior. Extensive experiments demonstrate that our proposed method enhances
the convergence, stability, and performance of RL algorithms.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - R\'{e}nyi Divergence Deep Mutual Learning [10.357597530261664]
This paper revisits Deep Learning Mutual (DML) as a simple yet effective computing paradigm.
We propose using R'enyi divergence instead of the KL divergence, which is more flexible and limited.
Our empirical results demonstrate the advantage combining DML and R'enyi divergence, leading to further improvement in model generalization.
arXiv Detail & Related papers (2022-09-13T04:58:35Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Stochastic Anderson Mixing for Nonconvex Stochastic Optimization [12.65903351047816]
Anderson mixing (AM) is an acceleration method for fixed-point iterations.
We propose a Mixing (SAM) scheme to solve non adaptive optimization problems.
arXiv Detail & Related papers (2021-10-04T16:26:15Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - SIBRE: Self Improvement Based REwards for Adaptive Feedback in
Reinforcement Learning [5.868852957948178]
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL)
The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance.
We prove that SIBRE converges in expectation under the same conditions as the original RL algorithm.
arXiv Detail & Related papers (2020-04-21T09:22:16Z) - Mixed Reinforcement Learning with Additive Stochastic Uncertainty [19.229447330293546]
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency.
This paper presents a mixed RL algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy.
The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of non-affine nonlinear systems.
arXiv Detail & Related papers (2020-02-28T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.