Related papers: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

URL: http://arxiv.org/abs/2110.08896v2
Date: Wed, 20 Oct 2021 05:12:46 GMT
Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Authors: Ke Sun, Yafei Wang, Yi Liu, Yingnan Zhao, Bo Pan, Shangling Jui, Bei Jiang, Linglong Kong
Abstract summary: We provide insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy schemes by an extra contraction factor. We propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior.
Score: 7.418163369920758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

Related papers

Improving Multimodal Learning via Imbalanced Learning [6.93254775445168]
Multimodal learning often encounters the under-optimized problem and may perform worse than unimodal learning.<n>This paper argues that balanced learning is not the optimal setting for multimodal learning.<n>We propose the Asymmetric Representation Learning(ARL) strategy to assist multimodal learning via imbalanced optimization.
arXiv Detail & Related papers (2025-07-14T12:14:57Z)
Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization [25.633698252033756]
We propose the Evolutionary Augmentation Mechanism (EAM) to synergize the learning efficiency of DRL with the global search power of GAs.<n>EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation.<n>EAM can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO.
arXiv Detail & Related papers (2025-06-11T05:17:30Z)
Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z)
Conformal Symplectic Optimization for Stable Reinforcement Learning [21.491621524500736]
By utilizing relativistic kinetic energy, RAD incorporates from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal influences. Notably, RAD achieves up to a 155.1% performance improvement, showcasing its efficacy in training Atari games.
arXiv Detail & Related papers (2024-12-03T09:07:31Z)
Accelerating AI Performance using Anderson Extrapolation on GPUs [2.114333871769023]
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation. By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence. We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
arXiv Detail & Related papers (2024-10-25T10:45:17Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Rényi Divergence Deep Mutual Learning [3.682680183777648]
This paper revisits Deep Learning Mutual (DML) as a simple yet effective computing paradigm. We propose using R'enyi divergence instead of the KL divergence, which is more flexible and limited. Our empirical results demonstrate the advantage combining DML and R'enyi divergence, leading to further improvement in model generalization.
arXiv Detail & Related papers (2022-09-13T04:58:35Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Stochastic Anderson Mixing for Nonconvex Stochastic Optimization [12.65903351047816]
Anderson mixing (AM) is an acceleration method for fixed-point iterations. We propose a Mixing (SAM) scheme to solve non adaptive optimization problems.
arXiv Detail & Related papers (2021-10-04T16:26:15Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
SIBRE: Self Improvement Based REwards for Adaptive Feedback in Reinforcement Learning [5.868852957948178]
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL) The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We prove that SIBRE converges in expectation under the same conditions as the original RL algorithm.
arXiv Detail & Related papers (2020-04-21T09:22:16Z)
Mixed Reinforcement Learning with Additive Stochastic Uncertainty [19.229447330293546]
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency. This paper presents a mixed RL algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of non-affine nonlinear systems.
arXiv Detail & Related papers (2020-02-28T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.