Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
- URL: http://arxiv.org/abs/2110.01543v1
- Date: Mon, 4 Oct 2021 16:26:15 GMT
- Title: Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
- Authors: Fuchao Wei, Chenglong Bao, Yang Liu
- Abstract summary: Anderson mixing (AM) is an acceleration method for fixed-point iterations.
We propose a Mixing (SAM) scheme to solve non adaptive optimization problems.
- Score: 12.65903351047816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anderson mixing (AM) is an acceleration method for fixed-point iterations.
Despite its success and wide usage in scientific computing, the convergence
theory of AM remains unclear, and its applications to machine learning problems
are not well explored. In this paper, by introducing damped projection and
adaptive regularization to classical AM, we propose a Stochastic Anderson
Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under
mild assumptions, we establish the convergence theory of SAM, including the
almost sure convergence to stationary points and the worst-case iteration
complexity. Moreover, the complexity bound can be improved when randomly
choosing an iterate as the output. To further accelerate the convergence, we
incorporate a variance reduction technique into the proposed SAM. We also
propose a preconditioned mixing strategy for SAM which can empirically achieve
faster convergence or better generalization ability. Finally, we apply the SAM
method to train various neural networks including the vanilla CNN, ResNets,
WideResNet, ResNeXt, DenseNet and RNN. Experimental results on image
classification and language model demonstrate the advantages of our method.
Related papers
- Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization [17.670203551488218]
We propose Asymptotic Unbiased Sampling to accelerate Sharpness-Aware Minimization (AUSAM)
AUSAM maintains the model's generalization capacity while significantly enhancing computational efficiency.
As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks.
arXiv Detail & Related papers (2024-06-12T08:47:44Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Debiasing Conditional Stochastic Optimization [15.901623717313493]
We study the conditional causal optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, etc.
We develop new algorithms for the finite variant variant CSO problem that significantly improve upon existing results.
We believe that our technique has the potential to be a useful tool for addressing similar challenges in other optimization problems.
arXiv Detail & Related papers (2023-04-20T19:19:55Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - Rényi Divergence Deep Mutual Learning [3.682680183777648]
This paper revisits Deep Learning Mutual (DML) as a simple yet effective computing paradigm.
We propose using R'enyi divergence instead of the KL divergence, which is more flexible and limited.
Our empirical results demonstrate the advantage combining DML and R'enyi divergence, leading to further improvement in model generalization.
arXiv Detail & Related papers (2022-09-13T04:58:35Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation
Maximization for Nonconvex Finite-Sum Optimization [21.81837334970773]
We propose an extension of the Path-Integrated Differential Estima to the Expectation Maximization (EM) algorithm.
We show it supports the same state art bounds as SPIDER-EM-IDER; and results provide for a rate for our findings.
arXiv Detail & Related papers (2020-11-24T21:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.