Related papers: Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

URL: http://arxiv.org/abs/2310.05898v5
Date: Fri, 19 Apr 2024 09:06:57 GMT
Title: Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Authors: Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu,
Abstract summary: Lion (Evolved Sign Momentum) has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates.
Score: 8.393403749426097
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.

Related papers

Lion Cub: Minimizing Communication Overhead in Distributed Lion [9.360174471655977]
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. We combine these into Lion Cub, which enables up to 5x speedups in end-to-end training compared to Lion.
arXiv Detail & Related papers (2024-11-25T15:08:24Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Communication Efficient Distributed Training with Distributed Lion [25.39333175634972]
We introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. We demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems.
arXiv Detail & Related papers (2024-03-30T18:07:29Z)
Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
We study the dependence on $n$ and $L$ of the maximal update ($mu$P) learning rate. We find that it has a non-trivial dependence of $L$, scaling like $L-3/2.$
arXiv Detail & Related papers (2023-05-13T01:10:49Z)
Symbolic Discovery of Optimization Algorithms [132.62397077095787]
We use efficient search techniques to explore an infinite and sparse program space. Our method discovers a simple and effective optimization algorithm, $textbfLion$. Lion is successfully deployed in production systems such as Google search ads CTR model.
arXiv Detail & Related papers (2023-02-13T20:27:30Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty [14.85926834924458]
We present a new approach to solve the sparse approximation or best subset it interpolates between the classical lasso and general patterns. We derive a sparse-time to compute the general soft-min penalty.
arXiv Detail & Related papers (2020-05-18T18:43:06Z)
A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.