H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
- URL: http://arxiv.org/abs/2406.09958v2
- Date: Mon, 17 Jun 2024 11:25:33 GMT
- Title: H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
- Authors: Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu,
- Abstract summary: We develop H-Fac, which incorporates a factorized approach to momentum and scaling parameters.
Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers.
These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.
- Score: 11.01832755213396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.
Related papers
- Efficient Inverse Design Optimization through Multi-fidelity Simulations, Machine Learning, and Search Space Reduction Strategies [0.8646443773218541]
This paper introduces a methodology designed to augment the inverse design optimization process in scenarios constrained by limited compute.
The proposed methodology is analyzed on two distinct engineering inverse design problems: airfoil inverse design and the scalar field reconstruction problem.
Notably, this method is adaptable across any inverse design application, facilitating a synergy between a representative low-fidelity ML model, and high-fidelity simulation, and can be seamlessly applied across any variety of population-based optimization algorithms.
arXiv Detail & Related papers (2023-12-06T18:20:46Z) - Hybrid GRU-CNN Bilinear Parameters Initialization for Quantum
Approximate Optimization Algorithm [7.502733639318316]
We propose a hybrid optimization approach that integrates Gated Recurrent Units (GRU), Conal Neural Networks (CNN), and a bilinear strategy as an innovative alternative to conventional approximations for predicting optimal parameters of QAOA circuits.
We employ the bilinear strategy to initialize to QAOA circuit parameters at greater depths, with reference parameters obtained from GRU-CNN optimization.
arXiv Detail & Related papers (2023-11-14T03:00:39Z) - Ensemble-based Hybrid Optimization of Bayesian Neural Networks and
Traditional Machine Learning Algorithms [0.0]
This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM)
Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix.
Overall, the ensemble method stands out as a robust, algorithmically optimized approach.
arXiv Detail & Related papers (2023-10-09T06:59:17Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Meta-Learning Digitized-Counterdiabatic Quantum Optimization [3.0638256603183054]
We tackle the problem of finding suitable initial parameters for variational optimization by employing a meta-learning technique using recurrent neural networks.
We investigate this technique with the recently proposed digitized-counterdiabatic quantum approximate optimization algorithm (DC-QAOA)
The combination of meta learning and DC-QAOA enables us to find optimal initial parameters for different models, such as MaxCut problem and the Sherrington-Kirkpatrick model.
arXiv Detail & Related papers (2022-06-20T18:57:50Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Particle Swarm Optimization: Fundamental Study and its Application to
Optimization and to Jetty Scheduling Problems [0.0]
The advantages of evolutionary algorithms with respect to traditional methods have been greatly discussed in the literature.
While particle swarms share such advantages, they outperform evolutionary algorithms in that they require lower computational cost and easier implementation.
This paper does not intend to study their tuning, general-purpose settings are taken from previous studies, and virtually the same algorithm is used to optimize a variety of notably different problems.
arXiv Detail & Related papers (2021-01-25T02:06:30Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.