A Trainable Optimizer
- URL: http://arxiv.org/abs/2508.01764v1
- Date: Sun, 03 Aug 2025 14:06:07 GMT
- Title: A Trainable Optimizer
- Authors: Ruiqi Wang, Diego Klabjan,
- Abstract summary: We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
- Score: 18.195022468462753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The concept of learning to optimize involves utilizing a trainable optimization strategy rather than relying on manually defined full gradient estimations such as ADAM. We present a framework that jointly trains the full gradient estimator and the trainable weights of the model. Specifically, we prove that pseudo-linear TO (Trainable Optimizer), a linear approximation of the full gradient, matches SGD's convergence rate while effectively reducing variance. Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional tensor multiplications. To further improve computational efficiency, we introduce two simplified variants of Pseudo-linear TO. Experiments demonstrate that TO methods converge faster than benchmark algorithms (e.g., ADAM) in both strongly convex and non-convex settings, and fine tuning of an LLM.
Related papers
- VAMO: Efficient Large-Scale Nonconvex Optimization via Adaptive Zeroth Order Variance Reduction [3.130722489512822]
VAMO combines FO mini-batch gradients with ZO finite-difference probes under an ZOG-style framework.<n>VAMO outperforms established FO and ZO methods, offering a faster, more flexible option for improved efficiency.
arXiv Detail & Related papers (2025-05-20T05:31:15Z) - A Triple-Inertial Accelerated Alternating Optimization Method for Deep Learning Training [3.246129789918632]
gradient descent (SGD) algorithm has achieved remarkable success in training deep learning models.<n> alternating minimization (AM) methods have emerged as a promising alternative for model training.<n>We propose a novel Triple-Inertial Accelerated Alternating Minimization (TIAM) framework for neural network training.
arXiv Detail & Related papers (2025-03-11T14:42:17Z) - Optimal DLT-based Solutions for the Perspective-n-Point [0.0]
We propose a modified direct linear (DLT) algorithm for solving the perspective-n-point (Newton)<n>The modification consists analytically weighting the different measurements in the linear system with a negligible increase in computational load.<n>Our approach clears improvements in both performance and runtime.
arXiv Detail & Related papers (2024-10-18T04:04:58Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods
to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs)
guaranteeing the effective and efficient training of BNNs is an unsolved problem.
We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.