Related papers: A Trainable Optimizer

A Trainable Optimizer

URL: http://arxiv.org/abs/2508.01764v1
Date: Sun, 03 Aug 2025 14:06:07 GMT
Title: A Trainable Optimizer
Authors: Ruiqi Wang, Diego Klabjan,
Abstract summary: We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
Score: 18.195022468462753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The concept of learning to optimize involves utilizing a trainable optimization strategy rather than relying on manually defined full gradient estimations such as ADAM. We present a framework that jointly trains the full gradient estimator and the trainable weights of the model. Specifically, we prove that pseudo-linear TO (Trainable Optimizer), a linear approximation of the full gradient, matches SGD's convergence rate while effectively reducing variance. Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional tensor multiplications. To further improve computational efficiency, we introduce two simplified variants of Pseudo-linear TO. Experiments demonstrate that TO methods converge faster than benchmark algorithms (e.g., ADAM) in both strongly convex and non-convex settings, and fine tuning of an LLM.

Related papers

VAMO: Efficient Large-Scale Nonconvex Optimization via Adaptive Zeroth Order Variance Reduction [3.130722489512822]
VAMO combines FO mini-batch gradients with ZO finite-difference probes under an ZOG-style framework.<n>VAMO outperforms established FO and ZO methods, offering a faster, more flexible option for improved efficiency.
arXiv Detail & Related papers (2025-05-20T05:31:15Z)
A Triple-Inertial Accelerated Alternating Optimization Method for Deep Learning Training [3.246129789918632]
gradient descent (SGD) algorithm has achieved remarkable success in training deep learning models.<n> alternating minimization (AM) methods have emerged as a promising alternative for model training.<n>We propose a novel Triple-Inertial Accelerated Alternating Minimization (TIAM) framework for neural network training.
arXiv Detail & Related papers (2025-03-11T14:42:17Z)
Optimal DLT-based Solutions for the Perspective-n-Point [0.0]
We propose a modified direct linear (DLT) algorithm for solving the perspective-n-point (Newton)<n>The modification consists analytically weighting the different measurements in the linear system with a negligible increase in computational load.<n>Our approach clears improvements in both performance and runtime.
arXiv Detail & Related papers (2024-10-18T04:04:58Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs) guaranteeing the effective and efficient training of BNNs is an unsolved problem. We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.