Frugality in second-order optimization: floating-point approximations for Newton's method
- URL: http://arxiv.org/abs/2511.17660v1
- Date: Thu, 20 Nov 2025 21:05:45 GMT
- Title: Frugality in second-order optimization: floating-point approximations for Newton's method
- Authors: Giuseppe Carrino, Elena Loli Piccolomini, Elisa Riccietti, Theo Mary,
- Abstract summary: This work analyzes the impact of finite-precision arithmetic on Newton steps.<n>It establishes a convergence theorem for mixed-precision Newtons, including "quasi" and "inexact" variants.<n>It also introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives.
- Score: 4.04818342174938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provides not only convergence guarantees but also a priori estimates of the achievable solution accuracy. Empirical evaluations on standard regression benchmarks demonstrate that the proposed methods outperform Adam on the Australian and MUSH datasets. The second part of the manuscript introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives. GN_k attains performance comparable to full Newton's method on regression tasks while requiring significantly fewer derivative evaluations.
Related papers
- Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.<n>However, these methods do not exploit curvature information.<n>Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z) - Online Covariance Matrix Estimation in Sketched Newton Methods [7.441891256588546]
We study an online sketched Newton method that leverages a randomized sketching technique to perform an approximate Newton step in each iteration.<n>We also discuss the extension of our estimator to constrained problems, and demonstrate its superior performance on regression problems.
arXiv Detail & Related papers (2025-02-10T23:11:02Z) - Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian [53.044526424637866]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian.<n>We first propose a Newton-conjugate (Newton-CG) method for finding an approximate first- and second-order stationary point.<n>We present preliminary numerical results to demonstrate the superior practical performance of our parameter-free NewtonCG method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis [71.05708939639537]
We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of emphconcave unconstrained problems.<n>Our method improves the existing line-search-based min-max optimization by shaving off an $O(loglog(1/eps)$ factor in the required number of Schur decompositions.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches [0.0]
We propose a finite-sum minimization algorithm that provably accelerates existing Newton methods.<n>Surprisingly, this acceleration gets more significant the larger the data size.<n>Our algorithm retains the key advantages of Newton-type methods, such as easily parallel large-batch operations and a simple unit step size.
arXiv Detail & Related papers (2022-06-06T16:00:18Z) - Fast Projected Newton-like Method for Precision Matrix Estimation under
Total Positivity [15.023842222803058]
Current algorithms are designed using the block coordinate descent method or the proximal point algorithm.
We propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme.
Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-03T14:39:10Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - SPAN: A Stochastic Projected Approximate Newton Method [17.94221425332409]
We propose SPAN, a novel approximate and fast Newton method to compute the inverse of the Hessian matrix.
SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time.
arXiv Detail & Related papers (2020-02-10T12:42:42Z) - On Newton Screening [14.040371216692645]
We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism.
We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
arXiv Detail & Related papers (2020-01-27T11:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.