Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models
- URL: http://arxiv.org/abs/2510.19999v1
- Date: Wed, 22 Oct 2025 20:01:25 GMT
- Title: Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models
- Authors: Yixiao Wang, Zishan Shao, Ting Jiang, Aditya Devarakonda,
- Abstract summary: We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints.<n>We show consistent performance improvements of $3times$ in average for regularization path variant on diverse benchmark datasets.
- Score: 9.835110912860591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion around the current iterate to avoid nonlinear operations arising in the gradient computation. By introducing this approximation, we are able to unroll the vector recurrences occurring in the CD method and reformulate the resulting computations into more efficient batched computations. We show empirically that the recurrence can be unrolled by a tunable integer parameter, $s$, such that $s > 1$ yields performance improvements without affecting convergence, whereas $s = 1$ yields the original CD method. A key advantage of ECCD is that it avoids the convergence delay and numerical instability exhibited by block coordinate descent. Finally, we implement our proposed method in C++ using Eigen to accelerate linear algebra computations. Comparison of our method against existing state-of-the-art solvers shows consistent performance improvements of $3\times$ in average for regularization path variant on diverse benchmark datasets. Our implementation is available at https://github.com/Yixiao-Wang-Stats/ECCD.
Related papers
- A Trainable Optimizer [18.195022468462753]
We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
arXiv Detail & Related papers (2025-08-03T14:06:07Z) - Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling [44.31966204357333]
We develop memory-efficient algorithms for large-scale machine learning problems.<n>We use two key techniques to make our approach memory-efficient and avoid full computations.
arXiv Detail & Related papers (2025-02-20T15:37:45Z) - Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.<n>However, these methods do not exploit curvature information.<n>Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Reduced Contraction Costs of Corner-Transfer Methods for PEPS [0.0]
We propose a pair of approximations that allows the leading order computational cost of contracting an infinite projected entangled-pair state to be reduced.
The improvement in computational cost enables us to perform large bond dimension calculations, extending its potential to solve challenging problems.
arXiv Detail & Related papers (2023-06-14T02:54:12Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.<n>We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Coordinate Methods for Matrix Games [41.821881312775496]
We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $min_x in mathcalX max_yinmathY ytop A x$.
Our methods push existing sublinear methods towards their limits in terms of per-iteration complexity and sample complexity.
arXiv Detail & Related papers (2020-09-17T17:55:03Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Variance reduction for Random Coordinate Descent-Langevin Monte Carlo [7.464874233755718]
Langevin Monte Carlo (LMC) that provides fast convergence requires computation of gradient approximations.
In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions.
We introduce a new variance reduction approach, termed Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC.
arXiv Detail & Related papers (2020-06-10T21:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.