Related papers: Gradient Descent Methods for Regularized Optimization

Gradient Descent Methods for Regularized Optimization

URL: http://arxiv.org/abs/2412.20115v1
Date: Sat, 28 Dec 2024 10:54:15 GMT
Title: Gradient Descent Methods for Regularized Optimization
Authors: Filip Nikolovski, Irena Stojkovska, Katerina Hadzi-Velkova Saneva, Zoran Hadzi-Velkov,
Abstract summary: The gradient descent (GD) method is one of the primary methods used for numerical optimization of differentiable objective functions.<n>A more effective version of GD, called the proximal gradient descent, employs a technique known as soft-thresholding to shrink the iteration updates toward zero.<n>This paper proposes a novel algorithm for the proximal GD method that incorporates a variable step size.
Score: 0.6624754673303327
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Regularization is a widely recognized technique in mathematical optimization. It can be used to smooth out objective functions, refine the feasible solution set, or prevent overfitting in machine learning models. Due to its simplicity and robustness, the gradient descent (GD) method is one of the primary methods used for numerical optimization of differentiable objective functions. However, GD is not well-suited for solving $\ell^1$ regularized optimization problems since these problems are non-differentiable at zero, causing iteration updates to oscillate or fail to converge. Instead, a more effective version of GD, called the proximal gradient descent employs a technique known as soft-thresholding to shrink the iteration updates toward zero, thus enabling sparsity in the solution. Motivated by the widespread applications of proximal GD in sparse and low-rank recovery across various engineering disciplines, we provide an overview of the GD and proximal GD methods for solving regularized optimization problems. Furthermore, this paper proposes a novel algorithm for the proximal GD method that incorporates a variable step size. Unlike conventional proximal GD, which uses a fixed step size based on the global Lipschitz constant, our method estimates the Lipschitz constant locally at each iteration and uses its reciprocal as the step size. This eliminates the need for a global Lipschitz constant, which can be impractical to compute. Numerical experiments we performed on synthetic and real-data sets show notable performance improvement of the proposed method compared to the conventional proximal GD with constant step size, both in terms of number of iterations and in time requirements.

Related papers

Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models [37.63410634069547]
We propose to exponentially increase the step-size of the Gaussian descent (GD) algorithm. We then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models. The total computational complexity of the EGD algorithm is emphoptimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models.
arXiv Detail & Related papers (2022-05-16T21:36:22Z)
An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms [19.41328109094503]
We propose and analyze several novel adaptive variants of the popular SAGA algorithm. We establish its convergence guarantees under general settings. We improve the analysis of SAGA to support non-Euclidean norms.
arXiv Detail & Related papers (2022-04-28T09:43:07Z)
Continuation Newton methods with deflation techniques for global optimization problems [3.705839280172101]
A global minimum point of an optimization problem is of interest in engineering. In this article, we consider a new memetic algorithm for this nonlinear largescale problem. According to our numerical experiments, new algorithm works well for unconstrained unconstrained problems.
arXiv Detail & Related papers (2021-07-29T09:53:49Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Obtaining Adjustable Regularization for Free via Iterate Averaging [43.75491612671571]
Regularization for optimization is a crucial technique to avoid overfitting in machine learning. We establish an averaging scheme that converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart. Our approaches can be used for accelerated and preconditioned optimization methods as well.
arXiv Detail & Related papers (2020-08-15T15:28:05Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems. We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.