Related papers: Accelerated nonlinear primal-dual hybrid gradient algorithms with applications to machine learning

Accelerated nonlinear primal-dual hybrid gradient algorithms with applications to machine learning

URL: http://arxiv.org/abs/2109.12222v1
Date: Fri, 24 Sep 2021 22:37:10 GMT
Title: Accelerated nonlinear primal-dual hybrid gradient algorithms with applications to machine learning
Authors: J\'er\^ome Darbon and Gabriel Provencher Langlois
Abstract summary: primal-dual hybrid gradient (PDHG) is a first-order method that splits convex optimization problems with saddle-point structure into smaller subproblems. PDHG requires stepsize parameters fine-tuned for the problem at hand. We introduce accelerated nonlinear variants of the PDHG algorithm that can achieve, for a broad class of optimization problems relevant to machine learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The primal-dual hybrid gradient (PDHG) algorithm is a first-order method that splits convex optimization problems with saddle-point structure into smaller subproblems. Those subproblems, unlike those obtained from most other splitting methods, can generally be solved efficiently because they involve simple operations such as matrix-vector multiplications or proximal mappings that are easy to evaluate. In order to work fast, however, the PDHG algorithm requires stepsize parameters fine-tuned for the problem at hand. Unfortunately, the stepsize parameters must often be estimated from quantities that are prohibitively expensive to compute for large-scale optimization problems, such as those in machine learning. In this paper, we introduce accelerated nonlinear variants of the PDHG algorithm that can achieve, for a broad class of optimization problems relevant to machine learning, an optimal rate of convergence with stepsize parameters that are simple to compute. We prove rigorous convergence results, including for problems posed on infinite-dimensional reflexive Banach spaces. We also provide practical implementations of accelerated nonlinear PDHG algorithms for solving several regression tasks in machine learning, including support vector machines without offset, kernel ridge regression, elastic net regularized linear regression, and the least absolute shrinkage selection operator.

Related papers

Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms [1.4747234049753448]
In high-stakes engineering applications, optimization algorithms must come with provable provablecase guarantees over a mathematically defined class of problems.<n>We describe the class of algorithms that achieve linear convergence for classes of nonsmooth composite optimization problems.
arXiv Detail & Related papers (2025-08-01T16:56:42Z)
WENDy for Nonlinear-in-Parameters ODEs [1.9573380763700712]
The Weak-form Estimation of Non-linear Dynamics (WEN) is extended to accommodate systems of ordinary differential equations that are nonlinear-ins. We present results on a suite of benchmark systems to demonstrate the practical benefits of our approach.
arXiv Detail & Related papers (2025-02-13T01:40:21Z)
A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations [9.588717577573684]
We propose a scalable preconditioned primal hybrid gradient algorithm for solving partial differential equations (PDEs) We compare the performance of the proposed method with several commonly used deep learning algorithms. The numerical results suggest that the proposed method performs efficiently and robustly and converges more stably.
arXiv Detail & Related papers (2024-11-09T20:39:10Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation [0.135975510645475]
In this work, we derive safe screening rules that can be used to discard inessential samples. The new tests are much faster to compute, especially for problems involving a parameter space of high dimension. We show how an existing homotopy algorithm to compute the regularization path of the lasso method can be reparametrized with respect to the squared $ell_$-penalty.
arXiv Detail & Related papers (2023-10-13T08:10:46Z)
Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold. We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially. We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem. Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Solution Path Algorithm for Twin Multi-class Support Vector Machine [6.97711662470035]
The paper is devoted to the fast regularization parameter tuning algorithm for the twin multi-class support vector machine. A new sample dataset division method is adopted and the Lagrangian multipliers are proved to be piecewise linear. The proposed method can achieve good classification performance with reducing the computational cost of grid search method from exponential level to the constant level.
arXiv Detail & Related papers (2020-05-30T14:05:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.