Related papers: Optimal Diagonal Preconditioning: Theory and Practice

Optimal Diagonal Preconditioning: Theory and Practice

URL: http://arxiv.org/abs/2209.00809v1
Date: Fri, 2 Sep 2022 04:21:28 GMT
Title: Optimal Diagonal Preconditioning: Theory and Practice
Authors: Zhaonan Qu, Wenzhi Gao, Oliver Hinder, Yinyu Ye, Zhengyuan Zhou
Abstract summary: This paper presents the problem of optimal diagonal preconditioning to achieve maximal reduction in any full-rank number of rows or columns or simultaneously. We provide a baseline bisection algorithm that is easy to implement in practice. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems.
Score: 23.79536881427839
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.

Related papers

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation [0.135975510645475]
In this work, we derive safe screening rules that can be used to discard inessential samples. The new tests are much faster to compute, especially for problems involving a parameter space of high dimension. We show how an existing homotopy algorithm to compute the regularization path of the lasso method can be reparametrized with respect to the squared $ell_$-penalty.
arXiv Detail & Related papers (2023-10-13T08:10:46Z)
Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z)
Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold. We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z)
Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints [3.179831861897336]
We provide a framework for solving low-rank optimization problems to certifiable optimality. Our framework also provides near-optimal solutions through rounding and local search techniques.
arXiv Detail & Related papers (2020-09-22T08:59:06Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.