Related papers: Beyond Discretization: Learning the Optimal Solution Path

Beyond Discretization: Learning the Optimal Solution Path

URL: http://arxiv.org/abs/2410.14885v2
Date: Tue, 12 Nov 2024 18:42:01 GMT
Title: Beyond Discretization: Learning the Optimal Solution Path
Authors: Qiran Dong, Paul Grigas, Vishal Gupta,
Abstract summary: We propose an approach that parameterizes the solution path with a set of basis functions and solves a emphsingle optimization problem. Our method offers substantial complexity improvements over discretization. We also prove stronger results for special cases common in machine learning.
Score: 3.9675360887122646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many applications require minimizing a family of optimization problems indexed by some hyperparameter $\lambda \in \Lambda$ to obtain an entire solution path. Traditional approaches proceed by discretizing $\Lambda$ and solving a series of optimization problems. We propose an alternative approach that parameterizes the solution path with a set of basis functions and solves a \emph{single} stochastic optimization problem to learn the entire solution path. Our method offers substantial complexity improvements over discretization. When using constant-step size SGD, the uniform error of our learned solution path relative to the true path exhibits linear convergence to a constant related to the expressiveness of the basis. When the true solution path lies in the span of the basis, this constant is zero. We also prove stronger results for special cases common in machine learning: When $\lambda \in [-1, 1]$ and the solution path is $\nu$-times differentiable, constant step-size SGD learns a path with $\epsilon$ uniform error after at most $O(\epsilon^{\frac{1}{1-\nu}} \log(1/\epsilon))$ iterations, and when the solution path is analytic, it only requires $O\left(\log^2(1/\epsilon)\log\log(1/\epsilon)\right)$. By comparison, the best-known discretization schemes in these settings require at least $O(\epsilon^{-1/2})$ discretization points (and even more gradient calls). Finally, we propose an adaptive variant of our method that sequentially adds basis functions and demonstrates strong numerical performance through experiments.

Related papers

Global Optimization with A Power-Transformed Objective and Gaussian Smoothing [4.275224221939364]
We show that our method converges to a solution in the $delta$-neighborhood of $f$'s global optimum point. The convergence rate is $O(d2sigma4varepsilon-2)$, which is faster than both the standard and single-loop homotopy methods.
arXiv Detail & Related papers (2024-12-06T17:33:43Z)
Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods [18.47705532817026]
We study the problem of finding an $epsilon$firstorder stationary point (FOSP) of a smooth function. We introduce a novel optimistic quasi- gradient approximation method to solve this online learning problem.
arXiv Detail & Related papers (2024-12-03T05:20:05Z)
A fast algorithm to minimize prediction loss of the optimal solution in inverse optimization problem of MILP [0.0]
We consider the inverse optimization problem of estimating the weights of the objective function for a mixed integer linear program (MILP)<n>We show and demonstrate that the proposed method efficiently learns the weights.<n>Experiments demonstrate that the proposed method solves the inverse optimization problems of MILP using fewer than $1/7$ the number of MILP calls required by known methods.
arXiv Detail & Related papers (2024-05-23T07:51:05Z)
A quantum central path algorithm for linear optimization [5.450016817940232]
We propose a novel quantum algorithm for solving linear optimization problems by quantum-mechanical simulation of the central path. This approach yields an algorithm for solving linear optimization problems involving $m$ constraints and $n$ variables to $varepsilon$-optimality. In the standard gate model (i.e., without access to quantum RAM), our algorithm can obtain highly-precise solutions to LO problems using at most $$mathcalO left( sqrtm + n textsfnnz (A) fracR_1
arXiv Detail & Related papers (2023-11-07T13:26:20Z)
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization. We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization. In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z)
Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems. We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
An algorithmic view of $\ell_2$ regularization and some path-following algorithms [7.6146285961466]
We establish an equivalence between the $ell$-regularized solution path for a convex loss function and the solution of an ordinary differentiable equation (ODE) This equivalence reveals that the solution path can be viewed as the flow of a hybrid of gradient descent and Newton method applying to the empirical loss. New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path.
arXiv Detail & Related papers (2021-07-07T16:00:13Z)
Saddle Point Optimization with Approximate Minimization Oracle [8.680676599607125]
A major approach to saddle point optimization $min_xmax_y f(x, y)$ is a gradient based approach as is popularized by generative adversarial networks (GANs) In contrast, we analyze an alternative approach relying only on an oracle that solves a minimization problem approximately. Our approach locates approximate solutions $x'$ and $y'$ to $min_x'f(x', y)$ at a given point $(x, y)$ and updates $(x, y)$ toward these approximate solutions $(x', y'
arXiv Detail & Related papers (2021-03-29T23:03:24Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm. The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations. It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.