Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method
for Empirical Risk Minimization
- URL: http://arxiv.org/abs/2208.13933v2
- Date: Tue, 21 Nov 2023 23:06:51 GMT
- Title: Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method
for Empirical Risk Minimization
- Authors: Zikai Xiong and Robert M. Freund
- Abstract summary: In Empirical Minimization -- Minimization -- we present a novel computational step-size approach for which we have computational guarantees.
We show that our methods exhibit very significant problems on realworld binary datasets.
We also present a novel adaptive step-size approach for which we have computational guarantees.
- Score: 1.4504054468850665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Frank-Wolfe method has become increasingly useful in statistical and
machine learning applications, due to the structure-inducing properties of the
iterates, and especially in settings where linear minimization over the
feasible set is more computationally efficient than projection. In the setting
of Empirical Risk Minimization -- one of the fundamental optimization problems
in statistical and machine learning -- the computational effectiveness of
Frank-Wolfe methods typically grows linearly in the number of data observations
$n$. This is in stark contrast to the case for typical stochastic projection
methods. In order to reduce this dependence on $n$, we look to second-order
smoothness of typical smooth loss functions (least squares loss and logistic
loss, for example) and we propose amending the Frank-Wolfe method with Taylor
series-approximated gradients, including variants for both deterministic and
stochastic settings. Compared with current state-of-the-art methods in the
regime where the optimality tolerance $\varepsilon$ is sufficiently small, our
methods are able to simultaneously reduce the dependence on large $n$ while
obtaining optimal convergence rates of Frank-Wolfe methods, in both the convex
and non-convex settings. We also propose a novel adaptive step-size approach
for which we have computational guarantees. Last of all, we present
computational experiments which show that our methods exhibit very significant
speed-ups over existing methods on real-world datasets for both convex and
non-convex binary classification problems.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Reducing Discretization Error in the Frank-Wolfe Method [2.806911268410107]
The Frank-Wolfe algorithm is a popular method in structurally constrained machine learning applications.
One major limitation of the method is a slow rate of convergence that is difficult to accelerate due to erratic, zig-zagging step directions.
We propose two improvements: a multistep Frank-Wolfe method that directly applies optimized higher-order discretization schemes; and an LMO-averaging scheme with reduced discretization error.
arXiv Detail & Related papers (2023-04-04T00:43:05Z) - A Multistep Frank-Wolfe Method [2.806911268410107]
We study the zig-zagging phenomenon in the Frank-Wolfe method as an artifact of discretization.
We propose multistep Frank-Wolfe variants where the truncation errors decay as $O(Deltap)$, where $p$ is the method's order.
arXiv Detail & Related papers (2022-10-14T21:12:01Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z) - Projection-Free Adaptive Gradients for Large-Scale Optimization [22.0439695290991]
Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens by querying only approximate first-order information from the objective.
We show that our method can improve the performance of adaptive algorithms for constrained optimization.
arXiv Detail & Related papers (2020-09-29T15:56:12Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization [30.980431848476925]
We propose an algorithm for constrained finite smooth-sum minimization with a generalized linear prediction/structure.
The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration independent of the dataset size.
We provide an implementation of all considered methods in an open-source package.
arXiv Detail & Related papers (2020-02-27T00:47:21Z) - Faster Projection-free Online Learning [34.96927532439896]
We give an efficient projection-free algorithm that guarantees $T2/3$ regret for general online convex optimization.
Our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
arXiv Detail & Related papers (2020-01-30T21:18:39Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.