Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
- URL: http://arxiv.org/abs/2206.09370v1
- Date: Sun, 19 Jun 2022 10:24:12 GMT
- Title: Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
- Authors: Lior Danon, Dan Garber
- Abstract summary: One variant uses standard Frank-Wolfe steps, the second also considers textitaway-steps (AFW)
The third is a textitgeodesic version of AFW (GAFW)
- Score: 19.24470467199451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tyler's M-estimator is a well known procedure for robust and heavy-tailed
covariance estimation. Tyler himself suggested an iterative fixed-point
algorithm for computing his estimator however, it requires super-linear (in the
size of the data) runtime per iteration, which may be prohibitive in large
scale. In this work we propose, to the best of our knowledge, the first
Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses
standard Frank-Wolfe steps, the second also considers \textit{away-steps}
(AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably
requires, up to a log factor, only linear time per iteration, while GAFW runs
in linear time (up to a log factor) in a large $n$ (number of data-points)
regime. All three variants are shown to provably converge to the optimal
solution with sublinear rate, under standard assumptions, despite the fact that
the underlying optimization problem is not convex nor smooth. Under an
additional fairly mild assumption, that holds with probability 1 when the
(normalized) data-points are i.i.d. samples from a continuous distribution
supported on the entire unit sphere, AFW and GAFW are proved to converge with
linear rates. Importantly, all three variants are parameter-free and use
adaptive step-sizes.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Frank Wolfe Meets Metric Entropy [0.0]
We provide a technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe.
We show that a dimension-free linear upper bound must fail not only in the worst case, but in the emph average case.
We also establish this phenomenon for the nuclear norm ball.
arXiv Detail & Related papers (2022-05-17T21:23:36Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems.
We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z) - Enhancing Parameter-Free Frank Wolfe with an Extra Subproblem [75.83472151564185]
This work introduces and analyzes a variant of the Frank Wolfe (FW) algorithm termed ExtraFW.
ExtraFW is the pair of gradients leveraged per iteration, thanks to which the decision variable is updated in a prediction-correction (PC) format.
Compared with other parameter-free FW variants that have faster rates on the same problems, ExtraFW has improved rates and fine-grained analysis thanks to its PC update.
arXiv Detail & Related papers (2020-12-09T19:47:24Z) - Revisiting Projection-free Online Learning: the Strongly Convex Case [21.30065439295409]
Projection-free optimization algorithms are mostly based on the classical Frank-Wolfe method.
Online Frank-Wolfe method achieves a faster rate of $O(T2/3)$ on strongly convex functions.
arXiv Detail & Related papers (2020-10-15T07:45:01Z) - Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization [30.980431848476925]
We propose an algorithm for constrained finite smooth-sum minimization with a generalized linear prediction/structure.
The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration independent of the dataset size.
We provide an implementation of all considered methods in an open-source package.
arXiv Detail & Related papers (2020-02-27T00:47:21Z) - Self-Concordant Analysis of Frank-Wolfe Algorithms [3.3598755777055374]
In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature.
We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations.
arXiv Detail & Related papers (2020-02-11T11:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.