Hyperparameter tuning via trajectory predictions: Stochastic prox-linear
methods in matrix sensing
- URL: http://arxiv.org/abs/2402.01599v1
- Date: Fri, 2 Feb 2024 17:55:12 GMT
- Title: Hyperparameter tuning via trajectory predictions: Stochastic prox-linear
methods in matrix sensing
- Authors: Mengqi Lou and Kabir Aladin Verchand and Ashwin Pananjady
- Abstract summary: We analyze a problem recovering an unknown rank-1 matrix from rank-1 Gaussian measurements corrupted by noise.
We derive a deterministic recursion that predicts an accurate error of this method.
We show, using a non-asymotic framework, that this is for any batch-size and a large range of step-sizes.
- Score: 6.446062819763263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the desire to understand stochastic algorithms for nonconvex
optimization that are robust to their hyperparameter choices, we analyze a
mini-batched prox-linear iterative algorithm for the problem of recovering an
unknown rank-1 matrix from rank-1 Gaussian measurements corrupted by noise. We
derive a deterministic recursion that predicts the error of this method and
show, using a non-asymptotic framework, that this prediction is accurate for
any batch-size and a large range of step-sizes. In particular, our analysis
reveals that this method, though stochastic, converges linearly from a local
initialization with a fixed step-size to a statistical error floor. Our
analysis also exposes how the batch-size, step-size, and noise level affect the
(linear) convergence rate and the eventual statistical estimation error, and we
demonstrate how to use our deterministic predictions to perform hyperparameter
tuning (e.g. step-size and batch-size selection) without ever running the
method. On a technical level, our analysis is enabled in part by showing that
the fluctuations of the empirical iterates around our deterministic predictions
scale with the error of the previous iterate.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization [5.900674344455754]
We show a technique for estimating properties of a rank random matrix with i.i.d.
We show sharp convergence guarantees exact recovery in a single step.
Our analysis also exposes several other properties of this problem.
arXiv Detail & Related papers (2022-07-20T05:31:05Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Sparse recovery by reduced variance stochastic approximation [5.672132510411465]
We discuss application of iterative quadratic optimization routines to the problem of sparse signal recovery from noisy observation.
We show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques.
arXiv Detail & Related papers (2020-06-11T12:31:20Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.