Max-affine regression via first-order methods
- URL: http://arxiv.org/abs/2308.08070v1
- Date: Tue, 15 Aug 2023 23:46:44 GMT
- Title: Max-affine regression via first-order methods
- Authors: Seonho Kim and Kiryung Lee
- Abstract summary: The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
- Score: 7.12511675782289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider regression of a max-affine model that produces a piecewise linear
model by combining affine models via the max function. The max-affine model
ubiquitously arises in applications in signal processing and statistics
including multiclass classification, auction problems, and convex regression.
It also generalizes phase retrieval and learning rectifier linear unit
activation functions. We present a non-asymptotic convergence analysis of
gradient descent (GD) and mini-batch stochastic gradient descent (SGD) for
max-affine regression when the model is observed at random locations following
the sub-Gaussianity and an anti-concentration with additive sub-Gaussian noise.
Under these assumptions, a suitably initialized GD and SGD converge linearly to
a neighborhood of the ground truth specified by the corresponding error bound.
We provide numerical results that corroborate the theoretical finding.
Importantly, SGD not only converges faster in run time with fewer observations
than alternating minimization and GD in the noiseless scenario but also
outperforms them in low-sample scenarios with noise.
Related papers
- Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms [22.79595679373698]
Mixed linear regression is a well-studied problem in statistics and machine learning.
In this paper, we consider the more general problem of learning of mixed linear regression from samples.
We show that the AM and EM algorithms lead to learning in mixed linear regression by converging to the population loss minimizers.
arXiv Detail & Related papers (2024-06-03T09:43:24Z) - Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression [5.883916678819683]
We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR)
Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings.
arXiv Detail & Related papers (2024-05-28T14:46:20Z) - Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled
Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD)
It converges linearly at a constant rate independent of the condition number of the low-rank object.
It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.