Over-Parametrized Matrix Factorization in the Presence of Spurious
Stationary Points
- URL: http://arxiv.org/abs/2112.13269v1
- Date: Sat, 25 Dec 2021 19:13:37 GMT
- Title: Over-Parametrized Matrix Factorization in the Presence of Spurious
Stationary Points
- Authors: Armin Eftekhari
- Abstract summary: This work considers the computational aspects of over-parametrized matrix factorization.
In this work, we establish that the gradient flow of the corresponding merit function converges to a global minimizer.
We numerically observe that a proposed discretization of the gradient flow, inspired by primal-dual algorithms, is successful when randomly.
- Score: 20.515985046189268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the emerging role of interpolating machines in signal processing
and machine learning, this work considers the computational aspects of
over-parametrized matrix factorization. In this context, the optimization
landscape may contain spurious stationary points (SSPs), which are proved to be
full-rank matrices. The presence of these SSPs means that it is impossible to
hope for any global guarantees in over-parametrized matrix factorization. For
example, when initialized at an SSP, the gradient flow will be trapped there
forever. Nevertheless, despite these SSPs, we establish in this work that the
gradient flow of the corresponding merit function converges to a global
minimizer, provided that its initialization is rank-deficient and sufficiently
close to the feasible set of the optimization problem. We numerically observe
that a heuristic discretization of the proposed gradient flow, inspired by
primal-dual algorithms, is successful when initialized randomly. Our result is
in sharp contrast with the local refinement methods which require an
initialization close to the optimal set of the optimization problem. More
specifically, we successfully avoid the traps set by the SSPs because the
gradient flow remains rank-deficient at all times, and not because there are no
SSPs nearby. The latter is the case for the local refinement methods. Moreover,
the widely-used restricted isometry property plays no role in our main result.
Related papers
- Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition [11.530542389959347]
We present fundamental limits of first-order optimization in a range of non-dimensional settings, including L-Convexity (QC), Quadratic Growth (smoothQG), and Restricted Inequalities (RSI)
arXiv Detail & Related papers (2025-02-19T19:21:00Z) - Implicit regularization in AI meets generalized hardness of
approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks.
We link this to the phenomenon of phase transitions in generalized hardness of approximation.
Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Convergence of the mini-batch SIHT algorithm [0.0]
The Iterative Hard Thresholding (IHT) algorithm has been considered extensively as an effective deterministic algorithm for solving sparse optimizations.
We show that the sequence generated by the sparse mini-batch SIHT is a supermartingale sequence and converges with probability one.
arXiv Detail & Related papers (2022-09-29T03:47:46Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Small random initialization is akin to spectral learning: Optimization
and generalization guarantees for overparameterized low-rank matrix
reconstruction [35.585697639325105]
In this paper we show that small random initialization are not fully understood.
We reconstruct a gradient from a small randomrank matrix and find solutions akin to an optimal gradient from a low randomrank matrix.
arXiv Detail & Related papers (2021-06-28T22:52:39Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Stochastic Zeroth-order Riemannian Derivative Estimation and
Optimization [15.78743548731191]
We propose an oracle version of the Gaussian smoothing function to overcome the difficulty of non-linearity of manifold non-linearity.
We demonstrate the applicability of our algorithms by results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
arXiv Detail & Related papers (2020-03-25T06:58:19Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.