Related papers: Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting

Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting

URL: http://arxiv.org/abs/2602.05887v2
Date: Wed, 11 Feb 2026 12:40:37 GMT
Title: Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting
Authors: Tianqi Shen, Jinji Yang, Junze He, Kunhan Gao, Ziye Ma,
Abstract summary: Low-rank matrix sensing is a fundamental yet challenging non objective problem.<n>We design a framework to over-parametrized escape directions onto original parameter space to guarantee a decrease from existing minima.
Score: 4.6910869230336045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-rank matrix sensing is a fundamental yet challenging nonconvex problem whose optimization landscape typically contains numerous spurious local minima, making it difficult for gradient-based optimizers to converge to the global optimum. Recent work has shown that over-parameterization via tensor lifting can convert such local minima into strict saddle points, an insight that also partially explains why massive scaling can improve generalization and performance in modern machine learning. Motivated by this observation, we propose a Simulated Oracle Direction (SOD) escape mechanism that simulates the landscape and escape direction of the over-parametrized space, without resorting to actually lifting the problem, since that would be computationally intractable. In essence, we designed a mathematical framework to project over-parametrized escape directions onto the original parameter space to guarantee a strict decrease of objective value from existing local minima. To the best of our knowledge, this represents the first deterministic framework that could escape spurious local minima with guarantee, especially without using random perturbations or heuristic estimates. Numerical experiments demonstrate that our framework reliably escapes local minima and facilitates convergence to global optima, while incurring minimal computational cost when compared to explicit tensor over-parameterization. We believe this framework has non-trivial implications for nonconvex optimization beyond matrix sensing, by showcasing how simulated over-parameterization can be leveraged to tame challenging optimization landscapes.

Related papers

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization [37.51825281790747]
The proliferation of saddle points, rather than poor local minima, is an obstacle in large-scale non- optimization for machine learning.<n>We show that variable elimination fundamentally reshapes critical maximassian in the reduced landscape.
arXiv Detail & Related papers (2025-11-03T05:19:43Z)
Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better. We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv Detail & Related papers (2023-10-26T19:56:52Z)
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization [10.009748368458409]
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
arXiv Detail & Related papers (2023-07-07T13:06:12Z)
A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent. The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z)
The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions [53.478856119297284]
We introduce the probabilistic normal epipolar constraint (PNEC) that overcomes the limitation by accounting for anisotropic and inhomogeneous uncertainties in the feature positions. In experiments on synthetic data, we demonstrate that the novel PNEC yields more accurate rotation estimates than the original NEC. We integrate the proposed method into a state-of-the-art monocular rotation-only odometry system and achieve consistently improved results for the real-world KITTI dataset.
arXiv Detail & Related papers (2022-04-05T14:47:11Z)
Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning [2.049702429898688]
This paper focuses on a broad Bregman-surrogate framework including the adaptive approximation, mirror, iterative threshold descent, DC programming and many others as examples.
arXiv Detail & Related papers (2021-12-16T20:37:40Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
A Graduated Filter Method for Large Scale Robust Estimation [32.08441889054456]
We introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima. Our algorithm is built upon the graduated-of-the-art methods to solve problems having many poor local minima.
arXiv Detail & Related papers (2020-03-20T02:51:31Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.