Related papers: A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

URL: http://arxiv.org/abs/2511.01234v1
Date: Mon, 03 Nov 2025 05:19:43 GMT
Title: A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Authors: Min Gan, Guang-Yong Chen, Yang Yi, Lin Yang,
Abstract summary: The proliferation of saddle points, rather than poor local minima, is an obstacle in large-scale non- optimization for machine learning.<n>We show that variable elimination fundamentally reshapes critical maximassian in the reduced landscape.
Score: 37.51825281790747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate these complex energy landscapes has remained elusive. In this work, we provide a rigorous geometric explanation by comparing the optimization landscapes of the original and reduced formulations. Through a rigorous analysis based on Hessian inertia and the Schur complement, we prove that variable elimination fundamentally reshapes the critical point structure of the objective function, revealing that local maxima in the reduced landscape are created from, and correspond directly to, saddle points in the original formulation. Our findings are illustrated on the canonical problem of non-convex matrix factorization, visualized directly on two-parameter neural networks, and finally validated in training deep Residual Networks, where our approach yields dramatic improvements in stability and convergence to superior minima. This work goes beyond explaining an existing method; it establishes landscape simplification via saddle point transformation as a powerful principle that can guide the design of a new generation of more robust and efficient optimization algorithms.

Related papers

Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting [4.6910869230336045]
Low-rank matrix sensing is a fundamental yet challenging non objective problem.<n>We design a framework to over-parametrized escape directions onto original parameter space to guarantee a decrease from existing minima.
arXiv Detail & Related papers (2026-02-05T17:05:02Z)
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings [38.819359908152656]
We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods.<n>Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.
arXiv Detail & Related papers (2025-06-10T04:03:59Z)
Energy Landscape Plummeting in Variational Quantum Eigensolver: Subspace Optimization, Non-iterative Corrections and Generator-informed Initialization for Improved Quantum Efficiency [0.0]
Variational Quantum Eigensolver (VQE) faces significant challenges due to hardware noise and the presence of barren plateaus and local traps.<n>We introduce a general formalism that optimize hardware resource utilization and accuracy by projecting VQE optimizations on to a reduced-dimensional subspace.<n> Numerical simulations show that, when integrated with any chemistry-inspired ansatz, our method can provide one to two orders of magnitude better estimation of the minima.
arXiv Detail & Related papers (2025-04-17T17:07:09Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks. We link this to the phenomenon of phase transitions in generalized hardness of approximation. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges. We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems. Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.