Related papers: Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

URL: http://arxiv.org/abs/2302.10963v3
Date: Fri, 04 Apr 2025 15:57:51 GMT
Title: Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?
Authors: Jianhao Ma, Salar Fattahi,
Abstract summary: We show that the true solutions corresponding to $Xstar$ do not emerge as local optima, but rather as strict saddle points.<n>Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.
Score: 18.49274803854387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.

Related papers

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes [29.466981306355066]
We show that gradient descent with a fixed learning rate $eta$ can only find local minima that represent smooth functions. We also prove a nearly-optimal MSE bound of $widetildeO(n-4/5)$ within the strict interior of the support of the $n$ data points.
arXiv Detail & Related papers (2024-06-10T22:57:27Z)
Low-rank Matrix Bandits with Heavy-tailed Rewards [55.03293214439741]
We study the problem of underlinelow-rank matrix bandit with underlineheavy-underlinetailed underlinerewards (LowHTR) By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS.
arXiv Detail & Related papers (2024-04-26T21:54:31Z)
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization. For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal. All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Avoiding strict saddle points of nonconvex regularized problems [3.92625489118339]
We show the second terms optimality conditions only depend on the nonzeros of the stationary points.<n>We propose two re-weighted iterativeweighted algorithms including the iteratively reweighted $ell_1$.<n>We show these algorithms converge only to the local saddlers with randomly when the saddle point property is assumed.
arXiv Detail & Related papers (2024-01-17T15:25:50Z)
One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
We propose a natural algorithm that involves imputing the missing values of the matrix $XTX$. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing.
arXiv Detail & Related papers (2023-06-06T22:35:16Z)
A General Algorithm for Solving Rank-one Matrix Sensing [15.543065204102714]
The goal of matrix sensing is to recover a matrix $A_star in mathbbRn times n$, based on a sequence of measurements. In this paper, we relax that rank-$k$ assumption and solve a much more general matrix sensing problem.
arXiv Detail & Related papers (2023-03-22T04:07:26Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly. We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution [4.7464518249313805]
This work characterizes the effect of depth on the optimization landscape of regression. We show that, despite their non neurality, deeper models have more desirable optimization.
arXiv Detail & Related papers (2022-07-15T17:11:26Z)
Low-rank Matrix Recovery With Unknown Correspondence [62.634051913953485]
We show that it is possible to recover $M$ via solving a nuclear norm minimization problem under a proper low-rank condition on $M$, with provable non-asymptotic error bound for the recovery of $M$. Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that $textM3textO achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.
arXiv Detail & Related papers (2021-10-15T09:27:50Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Unique sparse decomposition of low rank matrices [17.037882881652617]
We find a unique decomposition of a low rank matrixYin mathbbRrtimes n$. We prove that up to some $Yin mathRrtimes n$ is a sparse-wise decomposition of $Xin mathbbRrtimes n$.
arXiv Detail & Related papers (2021-06-14T20:05:59Z)
Saddle Point Optimization with Approximate Minimization Oracle [8.680676599607125]
A major approach to saddle point optimization $min_xmax_y f(x, y)$ is a gradient based approach as is popularized by generative adversarial networks (GANs) In contrast, we analyze an alternative approach relying only on an oracle that solves a minimization problem approximately. Our approach locates approximate solutions $x'$ and $y'$ to $min_x'f(x', y)$ at a given point $(x, y)$ and updates $(x, y)$ toward these approximate solutions $(x', y'
arXiv Detail & Related papers (2021-03-29T23:03:24Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Rank $2r$ iterative least squares: efficient recovery of ill-conditioned low rank matrices from few entries [4.230158563771147]
We present a new, simple and computationally efficient iterative method for low rank matrix completion. Our algorithm, denoted R2RILS for rank $2r$ iterative least squares, has low memory requirements.
arXiv Detail & Related papers (2020-02-05T16:20:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.