Related papers: Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent

Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent

URL: http://arxiv.org/abs/2502.00463v2
Date: Thu, 06 Feb 2025 14:32:10 GMT
Title: Efficient Over-parameterized Matrix Sensing from Noisy Measurements via Alternating Preconditioned Gradient Descent
Authors: Zhiyu Liu, Zhi Han, Yandong Tang, Hai Zhang, Shaojie Tang, Yao Wang,
Abstract summary: Preconditioning methods have been proposed to accelerate the convergence of matrix sensing problem.<n>We propose an alternating preconditioned descent (APGD) algorithm, which alternately updates the two factor parameter.<n>We theoretically prove that APGD achieves near-optimal convergence at a linear rate, starting from arbitrary randoms.
Score: 17.73720530889677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the noisy matrix sensing problem in the over-parameterization setting, where the estimated rank $r$ is larger than the true rank $r_\star$. Specifically, our main objective is to recover a matrix $ X_\star \in \mathbb{R}^{n_1 \times n_2} $ with rank $ r_\star $ from noisy measurements using an over-parameterized factorized form $ LR^\top $, where $ L \in \mathbb{R}^{n_1 \times r}, \, R \in \mathbb{R}^{n_2 \times r} $ and $ \min\{n_1, n_2\} \ge r > r_\star $, with the true rank $ r_\star $ being unknown. Recently, preconditioning methods have been proposed to accelerate the convergence of matrix sensing problem compared to vanilla gradient descent, incorporating preconditioning terms $ (L^\top L + \lambda I)^{-1} $ and $ (R^\top R + \lambda I)^{-1} $ into the original gradient. However, these methods require careful tuning of the damping parameter $\lambda$ and are sensitive to initial points and step size. To address these limitations, we propose the alternating preconditioned gradient descent (APGD) algorithm, which alternately updates the two factor matrices, eliminating the need for the damping parameter and enabling faster convergence with larger step sizes. We theoretically prove that APGD achieves near-optimal error convergence at a linear rate, starting from arbitrary random initializations. Through extensive experiments, we validate our theoretical results and demonstrate that APGD outperforms other methods, achieving the fastest convergence rate. Notably, both our theoretical analysis and experimental results illustrate that APGD does not rely on the initialization procedure, making it more practical and versatile.

Related papers

Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization [85.91302339486673]
We study robust Markov decision processes (RMDPs) with general policy parameterization under s-rectangular and non-rectangular uncertainty sets.<n>We prove novel Lipschitz and Lipschitz-smoothness properties for general policy parameterizations that extends to infinite state spaces.<n>We design a projected gradient descent algorithm for s-rectangular uncertainty and a Frank-Wolfe algorithm for non-rectangular uncertainty.
arXiv Detail & Related papers (2026-02-11T21:44:20Z)
Parameter-free Algorithms for the Stochastically Extended Adversarial Model [59.81852138768642]
Existing approaches for the Extended Adversarial (SEA) model require prior knowledge of problem-specific parameters, such as the diameter of the domain $D$ and the Lipschitz constant of the loss functions $G$.<n>We develop parameter-free methods by leveraging the Optimistic Online Newton Step (OONS) algorithm to eliminate the need for these parameters.
arXiv Detail & Related papers (2025-10-06T10:53:37Z)
Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization [19.32160757444549]
In practical instances of nonspecified matrix factorization, the rank of the true solutionrstar$ is often unknown, so the rankr$ of the model can be singular as $r>rstar$. We propose an inexpensive suber for matrix sensing variant non matrix factorization that restores the convergence factor back to linear, even in agnosticized case. Our numerical experiments find that PrecGD works equally well in restoring the convergence of other variants non matrix factorization.
arXiv Detail & Related papers (2025-04-13T20:06:49Z)
Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z)
Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems. We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion [21.846732043706318]
We show that the vanilla gradient descent provably converges to the ground truth $rmXstar$ without requiring any explicit regularization. Surprisingly, neither the convergence rate nor the final accuracy depends on the over- parameterized search rank $r'$, and they are only governed by the true rank $r$.
arXiv Detail & Related papers (2024-02-09T19:39:23Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization [46.55524654398093]
We show how over- parameterization changes the convergence behaviors of descent. The goal is to recover an unknown low-rank ground-rank ground-truth matrix from near-isotropic linear measurements. We propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $alpha$.
arXiv Detail & Related papers (2023-10-03T03:34:22Z)
On the Linear Convergence of Policy Gradient under Hadamard Parameterization [4.182089296199263]
We study the convergence of deterministic policy gradient under the Hadamard parameterization. We show that the error decreases at an $O(frac1k)$ rate for all the iterations.
arXiv Detail & Related papers (2023-05-31T05:51:15Z)
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing [42.905196856926615]
$textsfScaledGD($lambda$)$ is a preconditioned gradient descent method to tackle the low-rank matrix sensing problem. We show that $textsfScaledGD($lambda$)$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations.
arXiv Detail & Related papers (2023-02-02T16:13:27Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Accelerated Single-Call Methods for Constrained Min-Max Optimization [5.266784779001398]
Existing methods either require two gradient calls or two projections in each iteration, which may costly in some applications. In this paper, we first show that a variant of the Optimistic Gradient (RGOG) has a rich set of non-comonrate min-max convergence rate problems. Our convergence rates hold for standard measures such as and the natural and the natural.
arXiv Detail & Related papers (2022-10-06T17:50:42Z)
Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly. We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms. Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.