Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
- URL: http://arxiv.org/abs/2602.18997v2
- Date: Fri, 27 Feb 2026 03:18:41 GMT
- Title: Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
- Authors: Danil Akhtiamov, Reza Ghane, Omead Pooladzandi, Babak Hassibi,
- Abstract summary: We investigate Mirror Descent (SMD) with matrix parameters and vector-valued predictions.<n>We prove that with matrix mirror functions $(cdot)$ converges exponentially to a global interpolator.<n>These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.
- Score: 14.235906242589659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove that SMD with matrix mirror functions $ψ(\cdot)$ converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $ψ(\cdot)$ from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.
Related papers
- Testing for latent structure via the Wilcoxon--Wigner random matrix of normalized rank statistics [0.20052993723676893]
We introduce and systematically study certain symmetric matrices, called Wilcoxon--Wigner random matrices.<n>We establish that the leading eigenvalue and corresponding eigenvector of Wilcoxon--Wigner random matrices admit fluctuations with explicit scaling terms.<n>Results enable rigorous parameter-free and distribution-free spectral methodology for addressing two hypothesis testing problems.
arXiv Detail & Related papers (2025-12-21T23:49:22Z) - Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy [57.54306942529943]
We propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices.<n>Our method achieves competitive performance across a range of downstream image classification tasks.
arXiv Detail & Related papers (2025-07-17T16:09:05Z) - Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications [34.781211988554425]
We provide rigorous equations characterizing the Bayes-optimal learning performance from a number of samples.<n>We mathematically establish predictions obtained via non-rigorous methods from statistical physics.
arXiv Detail & Related papers (2025-03-18T10:36:30Z) - Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.<n>This paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective.
arXiv Detail & Related papers (2024-07-15T07:11:44Z) - Regularized Projection Matrix Approximation with Applications to Community Detection [1.3761665705201904]
This paper introduces a regularized projection matrix approximation framework designed to recover cluster information from the affinity matrix.
We investigate three distinct penalty functions, each specifically tailored to address bounded, positive, and sparse scenarios.
Numerical experiments conducted on both synthetic and real-world datasets reveal that our regularized projection matrix approximation approach significantly outperforms state-of-the-art methods in clustering performance.
arXiv Detail & Related papers (2024-05-26T15:18:22Z) - The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models [37.6314945221565]
Deep networks are known to generalize well to unseen data.
Regularization properties ensure interpolating solutions with "good" properties are found.
We present simulation results that validate the theory and introduce two data models.
arXiv Detail & Related papers (2023-02-18T22:23:42Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Adversarially-Trained Nonnegative Matrix Factorization [77.34726150561087]
We consider an adversarially-trained version of the nonnegative matrix factorization.
In our formulation, an attacker adds an arbitrary matrix of bounded norm to the given data matrix.
We design efficient algorithms inspired by adversarial training to optimize for dictionary and coefficient matrices.
arXiv Detail & Related papers (2021-04-10T13:13:17Z) - Robust Low-rank Matrix Completion via an Alternating Manifold Proximal
Gradient Continuation Method [47.80060761046752]
Robust low-rank matrix completion (RMC) has been studied extensively for computer vision, signal processing and machine learning applications.
This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix.
A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) and the l1 norm of the sparse matrix (to promote sparsity).
In this paper, motivated by some recent works on low-
arXiv Detail & Related papers (2020-08-18T04:46:22Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Information-Theoretic Limits for the Matrix Tensor Product [8.206394018475708]
This paper studies a high-dimensional inference problem involving the matrix tensor product of random matrices.
On the technical side, this paper introduces some new techniques for the analysis of high-dimensional matrix-preserving signals.
arXiv Detail & Related papers (2020-05-22T17:03:48Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.