On subdifferential chain rule of matrix factorization and beyond
- URL: http://arxiv.org/abs/2410.05022v1
- Date: Mon, 7 Oct 2024 13:24:59 GMT
- Title: On subdifferential chain rule of matrix factorization and beyond
- Authors: Jiewen Guan, Anthony Man-Cho So,
- Abstract summary: We show equality-type Clarke subdifferential chain rules of matrix factorization and factorization machine.
Some tensor generalizations and neural extensions are also discussed, albeit they remain mostly open.
- Score: 20.82938951566065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study equality-type Clarke subdifferential chain rules of matrix factorization and factorization machine. Specifically, we show for these problems that provided the latent dimension is larger than some multiple of the problem size (i.e., slightly overparameterized) and the loss function is locally Lipschitz, the subdifferential chain rules hold everywhere. In addition, we examine the tightness of the analysis through some interesting constructions and make some important observations from the perspective of optimization; e.g., we show that for all this type of problems, computing a stationary point is trivial. Some tensor generalizations and neural extensions are also discussed, albeit they remain mostly open.
Related papers
- $\ell_1$-norm rank-one symmetric matrix factorization has no spurious second-order stationary points [20.82938951566065]
We show that any second-order stationary point (and thus local minimizer) of the problem is actually globally optimal.
Our techniques can potentially be applied to analyze the optimization landscapes of a variety of other more sophisticated nonsmooth learning problems.
arXiv Detail & Related papers (2024-10-07T13:25:37Z) - Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
arXiv Detail & Related papers (2024-07-15T07:11:44Z) - The polarization hierarchy for polynomial optimization over convex bodies, with applications to nonnegative matrix rank [0.6963971634605796]
We construct a convergent family of outer approximations for the problem of optimizing functions over convex subject bodies to constraints.
A numerical implementation of the third level of the hierarchy is shown to give rise to a very tight approximation for this problem.
arXiv Detail & Related papers (2024-06-13T18:00:09Z) - High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers.
We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z) - Approximation of optimization problems with constraints through kernel
Sum-Of-Squares [77.27820145069515]
We show that pointwise inequalities are turned into equalities within a class of nonnegative kSoS functions.
We also show that focusing on pointwise equality constraints enables the use of scattering inequalities to mitigate the curse of dimensionality in sampling the constraints.
arXiv Detail & Related papers (2023-01-16T10:30:04Z) - Identifiability in Exact Two-Layer Sparse Matrix Factorization [0.0]
Sparse matrix factorization is the problem of approximating a matrix Z by a product of L sparse factors X(L) X(L--1).
This paper focuses on identifiability issues that appear in this problem, in view of better understanding under which sparsity constraints the problem is well-posed.
arXiv Detail & Related papers (2021-10-04T07:56:37Z) - Exact recursive calculation of circulant permanents: A band of different
diagonals inside a uniform matrix [0.0]
The proposed system of linear recurrence equations with variable coefficients provides a powerful tool for the analysis of the circulants.
Its solution would be tremendously important for a unified analysis of a wide range of the nature's #P-hard problems.
arXiv Detail & Related papers (2021-09-03T21:56:14Z) - Sparse Quantized Spectral Clustering [85.77233010209368]
We exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations.
We show that very little change occurs in the informative eigenstructure even under drastic sparsification/quantization.
arXiv Detail & Related papers (2020-10-03T15:58:07Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.