Related papers: An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

URL: http://arxiv.org/abs/2503.23832v1
Date: Mon, 31 Mar 2025 08:27:41 GMT
Title: An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function
Authors: Nicolas Gillis, Margherita Porcelli, Giovanni Seraghiti,
Abstract summary: We show that the two formulations may yield different low-rank $Theta in particular.<n>We also consider another alternative model, called 3B-ReLUNMD, which parameterizes $Theta$.
Score: 12.059960289576154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $\Theta$ such that $X\approx \max(0,\Theta)$. This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, $\|X - \max(0,\Theta)\|_F^2$. The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - \Theta\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $\Theta$; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes $\Theta=WH$, where $W$ has $r$ columns and $H$ has $r$ rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.

Related papers

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions [10.559775618998044]
We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD)<n>We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.
arXiv Detail & Related papers (2025-12-19T11:40:06Z)
A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization [0.7373617024876725]
Lipschitz-of-$nabla f$. $mathcalS_k|p$. $mathcalS_k|p$. $nabla f$. $mathcalS_k|p$.
arXiv Detail & Related papers (2024-09-28T18:16:32Z)
A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition [0.0]
We propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. We introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm.
arXiv Detail & Related papers (2024-02-04T10:43:35Z)
One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
We propose a natural algorithm that involves imputing the missing values of the matrix $XTX$. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing.
arXiv Detail & Related papers (2023-06-06T22:35:16Z)
Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function [17.17890385027466]
We focus on the case where $f(cdot) = max(0, cdot)$, the rectified unit (ReLU) non-linear activation. We introduce two new algorithms: (1) aggressive accelerated NMD (A-NMD) which uses an adaptive Nesterov extrapolation to accelerate an existing algorithm, and (2) three-block NMD (3B-NMD) which parametrizes $Theta = WH$ and leads to a significant reduction in the computational cost.
arXiv Detail & Related papers (2023-05-15T14:43:27Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons. Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Low-rank Matrix Recovery With Unknown Correspondence [62.634051913953485]
We show that it is possible to recover $M$ via solving a nuclear norm minimization problem under a proper low-rank condition on $M$, with provable non-asymptotic error bound for the recovery of $M$. Experiments on simulated data, the MovieLens 100K dataset and Yale B database show that $textM3textO achieves state-of-the-art performance over several baselines and can recover the ground-truth correspondence with high accuracy.
arXiv Detail & Related papers (2021-10-15T09:27:50Z)
Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$. We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020) We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Nonparametric Learning of Two-Layer ReLU Residual Units [22.870658194212744]
We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. We prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments.
arXiv Detail & Related papers (2020-08-17T22:11:26Z)
FRMDN: Flow-based Recurrent Mixture Density Network [3.007949058551534]
In this paper, we generalize recurrent mixture density networks by defining a Gaussian mixture model on a non-linearly transformed target sequence in each time-step. We observed that this model significantly improves the fit to image sequences measured by the log-likelihood. We also applied the proposed model on some speech and image data, and observed that the model has significant modeling power outperforming other state-of-the-art methods in terms of the log-likelihood.
arXiv Detail & Related papers (2020-08-05T14:05:37Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state. Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.