Related papers: Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

URL: http://arxiv.org/abs/2512.17473v2
Date: Mon, 22 Dec 2025 14:13:49 GMT
Title: Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions
Authors: Atharva Awari, Nicolas Gillis, Arnaud Vandaele,
Abstract summary: We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD)<n>We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.
Score: 10.559775618998044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

Related papers

Surrogate to Poincaré inequalities on manifolds for dimension reduction in nonlinear feature spaces [49.1574468325115]
We aim to approximate a continuously differentiable function $u:mathbbRd rightarrow mathbbRm$ by a composition of functions $fcirc g$ where $g:mathbbRd rightarrow mathbbRm$, $mleq d$, and $f : mathbbRm rightarrow mathbbRR$.<n>For a fixed $g$, we build $f$ using classical regression methods, involving evaluations
arXiv Detail & Related papers (2025-05-03T12:37:27Z)
An approximation of the $S$ matrix for solving the Marchenko equation [0.0]
I present a new approximation of the $S$-matrix dependence on momentum $q$, formulated as a sum of a rational function and a truncated Sinc series. This approach enables pointwise determination of the $S$ matrix with specified resolution, capturing essential features such as resonance behavior with high accuracy.
arXiv Detail & Related papers (2024-10-27T11:06:28Z)
How to Inverting the Leverage Score Distribution? [16.744561210470632]
Despite leverage scores being widely used as a tool, in this paper, we study a novel problem, namely the inverting leverage score. We use iterative shrinking and the induction hypothesis to ensure global convergence rates for the Newton method. This important study on inverting statistical leverage opens up numerous new applications in interpretation, data recovery, and security.
arXiv Detail & Related papers (2024-04-21T21:36:42Z)
Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function [17.17890385027466]
We focus on the case where $f(cdot) = max(0, cdot)$, the rectified unit (ReLU) non-linear activation. We introduce two new algorithms: (1) aggressive accelerated NMD (A-NMD) which uses an adaptive Nesterov extrapolation to accelerate an existing algorithm, and (2) three-block NMD (3B-NMD) which parametrizes $Theta = WH$ and leads to a significant reduction in the computational cost.
arXiv Detail & Related papers (2023-05-15T14:43:27Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Multiscale regression on unknown manifolds [13.752772802705978]
We construct low-dimensional coordinates on $mathcalM$ at multiple scales and perform multiscale regression by local fitting. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our algorithm has quasilinear complexity in the sample size, with constants linear in $D$ and exponential in $d$.
arXiv Detail & Related papers (2021-01-13T15:14:31Z)
Nonparametric Learning of Two-Layer ReLU Residual Units [22.870658194212744]
We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. We prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments.
arXiv Detail & Related papers (2020-08-17T22:11:26Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
Learning nonlinear dynamical systems from a single trajectory [102.60042167341956]
We introduce algorithms for learning nonlinear dynamical systems of the form $x_t+1=sigma(Thetastarx_t)+varepsilon_t$. We give an algorithm that recovers the weight matrix $Thetastar$ from a single trajectory with optimal sample complexity and linear running time.
arXiv Detail & Related papers (2020-04-30T10:42:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.