Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence
- URL: http://arxiv.org/abs/2106.15214v4
- Date: Mon, 17 Apr 2023 08:30:42 GMT
- Title: Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence
- Authors: Arthur Marmin and Jos\'e Henrique de Morais Goulart and C\'edric
F\'evotte
- Abstract summary: This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function.
We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
- Score: 4.468952886990851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article proposes new multiplicative updates for nonnegative matrix
factorization (NMF) with the $\beta$-divergence objective function. Our new
updates are derived from a joint majorization-minimization (MM) scheme, in
which an auxiliary function (a tight upper bound of the objective function) is
built for the two factors jointly and minimized at each iteration. This is in
contrast with the classic approach in which a majorizer is derived for each
factor separately. Like that classic approach, our joint MM algorithm also
results in multiplicative updates that are simple to implement. They however
yield a significant drop of computation time (for equally good solutions), in
particular for some $\beta$-divergences of important applicative interest, such
as the squared Euclidean distance and the Kullback-Leibler or Itakura-Saito
divergences. We report experimental results using diverse datasets: face
images, an audio spectrogram, hyperspectral data and song play counts.
Depending on the value of $\beta$ and on the dataset, our joint MM approach can
yield CPU time reductions from about $13\%$ to $78\%$ in comparison to the
classic alternating scheme.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Block Majorization Minimization with Extrapolation and Application to
$\beta$-NMF [17.97403029273243]
We propose a Block Majorization Minimization method with Extrapolation (BMMe) for solving a class of multi- optimization problems.
We show that block majorization parameters of BMMe can be reformulated as a block mirror method with the Bregman divergence adaptively updated.
We empirically illustrate the significant acceleration BM for $beta$NMF through extensive experiments.
arXiv Detail & Related papers (2024-01-12T15:52:02Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - FeDXL: Provable Federated Learning for Deep X-Risk Optimization [105.17383135458897]
We tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing algorithms are applicable.
The challenges for designing an FL algorithm for X-risks lie in the non-decomability of the objective over multiple machines and the interdependency between different machines.
arXiv Detail & Related papers (2022-10-26T00:23:36Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Majorization-minimization for Sparse Nonnegative Matrix Factorization
with the $\beta$-divergence [2.3787352248749376]
It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation.
Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem.
We derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $ell_1$-regularization or the more "aggressive" log-regularization.
arXiv Detail & Related papers (2022-07-13T16:09:29Z) - Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
than Dot-Product Self-Attention [9.205331586765613]
Self-attention is a widely used building block in neural modeling to mix long-range data elements.
We propose a novel scalable and effective mixing building block called Paramixer.
The overall computing cost of the new building block is as low as $O(N log N)$.
arXiv Detail & Related papers (2022-04-22T12:35:08Z) - Probabilistic semi-nonnegative matrix factorization: a Skellam-based
framework [0.7310043452300736]
We present a new probabilistic model to address semi-nonnegative matrix factorization (SNMF), called Skellam-SNMF.
It is a hierarchical generative model consisting of prior components, Skellam-distributed hidden variables and observed data.
Two inference algorithms are derived: Expectation-Maximization (EM) algorithm for maximum empha posteriori estimation and Vari Bayes EM (VBEM) for full Bayesian inference.
arXiv Detail & Related papers (2021-07-07T15:56:22Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.