Majorization-minimization for Sparse Nonnegative Matrix Factorization
with the $\beta$-divergence
- URL: http://arxiv.org/abs/2207.06316v4
- Date: Tue, 12 Mar 2024 11:29:58 GMT
- Title: Majorization-minimization for Sparse Nonnegative Matrix Factorization
with the $\beta$-divergence
- Authors: Arthur Marmin, Jos\'e Henrique de Morais Goulart, C\'edric F\'evotte
- Abstract summary: It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation.
Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem.
We derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $ell_1$-regularization or the more "aggressive" log-regularization.
- Score: 2.3787352248749376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article introduces new multiplicative updates for nonnegative matrix
factorization with the $\beta$-divergence and sparse regularization of one of
the two factors (say, the activation matrix). It is well known that the norm of
the other factor (the dictionary matrix) needs to be controlled in order to
avoid an ill-posed formulation. Standard practice consists in constraining the
columns of the dictionary to have unit norm, which leads to a nontrivial
optimization problem. Our approach leverages a reparametrization of the
original problem into the optimization of an equivalent scale-invariant
objective function. From there, we derive block-descent
majorization-minimization algorithms that result in simple multiplicative
updates for either $\ell_{1}$-regularization or the more "aggressive"
log-regularization. In contrast with other state-of-the-art methods, our
algorithms are universal in the sense that they can be applied to any
$\beta$-divergence (i.e., any value of $\beta$) and that they come with
convergence guarantees. We report numerical comparisons with existing heuristic
and Lagrangian methods using various datasets: face images, an audio
spectrogram, hyperspectral data, and song play counts. We show that our methods
obtain solutions of similar quality at convergence (similar objective values)
but with significantly reduced CPU times.
Related papers
- Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Optimization without Retraction on the Random Generalized Stiefel Manifold [9.301728976515255]
We propose a cheap iterative method that solves the optimization problem while having access only to a random estimates of $B$.
Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation.
arXiv Detail & Related papers (2024-05-02T19:55:30Z) - Mirror Natural Evolution Strategies [10.495496415022064]
We focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries.
We show that the estimated covariance matrix of textttMiNES converges to the inverse of Hessian matrix of the objective function.
arXiv Detail & Related papers (2023-08-01T11:45:24Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function.
We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.