MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging
Auxiliary Information
- URL: http://arxiv.org/abs/2303.02566v2
- Date: Mon, 12 Feb 2024 20:13:17 GMT
- Title: MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging
Auxiliary Information
- Authors: Zhiwei Wang, Fa Zhang, Cong Zheng, Xianghong Hu, Mingxuan Cai, Can
Yang
- Abstract summary: We propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to leverage auxiliary information (MFAI)
MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships.
MFAI is computationally efficient and scalable to large datasets by exploiting variational inference.
- Score: 8.42894516984735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In various practical situations, matrix factorization methods suffer from
poor data quality, such as high data sparsity and low signal-to-noise ratio
(SNR). Here, we consider a matrix factorization problem by utilizing auxiliary
information, which is massively available in real-world applications, to
overcome the challenges caused by poor data quality. Unlike existing methods
that mainly rely on simple linear models to combine auxiliary information with
the main data matrix, we propose to integrate gradient boosted trees in the
probabilistic matrix factorization framework to effectively leverage auxiliary
information (MFAI). Thus, MFAI naturally inherits several salient features of
gradient boosted trees, such as the capability of flexibly modeling nonlinear
relationships and robustness to irrelevant features and missing values in
auxiliary information. The parameters in MFAI can be automatically determined
under the empirical Bayes framework, making it adaptive to the utilization of
auxiliary information and immune to overfitting. Moreover, MFAI is
computationally efficient and scalable to large datasets by exploiting
variational inference. We demonstrate the advantages of MFAI through
comprehensive numerical results from simulation studies and real data analyses.
Our approach is implemented in the R package mfair available at
https://github.com/YangLabHKUST/mfair.
Related papers
- LaFA: Latent Feature Attacks on Non-negative Matrix Factorization [3.45173496229657]
We introduce a novel class of attacks in NMF termed Latent Feature Attacks (LaFA)
Our method utilizes the Feature Error (FE) loss directly on the latent features.
To handle large peak-memory overhead gradient from back-propagation in FE attacks, we develop a method based on implicit differentiation.
arXiv Detail & Related papers (2024-08-07T17:13:46Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - Quadratic Matrix Factorization with Applications to Manifold Learning [1.6795461001108094]
We propose a quadratic matrix factorization (QMF) framework to learn the curved manifold on which the dataset lies.
Algorithmically, we propose an alternating minimization algorithm to optimize QMF and establish its theoretical convergence properties.
Experiments on a synthetic manifold learning dataset and two real datasets, including the MNIST handwritten dataset and a cryogenic electron microscopy dataset, demonstrate the superiority of the proposed method over its competitors.
arXiv Detail & Related papers (2023-01-30T15:09:00Z) - Non-Negative Matrix Factorization with Scale Data Structure Preservation [23.31865419578237]
The model described in this paper belongs to the family of non-negative matrix factorization methods designed for data representation and dimension reduction.
The idea is to add, to the NMF cost function, a penalty term to impose a scale relationship between the pairwise similarity matrices of the original and transformed data points.
The proposed clustering algorithm is compared to some existing NMF-based algorithms and to some manifold learning-based algorithms when applied to some real-life datasets.
arXiv Detail & Related papers (2022-09-22T09:32:18Z) - Unitary Approximate Message Passing for Matrix Factorization [90.84906091118084]
We consider matrix factorization (MF) with certain constraints, which finds wide applications in various areas.
We develop a Bayesian approach to MF with an efficient message passing implementation, called UAMPMF.
We show that UAMPMF significantly outperforms state-of-the-art algorithms in terms of recovery accuracy, robustness and computational complexity.
arXiv Detail & Related papers (2022-07-31T12:09:32Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper.
FNMF learns the weights of features adaptively according to their importances.
It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.