A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and
Sparse data
- URL: http://arxiv.org/abs/2204.07819v1
- Date: Sat, 16 Apr 2022 15:08:00 GMT
- Title: A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and
Sparse data
- Authors: Di Wu, Peng Zhang, Yi He, Xin Luo
- Abstract summary: High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of big data-related applications.
Current LFA-based models mainly focus on a single-metric representation, where the representation strategy designed for the approximation Loss function is fixed and exclusive.
We in this paper propose a multi-metric latent factor (MMLF) model.
Our proposed MMLF enjoys the merits originated from a set of disparate metric spaces all at once, achieving the comprehensive and unbiased representation of HiDS matrices.
- Score: 11.800988109180285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of
big data-related applications. Latent factor analysis (LFA) is a typical
representation learning method that extracts useful yet latent knowledge from
HiDS matrices via low-rank approximation. Current LFA-based models mainly focus
on a single-metric representation, where the representation strategy designed
for the approximation Loss function, is fixed and exclusive. However,
real-world HiDS matrices are commonly heterogeneous and inclusive and have
diverse underlying patterns, such that a single-metric representation is most
likely to yield inferior performance. Motivated by this, we in this paper
propose a multi-metric latent factor (MMLF) model. Its main idea is two-fold:
1) two vector spaces and three Lp-norms are simultaneously employed to develop
six variants of LFA model, each of which resides in a unique metric
representation space, and 2) all the variants are ensembled with a tailored,
self-adaptive weighting strategy. As such, our proposed MMLF enjoys the merits
originated from a set of disparate metric spaces all at once, achieving the
comprehensive and unbiased representation of HiDS matrices. Theoretical study
guarantees that MMLF attains a performance gain. Extensive experiments on eight
real-world HiDS datasets, spanning a wide range of industrial and science
domains, verify that our MMLF significantly outperforms ten state-of-the-art,
shallow and deep counterparts.
Related papers
- Measuring Orthogonality in Representations of Generative Models [81.13466637365553]
In unsupervised representation learning, models aim to distill essential features from high-dimensional data into lower-dimensional learned representations.
Disentanglement of independent generative processes has long been credited with producing high-quality representations.
We propose two novel metrics: Importance-Weighted Orthogonality (IWO) and Importance-Weighted Rank (IWR)
arXiv Detail & Related papers (2024-07-04T08:21:54Z) - VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities.
In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets.
Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Synergistic eigenanalysis of covariance and Hessian matrices for
enhanced binary classification [75.90957645766676]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our approach is substantiated by formal proofs that establish its capability to maximize between-class mean distance and minimize within-class variances.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Task Aware Modulation using Representation Learning: An Approach for Few
Shot Learning in Heterogeneous Systems [16.524898421921108]
TAM-RL is a framework that enhances personalized predictions in few-shot settings for heterogeneous systems.
We show that TAM-RL can significantly outperform existing baseline approaches such as MAML and multi-modal MAML.
We show that TAM-RL significantly improves predictive performance for cases where it is possible to learn distinct representations for different tasks.
arXiv Detail & Related papers (2023-10-07T07:55:22Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - Multi-constrained Symmetric Nonnegative Latent Factor Analysis for
Accurately Representing Large-scale Undirected Weighted Networks [2.1797442801107056]
An Undirected Weighted Network (UWN) is frequently encountered in a big-data-related application.
An analysis model should carefully consider its symmetric-topology for describing an UWN's intrinsic symmetry.
This paper proposes a Multi-constrained Symmetric Nonnegative Latent-factor-analysis model with two-fold ideas.
arXiv Detail & Related papers (2023-06-06T14:13:16Z) - Graph-incorporated Latent Factor Analysis for High-dimensional and
Sparse Matrices [9.51012204233452]
A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system.
This paper proposes a graph-incorporated latent factor analysis (GLFA) model to perform representation learning on HiDS matrix.
Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix.
arXiv Detail & Related papers (2022-04-16T15:04:34Z) - A Differential Evolution-Enhanced Latent Factor Analysis Model for
High-dimensional and Sparse Data [11.164847043777703]
This paper proposes a Sequential-Group-Differential- Evolution (SGDE) algorithm to refine the latent factors optimized by a PLFA model.
As demonstrated by the experiments on four HiDS matrices, a SGDE-PLFA model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2022-04-02T13:41:19Z) - Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper.
FNMF learns the weights of features adaptively according to their importances.
It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z) - A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network
Representation Learning [52.83948119677194]
We propose a multi-semantic metapath (MSM) model for large scale heterogeneous representation learning.
Specifically, we generate multi-semantic metapath-based random walks to construct the heterogeneous neighborhood to handle the unbalanced distributions.
We conduct systematical evaluations for the proposed framework on two challenging datasets: Amazon and Alibaba.
arXiv Detail & Related papers (2020-07-19T22:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.