Related papers: A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and Sparse data

A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and Sparse data

URL: http://arxiv.org/abs/2204.07819v1
Date: Sat, 16 Apr 2022 15:08:00 GMT
Title: A Multi-Metric Latent Factor Model for Analyzing High-Dimensional and Sparse data
Authors: Di Wu, Peng Zhang, Yi He, Xin Luo
Abstract summary: High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of big data-related applications. Current LFA-based models mainly focus on a single-metric representation, where the representation strategy designed for the approximation Loss function is fixed and exclusive. We in this paper propose a multi-metric latent factor (MMLF) model. Our proposed MMLF enjoys the merits originated from a set of disparate metric spaces all at once, achieving the comprehensive and unbiased representation of HiDS matrices.
Score: 11.800988109180285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of big data-related applications. Latent factor analysis (LFA) is a typical representation learning method that extracts useful yet latent knowledge from HiDS matrices via low-rank approximation. Current LFA-based models mainly focus on a single-metric representation, where the representation strategy designed for the approximation Loss function, is fixed and exclusive. However, real-world HiDS matrices are commonly heterogeneous and inclusive and have diverse underlying patterns, such that a single-metric representation is most likely to yield inferior performance. Motivated by this, we in this paper propose a multi-metric latent factor (MMLF) model. Its main idea is two-fold: 1) two vector spaces and three Lp-norms are simultaneously employed to develop six variants of LFA model, each of which resides in a unique metric representation space, and 2) all the variants are ensembled with a tailored, self-adaptive weighting strategy. As such, our proposed MMLF enjoys the merits originated from a set of disparate metric spaces all at once, achieving the comprehensive and unbiased representation of HiDS matrices. Theoretical study guarantees that MMLF attains a performance gain. Extensive experiments on eight real-world HiDS datasets, spanning a wide range of industrial and science domains, verify that our MMLF significantly outperforms ten state-of-the-art, shallow and deep counterparts.

Related papers

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs [51.09983600916971]
Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic. We argue that this linearity already exists within the model's submodules. We propose an innovative model merging strategy that independently merges these submodules.
arXiv Detail & Related papers (2025-04-15T06:23:24Z)
Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation [57.59506688299817]
Latent representation alignment is used to map embeddings from different modalities into a shared space, often aligned with the embedding space of large language models (LLMs) Preliminary protein-focused large language models (MLLMs) have emerged, but they have predominantly relied on approaches lacking a fundamental understanding of optimal alignment practices across representations. In this study, we explore the alignment of multimodal representations between LLMs and Geometric Deep Models (GDMs) in the protein domain. Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process.
arXiv Detail & Related papers (2024-11-08T04:15:08Z)
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models. In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias" This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z)
Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs) Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z)
VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities. In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets. Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z)
Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model. Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z)
SOFARI: High-Dimensional Manifold-Based Inference [8.860162863559163]
We introduce two SOFARI variants to handle strongly and weakly latent factors, where the latter covers a broader range of applications. We show that SOFARI provides bias-corrected estimators for both latent left factor vectors and singular values, for which we show to enjoy the mean-zero normal distributions with sparse estimable variances. We illustrate the effectiveness of SOFARI and justify our theoretical results through simulation examples and a real data application in economic forecasting.
arXiv Detail & Related papers (2023-09-26T16:01:54Z)
Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent. We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed. Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z)
Multi-constrained Symmetric Nonnegative Latent Factor Analysis for Accurately Representing Large-scale Undirected Weighted Networks [2.1797442801107056]
An Undirected Weighted Network (UWN) is frequently encountered in a big-data-related application. An analysis model should carefully consider its symmetric-topology for describing an UWN's intrinsic symmetry. This paper proposes a Multi-constrained Symmetric Nonnegative Latent-factor-analysis model with two-fold ideas.
arXiv Detail & Related papers (2023-06-06T14:13:16Z)
Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices [9.51012204233452]
A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system. This paper proposes a graph-incorporated latent factor analysis (GLFA) model to perform representation learning on HiDS matrix. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix.
arXiv Detail & Related papers (2022-04-16T15:04:34Z)
A Differential Evolution-Enhanced Latent Factor Analysis Model for High-dimensional and Sparse Data [11.164847043777703]
This paper proposes a Sequential-Group-Differential- Evolution (SGDE) algorithm to refine the latent factors optimized by a PLFA model. As demonstrated by the experiments on four HiDS matrices, a SGDE-PLFA model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2022-04-02T13:41:19Z)
Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper. FNMF learns the weights of features adaptively according to their importances. It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.