Personalized Coupled Tensor Decomposition for Multimodal Data Fusion: Uniqueness and Algorithms
- URL: http://arxiv.org/abs/2412.01102v2
- Date: Thu, 12 Dec 2024 13:08:13 GMT
- Title: Personalized Coupled Tensor Decomposition for Multimodal Data Fusion: Uniqueness and Algorithms
- Authors: Ricardo Augusto Borsoi, Konstantin Usevich, David Brie, Tülay Adali,
- Abstract summary: We introduce a personalized CTD framework tackling challenges of data fusion.
A flexible model is proposed where each dataset is represented as the sum of two components.
Two algorithms are proposed to compute the common and distinct components.
- Score: 15.520496676392955
- License:
- Abstract: Coupled tensor decompositions (CTDs) perform data fusion by linking factors from different datasets. Although many CTDs have been already proposed, current works do not address important challenges of data fusion, where: 1) the datasets are often heterogeneous, constituting different "views" of a given phenomena (multimodality); and 2) each dataset can contain personalized or dataset-specific information, constituting distinct factors that are not coupled with other datasets. In this work, we introduce a personalized CTD framework tackling these challenges. A flexible model is proposed where each dataset is represented as the sum of two components, one related to a common tensor through a multilinear measurement model, and another specific to each dataset. Both the common and distinct components are assumed to admit a polyadic decomposition. This generalizes several existing CTD models. We provide conditions for specific and generic uniqueness of the decomposition that are easy to interpret. These conditions employ uni-mode uniqueness of different individual datasets and properties of the measurement model. Two algorithms are proposed to compute the common and distinct components: a semi-algebraic one and a coordinate-descent optimization method. Experimental results illustrate the advantage of the proposed framework compared with the state of the art approaches.
Related papers
- A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees.
Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure.
We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Personalized PCA: Decoupling Shared and Unique Features [4.976703689624386]
We propose personalized PCA (PerPCA) to decouple shared and unique features from heterogeneous datasets.
We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem.
As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks, including video segmentation, topic extraction, and feature clustering.
arXiv Detail & Related papers (2022-07-17T00:09:47Z) - AVIDA: Alternating method for Visualizing and Integrating Data [1.6637373649145604]
AVIDA is a framework for simultaneously performing data alignment and dimension reduction.
We show that AVIDA correctly aligns high-dimensional datasets without common features.
In general applications, other methods can be used for the alignment and dimension reduction modules.
arXiv Detail & Related papers (2022-05-31T22:36:10Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - sJIVE: Supervised Joint and Individual Variation Explained [0.0]
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in biomedical research.
We propose a method called supervised joint and individual variation explained (sJIVE) that can simultaneously identify shared (joint) and source-specific (individual) underlying structure.
arXiv Detail & Related papers (2021-02-26T02:54:45Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.