AVIDA: Alternating method for Visualizing and Integrating Data
- URL: http://arxiv.org/abs/2206.00135v2
- Date: Fri, 7 Apr 2023 22:25:07 GMT
- Title: AVIDA: Alternating method for Visualizing and Integrating Data
- Authors: Kathryn Dover, Zixuan Cang, Anna Ma, Qing Nie, and Roman Vershynin
- Abstract summary: AVIDA is a framework for simultaneously performing data alignment and dimension reduction.
We show that AVIDA correctly aligns high-dimensional datasets without common features.
In general applications, other methods can be used for the alignment and dimension reduction modules.
- Score: 1.6637373649145604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional multimodal data arises in many scientific fields. The
integration of multimodal data becomes challenging when there is no known
correspondence between the samples and the features of different datasets. To
tackle this challenge, we introduce AVIDA, a framework for simultaneously
performing data alignment and dimension reduction. In the numerical
experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic
neighbor embedding are used as the alignment and dimension reduction modules
respectively. We show that AVIDA correctly aligns high-dimensional datasets
without common features with four synthesized datasets and two real multimodal
single-cell datasets. Compared to several existing methods, we demonstrate that
AVIDA better preserves structures of individual datasets, especially distinct
local structures in the joint low-dimensional visualization, while achieving
comparable alignment performance. Such a property is important in multimodal
single-cell data analysis as some biological processes are uniquely captured by
one of the datasets. In general applications, other methods can be used for the
alignment and dimension reduction modules.
Related papers
- Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection [64.08296187555095]
Uni$2$Det is a framework for unified and universal multi-dataset training on 3D detection.
We introduce multi-stage prompting modules for multi-dataset 3D detection.
Results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
arXiv Detail & Related papers (2024-09-30T17:57:50Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees.
Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure.
We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z) - MergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction [8.993992124170624]
MergeOcc is developed to simultaneously handle different LiDARs by leveraging multiple datasets.
The effectiveness of MergeOcc is validated through experiments on two prominent datasets for autonomous vehicles.
arXiv Detail & Related papers (2024-03-13T13:23:05Z) - AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera
Joint Synthesis [98.3959800235485]
Recently, there exist some methods exploring multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance.
In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors.
We introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI)
arXiv Detail & Related papers (2024-02-27T13:08:47Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Is your data alignable? Principled and interpretable alignability
testing and integration of single-cell data [24.457344926393397]
Single-cell data integration can provide a comprehensive molecular view of cells.
Existing methods suffer from several fundamental limitations.
We present a spectral manifold alignment and inference framework.
arXiv Detail & Related papers (2023-08-03T16:04:14Z) - Unsupervised Manifold Alignment with Joint Multidimensional Scaling [4.683612295430957]
We introduce Joint Multidimensional Scaling, which maps datasets from two different domains to a common low-dimensional Euclidean space.
Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem.
We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment.
arXiv Detail & Related papers (2022-07-06T21:02:42Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data.
MVPA works best with a well-designed feature set and an adequate sample size.
Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes.
This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.