Single-Cell Multimodal Prediction via Transformers
- URL: http://arxiv.org/abs/2303.00233v3
- Date: Fri, 13 Oct 2023 15:32:57 GMT
- Title: Single-Cell Multimodal Prediction via Transformers
- Authors: Wenzhuo Tang, Hongzhi Wen, Renming Liu, Jiayuan Ding, Wei Jin, Yuying
Xie, Hui Liu, Jiliang Tang
- Abstract summary: We propose scMoFormer to model the complex interactions among different modalities.
scMoFormer won a Kaggle silver medal with the rank of 24/1221 (Top 2%) without ensemble in a NeurIPS 2022 competition.
- Score: 36.525050229323845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent development of multimodal single-cell technology has made the
possibility of acquiring multiple omics data from individual cells, thereby
enabling a deeper understanding of cellular states and dynamics. Nevertheless,
the proliferation of multimodal single-cell data also introduces tremendous
challenges in modeling the complex interactions among different modalities. The
recently advanced methods focus on constructing static interaction graphs and
applying graph neural networks (GNNs) to learn from multimodal data. However,
such static graphs can be suboptimal as they do not take advantage of the
downstream task information; meanwhile GNNs also have some inherent limitations
when deeply stacking GNN layers. To tackle these issues, in this work, we
investigate how to leverage transformers for multimodal single-cell data in an
end-to-end manner while exploiting downstream task information. In particular,
we propose a scMoFormer framework which can readily incorporate external domain
knowledge and model the interactions within each modality and cross modalities.
Extensive experiments demonstrate that scMoFormer achieves superior performance
on various benchmark datasets. Remarkably, scMoFormer won a Kaggle silver medal
with the rank of 24/1221 (Top 2%) without ensemble in a NeurIPS 2022
competition. Our implementation is publicly available at Github.
Related papers
- MINIMA: Modality Invariant Image Matching [52.505282811925454]
We present MINIMA, a unified image matching framework for multiple cross-modal cases.
We scale up the modalities from cheap but rich RGB-only matching data, by means of generative models.
With MD-syn, we can directly train any advanced matching pipeline on randomly selected modality pairs to obtain cross-modal ability.
arXiv Detail & Related papers (2024-12-27T02:39:50Z) - Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation [59.4356484322228]
Graph Neural Networks (GNNs) have shown promising performance in this domain.
We propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields.
Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information.
arXiv Detail & Related papers (2024-12-18T16:12:26Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Graph Neural Networks for Multimodal Single-Cell Data Integration [32.8390339109358]
We present a general Graph Neural Network framework $textitscMoGNN$ to tackle three tasks.
textitscMoGNN$ demonstrates superior results in all three tasks compared with the state-of-the-art and conventional approaches.
arXiv Detail & Related papers (2022-03-03T17:59:02Z) - Progressive Multi-stage Interactive Training in Mobile Network for
Fine-grained Recognition [8.727216421226814]
We propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI)
First, we propose a Recursive Mosaic Generator (RMG) that generates images with different granularities in different phases.
Then, the features of different stages pass through a Multi-Stage Interaction (MSI) module, which strengthens and complements the corresponding features of different stages.
Experiments on three prestigious fine-grained benchmarks show that RMG-PMSI can significantly improve the performance with good robustness and transferability.
arXiv Detail & Related papers (2021-12-08T10:50:03Z) - Graph Capsule Aggregation for Unaligned Multimodal Sequences [16.679793708015534]
We introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network.
By converting sequence data into graph, the previously mentioned problems of RNN are avoided.
In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of long-range dependency.
arXiv Detail & Related papers (2021-08-17T10:04:23Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.