AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival
Analysis with Whole Slide Images and Gene Expression Data
- URL: http://arxiv.org/abs/2108.12565v1
- Date: Sat, 28 Aug 2021 04:02:10 GMT
- Title: AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival
Analysis with Whole Slide Images and Gene Expression Data
- Authors: Ruoqi Wang, Ziwang Huang, Haitao Wang, Hejun Wu
- Abstract summary: We propose a new asymmetrical multi-modal method, termed as AMMASurv.
AMMASurv can effectively utilize the intrinsic information within every modality and flexibly adapts to the modalities of different importance.
- Score: 2.0329335234511974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of multi-modal data such as the combination of whole slide images
(WSIs) and gene expression data for survival analysis can lead to more accurate
survival predictions. Previous multi-modal survival models are not able to
efficiently excavate the intrinsic information within each modality. Moreover,
despite experimental results show that WSIs provide more effective information
than gene expression data, previous methods regard the information from
different modalities as similarly important so they cannot flexibly utilize the
potential connection between the modalities. To address the above problems, we
propose a new asymmetrical multi-modal method, termed as AMMASurv.
Specifically, we design an asymmetrical multi-modal attention mechanism (AMMA)
in Transformer encoder for multi-modal data to enable a more flexible
multi-modal information fusion for survival prediction. Different from previous
works, AMMASurv can effectively utilize the intrinsic information within every
modality and flexibly adapts to the modalities of different importance.
Extensive experiments are conducted to validate the effectiveness of the
proposed model. Encouraging results demonstrate the superiority of our method
over other state-of-the-art methods.
Related papers
- Supervised Multi-Modal Fission Learning [19.396207029419813]
Learning from multimodal datasets can leverage complementary information and improve performance in prediction tasks.
We propose a Multi-Modal Fission Learning model that simultaneously identifies globally joint, partially joint, and individual components.
arXiv Detail & Related papers (2024-09-30T17:58:03Z) - Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening.
It provides a measure of confidence for each modality and elegantly integrates the multi-modality information.
Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Joint Self-Supervised and Supervised Contrastive Learning for Multimodal
MRI Data: Towards Predicting Abnormal Neurodevelopment [5.771221868064265]
We present a novel joint self-supervised and supervised contrastive learning method to learn the robust latent feature representation from multimodal MRI data.
Our method has the capability to facilitate computer-aided diagnosis within clinical practice, harnessing the power of multimodal data.
arXiv Detail & Related papers (2023-12-22T21:05:51Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.