Cross-Modal Translation and Alignment for Survival Analysis
- URL: http://arxiv.org/abs/2309.12855v1
- Date: Fri, 22 Sep 2023 13:29:14 GMT
- Title: Cross-Modal Translation and Alignment for Survival Analysis
- Authors: Fengtao Zhou, Hao Chen
- Abstract summary: We present a framework to explore the intrinsic cross-modal correlations and transfer potential complementary information.
Our experiments on five public TCGA datasets demonstrate that our proposed framework outperforms the state-of-the-art methods.
- Score: 7.657906359372181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advances in high-throughput sequencing technologies, the focus
of survival analysis has shifted from examining clinical indicators to
incorporating genomic profiles with pathological images. However, existing
methods either directly adopt a straightforward fusion of pathological features
and genomic profiles for survival prediction, or take genomic profiles as
guidance to integrate the features of pathological images. The former would
overlook intrinsic cross-modal correlations. The latter would discard
pathological information irrelevant to gene expression. To address these
issues, we present a Cross-Modal Translation and Alignment (CMTA) framework to
explore the intrinsic cross-modal correlations and transfer potential
complementary information. Specifically, we construct two parallel
encoder-decoder structures for multi-modal data to integrate intra-modal
information and generate cross-modal representation. Taking the generated
cross-modal representation to enhance and recalibrate intra-modal
representation can significantly improve its discrimination for comprehensive
survival analysis. To explore the intrinsic crossmodal correlations, we further
design a cross-modal attention module as the information bridge between
different modalities to perform cross-modal interactions and transfer
complementary information. Our extensive experiments on five public TCGA
datasets demonstrate that our proposed framework outperforms the
state-of-the-art methods.
Related papers
- Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition [64.56321246196859]
We propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework.
We first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information.
We introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes.
arXiv Detail & Related papers (2024-11-18T05:16:11Z) - Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration [21.97457095780378]
We propose a novel semi-supervised multimodal segmentation framework that is robust to scarce labeled data and misaligned modalities.
Our framework employs a novel cross modality collaboration strategy to distill modality-independent knowledge, which is inherently associated with each modality.
It also integrates contrastive consistent learning to regulate anatomical structures, facilitating anatomical-wise prediction alignment on unlabeled data.
arXiv Detail & Related papers (2024-08-14T07:34:12Z) - GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation [68.63955715643974]
Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
We propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
arXiv Detail & Related papers (2024-07-08T01:06:13Z) - Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images [10.996711454572331]
Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis.
Existing multimodal methods often rely on alignment strategies to integrate complementary information.
We propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks.
arXiv Detail & Related papers (2024-06-25T02:18:35Z) - FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - TTMFN: Two-stream Transformer-based Multimodal Fusion Network for
Survival Prediction [7.646155781863875]
We propose a novel framework named Two-stream Transformer-based Multimodal Fusion Network for survival prediction (TTMFN)
In TTMFN, we present a two-stream multimodal co-attention transformer module to take full advantage of the complex relationships between different modalities.
The experiment results on four datasets from The Cancer Genome Atlas demonstrate that TTMFN can achieve the best performance or competitive results.
arXiv Detail & Related papers (2023-11-13T02:31:20Z) - Knowledge-Enhanced Hierarchical Information Correlation Learning for
Multi-Modal Rumor Detection [82.94413676131545]
We propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection.
KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space.
It extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy.
arXiv Detail & Related papers (2023-06-28T06:08:20Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.